<<

arXiv:cs/0608095v5 [cs.IT] 4 Feb 2009 h eainbtentetocnb rte as written be can two the between relation The eln(-al [email protected]). (e-mail: Berlin h egho h hretcmue rga htoutputs that program computer shortest the of length the rsmmaueo h iaysrns ti lsl eae t related closely is It strings. complexity binary Kolmogorov prefix the on distri probability semimeasure subnormalized or a is probability rithmic where olw rmteKatieult that inequality Kraft the from follows n ttsia ehnc c.[] 5,[].Tealgorith The string [6]). binary [5], a [4], properti of interesting (cf. probability of mechanics variety statistical inf inductive and a science, computer revealed in applications has including it [3]), [2], S htauieslpexcomputer prefix universal a that i.e. t,Mro hi,Euain mlto Complexity Emulation Emulation, Chain, Markov ity, nsm iutos n h eutn ttoaycmue a computer stationary properties. resulting beautiful the have machine-dependenc of probabilities and amount string situations, some of some rid get in to help might ers where hteeyohrcnevbeatmtt e i ftoeaddi those too. of principle, rid in get fail to must attempt constants conceivable su interpretation, elimina other physical every cannot interesting that and method f clear for this a reason has the thus, ure that computers; show all we Moreover, of machine-dependence. set the on TTOAYAGRTMCPOAIIY(ERAY4 2009) 4, (FEBRUARY PROBABILITY ALGORITHMIC STATIONARY rbblt esr ntecmues n loo h binary machine-independent the on and also natural and a computers, the give strings. randomly on would measure computers existed, probability universal distri it we stationary of “un- corresponding Therefore, if The process that emulate. other. each to idea Markov emulating hard the the be on study should based computers themselves, natural” computers different univers the of appr machine-dependence. natural choice this a eliminate analyze the we const to on paper, multiplicative this depend resp. In computer. values additive reference actual an their to up since only defined are .Mulri ihteIsiueo ahmtc,TcnclU Technical Mathematics, of Institute the with is M¨uller M. ne Terms Index oee,w hwta etitn osm ucaso comput of subclass some to restricting that show we However, notntl,w hwta osainr itiuinexi distribution stationary no that show we Unfortunately, the to probabilities algorithmic assign to is method Our Abstract 90 ySlmnf,Lvn hii n tes(f [1], (cf. others and Chaitin the Levin, in Solomonoff, studied by been first 1960s has probability algorithmic INCE | Ω x | U Klooo opeiyadagrtmcprobability algorithmic and complexity —Kolmogorov .I I. eoe h egho iaystring binary a of length the denotes sCatnsfmu atn rbblt.S algo- So probability. halting famous Chaitin’s is AgrtmcPoaiiy omgrvComplex- Kolmogorov Probability, —Algorithmic TOUTO AND NTRODUCTION K P K U U U s ( ( ∈{ ( s s X s := ) =min := ) 0 = ) , ttoayAgrtmcProbability Algorithmic Stationary 1 } ∗ x − P ∈{ U log 0 ( {| , s 1 s :Ω =: ) X P } x K ∗ U | | U sdfie steprobability the as defined is : U U ( outputs ( ( s M U x s + ) )= ) U ( AIN x hc sdfie 4 as [4] defined is which s < = ) O 2 −| R (1) 1 s , ESULTS x s nrno input, random on | } , , . x { ∈ iest of niversity 0 ggesting , aksM¨uller Markus bution, 1 erence bution } oach ∗ s mic ant, tive ail- It . (2) (1) : es, nd sts te al o e - ie yaTrn ahn oe htw ih ug as judge might we strings is that arbitrary which an model one choose machine say, Now Turing “simple”. computer, a prefix by universal given “natural” a is ilo admbt;say, bits; random million arci.Wt ihpoaiiy hr sn hr rga f program short no is there probability, U high With coin. fair computer s ifrn string different a use U iiydpn ntecoc fteuieslrfrnecompu reference universal the of U choice the on depend prob bility algorithmic and complexity Kolmogorov Both constant. h computer The hr the where o edfieaohrpexcomputer prefix another define we Now h atta n a mlt h te that other the emulate can one that fact the V oilsrt h anie forapoc.Spoethat Suppose approach. our of a idea main and the drastic, illustrate be to can probability algorithmic short of and dependence finite with deals often the one [ strings. universal eliminate where binary Intelligence the [7], to Artificial of physics to are and and choice Examples desirable the constants, computer. be from additive reference would comes those which it of arbitrariness where However, rid much. occasions a too get many then matter enough, are not long does there are constant strings additive the fixed if applications: many for to up the only that shows differ constant. (2) multiplicative probabilities Equation Then algorithmic constant. corresponding additive an to up ..tecomplexities the i.e. computer n osywihoeo hmi bte hie hnthe than choice” “better a is them of them one between which distinguish other. create to say way are to obvious computers no and is universal there — the equal all theory, information ntecoc fteuieslcomputer universal the of choice the probability on algorithmic the Hence nice bad oee,te ontdpn on depend not do they However, . esatwt ipeeape oso httemachine- the that show to example, simple a with start We enough good is independence machine “weak” of kind This r ohuieslpexcmues hni olw from follows it then computers, prefix universal both are 0 = (0) U hc computes which bad U U ( s x nice O bad ehave we , := ) (1) U bad fw utpeeda“”t h nu.Since input. the to “1” a prepend just we if em ut naua,bti algorithmic in but unnatural, quite seems tr eoe qaiyu oa additive an to up equality denotes -term K    P suiesl ic teuae h universal the emulates it since universal, is U U K undefined s ( nice U s .W hsepc that expect thus We ). U P = ) s nice ( U and s s s ohriets h onaanand again coin the toss (otherwise bad ) satie yamlintse fa of tosses million a by attained is ( K y ≈ ( K ) V s 2 ) ( V − s > + ) 1 if if if ifrfo ahohronly other each from differ . P 000 1 2 x x x U U . O ( . = 1 = 0 = 000 s tomc” If much”: “too (1) ) U λ . eed drastically depends y. , bad , or s U x ossigo a of consisting as lal,the Clearly, . 0 = y, U U nice and lso ter or 6] a- d a 1 STATIONARY ALGORITHMIC PROBABILITY (FEBRUARY 4, 2009) 2

So what is “the” algorithmic probability of the single string V , that is, the probability that U emulates V on random −1.000.000 1 s? It seems clear that 2 is a better answer than 2 , input. Then, whatever the desired computer probability µ but the question is how we can make mathematical sense of looks like, to make any sense, it should satisfy this statement. How can we give a sound formal meaning to the statement that Unice is more “natural” than Ubad? A µ(U)= µ(V )PV (U). possible answer is that in the process of randomly constructing V universalX a computer from scratch, one is very unlikely to end up with But if we enumerate all universal computers as U , while there is some larger probability to encounter U . bad nice {U ,U ,U ,...}, this equation can be written as This suggests that we might hope to find some natural 1 2 3 probability distribution µ on the universal computers, in such µ(U1) PU1 (U1) PU2 (U1) ... µ(U1) a way that µ(Ubad) ≪ µ(Unice). Then we could define the µ(U ) P (U ) P (U ) ... µ(U )  2  =  U1 2 U2 2 · 2  “machine-independent” algorithmic probability P (s) of some µ(U ) P (U ) P (U ) ... µ(U )  3   U1 3 U2 3   3  string s as the weighted average of all algorithmic probabilities  ...   ......   ...  PU (s),       Thus, we should look for the unknown stationary P (s) := µ(U)PU (s). (3) probability eigenvector µ of the “emulation matrix” U universalX (PUi (Uj ))i,j . Guided by Equation (2), we could then define Clearly, both ideas are equivalent if the probabilities P (V ) “machine-independent ” via U are the transition probabilities of the aforementioned Markov K(s) := − log P (s). process. But how can we find such a probability distribution µ on the Now we give a synopsis of the paper and explain our main computers? The key idea here is to compare the capabilities results: of the two computers to emulate each other. Namely, by • comparing Unice and Ubad, one observes that Section II contains some notational preliminaries, and de- fines the output frequency of a string as the frequency that • it is very “easy” for the computer U to emulate the bad this string is output by a computer. For prefix computers, computer Unice: just prepend a “1” to the input. On the other hand, this notion equals algorithmic probability (Example 2.2). • In Section III, we define the emulation Markov process • it is very “difficult” for the computer U to emulate nice that we have motivated above, and analyze if it has a Ubad: to do the simulation, we have to supply Unice with the long string s as additional data. stationary distribution or not. Here is the construction for the most important case (the case of the full set difficult of computers) in a nutshell: we say that a computer C x emulates computer D via the string x, and write C −→ D x and D = C −→ if C(xy) = D(y) for all strings Unice Ubad y. A computer is universal if it emulates every other computer. Given a universal computer, at least one of easy the two computers C −→0 and C −→1 must be universal, too. Fig. 1. Computers that emulate each other. Thus, we can consider the universal computers as the vertices of a graph, with directed edges going from 0 1 The idea is that this observation holds true more generally: U to V if U −→ V or U −→ V . Every vertex “Unnatural” computers are harder to emulate. There are two (universal computer) has either one or two outgoing edges obvious approaches to construct some computer probability (corresponding to the two bits). The random walk on this µ from this observation — interestingly, both turn out to be connected graph defines a Markov process: we start at equivalent: some computer, follow the outgoing edges, and if there • The situation in Figure 1 looks like the graph of some are two edges, we follow each of them with probability 1 Markov process. If one starts with either one of the two 2 . This is schematically depicted in Figure 2. computers depicted there and interprets the line widths If this process had a stationary distribution, this would as transition probabilities, then in the long run of more be a good candidate for a natural algorithmic probabil- and more moves, one tends to have larger probability ity measure on the universal computers. Unfortunately, to end up at Unice than at Ubad. So let’s apply this no stationary distribution exists: this Markov process is idea more generally and define a Markov process of all transient. the universal computers, randomly emulating each other. We prove this in Theorem 3.13. The idea is to construct If the process has a stationary distribution (e.g. if it is a sequence of universal computers M1,M2,M3,... such positive recurrent), this is a good candidate for computer that Mi emulates Mi+1 with high probability — in fact, probability. with probability turning to 1 fast as i gets large. The • Similarly as in Equation (1), there should be a simple corresponding part of the emulation Markov process is way to define probabilities PU (V ) for computers U and depicted in Figure 3. The outgoing edges in the upwards STATIONARY ALGORITHMIC PROBABILITY (FEBRUARY 4, 2009) 3

0 1 permutation, then a computer C and the output permuted computer σ◦C must have the same probability as long as U6 both are in the computer set Φ (Theorem 4.6). If there are 1 infinitely many of them, they all must have probability 0 zero which contradicts positive recurrence. U5 1 0 U2 • For the same reason, there cannot be one particular “nat- U 1 0 ural” choice of a computer set Φ with positive recurrent 1 Markov process, because σ ◦ Φ is always another good 0 1 (positive recurrent) candidate, too (Theorem 4.8). 0 • This has a nice physical interpretation which we explain U3 U4 1 in Section V: algorithmic probability and Kolmogorov complexity always contain at least the ambiguity which is given by permuting the output strings. This permutation can be interpreted as “renaming” the objects that the Fig. 2. Schematic diagram of the emulation Markov process. Note that the strings are describing. zeroes and ones represent input bits, not transition probabilities, for example We argue that this kind of ambiguity will be present in any U1(1y) = U3(y) for every string y. In our notation, we have for example 110 U1 −→ U6. attempt to eliminate machine-dependence from algorithmic probability or complexity, even if it is different from the approach in this paper. This conclusion can be seen as the direction lead back to a fixed universal reference com- main result of this work. puter, which ensures that every computer Mi is universal. Finally, we show in the appendix that the string probability that we have constructed equals, under certain conditions, the weighted average of output frequency — this is a particularly 1 1 1 1 2 4 8 16 unexpected and beautiful result (Theorem A.6) which needs some technical steps to be proved. The main tool is the study 1 3 7 15 2 4 8 16 of input transformations, i.e., to permute the strings before the M1 M2 M3 M4 ... computation. The appendix is the technically most difficult part of this paper and can be skipped on first reading. Fig. 3. This construction in Theorem 3.13 proves that the emulation Markov chain is transient, and no stationary distribution exists (the numbers are II. PRELIMINARIES AND OUTPUT FREQUENCY transition probabilities). It is comparable to a “computer virus” in the sense that each Mi emulates “many” slightly modified copies of itself. We start by fixing some notation. In this paper, we only consider finite, binary strings, which we denote by As our Markov process has only transition probabilities ∞ 1 {0, 1}∗ := {0, 1}n = {λ, 0, 1, 00, 01,...}. 2 and 1, the edges going from Mi to Mi+1 in fact consist of several transitions (edges). As those transition n[=0 probabilities are constructed to tend to 1 very fast, the The symbol λ denotes the empty string, and we write the ∗ probability to stay on this Mi-path forever (and not length of a string s ∈{0, 1} as |s|, while the cardinality of a return to any other computer) is positive, which forces set S is denoted #S. To avoid confusion with the composition the process to be transient. of mappings, we denote the concatenation of strings with the Yet, it is still possible to construct analogous Markov pro- symbol ⊗, e.g. cesses for restricted sets of computers Φ. Some of those 101 ⊗ 001 = 101001. sets yield processes which have stationary distributions; a non-trivial example is given in Example 3.14. In particular, we have |λ| =0 and |x ⊗ y| = |x| + |y|. A com- puter C is a partial-recursive function C : {0, 1}∗ →{0, 1}∗, • For those computer sets Φ with positive recurrent emula- tion process, the corresponding computer probability has and we denote the set of all computers by Ξ. Note that our nice properties that we study in Section IV. The com- computers do not necessarily have to have prefix-free domain puter probability induces in a natural way a probability (unless otherwise stated). If C ∈ Ξ does not halt on some input ∗ distribution on the strings s ∈ {0, 1}∗ (Definition 4.1) x ∈{0, 1} , then we write C(x)= ∞ as an abbreviation for as the probability that the random walk described above the fact that C(x) is undefined. Thus, we can also interpret ∗ ∗ encounters some output which equals s. This probability computers C as mappings from {0, 1} to {0, 1} , where is computer-independent and can be written in several {0, 1}∗ := {0, 1}∗ ∪ {∞}. equivalent ways (Theorem 4.2). • A symmetry property of computer probability yields As usual, we denote by KC(x) the Kolmogorov complexity of another simple and interesting proof why for the set of all the string x ∈ {0, 1}∗ with respect to the computer C ∈ Ξ computers — and for many other natural computer sets K (x) := min {|s| | s ∈{0, 1}∗, C(s)= x} — the corresponding Markov process cannot be positive C recurrent (Theorem 4.7). In short, if σ is a computable or as ∞ is this set is empty. STATIONARY ALGORITHMIC PROBABILITY (FEBRUARY 4, 2009) 4

What would be a first, naive try to define algorithmic It follows that the limit µ (s) := lim µ(n)(s) exists, and Cp n→∞ Cp probability? Since we do not restrict our approach to prefix it holds −|x| computers, we cannot take Equation (1) as a definition. Instead µCp (s)= 2 , we may try to count how often a string is produced by the x∈{0,1X}∗:C(x)=s computer as output: so the output frequency as given in Definition 2.1 converges Definition 2.1 (Output Frequency): for n to the classical algorithmic probability as given N ∗ → ∞ For every C ∈ Ξ, n ∈ 0 and s ∈ {0, 1} , we set  in Equation (1). Note that ΩC =1 − µCp (∞). n (n) #{x ∈{0, 1} | C(x)= s} µC (s) := n . It is easy to construct examples of computers which are 2 not prefix, but which have an output frequency which either For later use in Section III, we also define for every C,D ∈ Ξ converges, or at least does not tend to zero as n → ∞. N and n ∈ 0 Thus, the notion of output frequency generalizes the idea of x algorithmic probability to a larger class of computers. # x ∈{0, 1}n C −→ D µ(n)(D) := n o , C n 2 III. STATIONARY COMPUTER PROBABILITY x where the expression C −→ D is given in Definition 3.1. As explained in the introduction, it will be an essential part Our final definition of algorithmic probability will look very of this work to analyze in detail how “easily” one computer C different, but it will surprisingly turn out to be closely related emulates another computer D. Our first definition specializes to this output frequency notion. what we mean by “emulation”: (n) The existence of the limit limn→∞ µC (s) depends on the Definition 3.1 (Emulation): A computer C ∈ Ξ emulates computer C and may be hard to decide, but in the special the computer D ∈ Ξ via x ∈{0, 1}∗, denoted case of prefix computers, the limit exists and agrees with x x the classical notion of algorithmic probability as given in C −→ D resp. D = C −→ , Equation (1):   ∗ Example 2.2 (Prefix Computers): A computer C ∈ Ξ is if C(x ⊗ s)= D(s) for every s ∈{0, 1} . We write C −→ D ∗ x called prefix if the following holds: if there is some x ∈{0, 1} such that C −→ D. λ It follows easily from the definition that C −→ C and C(x) 6= ∞ =⇒ C(x ⊗ y)= ∞ for every y 6= λ. x y x⊗y C −→ D and D −→ E =⇒ C −→ E. This means that if C halts on some input x ∈ {0, 1}∗, it must not halt on any extension x ⊗ y. Such computers are Now that we have defined emulation, it is easy to extend the traditionally studied in algorithmic . To fit notion of Kolmogorov complexity to emulation complexity: our approach, we need to modify the definition slightly. Call a Definition 3.2 (Emulation Complexity): For every C,D ∈ computer Cp ∈ Ξ prefix-constant if the following holds true: Ξ, the Emulation Complexity KC(D) is defined as

∗ ∗ s Cp(x) 6= ∞ =⇒ Cp(x ⊗ y)= Cp(x) for every y ∈{0, 1} . KC(D) := min |s| s ∈{0, 1} , C −→ D (4) n o It is easy to see that for every prefix computer C, one can find or as ∞ if the corresponding set is empty. a prefix-constant computer Cp with Cp(x)= C(x) whenever Note that similar definitions have already appeared in the C(x) 6= ∞. It is constructed in the following way: Suppose literature, see for example Def. 4.4 and Def. 4.5 in [8], or ∗ x ∈{0, 1} is given as input into Cp, then it the definition of the constant “sim(C)” in [9]. |x| • computes the set of all prefixes {xi}i=0 of x (e.g. for Definition 3.3 (Universal Computer): Let Φ ⊂ Ξ be a set x = 100 we have x0 = λ, x1 = 1, x2 = 10 and x3 = of computers. If there exists a computer U ∈ Φ such that 100), U −→ X for every X ∈ Φ, then Φ is called connected, • starts |x| +1 simulations of C at the same time, which and U is called a Φ-universal computer. We use the notation U U are supplied with x0 up to x|x| as input, Φ := {C ∈ Φ | C is Φ-universal}, and we write Φ := • waits until one of the simulations produces an output s ∈ {C ∈ Ξ | C −→ D ∀D ∈ Φ and ∃X ∈ Φ: X −→ C}. ∗ U U U U {0, 1} (if this never happens, Cp will loop forever), Note that Φ ⊂ Φ and Φ = ∅ ⇔ Φ = ∅. Examples of • finally outputs s. connected sets of computers include the set Ξ of all computers Fix an arbitrary string s ∈ {0, 1}∗. Consider the set and the set of prefix-constant computers, whereas the set of computers which always halt on every input cannot be con- (n) ∗ T (s) := {x ∈{0, 1} | |x|≤ n, C(x)= s} . nected, as is easily seen by diagonalization. For convenience, we give a short proof of the first statement: Every string x ∈ T (n)(s) can be extended (by concatenation) to a string x′ of length n. By construction, it follows that Proposition 3.4: The set of all computers Ξ is connected. C x′ s. There are n−|x| possible extensions x′, thus Proof. It is well-known that there is a computer U that takes p( )= 2 ∗ a description dM ∈{0, 1} of any computer M ∈ Ξ together n−|x| ∗ (n) 2 with some input x , and simulates M on input x, i.e. (n) x∈T (s) −|x| ∈{0 1} µC (s)= = 2 . p P 2n ∗ x∈{0,1}∗:|Xx|≤n,C(x)=s U(hdM , xi)= M(x) for every x ∈{0, 1} , STATIONARY ALGORITHMIC PROBABILITY (FEBRUARY 4, 2009) 5 where h·, ·i : {0, 1}∗ ×{0, 1}∗ → {0, 1}∗ is a bijective and 000 computable encoding of two strings into one. We can construct ˜ the encoding in such a way that hdM , xi = dM ⊗ x, i.e. the 00 001 description is encoded into some prefix code that is appended ˜ dM 010 to the left-hand side of x. It follows that U −→ M, and since 0 this works for every M ∈ Ξ, U is Ξ-universal.  01 C 011 Here is a basic property of Kolmogorov and emulation λ E complexity: D 100 Theorem 3.5 (Invariance of Complexities): Let Φ ⊂ Ξ be 10 1 connected, then for every U ∈ ΦU and V ∈ Φ, it holds that 101

KU (D) ≤ KU (V )+ KV (D) for every D ∈ Φ, 11 110 ∗ KU (s) ≤ KU (V )+ KV (s) for every s ∈ {0, 1} . Proof. Since U ∈ ΦU , it holds U −→ V . Let x be a shortest 111 string such that U(x ⊗ t) = V (t) for every t ∈ {0, 1}∗, i.e. |x| = KU (V ). If pD resp. ps are shortest strings such that Fig. 4. Emulation as a tree, with a branching subset Φ (bold lines). pD V −→ D resp. V (ps) = s, then |pD| = KV (D) and |ps| = x⊗pD KV (s), and additionally U −→ D and U(x⊗ps)= s. Thus, U KU (D) ≤ |x ⊗ pD| = |x| + |pD| and KU (s) ≤ |x ⊗ ps| = Proposition 3.7: Let Φ ⊂ Ξ be connected and #Φ ≥ 2, U |x| + |ps|.  then Φ is branching. U x⊗y U x⊗y Suppose some computer C ∈ Ξ emulates another computer Proof. Let C ∈ Φ and C −→ ∈ Φ , then C −→ −→ 10  x   E via the string 10, i.e. C −→ E. We can decompose this D for every D ∈ Φ, and so C −→ −→ D for every D ∈ Φ. 1 into two steps: Let D := C −→, then Moreover, there is some X ∈ Φ such that X −→ C, so in x x particular, X −→ C −→ . Thus, C −→ ∈ ΦU . C −→1 D and D −→0 E.   U   x On the other hand, since #Φ ≥ 2, there are computers Similarly, we can decompose every emulation C −→ D into C,D ∈ ΦU such that C 6= D. By definition of ΦU , there is |x| parts, just by parsing the string x bit by bit, while getting some X ∈ Φ such that X −→ D. Since C emulates every z a corresponding “chain” of emulated computers. A clear way computer in Φ, we have C −→ X, so C −→ D for some to illustrate this situation is in the form of a tree, as shown in z 6= λ.  λ Figure 4. We start at the root λ. Since C −→ C, this string corresponds to the computer C itself. Then, we are free to As illustrated in the bold subtree in Figure 4, we can choose 0 or 1, yielding the computer C −→0 or C −→1 = define the process of a random walk on this subtree if its D respectively. Ending up with D, we can choose  the next bit corresponding computer subset Φ is branching: we start at the 0 10 root λ, follow the branches, and at every bifurcation, we turn (taking a 0 we will end up with E = D −→ = C −→ ) “left or right” (i.e. input an additional 0 or 1) with probability and so on.     1 . This random walk generates a probability distribution on In general, some of the emulated computers will themselves 2 the subtree: be elements of Φ and some not. As in Figure 4, we can mark Definition 3.8 (Path and Computer Probability): If Φ ⊂ Ξ every path that leads to a computer that is itself an element of is branching and let C ∈ Φ, we define the Φ-tree of C as the Φ by a thick line. (In this case, for example C,D,E ∈ Φ, but set of all inputs x ∈{0, 1}∗ that make C emulate a computer C 11 .) If we want to restrict the process of parsing −→ 6∈ Φ in Φ and denote it by C−1(Φ), i.e. through  the tree to the marked (thick) paths, then we need the following property: x C−1(Φ) := x ∈{0, 1}∗ | C −→ ∈ Φ . Definition 3.6: A set of computers Φ ⊂ Ξ is called bran- n   o ching, if for every C ∈ Φ, the following two conditions are To every x in the Φ-tree of C, we can associate its path satisfied: probability µC−1(Φ)(x) as the probability of arriving at x on ∗ • For every x, y ∈{0, 1} , it holds a random walk on this tree. Formally, x⊗y x C −→ ∈ Φ =⇒ C −→ ∈ Φ. µ −1 (λ) := 1,     C (Φ) 1 −1 x −1 ¯ ∗ 2 µC (Φ)(x) if x ⊗ b ∈ C (Φ) • There is some x ∈{0, 1} \{λ} such that C −→ ∈ Φ. µC−1(Φ)(x ⊗ b) :=    µC−1(Φ)(x) otherwise If Φ is branching, we can parse through the corresponding marked subtree without encountering any dead end, with the for every bit b ∈{0, 1} with x⊗b ∈ C−1(Φ), where ¯b denotes possibility to reach every leaf of the subtree. In particular, the inverse bit. The associated n-step computer probability of these requirements are fulfilled by sets of universal computers: D ∈ Φ is defined as the probability of arriving at computer STATIONARY ALGORITHMIC PROBABILITY (FEBRUARY 4, 2009) 6

D on a random walk of n steps on this tree, i.e. The Chapman-Kolmogorov equation follows directly from the

(n) theory of Markov processes. The stochastic matrix EΦ is µC (D|Φ) := µC−1(Φ)(x). irreducible iff for every i, j ∈ N there is some n ∈ N such Xn x (n) x∈{0,1} :C−→D that 0 < ((E )n) = µ (C |Φ), which is equivalent to the Φ i,j Ci j n x It is clear that for Φ = Ξ, we get back the notion of output existence of some x ∈{0, 1} such that Ci −→ Cj .  frequency as given in Definition 2.1: For every C,D ∈ Ξ, it The next proposition collects some relations between the holds emulation Markov process and the corresponding set of com- µ(n)(D)= µ(n)(D|Ξ). C C puters. We assume that the reader is familiar with the basic The condition that Φ shall be branching guarantees that vocabulary from the theory of Markov chains. N Proposition 3.11 (Irreducibility and Aperiodicity): Let x∈{0,1}n∩C−1(Φ) µC−1(Φ)(x)=1 for every n ∈ 0, i.e. the conservationP of probability. For example, the path probability Φ ⊂ Ξ be a set of computers. 1 U U in Figure 4 has values µC−1(Φ)(0) = µC−1(Φ)(1) = 2 , • Φ is irreducible ⇔ Φ=Φ ⇔ Φ ⊂ Φ . 1 1 U U µC−1(Φ)(00) = µC−1(Φ)(01) = 4 , µC−1(Φ)(10) = 2 , • If Φ is connected and #Φ ≥ 2, then Φ is irreducible 1 µC−1(Φ)(001) = µC−1(Φ)(010) = 4 = µC−1(Φ)(100) = and branching. µC−1(Φ)(101). • If Φ is branching, then we can define the period of C ∈ Φ It is almost obvious that the random walk on the subtree N (n) as d(C) := GGT n ∈ µC (C|Φ) > 0 (resp. ∞ if that generates the computer probabilities µ(n)(D|Φ) is a n o C this set is empty). If Φ ⊂ Ξ is irreducible, then d(C)= Markov process, which is the statement of the next lemma. d(D) =: d < ∞ for every C,D ∈ Φ holds true. In this For later reference, we first introduce some notation for the case, d will be called the period of Φ, and if d =1, then corresponding transition matrix: Φ is called aperiodic. Definition 3.9 (Emulation Matrix): Let Φ ⊂ Ξ be branch- Proof. To prove the first equivalence, suppose that Φ ⊂ Ξ ing, and enumerate the computers in Φ in arbitrary order: is irreducible, i.e. for every C,D ∈ Φ it holds C −→ D. Φ = {C , C ,...}. Then, we define the (possibly infinite) 1 2 Thus, Φ is connected and C ∈ ΦU , so Φ ⊂ ΦU , and since emulation matrix E as Φ always ΦU ⊂ Φ, it follows that Φ=ΦU . On the other hand, (E ) := µ(1)(C |Φ). if Φ=ΦU , then for every C,D ∈ Φ it holds C −→ D, since Φ i,j Ci j C ∈ ΦU . Thus, Φ is irreducible. For the second equivalence, Lemma 3.10 (Markovian Emulation Process): If Φ ⊂ Ξ is U U (n) suppose that Φ is irreducible, thus, Φ=Φ ⊂ Φ . If on the branching, then the computer probabilities µC (·|Φ) are n-step other hand Φ ⊂ ΦU , it follows in particular for every C ∈ Φ probabilities of some Markov process (which we also denote that C −→ X for every X ∈ Φ, so Φ is irreducible. Φ) whose transition matrix is given by the emulation matrix For the second statement, let C,X ∈ ΦU be arbitrary. By EΦ. Explicitly, with δi := (0,..., 0, 1 , 0,...), definition of ΦU , it follows that there is some V ∈ Φ such i that V −→ X, and it holds C −→ V , so C −→ X, and ΦU U (n) (n) (n) |{z} n is irreducible. By Proposition 3.7 and #ΦU ≥ #Φ ≥ 2, ΦU µ (C1|Φ),µ (C2|Φ),µ (C3|Φ),... = δi · (EΦ) Ci Ci Ci must be branching. The third statement is well-known from   (5) the theory of Markov processes.  for every n ∈ N0, and we have the Chapman-Kolmogorov equation A basic general result about Markov processes now gives

(m+n) (m) (n) us the desired absolute computer probability - almost, at least: µC (D|Φ) = µC (X|Φ)µX (D|Φ) (6) Theorem 3.12 (Stationary Alg. Computer Probability): Let XX∈Φ Φ ⊂ Ξ be branching, irreducible and aperiodic. Then, for for every m,n ∈ N0. Also, Φ ⊂ Ξ (resp. EΦ) is irreducible if every C,D ∈ Φ, the limit (“computer probability”) and only if C −→ D for every C,D ∈ Φ. µ(D|Φ) := lim µ(n)(D|Φ) Proof. Equation (5) is trivially true for n =0 and is shown in n→∞ C full generality by induction (the proof details are not important exists and is independent of C. There are two possible cases: for the following argumentation and can be skipped): (1) The Markov process which corresponds to Φ is transient (n+1) µ C µ −1 x b or null recurrent. Then, Ci ( j |Φ) = C (Φ)( ⊗ ) x∈{0,1}Xn,b∈{0,1}: x⊗b µ(D|Φ)=0 for every D ∈ Φ. Ci−→Cj

= µC−1(Φ)(x ⊗ b) (2) The Markov process which corresponds to Φ is positive n CXk∈Φ x∈{X0,1} : b∈{X0,1}: recurrent. Then, x b Ci−→Ck C −→C k j µ D > for every D , and µ D . (1) ( |Φ) 0 ∈ Φ ( |Φ)=1 = µ −1 (x)µ (C |Φ) C (Φ) Ck j DX∈Φ n CXk∈Φ x∈{X0,1} : x Ci−→Ck In this case, the vector µΦ := (µ(C1|Φ),µ(C2|Φ),...) is (n) the unique stationary probability eigenvector of EΦ, i.e. = µC (Ck|Φ) (EΦ)k,j . i the unique probability vector solution to µΦ · EΦ = µΦ. CXk∈Φ STATIONARY ALGORITHMIC PROBABILITY (FEBRUARY 4, 2009) 7

i i−1 Note that we have derived this result under quite weak con- by the string 1 , i.e. M1(1⊗11⊗...⊗1 ⊗s⊗x)= M1(1⊗ i−1 i i ditions — e.g. in contrast to classical algorithmic probability, 11 ⊗ ... ⊗ 1 ⊗ 1 ⊗ x), resp. Mi(s ⊗ x)= Mi(1 ⊗ x) for i we do not assume that our computers have prefix-free domain. ∗ s 1 every x ∈ {0, 1} . Thus, Mi −→ = Mi −→ = Mi+1 Nevertheless, we are left with the problem to determine     whether a given set Φ of computers is positive recurrent (case for every 0i 6= s ∈{0, 1}i, and so

(2) given above) or not (case (1)). (i) U −i U µ (Mi+1|Ξ )=1 − 2 for every i ∈ N. The most interesting case is Φ = Ξ , i.e. the set of Mi computers that are universal in the sense that they can simulate Iterated application of the Chapman-Kolmogorov equation (6) every other computer without any restriction. This set is yields for every n ∈ N “large” — apart from universality, we do not assume any (1+2+...+(n−1)) U (1) U (2) U additional property like e.g. being prefix. By Proposition 3.11, µM (Mn|Ξ ) ≥ µM (M2|Ξ ) · µM (M3|Ξ ) U 1 1 2 Ξ is irreducible and branching. Moreover, fix any universal (n−1) U ... · µ (Mn|Ξ ) computer U ∈ ΞU and consider the computer V ∈ Ξ, given Mn−1 n−1 by −i λ if x = λ, = 1 − 2 V (x) := V (s) if x =0 ⊗ s, iY=1   ∞  U s if x s. ( ) =1 ⊗ > 1 − 2−i =0.2887 ... 1  0 iY=1 As V −→ U, we know that V ∈ ΞU , and since V −→ V , it  (1) U With at least this probability, the Markov process correspond- follows that µV (V ) > 0, and so d(V )=1= d(Ξ ). Hence U Ξ is aperiodic. ing to Φ will follow the sequence of computers {Mi}i∈N U So is Ξ positive recurrent or not? Unfortunately, the forever, without ever returning to M1. (Note that also the U 11 answer turns out to be negative: Ξ is transient. The idea to intermediately emulated computers like M1 −→ are different 11 prove this is to construct a sequence of universal computers from M1, since M1(λ) = λ, but M1 −→ (λ) 6= λ.) Thus, M ,M ,M ,... such that each computer M emulates the   1 2 3 i the eventual return probability to M1 is strictly less than 1. next computer Mi+1 with large probability, that is, the prob- ability tends to one as i gets large. Thus, starting the random In this proof, every computer Mi+1 is a modified copy of its walk on, say, M1, it will with positive probability stay on this ancestor Mi. In some sense, M1 can be seen as some kind of Mi-path forever and never return to any other computer. See “computer virus” that undermines the existence of a stationary also Figure 3 in the Introduction for illustration. computer probability. The theorem’s name “Markoff Chaney Theorem 3.13 (Markoff Chaney Virus): Virus” was inspired by a fictitious character in Robert Anton 1 ΞU is transient, i.e. there is no stationary algorithmic computer Wilson’s “Illuminatus!” trilogy . probability on the universal computers. The set ΞU is in some sense too large to allow the existence Proof. Let U ∈ ΞU be an arbitrary universal computer with of stationary algorithmic probability distribution. Yet, there U(λ)=0. We define another computer M1 ∈ Ξ as follows: exist computer sets Φ that are actually positive recurrent and ∗ If some string s ∈{0, 1} is supplied as input, then M1 thus have such a probability distribution; here is an explicit example: • splits the string s into parts s1,s2,...,sk,stail, such that Example 3.14 (A Positive Recurrent Computer Set): Fix s = s1 ⊗ s2 ⊗ ... ⊗ sk ⊗ stail and |si| = i for every an arbitrary string u ∈ {0, 1}∗ with |u| ≥ 2, and let U be 1 ≤ i ≤ k. We also demand that |stail| < k +1 (for a universal computer, i.e. U ∈ ΞU , with the property that it example, if s = 101101101011, then s1 = 1, s2 = 01, emulates every other computer via some string that does not s3 = 101, s4 = 1010 and stail = 11), i contain u as a substring, i.e. • tests if there is any i ∈{1,...,k} such that si =0 (i.e. s contains only zeros). If yes, then M computes and d i 1 ∀D ∈ Ξ ∃d ∈{0, 1}∗ : U −→ D and u not substring of d. outputs U(si+1 ⊗ ... ⊗ sk ⊗ stail) (if there are several i i with si =0 , then it shall take the smallest one). If not, If C ∈ Ξ is any computer, define a corresponding computer k then M1 outputs 1 = 1 ... 1 . Cu,U by Cu,U (x) = U(y) if x = w ⊗ u ⊗ y and y k times does not contain u as a substring, and as Cu,U (x) = C(x) 1 | {z }1⊗11 1⊗11⊗111 otherwise (that is, if x does not contain u). The string u is Let M2 := M1 −→, M3 := M1 −→, M4 := M1 −→ 11+2+...+(n−1) 1i a “synchronizing word” for the computer Cu,U , in the sense and so on, in general Mn := M1 −→ resp. Mi −→ i that any occurrence of u in the input forces Cu,U to “reset” 0 U Mi+1. We also have Mi −→ U, so Mi ∈ Ξ for every and to emulate U. i ∈ N. Thus, the computers Mi are all universal. Also, since 1+...+(i−1) i−1 1“The Midget, whose name was Markoff Chaney, was no relative of the Mi(λ) = M1(1 )=1 , the computers Mi are famous Chaneys of Hollywood, but people did keep making jokes about that. mutually different from each other, i.e. Mi 6= Mj for i 6= j. [...] Damn the science of mathematics itself, the line, the square, the average, s i Now consider the computers Mi −→ for |s| = i, but s 6=0 . It the whole measurable world that pronounced him a bizarre random factor. i−1 Once and for all, beyond fantasy, in the depth of his soul he declared war on holds Mi(s ⊗ x)= M1(1 ⊗ 11 ⊗ ... ⊗ 1 ⊗ s ⊗ x). The only the “statutory ape”, on law and order, on predictability, on negative entropy. property of s that affects the outcome of M1’s computation is He would be a random factor in every equation; from this day forward, unto the property to be different from 0i. But this property is shared death, it would be civil war: the Midget versus the Digits... ” STATIONARY ALGORITHMIC PROBABILITY (FEBRUARY 4, 2009) 8

We get a set of computers IV. SYMMETRIES AND STRING PROBABILITY The aim of this section is twofold: on the one hand, we Φu,U := {Cu,U | C ∈ Ξ}. will derive an alternative proof of the non-existence of a Whenever x does not contain u as a substring, it holds stationary computer probability distribution on ΞU (which x x we have already proved in Theorem 3.13). The benefit of C −→ D ⇒ Cu,U −→ Du,U . this alternative proof will be to generalize our no-go result It follows that V := Uu,U is a universal computer for Φu,U . much further: it will supply us with an interesting physical U U Thus Φu,U is connected,and it is easy to see that Φu,U = Φu,U interpretation why getting rid of machine-dependence must be U U and #Φu,U ≥ 2. According to to Proposition 3.11, Φu,U is impossible. We discuss this in more detail in Section V. irreducible and branching. An argument similar to that before On the other hand, we would like to explore what happens Theorem 3.13 (where it was proved that ΞU is aperiodic) for computer sets Φ that actually are positive recurrent. In U proves that Φu,U is also aperiodic. Moreover, by construction particular, we show that such sets generate a natural algo- U it holds for every computer C ∈ Φu,U and ℓ := |u| rithmic probability on the strings — after all, finding such a (ℓ) U −ℓ probability distribution was our aim from the beginning (cf. µC (V |Φu,U ) ≥ 2 . the Introduction). Actually, this string probability turns out The Chapman-Kolmogorov equation (6) then yields to be useful in proving our no-go generalization. Moreover, it shows that the hard part is really to define computer probability (n+ℓ) U (n) U (ℓ) U µC (V |Φu,U ) = µC (X|Φu,U )µX (V |Φu,U ) — once this is achieved, string probability follows almost X XU ∈Φu,U trivially. −ℓ (n) U −ℓ Here is how we define string probability. While computer ≥ 2 µC (X|Φu,U )=2 . probability µ(C|Φ) was defined as the probability of encoun- X∈XΦU u,U tering C on a random walk on the Φ-tree, we analogously (n) U −ℓ define the probability of a string s as the probability of getting Consequently, lim supn→∞ µC (V |Φu,U ) ≥ 2 . According U the output s on this random walk: to Theorem 3.12, it follows that Φu,U is positive recurrent. In U −|u| U Definition 4.1 (String Probability): Let Φ ⊂ Ξ be branch- particular, µ(V |Φu,U ) ≥ 2 . Note also that #Φu,U = ∞, so we do not have the trivial situation of a finite computer set. ing and let C ∈ Φ. The n-step string probability of s ∈ {0, 1}∗ is defined as the probability of arriving at output s on a random Obviously, the computer set Φu,U in the previous example depends on the choice of the string u and the computer U; walk of n steps on the Φ-tree of C, i.e. different choices yield different computer sets and different (n) µ (s|Φ) := µC−1(Φ)(x). probabilities. In the next section, we will see in Theorem 4.8 C x∈{0,1}n∩CX−1(Φ):C(x)=s that every positive recurrent computer set contains an unavoid- able “amount of arbitrariness”, and this fact has an interesting Theorem 4.2 (Stationary Algorithmic String Probability): physical interpretation. If Φ ⊂ Ξ is positive recurrent, then for every C ∈ Φ and ∗ Given any positive recurrent computer set Φ (as in the previ- s ∈ {0, 1} the limit ous example), the actual numerical values of the corresponding (n) µ(s|Φ) := lim µC (s|Φ) stationary computer probability µ(·|Φ) will in general be n→∞ (0) noncomputable. For this reason, the following lemma may = µ(U|Φ)µU (s|Φ) = µ(U|Φ) be interesting, giving a rough bound on stationary computer UX∈Φ U∈Φ:XU(λ)=s probability in terms of emulation complexity: exists and is independent of C. 2 Lemma 3.15: Let Φ ⊂ Ξ be positive recurrent . Then, for Proof. It is easy to see from the definition of n-step string every C,D ∈ Φ, we have the inequality probability that µ(D|Φ) (n) (n) 2−KC (D) ≤ ≤ 2KD(C). µ (s|Φ) = µ (U|Φ). µ(C|Φ) C C U∈Φ:XU(λ)=s Proof. We start with the limit m → ∞ in the Chapman- Taking the limit n → ∞, Theorem 3.12 yields equality of Kolmogorov equation (6) and obtain left and right hand side, and thus existence of the limit and (n) independence of C.  µ(D|Φ) = µ(U|Φ)µU (D|Φ) UX∈Φ In general, µ(·|Φ) is a probability distribution on {0, 1}∗ (n) ∗ ≥ µ(C|Φ)µC (D|Φ) rather than on {0, 1} , i.e. the undefined string can have

positive probability, µ(∞|Φ) > 0, so ∗ µ(s|Φ) < 1. for every n ∈ N . Next, we specialize n := K (D), then s∈{0,1} 0 C We continue by showing a Chapman-Kolmogorov-like µ(n)(D|Φ) ≥ 2−n. This proves the left hand side of the P C equation (analogous to Equation (6)) for the string probability. inequality. The right hand side can be obtained simply by Note that this equation differs from the much deeper result of interchanging C and D.  Theorem A.6 in the following sense: it describes a weighted average of probabilities µ(n)(s|Φ), and those probabilities do 2In the following, by stating that some computer set Φ ⊂ Ξ is positive U recurrent, we shall always assume that Φ is also branching, irreducible and not only depend on the computer U (as in Theorem A.6), but aperiodic. also on the choice of the subset Φ. STATIONARY ALGORITHMIC PROBABILITY (FEBRUARY 4, 2009) 9

Proposition 4.3 (Chapman-Kolmogorov for String Prob.): Theorem 4.6 (Output Symmetry): Let Φ ⊂ Ξ be positive re- If Φ ⊂ Ξ is positive recurrent, then current and closed3 with respect to some output transformation −1 (m+n) (m) (n) σ and its inverse σ . Then, we have for every C ∈ Φ µC (s|Φ) = µC (U|Φ)µU (s|Φ) UX∈Φ µ(C|Φ) = µ(σ ◦ C|Φ)

∗ ∗ for every C ∈ Φ, m,n ∈ N0 and s ∈ {0, 1} . and for every s ∈{0, 1} ∗ Proof. For x, y ∈ {0, 1} , we use the notation µ(s|Φ) = µ(σ(s)|Φ). 0 if x 6= y δx,y := Proof. Note that Φ = σ ◦ Φ. Let C,D ∈ Φ. Suppose that 1 if x = y b  C −→ D for some bit b ∈{0, 1}. Then, and calculate σ ◦ C(b ⊗ x)= σ(D(x)) = σ ◦ D(x). (m+n) µ (s|Φ) = µC−1 (x) · δs,C x b C (Φ) ( ) Thus, we have σ ◦ C −→ σ ◦ D. It follows for the 1-step x∈{0,1}mX+n∩C−1(Φ) transition probabilities that (m+n) (0) = µC (U|Φ)µU (s|Φ) E µ(1) C UX∈Φ ( Φ)i,j = Ci ( j |Φ) = µ(m)(V |Φ)µ(n)(U|Φ)µ(0)(s|Φ) µ(1) σ C E C V U = σ◦Ci ( ◦ j |Φ)=( σ◦Φ)i,j UX∈Φ VX∈Φ (m) (n) (0) for every i, j. Thus, the emulation matrix EΦ does not change = µC (V |Φ) µV (U|Φ)µU (s|Φ). if every computer C (or rather its number in the list of all VX∈Φ UX∈Φ computers) is exchanged with (the number of) its transformed (n)  computer σ ◦ C yielding the transformed emulation matrix The second sum equals µV (s|Φ) and the claim follows. Eσ◦Φ. But then, EΦ and Eσ◦Φ must have the same unique For prefix computers C, algorithmic probability PC (s) of stationary probability eigenvector any string s as defined in Equation (1) and the expression #Φ #Φ 2−KC (s) differ only by a multiplicative constant [4]. Here is µΦ = (µ(Ck|Φ))k=1 = µσ◦Φ = (µ(σ ◦ Ck|Φ))k=1 . an analogous inequality for stationary string probability: This proves the first identity, while the second identity follows Lemma 4.4: Let Φ ⊂ Ξ be positive recurrent and C ∈ Φ from the calculation some arbitrary computer, then µ(s|Φ) = µ(U|Φ) = µ(σ ◦ U|Φ) −KC (s) ∗ µ(s|Φ) ≥ µ(C|Φ) · 2 for all s ∈{0, 1} . U∈Φ:XU(λ)=s U∈Φ:XU(λ)=s Proof. We start with the limit m → ∞ in the Chapman- = µ(V |Φ) = µ(σ(s)|Φ).  Kolmogorov equation given in Proposition 4.3 and get V ∈σ◦Φ:XV (λ)=σ(s)

(n) Thus, if some computer set Φ ⊂ Ξ contains e.g. for every µ(s|Φ) = µ(U|Φ)µ (s|Φ) U computer C also the computer C¯ which always outputs UX∈Φ (n) the bitwise inverse of C, then µ(s|Φ) = µ(¯s|Φ) holds. In ≥ µ(C|Φ)µC (s|Φ) some sense, this shows that the approach taken in this paper successfully eliminates properties of single computers (e.g. to for every n ∈ N0. Then we specialize n := KC(s) and use (n) prefer the string over ) and leaves only general µ (s|Φ) ≥ 2−n for this choice of n.  10111 01000 C algorithmic properties related to the set of computers. Looking for further properties of stationary string probabil- Moreover, Theorem 4.6 allows for an alternative proof that ity, it seems reasonable to conjecture that, for many computer ΞU and similar computer sets cannot be positive recurrent. We ∗ sets Φ, a string s ∈ {0, 1} (like s = 10111) and its call a set of computable permutations S := {σi}i∈N cyclic inverse s¯ (in this case s¯ = 01000) have the same probability if every string s ∈ {0, 1}∗ is mapped to infinitely many µ(s|Φ) = µ(¯s|Φ), since both seem to be in some sense other strings by application of finite compositions of those algorithmically equivalent. A general approach to prove such permutations, i.e. if for every s ∈{0, 1}∗ conjectures is to study output transformations: # {σ ◦ σ ◦ ... ◦ σ (s) | N ∈ N, i ∈ N} = ∞, Definition 4.5 (Output Transformation σ): i1 i2 iN n Let σ : {0, 1}∗ → {0, 1}∗ be a computable permutation. For and if S contains with each permutation σ also its inverse σ−1. every C ∈ Ξ, the map σ ◦ C is itself a computer, defined by Then, many computer subset cannot be positive recurrent: σ ◦ C(x) := σ(C(x)). The map C 7→ σ ◦ C will be called an Theorem 4.7 (Output Symmetry and Positive Recurrence): output transformation and will also be denoted σ. Moreover, Let Φ ⊂ Ξ be closed with respect to a cyclic set of output for computer sets Φ ⊂ Ξ, we use the notation transformations, then Φ is not positive recurrent. Proof. Suppose Φ is positive recurrent. Let S := {σ } N σ σ C C . i i∈ ◦ Φ := { ◦ | ∈ Φ} be the corresponding cyclic set of output transformations. Let Under reasonable conditions, string and computer probability 3We say that a computer set Φ ⊂ Ξ is closed with respect to some are invariant with respect to output transformations: transformation T : Ξ → Ξ if Φ ⊃ T (Φ) := {T (C) | C ∈ Φ}. STATIONARY ALGORITHMIC PROBABILITY (FEBRUARY 4, 2009) 10 s ∈{0, 1}∗ be an arbitrary string, then for every composition the approach of this paper nor in any other similar but different

σ := σi1 ◦ ... ◦ σiN , we have by Theorem 4.6 approach. To understand why this is true, recall that the main reason for our no-go result was the symmetry of computer µ(s|Φ) = µ(σ(s)|Φ). probability with respect to output transformations C 7→ σ ◦C, Since S is cyclic, there are infinitely many such transfor- where σ is a computable permutation on the strings. This can mations σ, producing infinitely many strings σ(s) which all be seen in two places: have the same probability. It follows that µ(s|Φ) = 0. Since • In Theorem 4.7, this symmetry yields the result that any s ∈{0, 1}∗ was arbitrary, this is a contradiction.  computer set which is “too large” (like ΞU ) cannot be Again, we conclude that ΞU is not positive recurrent, positive recurrent. since this computer set is closed with respect to all output • Theorem 4.8 states that if a set Φ is positive recurrent, transformations. then σ ◦ Φ must be positive recurrent, too. Since in this Although ΞU is not positive recurrent, there might be a case Φ 6= σ ◦ Φ for many σ, this means that there cannot unique, natural, “maximal” or “most interesting” subset Φ ⊂ Ξ be a unique “natural” choice of the computer set Φ. which is positive recurrent. What can we say about this idea? In fact, the following theorem says that this is also Output transformations have a natural physical interpretation impossible. As this theorem is only a simple generalization as “renaming the objects that the strings are describing”. of Theorem 4.6, we omit the proof. To see this, suppose we want to define the complexity of Theorem 4.8 (Non-Uniqueness): If Φ ⊂ Ξ is positive re- the microstate of a box of gas in thermodynamics (this can current, then so is σ ◦ Φ for every computable permutation sometimes be useful, see [4]). Furthermore, suppose we are (=output transformation) σ. Moreover, only interested in a coarse-grained description such that there are only countably many possibilities what the positions, µ(C|Φ) = µ(σ ◦ C|σ ◦ Φ) velocities etc. of the gas particles might look like. Then, we can encode every microstate into a binary string, and for every C ∈ Φ, and define the complexity of a microstate as the complexity of the µ(s|Φ) = µ(σ(s)|σ ◦ Φ) corresponding string (assuming that we have fixed an arbitrary complexity measure K on the strings). for every s ∈ {0, 1}∗. This means that there cannot be a unique “natural” positive But there are always many different possibilities how to recurrent computer set Φ: for every such set Φ, there exist encode the microstate into a string (specifying the velocities output transformations σ such that σ ◦ Φ 6= Φ (this follows in different data formats, specifying first the positions and then from Theorem 4.7). But then, Theorem 4.8 proves that σ ◦ Φ the velocities or the other way round etc.). If every encoding is positive recurrent, too — and it is thus another candidate is supposed to be one-to-one and can be achieved by some for the “most natural” computer set. machine, then two different encodings will always be related to each other by a computable permutation.

V. CONCLUSIONS AND INTERPRETATION In more detail, if one encoding e1 maps microstates m to encoded strings e m , ∗, then another encoding e We have studied a natural approach to get rid of machine- 1( ) ∈ {0 1} 2 dependence in the definition of algorithmic probability. The will map microstates m to e2(m) = σ(e1(m)), where σ is a computable permutation on the strings (that depends on e and idea was to look at a Markov process of universal computers 1 e ). Choosing encoding e , a microstate m will be assigned emulating each other, and to take the stationary distribution as 2 1 the complexity K e m , while for encoding e , it will be a natural probability measure on the computers. ( 1( )) 2 This approach was only partially successful: as the cor- assigned the complexity K(σ ◦ e1(m)). That is, there is an responding Markov process on the set of all computers is unavoidable ambiguity which arises from the arbitrary choice of an encoding scheme. Switching between the two encodings not positive recurrent and thus has no unique stationary amounts to “renaming” the microstates, and this is exactly an distribution, one has to choose a subset Φ of the computers, which introduces yet another source of ambiguity. output transformation in the sense of this paper. However, we have shown (cf. Example 3.14) that there exist Even if we do not have the situation that the strings shall non-trivial, infinite sets Φ of computers that are actually posi- describe physical objects, we encounter a similar ambiguity tive recurrent and possess a stationary algorithmic probability already in the definition of a computer: a computer, i.e. a distribution. This distribution has beautiful properties and partial recursive function, is described by a Turing machine eliminates at least some of the machine-dependence arising computing that function. Whenever we look at the output from choosing a single, arbitrary universal computer as a of a Turing machine, we have to “read” the output from reference machine (e.g. Theorem 4.6). It gives probabilities the machine’s tape which can potentially be done in several for computers as well as for strings (Theorem 4.2), agrees inequivalent ways, comparable to the different “encodings” with the average output frequency (Theorem A.6), and does described above. not assume that the computers have any specific structural Every kind of attempt to get rid of those additive constants property like e.g. being prefix-free. in Kolmogorov complexity will have to face this ambiguity of The second main result can be stated as follows: There is no “renaming”. This is why we think that all those attempts must way to get completely rid of machine-dependence, neither in fail. STATIONARY ALGORITHMIC PROBABILITY (FEBRUARY 4, 2009) 11

(n) (n) APPENDIX thus, the definition µ ([D]k|Φ) := µ (D|Φ) [C]k D∈[D]k C STRING PROBABILITYISTHE WEIGHTED AVERAGE OF makes sense for n ∈ N and [C]k, [D]k ∈ P[Φ]k and is indepen- OUTPUT FREQUENCY dent of the choice of the representative C ∈ [C]k. Enumerating This appendix is rather technical and can be skipped on the equivalence classes [Φ]k = {[C1]k, [C2]k, [C3]k,...} in first reading. Its aim is to prove Theorem A.6. This theorem arbitrary order, we can define an associated emulation matrix says that the string probability which has been introduced in EΦ,k as (1) (E ,k) := µ ([Cj ]k|Φ). Definition 4.1 in Section IV is exactly what we really wanted Φ i,j [Ci]k to have from the beginning: in the introduction, our main It is easily checked that if Φ is positive recurrent, then motivation to find a probability measure on the computers the Markov process described by the transition matrix EΦ,k was to define machine-independent algorithmic probability of must also be irreducible, aperiodic and positive recurrent, strings as the weighted mean over all universal computers and µ ,k := (µ([C ]k|Φ),µ([C ]k|Φ),µ([C ]k|Φ),...) is the as stated in Equation (3). Theorem A.6 says that string Φ 1 2 3 unique probability vector solution to the equation µ ,k ·E ,k = probability can be written exactly in this way, given some Φ Φ µΦ,k. natural assumptions on the reference set of computers. Now we can define input transformations: Note that Theorem A.6 is a surprising result for the fol- Definition A.2 (Input Transformation Iσ): lowing reason: string probability, as defined in Definition 4.1, Let σ : {0, 1}n → {0, 1}n be a permutation such that there only depends on the outputs of the computers on the “universal is at least one string x ∈ {0, 1}n for which x 6= σ(x) , subtree”, that is, on the leaves in Figure 4 which correspond 1 1 where x denotes the first bit of x. For every s ∈ {0, 1}∗, to bold lines. But output frequency, as given on the right-hand 1 let Iσ(s) be the string that is generated by applying σ to the side in Theorem A.6 and defined in Definition 2.1, counts last n bits of s (e.g. if n = 1, σ(1) = 0 and s = 1011, then the outputs on all leaves — that is, output frequency is a Iσ(s) = 1010). If |s|

[Φ]k := {|C]k | C ∈ Φ}. The action of an input transformation is depicted in Fig- ure 5: Changing e.g. the last bit of the input causes a permu- A computer set Φ ⊂ Ξ is called complete if for every C ∈ Φ tation of the outputs corresponding to neighboring branches. and k ∈ N it holds [C] ⊂ Φ. If Φ ⊂ Ξ is positive recurrent k As long as Φ is complete and closed with respect to that input and complete, we set for every [C] ∈ [Φ] k k transformation, the emulation structure will not be changed. This is a byproduct of the proof of the following theorem: µ([C]k|Φ) := µ(C|Φ). Theorem A.3 (Input Symmetry): Let Φ ⊂ Ξ be positive CX∈[C]k recurrent, complete and closed with respect to an input trans- It is easy to see that for every C,D ∈ Ξ it holds formation Iσ. Then, for every k ≥ |σ| k s k s ∗ C ∼ D ⇔ C −→ ∼ D −→ for every s ∈{0, 1} , µ([C]k|Φ) = µ([Iσ (C)]k|Φ). h    i STATIONARY ALGORITHMIC PROBABILITY (FEBRUARY 4, 2009) 12

0 Proof. Suppose that [C]k −→ [C0]k, i.e. C(0 ⊗ x) = C0(x) computer classes. We define the corresponding equivalence for every |x|≥ k, C ∈ [C]k and C0 ∈ [C0]k. As |σ|≤ k, classes (“transformation classes”) as

(Iσ(C)) (0 ⊗ x) = C(Iσ(0 ⊗ x)) = C(0 ⊗ Iσ(x)) {V }k := {[X]k ∈ [Φ]k | ∃Iσ : |σ|≤ k, [Iσ(V )]k = [X]k} . = C0(Iσ (x)) = Iσ(C0)(x), Thus, two computer classes [X]k and [Y ]k are elements of the 0 1 so [Iσ(C)]k −→ [Iσ(C0)]k. Analogously, from [C]k −→ same transformation class if one is an input transformation 1 (of order less than k) of the other. Again, we set {Φ}k := [C1]k it follows that [Iσ(C)]k −→ [Iσ(C1)]k and vice versa. Thus, {{X}k | X ∈ Φ}. (n) (1) (1) For every X ∈ [X]n, the probability µX (s|Φ) is the same (EΦ,k) = µ ([Cj ]k|Φ) = µ ([Iσ(Cj )]k|Φ). (n) i,j [Ci]k [Iσ(Ci)]k and can be denoted µ (s|Φ). According to Proposition 4.3, [X]n So interchanging every equivalence class of computers with we have its transformed class leaves the emulation matrix invariant. A (n) µ(s|Φ) = µ([Y ]n|Φ)µ (s|Φ). similar argument as in Theorem 4.6 proves the claim.  [Y ]n {X}Xn∈{Φ}n [Y ]nX∈{X}n We are now heading towards an analogue of Equation (3), Due to Theorem A.3, the probability µ([Y ] |Φ) is the same for i.e. towards a proof that our algorithmic string probability n every [Y ] ∈ {X} . Let [X] be an arbitrary representative equals the weighted average of output frequency. This needs n n n of {X} , then some preparation: n Definition A.4 (Input Symmetry Group): Let I be an input σ µ({X}n|Φ) := µ([Y ]n|Φ)=#{X}n · µ([X]n|Φ). transformation of order n ∈ N. A computer C ∈ Ξ is [Y ]nX∈{X}n called Iσ-symmetric if Iσ(C) = C (which is equivalent to [Iσ(C)]n = [C]n). The input symmetry group of C is defined The two equations yield as µ({X} |Φ) µ(s|Φ) = n µ(n) (s|Φ). [Y ]n I− SYM(C) := {Iσ input transformation | Iσ(C)= C}. #{X}n {X}Xn∈{Φ}n [Y ]nX∈{X}n Every transformation of order n ∈ N can also be interpreted n Let S2n be the set of all permutations on {0, 1} . Two as a transformation on {0, 1}N for N >n, by setting permutations σ1, σ2 ∈ S2n are called Φ-equivalent if there exists a σ ∈ I−SYM(Φ) such that σ = σ◦σ (recall that Φ is σ(x1⊗x2⊗...⊗xN ) := (x1⊗...xN−n)⊗σ(xN−n+1,...,xN ) 1 2 irreducible). This is the case if and only of Iσ1 (C)= Iσ2 (C) whenever xi ∈ {0, 1}. With this identification, I− SYM(C) for one and thus every computer C ∈ Φ. The set of all Φ- is a group. equivalence classes will be denoted Sn(Φ). Every computer Proposition A.5 (Input Symmetry and Irreducibility): Let class [Y ]n ∈ {X}n is generated from [X]n by some input Φ ⊂ Ξ be irreducible. Then I − SYM(C) is the same for transformation. If X is an arbitrary representative of [X]n, we every C ∈ Φ and can be denoted I− SYM(Φ). thus have Proof. Let Φ ⊂ Ξ be irreducible, and let C ∈ Φ be Iσ- ∗ µ({X}n|Φ) (n) symmetric, i.e. C(Iσ(s)) = C(s) for every s ∈ {0, 1} . Let µ(s|Φ) = µ[I (X)] (s|Φ), #{X}n σ n D ∈ Φ be an arbitrary computer. Since Φ is irreducible, it {X}Xn∈{Φ}n [σ]∈XSn(Φ) holds C −→ D, i.e. there is a string x ∈{0, 1}∗ with C(x ⊗ where σ ∈ [σ] is an arbitrary representative. For every equiv- s)= D(s) for every s ∈{0, 1}∗. Let |s| ≥ |σ|, then alence class [σ], it holds true #[σ] = #(S2n ∩I− SYM(Φ)), D(s)= C(x⊗s)= C(Iσ(x⊗s)) = C(x⊗Iσ(s)) = D(Iσ(s)) thus

 µ({X}n|Φ) 1 and D is also Iσ-symmetric. µ(s|Φ) = · #{X}n #(I−SYM(Φ)∩S2n ) For most irreducible computer sets like Φ = ΞU , the {X}Xn∈{Φ}n input symmetry group will only consist of the identity, i.e. µ(n) (s|Φ). σ∈S2n [Iσ(X)]n I− SYM(Φ) = {Id}. S P Now we are ready to state the most interesting result of this By definition of the set n(Φ), section: n #{X} · #(I−SYM(Φ) ∩ S n ) = #S n = (2 )!. Theorem A.6 (Equivalence of Definitions): If Φ ⊂ Ξ is n 2 2 positive recurrent, complete and closed with respect to every Using that #{X}n = #Sn(Φ), we obtain input transformation Iσ with |σ|≤ n ∈ N0, then µ({X}n|Φ) (n) (n) µ(s|Φ) = µ (s|Φ) ∗ n Iσ(X) µ(s|Φ) = µ(U|Φ)µU (s) for every s ∈ {0, 1} , (2 )! {X}Xn∈{Φ}n σX∈S2n UX∈Φ µ({X}n|Φ) (n) = where µU (s) is the output frequency as introduced in Defi- n (2 )! S nition 2.1. {X}Xn∈{Φ}n σX∈ 2n

−1 Proof. The case n = 0 is trivial, so let n ≥ 1. It is δIσ(X)(x),s µX (Φ)(x). convenient to introduce another equivalence relation on the x∈{X0,1}n STATIONARY ALGORITHMIC PROBABILITY (FEBRUARY 4, 2009) 13

As |x| = n ≥ |σ| it holds Iσ(X)(x)= X(Iσ(x)) = X(σ(x)). The substitution y := σ(x) yields

µ({X}n|Φ) µ(s|Φ) = n δX(y),s (2 )! n {X}Xn∈{Φ}n y∈{X0,1} −1 µX−1(Φ)(σ (y)).

σX∈S2n Up to normalization, the rightmost sum is the average of all permutations of the probability vector µX−1(Φ), thus

1 −1 −n µ −1 (σ (y))=2 . (2n)! X (Φ) σX∈S2n Recall that X was an arbitrary representative of an arbitrary representative of {X}n. The last two equations yield −n µ(s|Φ) = µ({X}n|Φ) δX(y),s2 n {X}Xn∈{Φ}n y∈{X0,1} (n) = µ([X]n|Φ)µx (s)

{X}Xn∈{Φ}n [X]nX∈{X}n (n) = µ(X|Φ)µx (s)

{X}Xn∈{Φ}n [X]nX∈{X}n X∈X[X]n (n) = µ(X|Φ)µX (s). XX∈Φ Note that if X and Y are representatives of representatives (n) of an arbitrary transformation class {X}n, then µX (s) = (n)  µY (s). This theorem is the promised analogue of Equation (3): it shows that the string probability that we have defined in Definition 4.1 is the weighted average of output frequency as defined in Definition 2.1. For a discussion why this is interesting and surprising, see the first few paragraphs of this appendix.

ACKNOWLEDGMENTS The author would like to thank N. Ay, D. Gross, S. Guttenberg, M. Ioffe, T. Kr¨uger, D. Schleicher, F.-J. Schmitt, R. Siegmund-Schultze, R. Seiler, and A. Szkoła for helpful discussions and kind support.

REFERENCES [1] R. J. Solomonoff, “A preliminary report on a general theory of inductive inference”, Tech. Rept. ZTB-138, Zator Company, Cambridge, Mass., 1960. [2] A. K. Zvonkin, L. A. Levin, “The complexity of finite objects and the development of the concepts of information and by means of the theory of algorithms”, Russian Math. Surveys, 25/6, pp.83-124, 1970. [3] G. J. Chaitin, “A Theory of Program Size Formally Identical to Informa- tion Theory”, J. Assoc. Comput. Mach., vol. 22, pp.329-340, 1975. [4] M. Li and P. Vit´anyi, An Introduction to Kolmogorov Complexity and Its Applications, Springer, New York, 1997. [5] R. J. Solomonoff, “The Discovery of Algorithmic Probability”, Journal of Computer and System Sciences, vol. 55/1, pp. 73-88, 1997. [6] M. Hutter, Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability, Springer, Berlin, 2005. [7] R. Schack, “Algorithmic Information and Simplicity in Statistical Physics”, Int. J. Theor. Phys. 36, pp. 209-226, 1997. [8] N. J. Hay, “Universal Semimeasures: An Introduction”, CDMTCS Re- search Report 300, pp. 87-88, 2007. [9] G. J. Chaitin, Algorithmic information theory, Cambridge University Press, Cambridge, 1987.