Towards a Universal Theory of Artificial Intelligence Based on Algorithmic Probability and Sequential Decisions

Marcus Hutter - 1 - Universal Artificial Intelligence Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decisions Marcus Hutter Istituto Dalle Molle di Studi sull’Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano, Switzerland [email protected], http://www.idsia.ch/»marcus 2000 – 2002 Decision Theory = Probability + Utility Theory + + Universal Induction = Ockham + Epicur + Bayes = = Universal Artificial Intelligence without Parameters Marcus Hutter - 2 - Universal Artificial Intelligence Table of Contents ² Sequential Decision Theory ² Iterative and functional AI¹ Model ² Algorithmic Complexity Theory ² Universal Sequence Prediction ² Definition of the Universal AI» Model ² Universality of »AI and Credit Bounds ² Sequence Prediction (SP) ² Strategic Games (SG) ² Function Minimization (FM) ² Supervised Learning by Examples (EX) ² The Timebounded AI» Model ² Aspects of AI included in AI» ² Outlook & Conclusions Marcus Hutter - 3 - Universal Artificial Intelligence Overview ² Decision Theory solves the problem of rational agents in uncertain worlds if the environmental probability distribution is known. ² Solomonoff’s theory of Universal Induction solves the problem of sequence prediction for unknown prior distribution. ² We combine both ideas and get a parameterless model of Universal Artificial Intelligence. Decision Theory = Probability + Utility Theory + + Universal Induction = Ockham + Epicur + Bayes = = Universal Artificial Intelligence without Parameters Marcus Hutter - 4 - Universal Artificial Intelligence Preliminary Remarks ² The goal is to mathematically define a unique model superior to any other model in any environment. ² The AI» model is unique in the sense that it has no parameters which could be adjusted to the actual environment in which it is used. ² In this first step toward a universal theory we are not interested in computational aspects. ² Nevertheless, we are interested in maximizing a utility function, which means to learn in as minimal number of cycles as possible. The interaction cycle is the basic unit, not the computation time per unit. Marcus Hutter - 5 - Universal Artificial Intelligence The Cybernetic or Agent Model 0 0 0 0 0 0 r1 x r2 x r3 x r4 x r5 x r6 x ... 1 2 3 Pi4 5 6 PP PP ) PP Agent Environ¡ working tape ... working tape ... p ment q 1P PP PP PPq y1 y2 y3 y4 y5 y6 ... - p:X¤ !Y ¤ is cybernetic system / agent with chronological function / Turing machine. - q :Y ¤ !X¤ is deterministic computable (only partially accessible) chronological environment. Marcus Hutter - 6 - Universal Artificial Intelligence The Agent-Environment Interaction Cycle for k:=1 to T do - p thinks/computes/modifies internal state. - p writes output yk 2Y . - q reads output yk. - q computes/modifies internal state. - q writes reward/utlitity input rk :=r(xk)2R. 0 0 - q write regular input xk 2X . 0 - p reads input xk :=rkxk 2X. endfor - T is lifetime of system (total number of cycles). - Often R=f0; 1g=fbad; goodg=ferror; correctg. Marcus Hutter - 7 - Universal Artificial Intelligence Utility Theory for Deterministic Environment The (system,environment) pair (p,q) produces the unique I/O sequence pq pq pq pq pq pq !(p; q) := y1 x1 y2 x2 y3 x3 ::: Total reward (value) in cycles k to m is defined as pq pq Vkm(p; q) := r(xk ) + ::: + r(xm ) Optimal system is system which maximizes total reward best p := maxarg V1T (p; q) p + best VkT (p ; q) ¸ VkT (p; q) 8p Marcus Hutter - 8 - Universal Artificial Intelligence Probabilistic Environment Replace q by a probability distribution ¹(q) over environments. The total expected reward in cycles k to m is 1 X V ¹ (pjy˙x˙ ) := ¹(q) ¢ V (p; q) km <k N km q:q(y ˙<k)=x ˙ <k The history is no longer uniquely determined. y˙x˙ <k :=y ˙1x˙ 1:::y˙k¡1x˙ k¡1 :=actual history. AI¹ maximizes expected future reward by looking hk ´mk ¡k+1 cycles ahead (horizon). For mk =T , AI¹ is optimal. y˙ := maxarg max V ¹ (pjy˙x˙ ) k kmk <k yk p:p(x ˙ <k)=y ˙<kyk Environment responds with x˙ k with probability determined by ¹. This functional form of AI¹ is suitable for theoretical considerations. The iterative form (next section) is more suitable for ’practical’ purpose. Marcus Hutter - 9 - Universal Artificial Intelligence Iterative AI¹ Model The probability that the environment produces input xk in cycle k under the condition that the history is y1x1:::yk¡1xk¡1yk is abbreviated by ¹(yx<kyxk) ´ ¹(y1x1:::yk¡1xk¡1ykxk) With Bayes’ Rule, the probability of input x1:::xk if system outputs y1:::yk is ¹(y1x1:::ykxk) = ¹(yx1)¢¹(yx1yx2)¢ ::: ¢¹(yx<kyxk) Underlined arguments are probability variables, all others are conditions. ¹ is called a chronological probability distribution. Marcus Hutter - 10 - Universal Artificial Intelligence Iterative AI¹ Model The Expectimax sequence/algorithm: Take reward expectation over the xi and maximum over the yi in chronological order to incorporate correct dependency of xi and yi on the history. X X X best Vkm (y ˙x˙ <k) = max max ::: max yk yk+1 ym xk xk+1 xm (r(xk)+ ::: +r(xm))¢¹(y ˙x˙ <kyxk:m) X X X y˙k = maxarg max ::: max y y yk k+1 mk xk xk+1 xmk (r(x )+ ::: +r(x ))¢¹(y ˙x˙ yx ) k mk <k k:mk This is the essence of Decision Theory. Decision Theory = Probability + Utility Theory Marcus Hutter - 11 - Universal Artificial Intelligence Functional AI¹ ´ Iterative AI¹ The functional and iterative AI¹ models behave identically with the natural identification X ¹(yx1:k) = ¹(q) q:q(y1:k)=x1:k Remaining Problems: ² Computational aspects. ² The true prior probability is usually not (even approximately not) known. Marcus Hutter - 12 - Universal Artificial Intelligence Limits we are interested in 1 ¿ hl(ykxk)i ¿ k ¿ T ¿ jY £ Xj a b c d 1 ¿ 216 ¿ 224 ¿ 232 ¿ 265536 (a) The agents interface is wide. (b) The interface can be sufficiently explored. (c) The death is far away. (d) Most input/outputs do not occur. These limits are never used in proofs but ... ... we are only interested in theorems which do not degenerate under the above limits. Marcus Hutter - 13 - Universal Artificial Intelligence Induction = Predicting the Future Extrapolate past observations to the future, but how can we know something about the future? Philosophical Dilemma: ² Hume’s negation of Induction. ² Epicurus’ principle of multiple explanations. ² Ockhams’ razor (simplicity) principle. ² Bayes’ rule for conditional probabilities. Given sequence x1:::xk¡1 what is the next letter xk? If the true prior probability ¹(x1:::xn) is known, and we want to minimize the number of prediction errors, then the optimal scheme is to predict the xk with highest conditional ¹ probability, i.e. maxarg ¹(x y ). yk <k k Solomonoff solved the problem of unknown prior ¹ by introducing a universal probability distribution » related to Kolmogorov Complexity. Marcus Hutter - 14 - Universal Artificial Intelligence Algorithmic Complexity Theory The Kolmogorov Complexity of a string x is the length of the shortest (prefix) program producing x. K(x) := minfl(p): U(p) = xg ;U = univ.TM p The universal semimeasure is the probability that output of U starts with x when the input is provided with fair coin flips X »(x) := 2¡l(p) [Solomonoff 64] p : U(p)=x¤ Universality property of »: » dominates every computable probability distribution £ »(x) ¸ 2¡K(½) ¢½(x) 8½ Furthermore, the ¹ expected squared distance sum between » and ¹ is finite for computable ¹ X1 X 2 + ¹(x<k)(»(x<kxk) ¡ ¹(x<kxk)) < ln 2 ¢ K(¹) k=1 x1:k [Solomonoff 78] for binary alphabet Marcus Hutter - 15 - Universal Artificial Intelligence Universal Sequence Prediction n!1 ) »(x<nxn) ¡! ¹(x<nxn) with ¹ probability 1. ) Replacing ¹ by » might not introduce many additional prediction errors. General scheme: Predict xk with prob. ½(x<kxk). This includes deterministic predictors as well. Number of expected prediction errors: P P E := n ¹(x )(1¡½(x x )) n½ k=1 x1:k 1:k <k k Θ» predicts xk of maximal »(x<kxk). ¡1=2 n!1 EnΘ» =En½ · 1 + O(En½ ) ¡! 1 [Hutter 99] Θ» is asymptotically optimal with rapid convergence. For every (passive) game of chance for which there exists a winning strategy, you can make money by using Θ» even if you don’t know the underlying probabilistic process/algorithm. Θ» finds and exploits every regularity. Marcus Hutter - 16 - Universal Artificial Intelligence Definition of the Universal AI» Model Universal AI = Universal Induction + Decision Theory Replace ¹AI in decision theory model AI¹ by an appropriate generalization of » . X ¡l(q) »(yx1:k) := 2 q:q(y1:k)=x1:k X X X y˙k = maxarg max ::: max y y yk k+1 mk xk xk+1 xmk (r(x )+ ::: +r(x ))¢»(y ˙x˙ yx ) k mk <k k:mk Functional form: ¹(q) ,! »(q):=2¡l(q). Bold Claim: AI» is the most intelligent environmental independent agent possible. Marcus Hutter - 17 - Universal Artificial Intelligence Universality of »AI The proof is analog as for sequence prediction. Inputs yk are pure spectators. £ ¡K(½) »(yx1:n) ¸ 2 ½(yx1:n) 8 chronological ½ Convergence of »AI to ¹AI yi are again pure spectators. To generalize SP case to arbitrary alphabet we need jXj jXj jXj jXj X X y X X (y ¡z )2 · y ln i for y = 1 ¸ z i i i z i i i=1 i=1 i i=1 i=1 AI n!1 AI ) » (yx<nyxn) ¡! ¹ (yx<nyxn) with ¹ prob. 1. Does replacing ¹AI with »AI lead to AI» system with asymptotically optimal behaviour with rapid convergence? This looks promising from the analogy with SP! Marcus Hutter - 18 - Universal Artificial Intelligence Value Bounds (Optimality of AI») Naive reward bound analogously to error bound for SP ¹ ¤ ¹ V1T (p ) ¸ V1T (p) ¡ o(:::) 8¹; p Problem class (set of environments) f¹0; ¹1g with Y =V = f0; 1g and rk = ±iy1 in environment ¹i violates reward bound. The first output y1 decides whether all future rk =1 or 0. Alternative: ¹ probability Dn¹»=n of suboptimal outputs of AI» different from AI¹ in the first n cycles tends to zero Xn Dn¹»=n ! 0 ;Dn¹» :=h 1¡± ¹ » i¹ y˙k ;y˙k k=1 This is a weak asymptotic convergence claim.

Towards a Universal Theory of Artificial Intelligence Based on Algorithmic Probability and Sequential Decisions

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support