Towards a Universal Theory of Artificial Intelligence Based on Algorithmic Probability and Sequential Decisions

Marcus Hutter - 1 - Universal Artificial Intelligence Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decisions Marcus Hutter Istituto Dalle Molle di Studi sull’Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano, Switzerland [email protected], http://www.idsia.ch/»marcus 2000 – 2002 Decision Theory = Probability + Utility Theory + + Universal Induction = Ockham + Epicur + Bayes = = Universal Artificial Intelligence without Parameters Marcus Hutter - 2 - Universal Artificial Intelligence Table of Contents ² Sequential Decision Theory ² Iterative and functional AI¹ Model ² Algorithmic Complexity Theory ² Universal Sequence Prediction ² Definition of the Universal AI» Model ² Universality of »AI and Credit Bounds ² Sequence Prediction (SP) ² Strategic Games (SG) ² Function Minimization (FM) ² Supervised Learning by Examples (EX) ² The Timebounded AI» Model ² Aspects of AI included in AI» ² Outlook & Conclusions Marcus Hutter - 3 - Universal Artificial Intelligence Overview ² Decision Theory solves the problem of rational agents in uncertain worlds if the environmental probability distribution is known. ² Solomonoff’s theory of Universal Induction solves the problem of sequence prediction for unknown prior distribution. ² We combine both ideas and get a parameterless model of Universal Artificial Intelligence. Decision Theory = Probability + Utility Theory + + Universal Induction = Ockham + Epicur + Bayes = = Universal Artificial Intelligence without Parameters Marcus Hutter - 4 - Universal Artificial Intelligence Preliminary Remarks ² The goal is to mathematically define a unique model superior to any other model in any environment. ² The AI» model is unique in the sense that it has no parameters which could be adjusted to the actual environment in which it is used. ² In this first step toward a universal theory we are not interested in computational aspects. ² Nevertheless, we are interested in maximizing a utility function, which means to learn in as minimal number of cycles as possible. The interaction cycle is the basic unit, not the computation time per unit. Marcus Hutter - 5 - Universal Artificial Intelligence The Cybernetic or Agent Model 0 0 0 0 0 0 r1 x r2 x r3 x r4 x r5 x r6 x ... 1 2 3 Pi4 5 6 PP PP ) PP Agent Environ¡ working tape ... working tape ... p ment q 1P PP PP PPq y1 y2 y3 y4 y5 y6 ... - p:X¤ !Y ¤ is cybernetic system / agent with chronological function / Turing machine. - q :Y ¤ !X¤ is deterministic computable (only partially accessible) chronological environment. Marcus Hutter - 6 - Universal Artificial Intelligence The Agent-Environment Interaction Cycle for k:=1 to T do - p thinks/computes/modifies internal state. - p writes output yk 2Y . - q reads output yk. - q computes/modifies internal state. - q writes reward/utlitity input rk :=r(xk)2R. 0 0 - q write regular input xk 2X . 0 - p reads input xk :=rkxk 2X. endfor - T is lifetime of system (total number of cycles). - Often R=f0; 1g=fbad; goodg=ferror; correctg. Marcus Hutter - 7 - Universal Artificial Intelligence Utility Theory for Deterministic Environment The (system,environment) pair (p,q) produces the unique I/O sequence pq pq pq pq pq pq !(p; q) := y1 x1 y2 x2 y3 x3 ::: Total reward (value) in cycles k to m is defined as pq pq Vkm(p; q) := r(xk ) + ::: + r(xm ) Optimal system is system which maximizes total reward best p := maxarg V1T (p; q) p + best VkT (p ; q) ¸ VkT (p; q) 8p Marcus Hutter - 8 - Universal Artificial Intelligence Probabilistic Environment Replace q by a probability distribution ¹(q) over environments. The total expected reward in cycles k to m is 1 X V ¹ (pjy˙x˙ ) := ¹(q) ¢ V (p; q) km <k N km q:q(y ˙<k)=x ˙ <k The history is no longer uniquely determined. y˙x˙ <k :=y ˙1x˙ 1:::y˙k¡1x˙ k¡1 :=actual history. AI¹ maximizes expected future reward by looking hk ´mk ¡k+1 cycles ahead (horizon). For mk =T , AI¹ is optimal. y˙ := maxarg max V ¹ (pjy˙x˙ ) k kmk <k yk p:p(x ˙ <k)=y ˙<kyk Environment responds with x˙ k with probability determined by ¹. This functional form of AI¹ is suitable for theoretical considerations. The iterative form (next section) is more suitable for ’practical’ purpose. Marcus Hutter - 9 - Universal Artificial Intelligence Iterative AI¹ Model The probability that the environment produces input xk in cycle k under the condition that the history is y1x1:::yk¡1xk¡1yk is abbreviated by ¹(yx<kyxk) ´ ¹(y1x1:::yk¡1xk¡1ykxk) With Bayes’ Rule, the probability of input x1:::xk if system outputs y1:::yk is ¹(y1x1:::ykxk) = ¹(yx1)¢¹(yx1yx2)¢ ::: ¢¹(yx<kyxk) Underlined arguments are probability variables, all others are conditions. ¹ is called a chronological probability distribution. Marcus Hutter - 10 - Universal Artificial Intelligence Iterative AI¹ Model The Expectimax sequence/algorithm: Take reward expectation over the xi and maximum over the yi in chronological order to incorporate correct dependency of xi and yi on the history. X X X best Vkm (y ˙x˙ <k) = max max ::: max yk yk+1 ym xk xk+1 xm (r(xk)+ ::: +r(xm))¢¹(y ˙x˙ <kyxk:m) X X X y˙k = maxarg max ::: max y y yk k+1 mk xk xk+1 xmk (r(x )+ ::: +r(x ))¢¹(y ˙x˙ yx ) k mk <k k:mk This is the essence of Decision Theory. Decision Theory = Probability + Utility Theory Marcus Hutter - 11 - Universal Artificial Intelligence Functional AI¹ ´ Iterative AI¹ The functional and iterative AI¹ models behave identically with the natural identification X ¹(yx1:k) = ¹(q) q:q(y1:k)=x1:k Remaining Problems: ² Computational aspects. ² The true prior probability is usually not (even approximately not) known. Marcus Hutter - 12 - Universal Artificial Intelligence Limits we are interested in 1 ¿ hl(ykxk)i ¿ k ¿ T ¿ jY £ Xj a b c d 1 ¿ 216 ¿ 224 ¿ 232 ¿ 265536 (a) The agents interface is wide. (b) The interface can be sufficiently explored. (c) The death is far away. (d) Most input/outputs do not occur. These limits are never used in proofs but ... ... we are only interested in theorems which do not degenerate under the above limits. Marcus Hutter - 13 - Universal Artificial Intelligence Induction = Predicting the Future Extrapolate past observations to the future, but how can we know something about the future? Philosophical Dilemma: ² Hume’s negation of Induction. ² Epicurus’ principle of multiple explanations. ² Ockhams’ razor (simplicity) principle. ² Bayes’ rule for conditional probabilities. Given sequence x1:::xk¡1 what is the next letter xk? If the true prior probability ¹(x1:::xn) is known, and we want to minimize the number of prediction errors, then the optimal scheme is to predict the xk with highest conditional ¹ probability, i.e. maxarg ¹(x y ). yk <k k Solomonoff solved the problem of unknown prior ¹ by introducing a universal probability distribution » related to Kolmogorov Complexity. Marcus Hutter - 14 - Universal Artificial Intelligence Algorithmic Complexity Theory The Kolmogorov Complexity of a string x is the length of the shortest (prefix) program producing x. K(x) := minfl(p): U(p) = xg ;U = univ.TM p The universal semimeasure is the probability that output of U starts with x when the input is provided with fair coin flips X »(x) := 2¡l(p) [Solomonoff 64] p : U(p)=x¤ Universality property of »: » dominates every computable probability distribution £ »(x) ¸ 2¡K(½) ¢½(x) 8½ Furthermore, the ¹ expected squared distance sum between » and ¹ is finite for computable ¹ X1 X 2 + ¹(x<k)(»(x<kxk) ¡ ¹(x<kxk)) < ln 2 ¢ K(¹) k=1 x1:k [Solomonoff 78] for binary alphabet Marcus Hutter - 15 - Universal Artificial Intelligence Universal Sequence Prediction n!1 ) »(x<nxn) ¡! ¹(x<nxn) with ¹ probability 1. ) Replacing ¹ by » might not introduce many additional prediction errors. General scheme: Predict xk with prob. ½(x<kxk). This includes deterministic predictors as well. Number of expected prediction errors: P P E := n ¹(x )(1¡½(x x )) n½ k=1 x1:k 1:k <k k Θ» predicts xk of maximal »(x<kxk). ¡1=2 n!1 EnΘ» =En½ · 1 + O(En½ ) ¡! 1 [Hutter 99] Θ» is asymptotically optimal with rapid convergence. For every (passive) game of chance for which there exists a winning strategy, you can make money by using Θ» even if you don’t know the underlying probabilistic process/algorithm. Θ» finds and exploits every regularity. Marcus Hutter - 16 - Universal Artificial Intelligence Definition of the Universal AI» Model Universal AI = Universal Induction + Decision Theory Replace ¹AI in decision theory model AI¹ by an appropriate generalization of » . X ¡l(q) »(yx1:k) := 2 q:q(y1:k)=x1:k X X X y˙k = maxarg max ::: max y y yk k+1 mk xk xk+1 xmk (r(x )+ ::: +r(x ))¢»(y ˙x˙ yx ) k mk <k k:mk Functional form: ¹(q) ,! »(q):=2¡l(q). Bold Claim: AI» is the most intelligent environmental independent agent possible. Marcus Hutter - 17 - Universal Artificial Intelligence Universality of »AI The proof is analog as for sequence prediction. Inputs yk are pure spectators. £ ¡K(½) »(yx1:n) ¸ 2 ½(yx1:n) 8 chronological ½ Convergence of »AI to ¹AI yi are again pure spectators. To generalize SP case to arbitrary alphabet we need jXj jXj jXj jXj X X y X X (y ¡z )2 · y ln i for y = 1 ¸ z i i i z i i i=1 i=1 i i=1 i=1 AI n!1 AI ) » (yx<nyxn) ¡! ¹ (yx<nyxn) with ¹ prob. 1. Does replacing ¹AI with »AI lead to AI» system with asymptotically optimal behaviour with rapid convergence? This looks promising from the analogy with SP! Marcus Hutter - 18 - Universal Artificial Intelligence Value Bounds (Optimality of AI») Naive reward bound analogously to error bound for SP ¹ ¤ ¹ V1T (p ) ¸ V1T (p) ¡ o(:::) 8¹; p Problem class (set of environments) f¹0; ¹1g with Y =V = f0; 1g and rk = ±iy1 in environment ¹i violates reward bound. The first output y1 decides whether all future rk =1 or 0. Alternative: ¹ probability Dn¹»=n of suboptimal outputs of AI» different from AI¹ in the first n cycles tends to zero Xn Dn¹»=n ! 0 ;Dn¹» :=h 1¡± ¹ » i¹ y˙k ;y˙k k=1 This is a weak asymptotic convergence claim.

Load more