Synchronous and Asynchronous Learning by Responsive Learning
Total Page:16
File Type:pdf, Size:1020Kb
Synchronous and Asynchronous Learning by Resp onsive Learning Automata Eric J Friedman Department of Economics Rutgers University New Brunswick NJ friedmanfaseconrutgersedu Scott Shenker Xerox PARC Coyote Hill Road Palo Alto CA shenkerparcxeroxcom Novemb er Abstract We consider the ability of economic agents to learn in a decentralized envi ronment in which agents do not know the sto chastic payo matrix and can not observe their opp onents actions they merely know at each stage of the game their own action and the resulting payo We discuss the requirements for learning in such an environment and show that a simple probabilistic learning algorithm satises two imp ortant optimizing prop erties i When placed in an unknown but eventually stationary random environment they converge in b ounded time in a sense we make precise to strategies that maximize average payo ii They satisfy a monotonicity prop erty related to the law of the eect in which increasing the payos for a given strategy increases the probability of that strategy b eing played in the future We then study how groups of such learners interact in a general game We show that synchronous groups of these learners converge to the serially undominated set In contrast asynchronous groups do not necessarily converge to the serially undominated set highlighting the imp ortance of timing in decentralized learning However these asynchronous learners do converge to the serially unoverwhelmed set which we dene Thus the serially unoverwhelmed set can b e viewed as an appropriate solution concept in decentralized settings Intro duction We consider learning in a rep eatedgame environment in which agents have extremely limited information At each stage of the rep eated game each agent takes an action and then observes her resulting payo Agents do not know the payo matrix which may b e sto chastic and changing over time nor can they directly observe the numb er of other agents or their actions Because the agents have no a priori information ab out the game structure they must learn the appropriate strategies by trial and error To gain insight into the general nature of learning in such environments we study the b ehavior of a simple probabilistic learning algorithm the responsive learning automaton This setting with its extreme lack of a priori and observable information is representative of many decentralized systems involving shared resources where there is typically little or no knowledge of the game structure and the actions of other agents are not observable Computer networks are one example of such a decentralized system and learning there inherently involves exp erimentation One can consider bandwidth usage from a gametheoretic p ersp ective see eg Shenker where the actions are trans mission rates and the resulting payos are network delays these delays dep end on the transmission rates of all other agents and on the details of the network infrastructure Network users typically do not know much if anything ab out the actions and preferences of other agents or ab out the nature of the underlying network infrastructure Thus the agents in a computer network are completely unaware of the payo structure and must 1 The degree of uncertainty ab out the environment is to o complex for any kind of Bayesian or hyp er rational learning to b e practical This idea is also contained in Roth and Erev and Sarin and Vahid who require that in their mo dels of learning players base their decisions only on what they observe and do not construct detailed descriptions of b eliefs We do not mo del such complexity issues formally There have b een many such prop osals in the literature see eg Rubinstein Friedman and Oren Knoblauch but no consensus on a reasonable measure of complexity 2 Our mo del is based on ordinary Learning Automata which were develop ed in biology Tsetlin and psychology as mo dels of learning Norman and have b een studied extensively in engineering Lakshmivarahan Narendra and Thatcher Recently they have also b een used by Arthur as a mo del of human learning in exp erimental economics Also Borgers and Sarin have recently compared their b ehavior with an evolutionary mo del of learning rely on exp erimentation to determine the appropriate transmission rate There is a large literature on such exp erimentation algorithms called congestion control algorithms see Jacobson or Lefelho cz et al for a description of the algorithms used in practice and Shenker for a more a more theoretical description Note that learning in this context diers signicantly from the standard game the oretic discussion of learning in rep eated games in which agents play b est resp onse to either the previous play Cournot some average of historical play Carter and Maddo ck or some complex Bayesian up dating over priors of either the opp o nents strategic b ehavior Fudenb erg and Kreps Kalai and Lehrer or the structure of the game Jordan Note that the rst three typ es learning require the observability of the opp onents actions and the knowledge of the payo matrix while the fourth will typically b e impractical in the situations in which we are interested Also none of these allow for the p ossibility that the payo matrix may change over time In our context where agents do not even know their own payo function trial and error learning is necessary even if the play of other agents is static Since such exp er imentation plays a central role our discussion naturally combines the incentive issues normally addressed by game theoretic discussions of learning with the sto chastic sam pling theory pivotal to the theory of decentralized control This combination of issues is of great practical imp ortance in the sharing of decentralized resources eg computer networks water and p ower systems yet is has received rather little attention in the gametheoretic learning literature To b etter understand learning in such contexts we address two basic questions what prop erties should a reasonable learning algorithm have in this context and how can one describ e the set of asymptotic plays when all agents use a reasonable learning 3 In theory the set of priors could b e enlarged to include the p ossible changes to the payo matrix however this would enlarge the already large set of priors substantially Such computations would clearly b e b eyond the ability of any real p erson or computer 4 Kalai and Lehrer study learning in a similar setting in the context of Bayesian learning algorithm We address these two questions in turn What prop erties should a reasonable learning algorithm have in this con text One natural requirement is that when playing in a stationary random environ ment ie the payo for a given action is in any p erio d is drawn from a xed probability distribution the algorithm eventually learns to play the strategy or strategies with the highest average payo This clearly is an absolutely minimal requirement and was studied by Hannan and more recently by Fudenb erg and Levine It is also fo cal in the decentralized control literature In fact we p osit a stronger condition which says that even in a nonstationary environment an action which is always optimal in the sense of exp ected payo should b e learned There is also a second natural requirement for learning algorithms Recall that in our context players do not directly observe the payo function so if the game has changed it can only b e detected through their exp erimentation Such changes in environment are quite common in many distributed control situations in computer networks every time a new user enters or the link top ology changes often due to crashes b oth common o ccurrences the underlying payo function changes The agents are not explicitly notied that the payo structure has changed and thus the learning algorithm should automatically adapt to changes in the environment While many standard mo dels of standard mo dels of learning such as the one intro duced by Kalai and Lehrer and almost all decentralized control algorithms can optimize against a stationary environment they are typically not suitable for changing environments For example most Bayesian learning pro cedures eg Kalai and Lehrer 5 Our approach has much in common with the exp erimental psychology literature and its recent application by Roth and Erev to mo del exp eriments on extensive form games 6 In the simple mo del we consider a stationary environment corresp onds to iid actions by other agents and nature Clearly in other cases stationarity can b e interpreted more generally For example in computer networks there is often cyclical variations such as day vs night in this case stationarity could b e dened with resp ect to the natural cycles Similarly simple Markovian eects could also b e incorp orated into the denition 7 Note that in the Bayesian optimality setting eg Kalai and Lehrer such convergence may not o ccur with high probability if the agents priors are suciently skewed and all absolutely exp edient Narendra and Thatcher learning algo rithms have the prop erty that after a nite time they may discard strategies forever at any time there is a nonzero probability that they will never play a particular strategy again Typically with high probability these discarded strategies are not currently the optimal strategy however if the environment changes and the payos for some of these discarded strategies increases the learner will not b e able to detect and react to this change Thus we conclude that such algorithms are illsuited to changing environments