Learning from Collective Behavior
Michael Kearns Jennifer Wortman Computer and Information Science Computer and Information Science University of Pennsylvania University of Pennsylvania [email protected] [email protected]
Abstract population after observing their interactions during a train- ing phase of polynomial length. We assume that each agent Inspired by longstanding lines of research in soci- i in a population of size N acts according to a fixed but un- ology and related fields, and by more recent large- known strategy ci drawn from a known class . A strategy C population human subject experiments on the In- probabilistically maps the current population state to the next ternet and the Web, we initiate a study of the com- state or action for that agent, and each agent’s strategy may putational issues in learning to model collective be different. As is common in much of the literature cited behavior from observed data. We define formal above, there may also be a network structure governing the models for efficient learning in such settings, and population interaction, in which case strategies may map the provide both general theory and specific learning local neighborhood state to next actions. algorithms for these models. Learning algorithms in our model are given training data of the population behavior, either as repeated finite-length trajectories from multiple initial states (an episodic model), 1 Introduction or in a single unbroken trajectory from a fixed start state (a no-reset model). In either case, they must efficiently (poly- Collective behavior in large populations has been a subject nomially) learn to accurately predict or simulate (properties of enduring interest in sociology and economics, and a more of) the future behavior of the same population. Our frame- recent topic in fields such as physics and computer science. work may be viewed as a computational model for learning There is consequently now an impressive literature on math- the dynamics of an unknown Markov process — more pre- ematical models for collective behavior in settings as di- cisely, a dynamic Bayes net — in which our primary interest verse as the diffusion of fads or innovation in social net- is in Markov processes inspired by simple models for social works [10, 1, 2, 18], voting behavior [10], housing choices behavior. and segregation [22], herding behaviors in financial mar- As a simple, concrete example of the kind of system we kets [27, 8], Hollywood trends [25, 24], critical mass phe- have in mind, consider a population in which each agent nomena in group activities [22], and many others. The ad- makes a series of choices from a fixed set over time (such as vent of the Internet and the Web have greatly increased the what restaurant to go to, or what political party to vote for). number of both controlled experiments [7, 17, 20, 21, 8] and Like many previously studied models, we consider agents open-ended systems (such as Wikipedia and many other in- who have a desire to behave like the rest of the population stances of “human peer-production”) that permit the logging (because they want to visit the popular restaurants, or want and analysis of detailed collective behavioral data. It is nat- to vote for “electable” candidates). On the other hand, each ural to ask if there are learning methods specifically tailored agent may also have different and unknown intrinsic prefer- to such models and data. ences over the choicesas well (based on cuisine and decor, or The mathematical models of the collective behavior liter- the actual policies of the candidates). We consider models in ature differ from one another in important details, such as the which each agent balances or integrates these two forces in extent to which individual agents are assumed to act accord- deciding how to behave at each step [12]. Our main question ing to traditional notions of rationality, but they generally is: Can a learning algorithm watching the collective behavior share the significant underlying assumption that each agent’s of such a population for a short period produce an accurate current behavior is entirely or largely determined by the re- model of their future choices? cent behavior of the other agents. Thus the collective behav- The assumptions of our model fit nicely with the litera- ior is a social phenomenon, and the population evolves over ture cited in the first paragraph, much of which indeed pro- time according to its own internal dynamics — there is no poses simple stochastic models for how individual agents re- exogenous “Nature” being reacted to, or injecting shocks to act to the current population state. We emphasize from the the collective. outset the difference between our interests and those com- In this paper, we introduce a computational theory of mon in multiagent systems and learning in games. In those learning from collective behavior, in which the goal is to fields, it is often the case that the agents themselves are accurately model and predict the future behavior of a large acting according to complex and fairly general learning al- gorithms (such as Q-learning [26], no-regret learning [9], large) class, but is otherwise unknown. The learner’s ulti- fictitious play [3], and so on), and the central question is mate goal is not to discover each individual agent strategy whether and when the population converges to particular, per se, but rather to make accurate predictions of the collec- “nice” states (such as Nash or correlated equilibria). In con- tive behavior in novel situations. trast, while the agent strategies we consider are certainly “adaptive” in a reactive sense, they are much simpler than 2.1 Agent Strategies and Collective Trajectories general-purpose learning algorithms, and we are interested We now describe the main components of our framework: in learning algorithms that model the full collective behavior no matter what its properties; there is no special status given State Space. At each time step, each agent i is in some • either to particular states nor to any notion of convergence. state si chosen from a known, finite set of size K. S Thus our interest is not in learning by the agents themselves, We often think of K as being large, and thus want al- but at the higher level of an observer of the population. gorithms whose running time scales polynomially in K Our primary contributions are: and other parameters. We view si as the action taken by agent i in response to the recent population behavior. The introduction of a computational model for learning The joint action vector ~s N describes the current • from collective behavior. global state of the collective.∈ S The developmentof some general theory for this model, Initial State Distribution. We assume that the initial • including a polynomial-time reduction of learning from • population state ~s 0 is drawn according to a fixed but collective behavior to learning in more traditional, unknown distribution P over N . During training, the single-target I.I.D. settings, and a separation between learner is able to see trajectoriesS of the collective behav- efficient learnability in collective models in which the ior in which the initial state is drawn from P , and as in learner does and does not see all intermediate popula- many standard learning models, must generalize with tion states. respect to this same distribution. (We also consider a no-reset variant of our model in Section 2.3.) The definition of specific classes of agent strategies, • including variants of the “crowd affinity” strategies Agent Strategy Class. We assume that each agent’s • sketched above, and complementary “crowd aversion” strategy is drawn from a known class of (typically C classes. probabilistic) mappings from the recent collective be- havior into the agent’s next state or action in . We S Provably efficient algorithms for learning from collec- mainly consider the case in which ci probabilisti- • tive behavior for these same classes. cally maps the current global state ~s into∈ C agent i’s next state. However, much of the theory we develop ap- The outline of the paper is as follows. In Section 2, we plies equally well to more complex strategies that might introduce our main model for learning from collective be- incorporate a longer history of the collective behavior havior, and then discuss two natural variants. Section 3 in- on the current trajectory, or might depend on summary troduces and motivates a number of specific agent strategy statistics of that history. classes that are broadly inspired by earlier sociological mod- els, and provides brief simulations of the collective behaviors Given these components, we can now define what is meant they can generate. Section 4 provides a general reduction of by a collective trajectory. learning from collective behavior to a generalized PAC-style model for learning from I.I.D. data, which is used subse- Definition 1 Let ~c N be the vector of strategies for the ∈ C quently in Section 5, where we give provably efficient algo- N agents, P be the initial state distribution, and T 1 be an ≥ rithms for learning some of the strategy classes introduced in integer. A T -trajectory of ~c with respect to P is a random Section 3. Brief conclusions and topics for further research variable ~s 0, , ~s T in which the initial state ~s 0 N are given in Section 6. is drawnh according· · · toiP , and for each t 1, ,T∈, S the t t ∈ { · · · } component si of the joint state ~s is obtained by applying t−1 2 The Model the strategy ci to ~s . (Again, more generally we may also allow the strategies ci to depend on the full sequence In this section we describe a learning model in which the ~s 0,...,~s t−1, or on summary statistics of that history.) observed data is generated from observations of trajectories (defined shortly) of the collective behavior of N interacting Thus, a collective trajectory in our model is simply a agents. The key feature of the model is the fact that each Markovian sequence of states that factors according to the agent’s next state or action is always determined by the recent N agent strategies — that is, a dynamic Bayes net [19]. Our actions of the other agents, perhaps combined with some in- interest is in cases in which this Markov process is generated trinsic “preferences” or behaviors of the particular agent. As by particular models of social behavior, some of which are we shall see, we can view our model as one for learning cer- discussed in Section 3. tain kinds of factored Markov processes that are inspired by models common in sociology and related fields. 2.2 The Learning Model Each agent may follow a different and possibly proba- We now formally define the learning model we study. In bilistic strategy. We assume that the strategy followed by our model, learning algorithms are given access to an oracle each agent is constrained to lie in a known (and possibly (~c, P, T ) that returns a T -trajectory ~s 0, , ~s T of OEXP h · · · i ~c with respect to P . This is thus an episodic or reset model, behavior if there exists an algorithm A such that for any in which the learner has the luxury of repeatedly observing population size N 1, any ~c N , any time horizon T , the population behavior from random initial conditions. It any distribution P over≥ N , and∈ any C ǫ> 0 and δ > 0, given S is most applicable in (partially) controlled, experimental set- access to the oracle EXP(~c, P, T ), algorithm A runs in time tings [7, 17, 20, 21, 8] where such “population resets” can polynomial in N, TO, dim( ), 1/ǫ, and 1/δ, and outputs a C be implemented or imposed. In Section 2.3 below we de- polynomial-time model Mˆ such that with probability at least fine a perhaps more broadly applicable variant of the model 1 δ, ε(QMˆ ,Q~c) ǫ. in which resets are not available; the algorithms we provide − ≤ can be adapted for this model as well (Section 5.3). We now discuss two reasonable variations on the model The goal of the learner is to find a generative model that we have presented. can efficiently produce trajectories from a distribution that is arbitrarily close to that generated by the true population. 2.3 A No-Reset Variant ˆ 0 Thus, let M(~s ,T ) be a (randomized) model output by a The model above assumes that learning algorithms are given 0 learning algorithm that takes as input a start state ~s and access to repeated, independent trajectories via the oracle time horizon T , and outputs a random T -trajectory, and let , which is analogous to the episodic setting of rein- ˆ OEXP QMˆ denote the distribution over trajectories generated by M forcement learning. As in that field, we may also wish to when the start state is distributed according to P . Similarly, consider an alternative “no-reset” model in which the learner let Q~c denote the distribution over trajectories generated by has access only to a single, unbroken trajectory of states gen- EXP(~c, P, T ). Then the goal of the learning algorithm is to erated by the Markov process. To do so we must formulate O find a model Mˆ making the distance ε(Q ,Q~c) between an alternative notion of generalization, since on the one hand, L1 Mˆ QMˆ and Q~c small, where the (distribution of the) initial state may quickly become ir- relevant as the collective behavior evolves, but on the other, ε(Q ,Q~c) Mˆ ≡ the state space is exponentially large and thus it is unrealistic 0 T 0 T to expect to model the dynamics from an arbitrary state in Q ˆ ( ~s , , ~s ) Q~c( ~s , , ~s ) . | M h · · · i − h · · · i | polynomial time. h~s 0,··· ,~s T i X One natural formulationallows the learner to observe any A couple of remarks are in order here. First, note that polynomially long prefix of a trajectory of states for training, we have defined the output of the learning algorithm to be and then to announce its readiness for the test phase. If ~s is a “black box” that simply produces trajectories from initial the final state of the training prefix, we can simply ask that states. Of course, it would be natural to expect that this black the learner output a model Mˆ that generates accurate T -step box operates by having good approximations to every agent trajectories forward from the current state ~s. In other words, strategy in ~c, and using collective simulations of these to pro- Mˆ should generate trajectories from a distribution close to duce trajectories, but we choose to define the output Mˆ in a the distribution over T -step trajectories that would be gener- more general way since there may be other approaches. Sec- ated if each agent continued choosing actions according to ond, we note that our learning criteria is both strong (see his strategy. The length of the training prefix is allowed to be below for a discussion of weaker alternatives) and useful, polynomial in T and the other parameters. in the sense that if ε(QMˆ ,Q~c) is smaller than ǫ, then we can While aspects of the general theory described below are sample Mˆ to obtain O(ǫ)-good approximations to the expec- particular to our main (episodic) model, we note here that the tation of any (bounded) function of trajectories. Thus, for in- algorithms we give for specific classes can in fact be adapted stance, we can use Mˆ to answer questions like “What is the to work in the no-reset model as well. Such extensions are expected number of agents playing the plurality action after discussed briefly in Section 5.3. T steps?” or “What is the probability the entire population is playing the same action after T steps?” (In Section 2.4 be- 2.4 Weaker Criteria for Learnability low we discuss a weaker model in which we care only about We have chosen to formulate learnability in our model us- one fixed outcome function.) ing a rather strong success criterion — namely, the ability to Our algorithmic results consider cases in which the agent (approximately) simulate the full dynamics of the unknown strategies may themselves already be rather rich, in which Markov process induced by the population strategy ~c. In or- case the learning algorithm should be permitted resources der to meet this strong criterion, we have also allowed the commensurate with this complexity. For example, the crowd learner access to a rather strong oracle, which returns all in- affinity models have a number of parameters that scales with termediate states of sampled trajectories. the number of actions K. More generally, we use dim( ) to There may be natural scenarios, however, in which we denote the complexity or dimension of ; in all of our imag-C are interested only in specific fixed properties of collective ined applications dim( ) is either the VCC dimension for de- behavior, and thus a weaker data source may suffice. For in- terministic classes, or one· of its generalizations to probabilis- stance, suppose we have a fixed, real-valued outcome func- tic classes (such as pseudo-dimension [11], fat-shattering di- tion F (~s T ) of final states (for instance, the fraction of agents mension [15], combinatorial dimension [11], etc.). playing the plurality action at time T ), with our goal being We are now ready to define our learning model. to simply learn a function G that maps initial states ~s 0 and a time horizon T to real values, and approximately minimizes Definition 2 Let be an agent strategy class over actions C 0 T . We say that is polynomially learnable from collective E 0 G(~s ,T ) E T [F (~s )] S C ~s ∼P − ~s where ~s T is a random variable that is the final state of a gate of h at time T = D, pairs of the form ~s 0, F (~s T ) T -trajectory of ~c from the initial state ~s 0. Clearly in such a provide exactly the same data as the PAC modelh for h underi model, while it certainly would suffice, there may be no need D, and thus must be equally hard. to directly learn a full dynamical model. It may be feasible For the polynomial learnability of from collective be- to satisfy this criterion without even observing intermediate havior, we note that is clearly PACC learnable, since it is states, but only seeing initial state and final outcome pairs just Boolean combinationsC of 1 or 2 inputs. In Section 4 ~s 0, F (~s T ) , closer to a traditional regression problem. we give a general reduction from collective learning of any h It is noti difficult to define simple agent strategy classes agent strategy class to PAC learning the class, thus giving the for which learning from only ~s 0, F (~s T ) pairs is provably claimed result. intractable, yet efficient learningh is possiblei in our model. This idea is formalized in Theorem 3 below. Here the popu- Conversely, it is also not difficult to concoct cases in lation forms a rather powerful computational device map- which learning the full dynamics in our sense is intractable, ping initial states to final states. In particular, it can be but we can learn to approximate a specific outcome func- tion from only ~s 0, F (~s T ) pairs. Intuitively, if each agent thought of as a circuit of depth T with “gates” chosen from h i , with the only real constraint being that each layer of the strategy is very complex but the outcome function applied to circuitC is an identical sequence of N gates which are applied final states is sufficiently simple (e.g., constant), we cannot to the outputs of the previous layer. Intuitively, if only initial but do not need to model the full dynamics in order to learn states and final outcomes are provided to the learner, learn- to approximate the outcome. ing should be as difficult as a corresponding PAC-style prob- We note that there is an analogy here to the distinc- lem. On the other hand, by observing intermediate state vec- tion between direct and indirect approaches to reinforcement tors we can build arbitrarily accurate models for each agent, learning [16]. In the former, one learns a policy that is spe- which in turn allows us to accurately simulate the full dy- cific to a fixed reward function without learning a model of namical model. next-state dynamics; in the latter, at possibly greater cost, one learns an accurate dynamical model, which can in turn Theorem 3 Let be the class of 2-input AND and OR gates, be used to compute good policies for any reward function. and one-input NOTC gates. Then is polynomially learnable For the remainder of this paper, we focus on the model as we from collective behavior, but thereC exists a binary outcome formalized it in Definition 2, and leave for future work the function F such that learning an accurate mapping from investigation of such alternatives. start states ~s 0 to outcomes F (~s T ) without observing inter- mediate state data is intractable. 3 Social Strategy Classes
Proof: (Sketch) We first sketch the hardness construction. Before providing our general theory, including the reduction Let be any class of Boolean circuits (that is, with gates in from collective learning to I.I.D. learning, we first illustrate ) thatH is not polynomially learnable in the standard PAC and motivate the definitions so far with some concrete exam- model;C under standard cryptographic assumptions, such a ples of social strategy classes, some of which we analyze in class exists. Let D be a hard distribution for PAC learning detail in Section 5. . Let h be a Boolean circuit with R inputs, S gates, 3.1 Crowd Affinity: Mixture Strategies andH depth ∈D. H To embed the computation by h in a collective problem, we let N = R + S and T = D. We introduce The first class of agent strategies we discuss are meant to an agent for each of the R inputs to h, whose value after the model settings in which each individual wishes to balance initial state is set according to an arbitrary AND, OR, or NOT their intrinsic personal preferences with a desire to “follow gate. We additionally introduce one agent for every gate g the crowd.” We broadly refer to strategies of this type as in h. If a gate g in h takes as its inputs the outputs of gates crowd affinity strategies (in contrast to the crowd aversion g′ and g′′, then at each time step the agent corresponding to strategies discussed shortly), and examine a couple of natural g computes the corresponding function of the states of the variants. agents corresponding to g′ and g′′ at the previous time step. As a motivating example, imagine that there are K Finally, by convention we always have the Nth agent be the restaurants, and each week, every member of a population agent corresponding to the output gate of h, and define the chooses one of the restaurants in which to dine. On the one output function as F (~s) = sN . The distribution P over ini- hand, each agent has personal preferences over the restau- tial states of the N agents is identical to D on the R agents rants based on the cuisine, service, ambiance, and so on. On corresponding to the inputs of h, and arbitrary (e.g., inde- the other, each agent has some desire to go to the currently pendent and uniform) on the remaining S agents. “hot” restaurants — that is, where many or most other agents Despite the fact that this construction introduces a great have been recently. To model this setting, let be the set of N S deal of spurious computation (for instance, at the first time K restaurants, and suppose ~s is the population state ∈ S step, many or most gates may simply be computing Boolean vector indicating where each agent dined last week. We can functions of the random bits assigned to non-input agents), it summarize the population behavior by the vector or distribu- K is clear that if gate g is at depth d in h, then at time d in the tion f~ [0, 1] , where fa is the fraction of agents dining collective simulation of the agents, the corresponding agent in restaurant∈ a in ~s. Similarly, we might represent the per- has exactly the value computed by g under the inputs to h sonal preferences of a specific agent by another distribution K (which are distributed according to D). Because the outcome ~w [0, 1] in which wa represents the probability this agent function is the value of the agent corresponding to the output would∈ attend restaurant a in the absence of any information 100 100 20
200 200 40
300 300 60
400 400 80
500 500 100
600 600 120
700 700 140
800 800 160
900 900 180
1000 1000 200 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 (a) (b) (c)
Figure 1: Sample simulations of the (a) crowd affinity mixture model; (b) crowd affinity multiplicative model; (c) agent affinity model. Horizontal axis is population state; vertical axis is simulation time. See text for details. about what the population is doing. One natural way for the havior. Broadly speaking, it is such properties we would like agent to balance their preferences with the population behav- a learning algorithm to model and predict from sufficient ob- ior would be to choose a restaurant according to the mixture servations. distribution (1 α)f~ + α ~w for some agent-dependent mix- ture coefficient−α. Such models have been studied in the 3.2 Crowd Affinity: Multiplicative Strategies sociology literature [12] in the context of belief formation. One possible objection to the crowd affinity mixture strate- We are interested in collective systems in which every gies described above is that each agent can be viewed as agent i has some unknown preferences ~wi and mixture co- randomly choosing whether to entirely follow the population efficient αi, and in each week t chooses its next restaurant distribution (with probability 1 α)orto entirely follow their t − according to (1 αi)f~ + αi ~wi, which thus probabilisti- personal preferences (with probability α) at each time step. − cally yields the next population distribution f~t+1. How do A more realistic model might have each agent truly combine such systems behave? And how can we learn to model their the population behavior with their preferences at every step. macroscopic properties from only observed behavior, espe- Consider, for instance, how an American citizen might cially when the number of choices K is large? alter their anticipated presidential voting decision over time An illustration of the rich collective behavior that can al- in response to recent primary or polling news. If their first ready be generated from such simple strategies is shown in choice of candidate — say, an Independent or Libertarian Figure 1(a). Here we show a single but typical 1000-step candidate — appears over time to be “unelectable” in the simulation of collective behavior under this model, in which general election due to their inability to sway large num- N = 100 and each agent’s individual preference vector ~w bers of Democratic and Republican voters, a natural and typ- puts all of its weight on just one of 10 possible actions (rep- ical response is for the citizen to shift their intended vote to resented as colors); this action was selected independently at whichever of the front-runners they most prefer or least dis- random for each agent. All agents have an α value of just like. In other words, the low popularity of their first choice 0.01, and thus are selecting from the population distribution causes that choice to be dampened or eradicated; unlike the 99% of the time. Each row shows the population state at a mixture model above, where weight α is always given to per- given step, with time increasing down the horizontal axis of sonal preferences, here there may remain no weight on this the image. The initial state was chosen uniformly at random. candidate. One natural way of defining a general such class of It is interesting to note the dramatic difference between strategies is as follows. As above, let f~ [0, 1]K, where α = 0 (in which rapid convergence to a common color ∈ is certain) and this small value for α; despite the fact that fa is the fraction of agents dining in restaurant a in the current state ~s. Similar to the mixture strategies above, let almost all agents play the population distribution at every K ~wi [0, 1] be a vector of weights representing the intrinsic step, revolving horizontal waves of near-consensus to dif- ∈ ferent choices are present, with no final convergence in preferences of agent i over actions. Then define the prob- ability that agent i plays action a to be fa wi,a/Z(f,~ ~wi), sight. The slight “personalization” of population-only be- · havior is enough to dramatically change the collective be- where the normalizing factor is Z(f,~ ~wi) = fb wi,b. b∈S · P Thus, in such multiplicative crowd affinity models, the prob- For a fixed agent, such a strategy can be modeled by a ability the agent takes an action is always proportional to the weight vector ~w [0, 1]N , with one weight for each agent product of their preference for it and its current popularity. in the populationrather∈ than each action. We define the prob- Despite their similar motivation, the mixture and mul- ability that this agent takes action a if the current global state is N to be proportional to w . In this class of tiplicative crowd affinity strategies can lead to dramatically ~s i:si=a i different collective behavior. Perhaps the most obvious dif- strategies,∈ S the strength of the agent’s desire to take the same ference is that in the mixture case, if agent i has a strong action as agent i is determined byP how large the weight wi preference for action a there is always some minimum prob- is. The overall behavior of this agent is then probabilistically ability (αiwi,a) they take this action, whereas in the mul- determined by summing over all agents in the fashion above. tiplicative case even a strong preference can be eradicated In Figure 1(c), we show a single but typical simulation, from expression by small or zero values for the popularity again with N = 100 but now with a much shorter time hori- fa. zon of 200 steps and a much larger set of 100 actions. All In Figure 1(b), we again show a single but typical 1000- agents have random distributions as their preferences over step, N = 100 simulation for the multiplicative model other agents; this model is similar to traditional diffusion dy- in which agent’s individual preference distributions ~w are namics in a dense, random (weighted) network, and quickly chosen to be random normalized vectors over 10 actions. converges to global consensus. The dynamics are now quite different than for the additive We leave the analysis of this strategy class to future work, crowd affinity model. In particular, now there is never near- but remark that in the simple case in which K =2, learning consensus but a gradual dwindling of the colors represented this class is closely related to the problem of learning per- in the population — from the initial full diversity down to 3 ceptrons under certain noise models in which the intensity of colors remaining at approximately t = 100, until by t = 200 the noise increases with proximity to the separator [5, 4] and there is a stand-off in the population between red and light seems at least as difficult. green. Unlike the additive models, colorsdie out in the popu- lation permanently. There is also clear vertical structure cor- 3.5 Incorporating Network Structure responding to strong conditional preferences of the agents Many of the social models inspiring this work involve a net- once the stand-off emerges. work structure that dictates or restricts the interactions be- tween agents [18]. It is natural to ask if the strategy classes 3.3 Crowd Aversion and Other Variants discussed here can be extended to the scenario in which each agent is influenced only by his neighbors in a given network. It is easy to transform the mixture or multiplicative crowd Indeed, it is straightforward to extend each of the strategy affinity strategies into crowd aversion strategies — that is, in classes introduced in this section to a network setting. For which agents wish to balance or combine their personal pref- example, to adapt the crowd affinity and aversion strategy erences with a desire to act differently than the population at classes, it suffices to redefine fa for each agent i to be the large. This can be accomplished in a variety of simple ways. fraction of agents in the local neighborhoodof agent i choos- For instance, if f~ is the current distributions over actions in ing action a. To adapt the agent affinity and aversion classes, the population, we can simply define a kind of “inverse” to it is necessary only to require that wj = 0 for every agent j the distribution by letting ga = (1 fa)/(K 1), where outside the local neighborhood of agent . By making these − − i K 1= (1 fb) is the normalizing factor, and apply- simple modifications, the learning algorithms discussed in − b∈S − ing the strategies above to ~g rather than f~. Now each agent Section 5 can immediately be applied to settings in which a P exhibits a tendency to “avoid the crowd”, moderated as be- network structure is given. fore by their own preferences. Of course, there is no reason to assume that the entire 4 A Reduction to I.I.D. Learning population is crowd-seeking, or crowd-avoiding; more gen- Since algorithms in our framework are attempting to learn to erally we would allow there to be both types of individuals model the dynamics of a factored Markov process in which present. Furthermore, we might entertain other transforms each component is known to lie in the class , it is natural of the population distribution than just g above. For in- a to investigate the relationship between learningC just a single stance, we might wish to still consider crowd affinity, but to strategy in and the entire Markovian dynamics. One main first “sharpen” the distribution by replacing each f with f 2 a a concern mightC be effects of dynamic instability — that is, and normalizing, then applying the models discussed above that even small errors in models for each of the compo- to the resulting vector. This has the effect of magnifying the N nents could be amplified exponentially in the overall popula- attraction to the most popular actions. In general our algo- tion model. rithmic results are robust to a wide range of such variations. In this section we show that this can be avoided. More precisely, we prove that if the component errors are all small 3.4 Agent Affinity and Aversion Strategies compared to 1/(NT )2, the population model also has small In the two versions of crowd affinity strategies discussed error. Thus fast rates of learning for individual compo- above, an agent has personal preferences over actions, and nents are polynomially preserved in the resulting population also reacts to the current population behavior, but only in an model. aggregate fashion. An alternative class of strategies that we To show this, we give a reduction showing that if a class call agent affinity strategies instead allows agents to prefer to of (possibly probabilistic) strategies is polynomially learn- agree (or disagree) in their choice with specific other agents. ableC (in a sense that we describe shortly) from I.I.D. data, then is also polynomially learnable from collective behav- 4.2 A General Reduction C ior. The key step in the reduction is the introduction of the Multiple analogs of the definition of learnability in the PAC experimental distribution, defined below. Intuitively, the ex- model have been proposed for distribution learning settings. perimental distribution is meant to capture the distribution The probabilistic concept model [15] presents a definition over states that are encountered in the collective setting over for learning conditional distributions over binary outcomes, repeated trials. Polynomial I.I.D. learning on this distribu- while later work [13] proposes a definition for learning un- tion leads to polynomial learning from the collective. conditional distributions over larger outcome spaces. We combine the two into a single PAC-style model for learn- 4.1 A Reduction for Deterministic Strategies ing conditional distributions over large outcome spaces from In order to illustrate some of the key ideas we use in the I.I.D. data as follows. more general reduction, we begin by examining the simple case in which the number of actions K = 2 and and each Definition 5 Let be a class of probabilistic mappings from strategy c is deterministic. We show that if is polyno- an input ~x toC an output y where is a finite set. We mially learnable∈C in the (distribution-free) PAC model,C then say that is∈polynomially X learnable∈ Y if thereY exists an algo- is polynomially learnable from collective behavior. C rithm A Csuch that for any c and any distribution D over In order to exploit the fact that is PAC learnable, it is , if A is given access to an∈C oracle producing pairs ~x, c(~x) first necessary to define a single distributionC over states on Xwith x distributed according to D, then for any ǫ,δh > 0, al-i which we would like to learn. gorithm A runs in time polynomial in 1/ǫ, 1/δ, and dim( ) and outputs a function cˆ such that with probability 1 δ, C − Definition 4 For any initial state distribution P , strategy vector ~c, and sequence length T , the experimental distribu- E~x∼D Pr(c(~x)= y) Pr(ˆc(~x)= y) ǫ . tion DP,~c,T is the distribution over state vectors ~s obtained | − | ≤ y∈Y by querying (~c, P, T ) to obtain ~s 0, , ~s T , choos- X OEXP h · · · i ing t uniformly at random from 0, ,T 1 , and setting We could have chosen instead to require that the expected t. { · · · − } ~s = ~s KL divergence between c and cˆ be bounded. Using Jensen’s inequality and Lemma 12.6.1 of Cover and Thomas [6], it We denote this distribution simply as D when P , ~c, and T is simple to show that if the expected KL divergence be- are clear from context. Given access to the oracle , we OEXP tween two distributions is bounded by ǫ, then the expected can sample pairs ~s, ci(~s) where ~s is distributed according h i 1 distance is bounded by 2 ln(2)ǫ. Thus any class that is to D using the following procedure: polynomiallyL learnable under this alternate definition is also polynomially learnable underp ours. 1. Query (~c, P, T ) to obtain ~s 0, , ~s T . OEXP h · · · i Theorem 6 For any class , if is polynomially learnable 2. Choose t 0, ,T 1 uniformly at random. ∈{ · · · − } according to Definition 5, thenC C is polynomially learnable from collective behavior. C 3. Return ~s t,s t+1 . h i i Proof: This proof is very similar in spirit to the proof of the If is polynomially learnable in the PAC model, then by reduction for deterministic case. However, several tricks are definition,C with access to the oracle , for any δ,ǫ > 0, EXP needed to deal with the fact that trajectories are now random it is possible to learn a model cˆ suchO that with probability i variables, even given a fixed start state. In particular, it is no 1 (δ/N), − longer the case that we can argue that starting at a given start ǫ state and executing a set of strategies that are “close to” the Pr~s D[ˆci(~s) = ci(~s)] ∼ 6 ≤ NT true strategy vector usually yields the same full trajectory we would have obtained by executing the true strategies of each in time polynomial in N, T , 1/ǫ, 1/δ, and the VC dimension agent. Instead, due to the inherent randomness in the strate- of using the sampling procedure above; the dependence C gies, we must argue that the distribution over trajectories is on N and T come from the fact that we are requesting a similar when the estimated strategies are sufficiently close to confidence of 1 (δ/N) and an accuracy of ǫ/(TN). We − the true strategies. can learn a set of such strategies cˆi for all agents i at the cost To make this argument, we begin by introducing the idea of an additional factor of N. of sampling from a distribution P1 using a “filtered” version Consider a new sequence ~s 0, , ~s T returned by the of a second distribution P2 as follows. First, draw an out- oracle . By the union bound,h · · with · probabilityi 1 δ, OEXP − come ω Ω according to P2. If P1(ω) P2(ω), output the probability that there exists any agent i and any t ω. Otherwise,∈ output ω with probability P≤(ω)/P (ω), and t t ∈ 1 2 0, ,T 1 , such that cˆi(~s ) = ci(~s ) is less than ǫ. with probability 1 P (ω)/P (ω), output an alternate action { · · · − } t6 t 1 2 If this is not the case (i.e., if cˆi(~s ) = ci(~s ) for all i and − drawn according to a third distribution P3, where t) then the same sequence of states would have been reached 0 if we had instead started at state ~s and generated each ad- P1(ω) P2(ω) t t t−1 P3(ω)= − ′ ′ ditional state ~s by letting s = ci(~s ). This implies that ′ ′ ′ i ω :P2(ω )
5.2 Learning Crowd Affinity Multiplicative Models Lemma 11 Suppose that fm,a > 0 and fm,b > 0 for all m such that . Then for any δ > 0, with probability In Section 3.2, we introduced the crowd affinity multiplica- m , forI any∈ M , if and , then if tive model. In this model, strategies are parameterized only 1 δ β > 0 χa,b βχb,a χb,a 1 − N ln(2/δ)/2, then ≤ ≥ by a weight vector ~w. The probability that agent i chooses |M| ≥ action a is simply f w / f w . m a a b∈S b b w χ (1 + β) N ln(2/δ) Although the motivation for this model is similar to that a m:Im∈M a . w χm for the mixture model, theP dynamics of the system are quite b − m:Im∈M b ≤ 2 N ln(2/δ) P |M| −p different (see the simulations and discussion in Section 3), P p p and a very different algorithm is necessary to learn individ- If we are fortunate enough to have a sufficient number of ual strategies. In this section, we show that this class is poly- data instances for which fm,a > 0 for all a , then this ∈ S nomially learnable from collective behavior, and sketch the lemma supplies us with a way of approximating the ratios corresponding learning algorithm. The algorithm we present between all pairs of weights and subsequently approximating is based on a simple but powerful observation. In particular, the weights themselves. In general, however, this may not be consider the following random variable: the case. Luckily, it is possible to estimate the ratio of the weights of each pair of actions a and b that are used together 1/fm,a if fm,a > 0 and am = a , frequently by the population using only those data instances χm = a 0 otherwise . in which at least one agent is choosingeach. Formally, define Suppose that for all m such that m , it is the case that a,b = m : fm,a > 0,fm,b > 0 . I ∈M M {I ∈M } fm,a > 0. Then by the definition of the strategy class and Lemma 11 tells us that if a,b is sufficiently large, and there linearity of expectation, M is at least one instance m a,b for which am = b, then I ∈ M m 1 fm,awa we can approximate the ratio between wa and wb well. E χa = What if one of these assumptions does not hold? If we fm,a fm,sws "m # m s∈S :IXm∈M :IXm∈M are not able to collect sufficiently many instances in which P1 fm,a > 0 and fm,b > 0, then standard uniform convergence = wa , s fm,sws results can be used to show that it is very unlikely that we m:Im∈M ∈S X see a new instance for which fa > 0 and fb > 0. This idea where the expectation is over the randomnessP in the agent’s is formalized in the following lemma, the proof of which is strategy. Notice that this expression is the product of two in the appendix. Lemma 12 For any M < , for any δ (0, 1), with Theorem 13 The class of crowd affinity multiplicative probability 1 δ, |M| ∈ model strategies is polynomially learnable from collective − behavior. Pr~ f [ a,b : fa > 0,fb > 0, a,b
We thank Nina Balcan and Eyal Even-Dar for early discus- [18] J. Kleinberg. Cascading behavior in networks: Algorithmic sions on models of social learning, and Duncan Watts for and economic issues. In N. Nisan, T. Roughgarden, E. Tar- helpful conversations and pointers to relevant literature. dos, and V.Vazirani, editors, Algorithmic Game Theory. Cam- bridge University Press, 2007. References [19] S. Russell and P. Norvig. Artificial Intelligence: A Modern [1] S. Bikhchandani, D. Hirshleifer, and I. Welch. A theory of Approach. Prentice Hall, 1995. fads, fashion, custom, and cultural change as informational cascades. Journal of Political Economy, 100:992–1026, 1992. [20] M. Salganik, P. Dodds, and D. J. Watts. Experimental study of inequality and unpredictability in an artificial cultural market. [2] S. Bikhchandani, D. Hirshleifer, and I. Welch. Learning from Science, 331(5762):854–856, 2006. the behavior of others: Conformity, fads, and informational cascades. The Journal of Economic Perspectives, 12:151–170, [21] M. Salganik and D. J. Watts. Social influence, manipulation, 1998. and self-fulfilling prophecies in cultural markets. Preprint, 2007. [3] G. W. Brown. Iterative solutions of games by fictitious play. In T.C. Koopmans, editor, Activity Analysis of Production and [22] T. Schelling. Micromotives and Macrobehavior. Norton, New Allocation. Wiley, 1951. York, NY, 1978.
[4] T. Bylander. Learning noisy linear threshold functions. Tech- [23] R. Sutton and A. Barto. Reinforcement Learning. MIT Press, nical Report, 1998. 1998.
[5] E. Cohen. Learning noisy perceptrons by a perceptron in poly- [24] A. De Vany. Hollywood Economics: How Extreme Uncer- nomial time. In 38th IEEE Annual Symposium on Foundations tainty Shapes the Film Industry. Routledge, London, 2004. of Computer Science, 1997. [25] A. De Vany and C. Lee. Quality signals in information cas- [6] T. Cover and J. Thomas. Elements of Information Theory. cades and the dynamics of the distribution of motion picture John Wiley & Sons, New York, NY, 1991. box office revenues. Journal of Economic Dynamics and Con- trol, 25:593–614, 2001. [7] P. Dodds, R. Muhamad, and D. J. Watts. An experimental study of search in global social networks. Science, 301:828– [26] C. Watkins and P. Dayan. Q-learning. Machine Learning, 829, August 2003. 8(3):279–292, 1992.
[8] M. Drehmann, J. Oechssler, and A. Roider. Herding and con- [27] I. Welch. Herding among security analysts. Journal of Finan- trarian behavior in financial markets: An Internet experiment. cial Economics, 58:369–396, 2000. American Economic Review, 95(5):1403–1426, 2005.
[9] D. Foster and R. Vohra. Regret in the on-line decision prob- lem. Games and Economic Behavior, 29:7–35, 1999. A Proofs from Section 5.1 Setting δ2 = Kδ1/2, we can again apply Lemma 8 and see that for sufficiently large M A.1 ProofofLemma8
For the first direction, m:Im∈M I(am = a) E m:Im∈M I(am = a) u uˆ u u ǫ u uv ǫv (1 αˆ)M − (1 α)M − = − P − P − v − vˆ ≤ v − v + kǫ v − v(v + kǫ) M ln(4/δ1)/2 (1 αˆ)Za∗ M + √2M − u uv + ukǫ ǫv ukǫ 2 2 = − − ≤ Za∗ (1 αˆ) M (1 αˆ)M M ln(4/δ ) v − v(v + kǫ) p − − − 1 ln(4/δ )/2 (1 αˆ)Za∗ + √2 u u(v + kǫ) ǫ(v + uk) 1 − p = − 2 ≤ Za∗ (1 αˆ) √M (1 αˆ) ln(4/δ ) v − v(v + kǫ) p − − − 1 ǫ(v + uk) ǫ(v + uk) ((1 αˆ)Za∗ /√2+1) ln(4/δ ) = . = − p 1 . 2 v(v + ǫk) ≤ v(v ǫk) Za∗ (1 αˆ) √M (1 αˆ) ln(4/δ ) − − − −p 1 Similarly, Setting δ1 = δ/(1 + K/2) and applyingp the union bound uˆ u u + ǫ u uv + ǫv u yields the lemma. = vˆ − v ≤ v kǫ − v v(v kǫ) − v − − A.3 ProofofLemma9 uv ukǫ + ǫv + ukǫ u = − As long as M is sufficiently large, for any fixed f~, v(v kǫ) − v − u(v kǫ)+ ǫ(v + uk) u (αfa + (1 α)wa) (ˆαfa + (1 αˆ)ˆwa) = − | − − − | v(v kǫ) − v a∈S − X ǫ(v + uk) α αˆ fa + (1 α)wa (1 αˆ)ˆwa = . ≤ | − | | − − − | v(v ǫk) a∈S a∈S − X X A.2 ProofofLemma7 α αˆ + ( α αˆ wˆa + wa wˆa (1 αˆ) ≤ | − | | − | | − | − a We bound the error of these estimations in two parts. First, X∈S since from Equation 6 we know that for any , with α αˆ wa wˆa ) δ1 > 0 −| − | · | − | probability 1 δ , 1 2 α αˆ + (1 αˆ) wa wˆa − ≤ | − | − | − | a X∈S αˆ fm,a α fm,a − α αˆ wa wˆa m:Im∈M m:Im∈M −| − | | − | X X a∈S X 1 ln(4/δ1) 1 fm,a M ln(4/δ1) , 2 ln((4 + 2K)/δ) ≤ Za∗ M ≤ Za∗ r m:Im∈M ≤ Z ∗ √M X p p a and K(Za∗ /√2+2) ln((4 + 2K)/δ) 1 + min , ∗ √ (1 αˆ)M (1 α)M M ln(4/δ1) , (Za (1 αˆ) M p ln((4 + 2K)/δ) | − − − |≤ Za∗ − − 2(1 αˆ) . p we have by Lemma 8 that for sufficientlyp large M − } Notice that this holds uniformly for all f~, so the same αˆ m:I ∈M fm,a α m:I ∈M fm,a m m bound holds when we take an expectation over f~. (1 αˆ)M − (1 α)M P P − − (1/Za∗ ) M ln(4/δ1) (1 αˆ)M + α ˆ fm,a A.4 Handling the case where Za∗ is small − m:Im∈M 2 2 Suppose that for an action a, Z < ǫ. Let η and µ be ≤ (1p αˆ) M (1 αˆ)M M ln(4/δ1)/Za∗ a a a − − − P the true median and mean respectively of the distribution M ln(4/δ1)((1 αˆ)M +pαM ˆ ) high − from which the random variables fm,a are drawn. Let fa 2 2 ≤ Za∗ (1 αˆ) M (1 αˆ)M M ln(4/δ1) be the mean value of the distribution over fm,a conditioned p− − − low on fm,a > ηa, and similarly let fa be the mean value ln(4/δ1) p low high 1 = . conditioned on fm,a < ηa, so µa = f + f /2. 2 a a Za∗ (1 αˆ) √M (1 αˆ) ln(4/δ1) ¯high − p − − Let fa be the empirical average of fm,a conditioned on ¯low Now, by Hoeffding’s inequality andp the union bound, for fm,a > ηa, and fa be the empirical average of fm,a any δ2 > 0, with probability 1 δ2, for all a, conditioned on fm,a < ηa. (Note that we cannot actu- − ¯high ¯low ally compute fa and fa since ηa is unknown.) Fi- I(a = a) E I(a = a) m m 1 = − " # Assume for now that it is never the case that fm,a ηa. This m:Im∈M m:Im∈M X X simplifies the explanation, although everything still holds if fm,a M ln(2K/δ )/2 . can be ηa. ≤ 2 p ˆhigh ˆlow high nally, let fa = (2/M) m:I ∈Mhigh fm,a and fa = Since fa is an average of points which are all higher than m a low low ˆhigh ˆlow fa , and similarly fa is an average of points all lower (2/M) m low fm,a. Notice that Za = fa fa . :Im∈Ma P − than f high, this implies that for any τ 0, high low ¯high a We show first that fa and fa are close to fa and ′ ≥ ¯low P ¯high ¯low Pr( fm,a µa ǫ + τ) fa respectively. Next we show that fa and fa are | − |≥ high low high low close to fˆ and fˆ respectively. Finally, we show that Pr(fm,a f + τ) + Pr(fm,a f τ) a a ≤ ≥ a ≤ a − this implies that if Za is small, then the probability that a ǫ′/(ǫ′ + τ) . random value of fm,a is far from ηa is small. This in turn ≤ Recall that when Za is small for all a, we set αˆ = 0 implies a small 1 distance between our estimated model L and wˆa = I(am = a). Let w¯a = αµa + (1 and the real model for each agent. m:Im∈M − high ¯high α)wa. Notice that E[ˆwa]=w ¯a. Applying Hoeffding’s in- To bound the difference between fa and fa , it is P equality yet again, with probability 1 δ , wˆa w¯a first necessary to lower bound the number of points in the − 5 | − | ≤ empirical samples with fm,a > ηa. Let zm be a random ln(2/δ5)/2M. For any τ > 0, variable that is 1 if f > η and 0 otherwise. Clearly m,a a p Pr(zm = 1) = Pr(zm =0)=1/2. By a straightforward E ~ f (αfa + (1 α)wa) (ˆαfa + (1 αˆ)ˆwa) f∼D | − − − | application of Hoeffding’s inequality, for any δ3, with prob- "a∈S # ability 1 δ3, X − = E ~ f [ αfa + (1 α)wa wˆa ] f∼D | − − | M ln(2/δ )M a∈S z 3 , X m 2 2 − ≤ r Ef~∼Df [ αfa + (1 α)wa w¯a + w¯a wˆa ] m:Im∈M ≤ | − − | | − | X a∈S ¯high X and so the numberof samplesaveragedto get fa is at least α E ~ f [ fa µa ]+ K ln(2/δ )/2M ≤ f∼D | − | 5 (M/2) ln(2/δ3)M/2. Applying Hoeffding’s inequality a∈S again, with− probability 1 δ , X p − 4 ǫ′ ǫ′ p K 1 (ǫ′ + τ)+ ≤ − ǫ′ + τ ǫ′ + τ high ¯high ln(2/δ4) fa fa . − ≤ sM 2 ln(2/δ3)M +K ln(2/δ5)/2M − ′ pǫ ˆhigh p = K τ + + ln(2/δ5)/2M . Now, fa is an empirical average of M/2 values in ǫ′ + τ ¯high [0, 1], while fa is an empirical average of the same points, p plus or minus up to ln(2/δ3)M/2 points. In the worst B Proofs from Section 5.2 case, f¯high either includes an additional ln(2/δ )M/2 a p 3 B.1 ProofofLemma 12 points with values lower than any point in high, or ex- pMa By the union bound, cludes the ln(2/δ )M/2 lowest values points in high. 3 Ma This implies that in the worst case, Prf~∼Df [ a,b : fa > 0,fb > 0, a,b
ln(2/δ ) B.2 Bounding the 1 f high f low ǫ +2 4 L a a Here we show that with high probability (over the choice of − ≤ sM 2 ln(2/δ3)M − and the draw of f~), if is sufficiently large, 2 pln(2/δ3) M M + . wafa wˆafa M/2 ln(2/δ3) p− wsfs − wˆsfs a∈S s∈S s∈S ′ p p X Call this quantity ǫ . Clearly we have P 2(1 + βP)KN N ln(2 K/δ) ′ low high ′ . µa ǫ fa µa fa µa + ǫ . ≤ √2M (1 + β)K(N + 1) N ln(2K/δ) − ≤ ≤ ≤ ≤ − p p First note that we can rewrite the expression on the left- hand side of the inequality above as ′ wafa wˆafa ′ , ′ w fs − ′ wˆsfs a∈S′ s∈S s s∈S X P′ P where for all a, wa = wa/( s∈S′ ws). We know from Equation 9 that with high probability (over the data set and the choice of f~), for all a ′,P ∈ S ′ 2 ′ fa(1 + β) N ln(2K /δ) wafa wˆafa |S | , | − |≤ √2M (1 + β) ′ N ln(2K2/δ) − |Sp| and p
′ w fs wsfs s − s ′ s ′ X∈S X∈S ′ fs w wˆs ≤ | s − | s ′ X∈S (1 + β) ′ N ln(2K2/δ) |S | . ≤ √2M (1 + β) ′ N ln(2K2/δ) − p|S | Thus, using the fact that s∈S′ wˆspfs 1/N, we can apply Lemma 8 once again to get that ≥ P ′ wafa wˆafa ′ ′ w f − ′ wˆ f s∈S s s s∈S s s wˆafa P (1 + β)KNP N ln(2K/δ ) fa + Ps∈S′ wˆsfs , ≤ √2M (1p + β)K(N + 1) N ln(2K/δ) − and p ′ wafa wˆafa ′ ′ w fs − ′ wˆsfs a∈S′ s∈S s s∈S X P 2(1 + βP)KN N ln(2 K/δ) . ≤ √2M (1 + β)K(N + 1) N ln(2K/δ) − p p