<<

Dynamic Social Choice with Evolving Preferences

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Citation Parkes, David ., and Ariel D. Procaccia. 2014. Dynamic Social Choice with Evolving Preferences. In Proceedings of the Twenty- Seventh AAAI Conference on , Bellevue, WA, July 14-18, 2013.

Published Version http://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6199

Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:30782198

Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Open Access Policy Articles, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#OAP Dynamic Social Choice with Evolving Preferences

David C. Parkes Ariel D. Procaccia School of Engineering and Applied Sciences Science Department Harvard University Carnegie Mellon University [email protected] [email protected]

Abstract should have an impact on the next cause to be chosen. So, we are faced with a situation where both the current cause provides insights into a variety of col- lective decision making settings, but nowadays some of its and the preferences of the members are constantly shifting. tenets are challenged by internet environments, which call for This calls for a consistent and socially desirable mechanism dynamic decision making under constantly changing prefer- that sequentially selects the current cause given the current ences. In this paper we model the problem via Markov de- preferences of MoveOn members. cision processes (MDP), where the states of the MDP coin- Our model. The common social choice setting concerns a cide with preference profiles and a (deterministic, stationary) policy corresponds to a social choice . We can there- set of agents (members, in the example) and a set of alter- fore employ the studied in the social choice litera- natives (causes or issues, in the example); the preferences of ture as guidelines in the of socially desirable policies. each agent are given by a ranking of the alternatives. A pref- We present tractable that compute optimal poli- erence profile is a collection of the agents’ preferences. The cies under different prominent social choice constraints. Our outcome is determined by a social choice function, which machinery relies on techniques for exploiting and maps a given preference profile to the winning alternative. isomorphisms between MDPs. We introduce dynamic preferences into this static setting by representing the preferences of agents and their stochas- 1 Introduction tic transitions using a social choice (MDP). One of the virtues of this model is that a state of the Social choice theory has its roots in the writings of the mar- MDP corresponds to a preference profile. In other words, a quis de Condorcet and the chevalier de Borda, and over the static snapshot of the social choice MDP at any given time centuries has evolved so that nowadays we have a compre- reduces, in a sense, to the traditional social choice setting. hensive mathematical of social decision mak- As in the traditional setting, the set of actions available in ing processes. However, social choice theory falls short in each state (which trigger transitions) coincides with the set the context of today’s online communities. The internet and of alternatives, and indeed in our example members’ opin- its myriad of applications has created a need for fast-paced, ions about a cause that is currently advocated – and hence dynamic social decision making, which begs the question, is their preferences — are likely to change. it possible to augment social choice theory to make it rele- We say that agents that have identical transition models vant for this new reality? share the same type. In our running example, we imagine In our model of dynamic social decision making, a se- MoveOn staffers categorizing their membership according quence of decisions must be made in the context of a popu- to carefully selected features (e.g., “moderate” or “progres- lation with constantly changing preferences, where the evo- sive”) and eliciting the vector of features from each mem- lution of future preferences depends on past preferences and ber. Each vector of features can then be associated with a past decisions. As a running example, we consider online transition model in an approximate way. Constructing the public policy advocacy groups, which are quickly gaining transition models is a nontrivial problem that is beyond the popularity and influence on the web and in social networks scope of this paper (more on this in Section 6). such as Facebook (via the application Causes); to be con- A deterministic (stationary) policy in an MDP maps each crete we focus on MoveOn, which boasts more than five mil- state to the action taken in this state. The crucial insight, lion members. Ideally the causes or issues that are advocated which will enable us to relate the dynamic setting to tradi- by MoveOn directly stem from the collective preferences of tional social choice theory, is that we interpret a determin- the members. A salient feature of MoveOn is that the time istic policy in a social choice MDP as a social choice func- frame between deciding on a cause and acting on it is very tion. Once this connection has been made, then the usual short. Crucially, when a cause is chosen and advocated the axiomatic properties of social choice are imposed on poli- preferences of the members will usually change, and this cies and not just on decisions in specific states. Still, some Copyright c 2013, Association for the Advancement of Artificial axioms such as Pareto optimality (in social choice, whenever Intelligence (www.aaai.org). All rights reserved. all the agents prefer x to y, y would not be elected) have a local interpretation, in the sense that they can be interpreted Dynamic preferences can also be found in papers on iter- state by state. In the case of Pareto optimality, if at any point ative voting (see, e.g., Meir et al. 2010), but this work is fun- the members all prefer one choice to another then the latter damentally different in that it fixes a social choice function choice should not be made by the organization. But other and studies a strategic best response process; it is not the axioms are nonlocal. For example, a policy is onto if every true preferences that are changing, but rather the reported possible choice is selected by the policy in some state. Sim- preferences. ilarly, the requirement that a policy be nondictatorial rules The problem of constructing optimal policies for con- out selecting a particular choice in a state only if it is most strained MDPs has received some attention (Altman 1999; preferred of the same member who is also getting his most- Dolgov and Durfee 2005; 2006). Unfortunately the con- preferred choice in every other state. straints in question usually have a specific ; to the The final component of a social choice MDP model is a best of our knowledge the constraints considered in previ- reward function, which associates a given action taken in a ous work are not sufficiently general to capture our social given state with a reward (for the designer); the goal is to choice constraints, and moreover we insist on deterministic optimize the infinite sum of discounted rewards. The exis- policies. tence of such a customizable objective is novel from a social A recent paper by Boutilier and Procaccia (2012) builds choice perspective (despite much work on optimization in on our model of dynamic social choice (which was pub- social choice, e.g., of latent utility functions (Boutilier et al. licly available earlier) to relate a notion of distances in social 2012)). Unfortunately, a policy that optimizes discounted choice (Elkind, Faliszewski, and Slinko 2009) to an opera- rewards may not satisfy basic social choice properties, and tional measure of social desirability. Very informally, they hence may be undesirable as a social decision making mech- show that a social choice function selects alternatives that anism. The key algorithmic challenge that we introduce is are closest to being consensus winners if and only if that therefore: social choice function is an optimal policy of a specific, ar- Given a social choice MDP, tractably compute an opti- guably natural, social choice MDP. The conceptual implica- mal deterministic policy subject to given social choice tion is that alternatives that are close to consensus can be- constraints. come winners faster in a dynamic process. Our results. Observe that the state space of a social choice 2 Preliminaries MDP is huge; if there are n agents and m alternatives then its cardinality is (m!)n. To make things manageable we assume In this section we formally introduce basic notions from the in our algorithmic results that there is only a constant num- MDP literature and social choice theory. ber of alternatives, i.e., m = O(1), and moreover that the of types is also bounded by a constant. We wish to 2.1 Markov Decision Processes design algorithms that are time in n. We argue Below we briefly review the basics of Markov decision pro- that these assumptions are consistent with our motivation: cesses; the reader is referred to the book by Puterman (1994) while the number of MoveOn members is in the millions, for more details. A Markov decision process (MDP) is a 4- the number of causes is usually rather small, and the num- tuple M = (S, A, R, P ) where S is a finite set of states; A ber of types is limited by the number of features, which must is a finite set of actions; R : S ×A → R is a reward function, be individually elicited from each member. where for s ∈ S and a ∈ A, R(s, a) is the reward obtained As mentioned above, local social choice axioms restrict when taking action a in state s; and P is the transition func- individual states to certain actions in a way that is indepen- tion, where P (s0|s, a) is the probability of moving to state dent of the actions selected for other states. A local is s0 when action a is taken in state s. Note that the transition anonymous if it is indifferent to the identities of the agents. function is Markovian in that it depends only on the current Some of the most prominent social choice axioms are lo- state and not on the history. cal and anonymous. Our main result is an that, A deterministic policy is a function π : S → A, which given a social choice MDP, an anonymous reward function, prescribes which action π(s) is taken in state s ∈ S. This and an anonymous local axiom, computes an optimal policy definition implicitly assumes that the policy is also station- that satisfies the axiom in polynomial time in n. Turning to ary, that is, it does not depend on the history. nonlocal axioms, we prove that, given a social choice MDP There are several variations to how the cumulative - and an anonymous reward function, we can find an optimal ward is calculated. We consider the most common ap- policy that is onto or nondictatorial in polynomial time in n. proach where there is an infinite horizon and a discount fac- Related work. Our work is conceptually related to the liter- tor γ ∈ [0, 1). Given an MDP M, a policy π is associated ature on dynamic incentive mechanisms (see, e.g., Parkes with a value function Vπ : S → R, where Vπ(s) is the cumu- et al. 2010 for an overview). Specifically, our model is lative discounted reward that is obtained if the initial state is inspired by models where the preferences of agents also s ∈ S and the action prescribed by π is taken at each step. evolve via an MDP (Cavallo, Parkes, and Singh 2006; It is known that for any (unconstrained) MDP there is an Bergemann and Valim¨ aki¨ 2010). However, the focus of optimal policy π∗ that is deterministic. Such an optimal pol- the work on dynamic incentive mechanisms is the design icy can be found in polynomial time, e.g., by the of monetary transfers to the agents in a way that incentivizes optimal values via and then greedily as- truthful reporting of preferences. signing actions that achieve the maximum at every state. 1

x x yyx xxy xyx yxy 0.5 0.5

1 1 0.5 1 1 0.5 0.5 1 1 0.5 0.5

y y xxx yyy yxx xyy 0.5 0.5 0.5 (a) The two type transition mod- (b) The transition model of the MDP. els: type 1 (left) type 2 (right).

Figure 1: In this example there are three agents and two alternatives, A = {x, y}. There are two types and Θb = {{1, 2}, {3}}, that is, agents 1 and 2 are of type 1 and agent 3 is of type 2. The transition models of the two types are illustrated in (a). We only show the transitions that are associated with action x. A node is labeled with the top preference of an agent, e.g., a node labeled by x corresponds to the preference x y. The transition model of the MDP is shown in (b), where again only the transitions that are associated with the action x are illustrated. A node is labeled by the top preference of the three agents, e.g., xyx corresponds to the preference profile where x 1 y, y 2 x, x 3 y.

2.2 Social choice 3 The Model Let N = {1, . . . , n} be the set of agents and A be the set Let N = {1, . . . , n} be a set of agents, and let A be a set of alternatives; denote |A| = m. We presently describe the of alternatives where |A| = m; we overload the notation A, MDP M = (S, A, R, P ) that we deal with. Specifically, as below the actions in our MDP coincide with the set of al- we will describe S, A, and P , and leave the restrictions on ternatives. Each agent is associated with strict linear prefer- R for later. The state transitions in our MDP are factored ences over the alternatives, that is, a strict ranking of the i across agents and thus the MDP is a very special case of a alternatives; x y means that agent i prefers x to y. Let i factored MDP (Boutilier, Dearden, and Goldszmidt 1995). L = L(A) denote the set of strict linear preferences over A. The set of actions A of our MDP coincides with the set A collection ~ = ( ,..., ) ∈ Ln of the agents’ prefer- 1 n of alternatives (which is also denoted by A). For each agent ences is called a preference profile.A social choice function i we have a X , which takes values in L. is a function f : Ln → A that designates a winning alterna- i The current value of X indicates the current preferences of tive given a preference profile. Given ∈ L we denote by i agent i. A state s ∈ S defines a value ∈ L for every top( ) the alternative that is most preferred in . i variable Xi, i ∈ N. Therefore, each state is a preference A prominent approach in social choice theory compares profile, and the size of the state space is huge: (m!)n. Given social choice functions based on their axiomatic properties. a state s ∈ S, we denote by s(i) the value of Xi in this Two properties are considered absolutely essential and are state, that is, the preferences of agent i. Furthermore, given satisfied by all commonly studied social choice functions. A a permutation µ : N → N, we also denote by µ(s) the state social choice function f is onto if for every a ∈ A there ex- such that s(i) = µ(s)(µ(i)) for all i ∈ N, that is, µ can also ~ n ~ ists ∈ L such that f( ) = a, that is, every alternative be seen as a permutation over states. can be elected. A social choice function f is dictatorial if n For each i ∈ N we have a transition model , where there exists an agent i ∈ N such that for every ~ ∈ L , 0 0 Pi( i | i, a) is the probability of Xi taking the value i f( ~ ) = top( i); f is nondictatorial if there is no such when the current value is i∈ L and the action a ∈ A is agent. taken. Below we assume that there are only t possible tran- Below we define some other prominent axioms; the first sition models, where t ≤ n. We say that agents with the two axioms provide a notion of consensus, and require that same transition model have the same type. We let Θb be the a social choice function elect an alternative when this notion partition of agents into types, where each θ ∈ Θ is a set is satisfied. We say that a∗ ∈ A is a Condorcet winner in ~ b of agents with the same type. In the following we will find if for all a ∈ A \{a∗}, |{i ∈ N : a∗ a}| > n/2, that i it useful to construct partitions that are more refined, and is, a majority of agents prefer a∗ to any other alternative. A sometimes coarser, than this basic type-based partition. social choice function f is Condorcet-consistent if f( ~ ) = We define the transition model of the MDP M by letting a∗ whenever a∗ is a Condorcet winner in ~ ; it is unanimous ~ n ∗ Y if for every ∈ L such that top( i) = a for every i ∈ N, P (s0|s, a) = P (s0(i)|s(i), a). (1) f( ~ ) = a∗, i.e., it always elects an alternative that is ranked i first by all the agents. A related axiom is Pareto optimality i∈N that requires that for every ~ ∈ L where x i y for all See Figure 1 for an illustration. Intuitively, at every step we i ∈ N, f( ~ ) 6= y. elect an alternative, and this choice affects the preferences of all the agents. Note that the preference ranking of each agent thus symmetric with respect to E. This is useful because transitions independently given the selection of a particular an optimal policy can be computed by contracting the state alternative. space of M, replacing each equivalence class in E with one Definition 3.1. An MDP M = (S, A, R, P ) where S, A, state. More formally, the following lemma is an adaptation and P are as above, is called a social choice MDP. of (Zinkevich and Balch 2001, Theorem 2). Lemma 4.2 (Zinkevich and Balch 2001). Let M = 4 Symmetries and Local Axioms (S, A, R, P ) be an MDP and let E be an equivalence re- lation over S. Assume that M is symmetric with respect Having defined the social choice MDP, and in particular with to E. Then an optimal deterministic policy for M that is states corresponding to preference profiles, we interpret a symmetric with respect to E can be computed in time that is deterministic policy π as a social choice function. A policy polynomial in |E| and |A|. defines a mapping from preference profiles to alternatives. We can therefore seek policies that satisfy the traditional so- In order to employ Lemma 4.2 we must construct an cial choice axioms; this is indeed the focus of our technical equivalence relation for which the social choice MDP is results. symmetric. For this, we define a partition Θ on agents that We ask whether it is possible to efficiently compute an op- induces an equivalence relation EΘ on states. As a special timal policy despite the large number of states. We will show case, we will be interested in the partition Θb of agents into that we can provide positive answers by exploiting symme- types but a formulation in terms of arbitrary partitions over tries between states. In particular, we show that the concepts the agents is useful for the result of Section 5. of are sufficiently general that they facilitate the Given a partition Θ of agents, we define EΘ as follows. 0 0 efficient of optimal policies that satisfy what For all s, s ∈ S, (s, s ) ∈ EΘ if and only if for all θ ∈ Θ we call anonymous local axioms. and ∈ L, |{i ∈ θ : s(i) = }| = |{i ∈ θ : s0(i) = }|. 4.1 Exploiting symmetries Informally, two states s and s0 are equivalent given partition The symmetries in a social choice MDP stem from the iden- Θ on agents if one state can be obtained from the other by a tical transition models associated with agents of the same permutation of the preferences of agents in the same subset type. Intuitively, rather than concerning ourselves with θ ∈ Θ. For example, if Θ = {{1}, {2}, {3}} then all states which ranking is currently held by each agent, it should be are distinct. If Θ = {{1, 2}, {3}}, then any pair of states enough to keep track of how many agents of each type pos- where agents 1 and 2 swap rankings are equivalent. sess each ranking. Note that for each θ ∈ Θ, each ∈ L, and any s ∈ S, To make this precise we use a formalism introduced by |{i ∈ θ : s(i) = }| ∈ {0, . . . , n}. This immediately Zinkevich and Balch (2001). Given an equivalence relation implies the following lemma.1 E over a set B, we denote by E(x) = {x0 ∈ B :(x, x0) ∈ Lemma 4.3. Let Θ be a partition of the agents such that E} the equivalence class of x ∈ B, and by E the set of k(m!) equivalence classes of E; in particular E(x) ∈ E. E is a |Θ| = k. Then |EΘ| ≤ (n + 1) . partition of set B. Thus, we see that when m is constant, the number of equivalence classes on states induced by a partition Θ of Definition 4.1. Let M = (S, A, R, P ) be an MDP and let constant size k is polynomial in n. E be an equivalence relation over S. 0 Definition 4.4. Given a partition Θ of some (arbitrary) set, 1. R is symmetric with respect to E if for all (s, s ) ∈ E and 0 0 0 0 a partition Θ is called a refinement of Θ if for every θ ∈ Θ all a ∈ A, R(s, a) = R(s , a). there exists θ ∈ Θ such that θ0 ⊆ θ. 2. P is symmetric with respect to E if for all (s, s0) ∈ E, We observe that refining the partition over the set of every a ∈ A, and every equivalence class S ∈ E, agents also refines the associated partition of states into X X P (s00|s, a) = P (s00|s0, a). equivalence classes. Formally: 0 s00∈S s00∈S Observation 4.5. Let Θ be a partition of agents, and let Θ be a refinement of Θ. Then EΘ0 is a refinement of EΘ, where 3. M is symmetric with respect to E if R and P are sym- these are the associated partitions on states. metric with respect to E. We wish to prove that a social choice MDP is symmetric 4. A policy π is symmetric with respect to E if for all with respect to equivalence classes induced by refinements 0 0 (s, s ) ∈ E, π(s) = π(s ). of the partition of agents into types. The next lemma estab- Intuitively, symmetry of an MDP with respect to an equiv- lishes this symmetry for the transition model. alence relation requires, for every action and every equiva- Lemma 4.6. Let M = (S, A, R, P ) be a social choice lence class on states, that both the reward and the probability MDP, and let Θb be the partition of the agents into types. Let of transitioning to any particular equivalence class, is inde- Θ be a refinement of Θ. Then P is symmetric with respect to pendent of the exact state in the current equivalence class. b E . Zinkevich and Balch (2001) show that if M is symmet- Θ ric with respect to E then there is an optimal deterministic 1The bound given in Lemma 4.3 is far from being tight, but it is policy that is identical on the equivalence classes of E, and sufficient for our purposes. 0 0 Proof. Let s, s ∈ S such that (s, s ) ∈ EΘ. Consider a Observation 4.8. Let M = (S, A, R, P ) be a social choice permutation µ : N → N such that for every θ ∈ Θ and MDP. Let Θ be any agent partition, and let R : S × A → R every i ∈ θ, µ(i) ∈ θ, and s0 = µ(s); µ is guaranteed to be an anonymous reward function. Then R is symmetric with exist by the definition of EΘ. respect to EΘ. Since Θ is a refinement of Θb, for every θ ∈ Θ and every To summarize, a social choice MDP with an anonymous i, j ∈ θ, Pi ≡ Pj, and in particular for every i ∈ N, Pi ≡ reward function is symmetric with respect to refinements of Pµ(i). It follows from (1) that E (by Lemma 4.6 and Observation 4.8), and this suggests Θb P (s00|s, a) = P (µ(s00)|µ(s), a) = P (µ(s00)|s0, a) a method for computing an optimal deterministic policy that is polynomial in n by Lemma 4.2. 00 for all s ∈ S. Therefore, for every a ∈ A and every equiv- We want to prove a more general result though, which will alence class S ∈ EΘ, also enable the restriction of the actions available in some X X P (s00|s, a) = P (µ(s00)|s0, a). (2) states. The purpose of handling restrictions is twofold. First, it will directly facilitate the computation of optimal policies s00∈S s00∈S that are consistent with certain local axioms. Second, it is Next, recall that µ(i) ∈ θ for each θ ∈ Θ and i ∈ θ, 00 crucial for the result of Section 5 that addresses some non- and hence for every S ∈ EΘ and s ∈ S it holds that 00 00 local axioms. (s , µ(s )) ∈ EΘ. We wish to claim that µ restricted to S is a permutation on S. It is sufficient to prove that µ is Definition 4.9. Let M = (S, A, R, P ) be a social choice 00 00 MDP. one-to-one; this is true since if s1 6= s2 then there exists 00 00 00 i ∈ N such that s1 (i) 6= s2 (i), and therefore µ(s1 )(µ(i)) 6= 1.A restriction is a subset Ψ ⊆ S × A. A deterministic 00 µ(s2 )(µ(i)). We conclude that policy π is Ψ-consistent if for every s ∈ S, π(s) ∈ {a ∈ X X A :(s, a) ∈ Ψ}. P (µ(s00)|s0, a) = P (s00|s0, a). (3) 2. Let Θ be a partition of the agents. A restriction Ψ is 00 00 s ∈S s ∈S symmetric with respect to induced equivalence relation 0 The combination of (2) and (3) yields EΘ on states if for all (s, s ) ∈ EΘ, and all a ∈ A, X X 0 P (s00|s, a) = P (s00|s0, a), (s, a) ∈ Ψ ⇔ (s , a) ∈ Ψ. s00∈S s00∈S That is, a restriction to particular actions in particular as desired. states is symmetric with respect to an equivalence relation if the same restriction holds across all equivalent states. In So far we have imposed no restrictions on the reward some of the MDP literature a restriction is considered to be function R. Below we will require it to be anonymous, that a component of the MDP, but we consider the MDPs and re- is, it should be symmetric with respect to the equivalence re- strictions separately, as we see the restriction as an external lation E{N} induced by the coarsest partition of the agents, constraint that is imposed on the MDP. {N}. Equivalently, a reward function R is anonymous if for We are finally ready to present our main result. A short every permutation µ : N → N on the agents, all s ∈ S discussion is in order, though, regarding the parameters of and all a ∈ A, R(s, a) = R(µ(s), a). This is a stronger the problem. As mentioned above, the size of the state space restriction than what we need technically, but it seems quite of a social choice MDP is (m!)n. Even if m is constant, this natural. is still exponential in n. In order to obtain running time that Definition 4.7. Let M = (S, A, R, P ) be a social choice is tractable with respect to the number of agents n we also assume that the number of types is constant. MDP. A reward function R : S × A → R is anonymous if it is symmetric with respect to E{N}. Theorem 4.10. Let M = (S, A, R, P ) be a social choice As an example of an anonymous reward function, con- MDP, let R be an anonymous reward function, let Θb be the sider a designer who wishes to reach a social consensus. partition of the agents into types, let Θ be a refinement of Θb, As described in Section 2.2, three common notions of con- and let Ψ ⊆ S × A be symmetric with respect to EΘ. Fur- sensus are a Condorcet winner, a majority winner which is thermore, assume that |A| = O(1) and |Θ| = O(1). Then ranked first by a majority of agents, and (the stronger notion an optimal deterministic Ψ-consistent policy for M (which of) a unanimous winner, that is, an alternative ranked first will be symmetric with respect to EΘ) can be computed in by all agents. The reward function can then be, e.g., of the polynomial time in the number of agents n. form Proof. We define R0 : S × A → such that2 1 if a is a Condorcet winner in s R R(s, a) = 0 otherwise R(s, a) if (s, a) ∈ Ψ R0(s, a) = The abovementioned notions of social consensus are indif- −∞ otherwise ferent to permuting the agents and hence lead to anonymous From the facts that R is symmetric with respect to EΘ (by reward functions. Observation 4.8) and Ψ is symmetric with respect to E (by The following observation is a direct corollary of Obser- Θ vation 4.5 and the fact that any partition on agents is a re- 2It is possible to replace −∞ with a sufficiently negative con- finement of {N}. stant, e.g., −(1/(1 − γ)) maxs,a R(s, a). assumption), it follows that R0 is also symmetric with re- states. However, many important axioms do not possess this spect to EΘ. By Lemma 4.6, P is symmetric with respect to property, e.g., neither Ontoness nor nondictatorship are lo- 0 0 0 EΘ. Let M = (S, A, R ,P ); we have that M is symmet- cal. Analogously to Corollary 4.13, we can show: ric with respect to EΘ. By Lemma 4.2 we can find an op- Theorem 5.1. Let M = (S, A, R, P ) be a social choice timal policy for M0 in time polynomial in |A| = O(1) and MDP, let R be an anonymous reward function, let Θb be the |EΘ|. By Lemma 4.3, using |A| = O(1) and |Θ| = O(1), O(1) partition of the agents into types, and let sˆ ∈ S. Further- we have that |EΘ| ≤ (n + 1) , therefore an optimal de- terministic policy for M0 can be computed in polynomial more, assume that m = |A| = O(1) and |Θb| = O(1). Then time. The definition of R0 simply rules out the use of state- an optimal deterministic policy for M that is either onto or action pairs that are not in Ψ, hence this policy is an optimal nondictatorial can be computed in polynomial time in the deterministic Ψ-consistent policy for M. number of agents n. The theorem’s intricate proof is relegated to the full ver- 4.2 Anonymous local axioms sion of the paper.5 On a high level, ontoness and nondicta- Theorem 4.10 provides us with the means to compute opti- torship stand out in that they admit a tractable guided search mal policies that are consistent with restrictions that respect approach. For example, there are n dictatorial polices that symmetries. Fortunately, many prominent social choice ax- must be ruled out in obtaining a nondictatorial policy. The ioms can be expressed with just these kinds of restrictions, technical challenge is to exclude policies through search hence we can settle the question of computing optimal de- without breaking the symmetries. To do this we must care- terministic policies under such constraints. fully refine our equivalence classes, without causing an ex- 3 ponential growth in their number. Our algorithms therefore Definition 4.11. An axiom is local if it can be represented amount to a balancing act, where we refine the equivalence as a restriction Ψ ⊆ S × A. A local axiom is anonymous if relations on states while avoiding very detailed symmetries it is symmetric with respect to E{N}. that lead to a bad running time. Informally, a local axiom prescribes which actions are al- Conceptually though, we view Corollary 4.13 as a far lowed in each state; it is local in the sense that this does not more important result because it captures many of the most depend on the policy’s choices in other states. Anonymity prominent social choice axioms. Theorem 5.1 should be requires that the acceptability, or not, of an action does not seen as a proof of concept: computing optimal policies sub- depend on the names of agents. For example, Condorcet- ject to some natural nonlocal constraints is tractable (albeit consistency is a local axiom since it simply restricts the set quite difficult). It remains open whether a similar result of actions (to a singleton) in every state where a Condorcet can be obtained for other prominent nonlocal axioms, e.g., winner exists. Condorcet-consistency is also anonymous: if monotonicity (pushing an alternative upwards can only help an alternative is a Condorcet winner, it would remain one it), anonymity (symmetry with respect to agents), and neu- for any permutation of preferences amongst agents. More trality (symmetry with respect to alternatives). generally, we have the following observation. Observation 4.12. The following axioms are local and 6 Discussion anonymous: Condorcet consistency, Pareto-optimality, and We have not addressed the challenge of constructing appro- unanimity.4 priate transition models for the agents; rather these transition Now the following result is an immediate corollary of models are assumed as input to our algorithms. There is a Theorem 4.10. variety of techniques that may be suitable, ranging from au- tomated methods such as (see, e.g., Craw- Corollary 4.13. Let M = (S, A, R, P ) be a social choice ford and Veloso 2005) and hidden Markov models, to mar- MDP, let R be an anonymous reward function, let Θb be the keting and psychological approaches. We aim to provide the partition of the agents into types, and let Ψ be the restric- first principled approach to decision making in environments tion that represents an anonymous local axiom. Further- where currently one is not available. Even a coarse partition more, assume that |A| = O(1) and |Θb| = t = O(1). Then of the agents into a few types, and subsequent application an optimal deterministic Ψ-consistent policy for M can be of our algorithmic results, would be an improvement over computed in polynomial time in the number of agents n. the status quo in organizations like MoveOn. Over time this initial partition can be gradually refined to yield better and 5 Nonlocal Axioms better approximations of reality. Many other challenges remain open, e.g., dealing with The axioms that we were able to handle in Section 4 are lo- dynamically arriving and departing agents, dropping the as- cal, in the sense that the restrictions on the available actions sumptions on the number of alternatives and types, and tack- in a state are not conditional on the actions taken in other ling prominent nonlocal axioms. In addition to these techni- 3Local axioms are also known as intraprofile conditions; see, cal challenges, we hope that the conceptual connection that e.g., (Roberts 1980). we have made between social choice and MDPs — and be- 4There are additional axioms that are local and anonymous, tween seemingly unrelated strands of AI research — will e.g., Smith consistency, mutual majority consistency, and invari- spark a vigorous discussion around these topics. ant loss consistency; see (Tideman 2006) for a description of these axioms. 5Available from: http://www.cs.cmu.edu/˜arielpro/papers.html. References Altman, E. 1999. Constrained Markov Decision Processes. Chapman and Hall. Bergemann, D., and Valim¨ aki,¨ J. 2010. The dynamic pivot mechanism. Econometrica 78:771–789. Boutilier, C., and Procaccia, A. D. 2012. A dynamic ra- tionalization of distance rationalizability. In Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI), 1278–1284. Boutilier, C.; Caragiannis, I.; Haber, S.; Lu, T.; Procaccia, A. D.; and Sheffet, O. 2012. Optimal social choice func- tions: A utilitarian view. In Proceedings of the 13th ACM Conference on Electronic Commerce (EC), 197–214. Boutilier, C.; Dearden, R.; and Goldszmidt, M. 1995. Ex- ploiting structure in policy construction. In Proceedings of the 14th International Joint Conference on Artificial Intelli- gence (IJCAI), 1104–1111. Cavallo, R.; Parkes, D. C.; and Singh, S. 2006. Optimal co- ordinated planning amongst self-interested agents with pri- vate state. In Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI), 55–62. Crawford, E., and Veloso, M. 2005. Learning dynamic pref- erences in multi-agent meeting scheduling. In Proceedings of the 5th IEEE/WIC/ACM International Conference on In- telligent Agent Technology (IAT), 487–490. Dolgov, D., and Durfee, E. 2005. Stationary deterministic policies for constrained MDPs with multiple rewards, costs, and discount factors. In Proceedings of the 19th Interna- tional Joint Conference on Artificial Intelligence (IJCAI), 1326–1331. Dolgov, D., and Durfee, E. 2006. Resource allocation among agents with MDP-induced preferences. Journal of Artificial Intelligence Research 27:505–549. Elkind, E.; Faliszewski, P.; and Slinko, A. 2009. On distance rationalizability of some voting rules. In Proceedings of the 12th Conference on Theoretical Aspects of Rationality and Knowledge (TARK), 108–117. Meir, R.; Polukarov, M.; Rosenschein, J. S.; and Jennings, N. R. 2010. Convergence to equilibria in plurality voting. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI), 823–828. Parkes, D. C.; Cavallo, R.; Constantin, F.; and Singh, S. 2010. Dynamic incentive mechanisms. AI Magazine 31(4):79–94. Puterman, M. L. 1994. Markov Decision Processes: Dis- crete Stochastic . Wiley. Roberts, K. W. S. 1980. Social choice theory: The single- profile and multi-profile approaches. Review of Economic Studies 47(2):441–450. Tideman, N. 2006. Collective Decisions and Voting. Ash- gate. Zinkevich, M., and Balch, T. 2001. Symmetry in Markov decision processes and its implications for single agent and multiagent learning. In Proceedings of the 18th Interna- tional Conference on Machine Learning (ICML), 632–640.