Coalition Formation Pro cesses with Belief
Revision among Bounded-Rational
Self-Interested Agents
Fernando Tohm e
Tuomas Sandholm
Email: ftohme, [email protected]
Department of Computer Science
Washington University
Campus Box 1045
One Bro okings Drive
St. Louis, MO 63130
Phone : 314 935-6160
Fax : 314 935-7302
Abstract
This pap er studies coalition formation among self-interested agents
that cannot make sidepayments. We show that -core stability reduces
to analyzing whether some utility pro le is maximal for all agents. We
also show that strategy pro les that lead to the -core are a subset
of Strong Nash equilibria. This fact carries our -core-based stability
results directly over to two other strategic solution concepts: Nash
equilibrium and Coalition-Pro of Nash equilibrium.
This material is based up on work supp orted by the National Science Foundation under
Tuomas Sandholm's CAREER Award IRI-9703122, Grant IRI-9610122, and Grant I IS-
9800994. Fernando Tohm ewas also supp orted by the Departamento de Econom a of
the Universidad Nacional del Sur, in Argentina and byaFellowship from the National
Research Council CONICET of Argentina. An early version of this pap er app eared in a
workshop [41] 1
The main fo cus of the pap er is to analyze the dynamic pro cess
of coalition formation by explicitly mo deling the costs of communica-
tion and delib eration. We describ e an algorithm for sequential action
choice where each agent greedily stepwise maximizes its payo given
its b eliefs. Conditions are derived under which this pro cess leads to
convergence of the agents' b eliefs and to a stable coalition structure.
We derive these results for the case where the length of the pro cess is
exogenously restricted as well as for the case where agents can cho ose
it.
Finally, we show that the outcome of any communica-
tion/delib eration pro cess that leads to a stable coalition structure is
Pareto-optimal for the original game which do es not incorp orate com-
munication or delib eration. Conversely,anyPareto-optimal outcome
can b e supp orted by a communication/delib eration pro cess that leads
to a stable coalition structure.
1 Intro duction
In manymultiagent settings, self-interested agents|e.g. representing real
world companies or individuals|can op erate more e ectively by forming
coalitions and co ordinating their activities within each coalition. Therefore,
ecient metho ds for coalition formation are of key imp ortance in multia-
gent systems. Coalition formation involves three activities: coalition struc-
1
ture generation partitioning the agents into disjoint coalitions , solving
each coalition's optimization problem within the coalition, and dividing
the value of each coalition among memb er agents in case of net cost, this
value may b e negative.
Coalition formation among self-interested agents has b een widely studied
in game theory [40, 11 , 2, 1 , 4, 9]. The main solution concepts are geared
toward payo division among agents in ways that guarantee forms of stabil-
ity of the coalition structure. Most of these solution concepts fo cus on the
nal solution, and usually do not address the dynamic pro cess that leads to
the solution. DAI work on coalition formation has intro duced proto cols for
dynamic coalition formation, but the role of strategies and the pro cess itself
1
This is the usual de nition of coalition structures in game theory.For a non-partitional
de nition see [38]. 2
have not b een included in the solution concept. Although the outcomes sat-
isfy di erent forms of stability, there is often no guarantee that the pro cess
itself is stable, i.e. that individual agents would b e b est o by adhering to
the imp osed strategies [37, 39 ]. Also, it is usually assumed that the agents
can compute the coalition values exactly [43, 37 , 39], but sometimes deter-
mining the value of a coalition involves solving an intractable combinatorial
optimization problem, e.g. solving how a coalition of dispatch centers should
route their p o oled vehicles to handle their p o oled delivery tasks. Some DAI
work has addressed the complexity of coalition value computation by explic-
itly incorp orating computational actions in the solution concept [32, 33 ]. This
allows one to game-theoretically trade o computation cost against solution
quality.However, that work do es not include proto cols for dynamic coalition
structure generation all coalition structures were exhaustively enumerated,
nor do es it address b elief revision.
In this pap er we will analyze coalition formation pro cesses from a norma-
tive p ersp ective. Each agent's decision to participate in a coalition dep ends
on strategic considerations since the parties are self-interested and evalu-
ate p ossible agreements based on the advantages they can get from their
memb ership to a given coalition. Since our approach is normative, it is nec-
essary to identify the motivations of each agent. Utilities constitute a natural
representation for the di erences among agents: utilities are numerical rep-
resentations of their di erent preferences.
In some multiagent systems agents can make sidepayments. On the other
hand, in many settings it is desirable to b e able to handle the interactions
without sidepayments. First, the agents might not have enough money,or
a secure mechanism for transferring money might not exist. Second, most in-
teraction proto cols mechanisms that use sidepayments are only guaranteed
2
to work if every agent's utility is linear in money, which is not often the case.
3
This pap er fo cuses on games where agents cannot make sidepayments.
4
Previous research has mostly fo cused on superadditive games [11 , 43 ].
Sup eradditivity means that any pair of coalitions is b est o by merging into
2
This is the case, for example, with the classic Clarke tax mechanism [6, 8, 10, 19]. See
also the discussion on mo dulo mechanisms in [20].
3
2-agent negotiations in MAS have already often b een analyzed in the setting where
agents cannot make sidepayments [27].
4
However, algorithms for coalition structure generation in non-sup eradditive settings
have also b een presented [34, 32, 37, 38, 39,13]. 3
one. Classically it is argued that almost all games are sup eradditive b ecause,
at worst, the agents in a comp osite coalition can use solutions that they
5
had when they were in separate coalitions. This pap er will also fo cus on
sup eradditive games. However it should b e p ointed out that not all games
need to b e sup eradditive. Sup eradditivitymay b e violated, for example, if
the setting has anti-cartel p enalties or co ordination overhead such as com-
munication costs or computation costs whichmay increase as the number
of agents in a coalition increases [32 , 33 ]. Since the agents studied in this
pap er are self-interested, the appropriate solution concept is one that em-
phasizes the stability of the coalition structure. This means trying to reach
agreements where no subgroup of agents is motivated to deviate from the
solution. The core is one of such solution concept [19]. The core is usually
used for games with transferable utility [40, 11 ] but it can b e extended to
games with nontransferable utility, for example using Aumann's notion of
the -core [2] which will b e used in this pap er as well.
Anovel asp ect of our approach is the role that b elief formation plays
in the coalition formation pro cess. This pro cess can b e characterized as a
sequence of delib eration computation and communication actions that the
agents take in the dynamic pro cess of coalition formation. Agents' incom-
plete information leads to standard solution concepts, suchasBayes-Nash
equilibrium [19 ] or sequential equilibrium [16 ]. Here we will b e concerned
with a solution concept that combines b oth coalition stability and incomplete
information. The basic idea is that a belief of an agent in a given stage of
the coalition formation pro cess is a conditional probability distribution on
the outcomes, given the previous steps of the pro cess. A coalition structure
obtains stability when the b eliefs of the agents converge. We show that this
coalition structure supp orts a Pareto optimal outcome. The price paid is
tractability: the computation of the optimal coalition formation pro cess can
b e exp onential in the numb er of agents and in the length of the negotiation
6
pro cess.
5
A sp ecial case of coalition formation where agents cannot make sidepayments is the
exchange economy, where parties exchange multidimensional endowments. Exchange
economies exhibit sup eradditivity: the b est outcome can b e reached by the grand coalition
consisting of all the agents [21].
6
Since the problem addressed here is coalition formation with indep endent, self-
interested agents with incomplete information, exp onential complexity seems an almost
unavoidable consequence. See [26] for a pro of of the exp onential complexity inherenteven 4
As is common practice [11 , 39 , 43 , 13 , 32 , 34 ], we start out by studying
coalition formation in characteristic function games CFGs. In such games,
7
the value of each coalition is indep endent of nonmemb ers' actions. In general
the value of a coalition could dep end on nonmemb ers' actions due to p ositive
and negative externalities interactions of the agents' solutions. Negative
externalities b etween a coalition and nonmemb ers are often caused by shared
resources. Once nonmemb ers are using the resource to a certain extent,
not enough of that resource is available to agents in the coalition to carry
out the planned solution at the minimum cost. Negative externalities can
also b e caused by con icting goals. In satisfying their goals, nonmemb ers
may actually move the world further from the coalition's goal states [27 ].
Positive externalities are often caused by partially overlapping goals. In
satisfying their goals, nonmemb ers may actually move the world closer to
the coalition's goal states. From there the coalition can reach its goals less
exp ensively than it could have without the actions of nonmemb ers. General
settings with p ossible externalities can b e mo deled as normal form games
NFGs. CFGs are a strict subset of NFGs. However, many real-world
multiagent problems happ en to b e CFGs [32]. Also, as we show, the claims
that we will make using the convenient notation of CFGs directly carry over
to NFGs.
The game theoretic context for this work is the Nash program whichwas
originally presented in [23], and which has recently b een widely adopted as
the analysis metho d of choice in computational multiagent systems consist-
ing of self-interested agents [31, 36, 35 , 14]. The idea is that interactions
should b e studied by analyzing stable combinations of strategies|one for
each agent. The sequential pro cess of coalition formation is similar to the
sequential pro cess of bargaining where agents try to reach an agreement
by exchanging prop osals [28]. However, incomplete information intro duces
complications that are related to the credibility of statements and cheap talk:
noncommital statements can induce a multiplicity of equilibria, called \bab-
bling" equilibria [7]. Negotiation has b een prop osed as a solution to this
problem. According to this idea, negotiation using credible not necessarily
in the most ecient decentralized allo cation pro cesses.
7
These coalition values may represent the quality of the optimal solution for each
coalition's optimization problem, or they may represent the b est b ounded-rational value
that a coalition can get given limited or costly computational resources for solving the
problem [32,33]. 5
truthful statements would lead to coherent agreements where the agents'
b eliefs are mutually consistent [22 ].
Finally, b elief revision in this pap er is based on a representation of b e-
liefs by means of probability distributions. Bayes theorem o ers a to ol for
up dating them. This is a common p oint shared by our approach and b oth
the game theoretical treatments of incomplete information [3], [15 ] and
Bayesian metho ds in arti cial intelligence [24].
The rest of the pap er is organized as follows. Section 2 intro duces the
classic game theoretic framework for coalition formation among agents that
cannot make sidepayments. Section 3 analyzes outcomes statically with the
-core solution concept. Section 4 shows generally that results derived un-
der the -core solution concept carry over directly to strategic solution con-
cepts such as the Nash equilibrium, the Strong Nash equilibrium, and the
Coalition-Pro of Nash equilibrium. Section 5 intro duces the dynamic coali-
tion formation pro cess which incorp orates delib eration and communication.
It shows that stability of the coalition formation pro cess is equivalentto
convergence of the agents' b eliefs for b oth exogenously and endogenously
terminated negotiation, and that the outcome is Pareto-optimal.
2 Games and solutions
This section reviews the concept of a coalition game and an approach for
de ning the value of a coalition characteristic function in games where
nonmemb ers' actions a ect the value of the coalition, and agents cannot
transfer sidepayments. We b egin by de ning a game.
n
De nition 1 A game G =S ; U is de ned by the set of agents I =
i
i=1
f1;:::;ng, the set S of possible strategies for each agent i 2 I , and the
i
Q
n
n
resulting utility pro le U : S !<, where for each strategy pro le
i
i=1
Q
n
s ;:::;s 2 S ,
1 n i
i=1
U s ;:::;s =u s ;:::;s ;:::;u s ;:::;s
1 n 1 1 n n 1 n
Q
n
where u : S !< is the utility of agent i.
i i
i=1
A solution concept de nes the reasonable ways that a game can b e played
by self-interested agents: 6
n
De nition 2 Given a game G =S ; U ,asolution concept in pure
i
i=1
Q
n
8
strategies is a correspondence : G ! S [;, and each
i
i=1
s =s ;:::;s 2 G is cal led a solution of G.
1 n
The range of a corresp ondence includes the empty set in order to encom-
pass games that do not have a solution of the typ e prescrib ed by the solution
concept.
An example of solution concept is given by Nash equilibria: for each game
G they are elements of G, where is the Nash correspondence:
N N
De nition 3 s =s ;:::;s ;:::;s is in G if for each i and for each
1 i n N
0 0
s 6= s , u s ;:::;s ;:::;s u s ;:::;s ;:::;s . Each such strategy
i i 1 n i 1 i n
i i
pro le is a Nash equilibrium.
In other words, in a Nash equilibrium no agent is motivated to deviate
from its strategy given that the others do not deviate.
De nition 1 characterizes games in terms of the strategies of agents and
the corresp onding payo s. These games are said to b e in normal form. The
normal form is a general representation that can b e used to mo del the fact
that nonmemb ers' actions a ect the value of the coalition [32 , 9 ]. However,
coalition formation has b een mostly studied in a strict subset of normal form
games|characteristic function games|where the value of a coalition do es
not dep end on nonmemb ers' actions, and it can therefore b e represented by
a coalition sp eci c characteristic function which provides a payo for each
coalition T i.e. set of agents [40 , 11 , 43 , 39 ]. Characteristic functions are
a desirable representation, so one would like to de ne such mathematical
entities for normal form games. In such general games, a characteristic func-
tion can only b e de ned by making sp eci c assumptions ab out nonmemb ers'
strategies. In this pap er we follow Aumann's classic approach of making
the -assumption, i.e. assuming that nonmemb ers pick strategies that are
worst for the coalition. Each coalition can lo cally guarantee itself a payo
that is no less than the one prescrib ed by an analysis under this p essimistic
9
assumption. Later in the pap er we show that the results that we obtain
8
A pure strategy is a strategy that do es not involve randomization by the agent. De ni-
tion 2 can b e easily extended to mixed strategies that involves randomization, by replacing
each S byS, the set of probability distributions on S .
i i i
9
The -assumption may b e imp ossibly p essimistic. A given nonmemb er can b e assumed
to pick di erent strategies when di erent coalitions are evaluated. This is in contrast with
the fact that in any realization, the nonmemb er can only pick one strategy. 7
under the -assumption carry over to strategic solution concepts Nash equi-
librium and its re nements that can b e used directly in normal form games
without any assumptions ab out nonmemb ers' strategies.
In games where agents can make sidepayments to each other [37, 39 , 32,
11 ], the characteristic function gives the sum of the payo s of the agents
in a coalition. Instead, our analysis fo cuses on games where agents cannot
make sidepayments. In such games, the characteristic function gives a set
of utility vectors that are achievable [2]. This is in order to provide the
coalition with a set of alternative utility divisions among memb er agents.
The set contains only Pareto-optimal utilityvectors: no agent can b e made
b etter o without making some other agentworse o . The next de nition
formalizes this vector-valued characteristic function under the -assumption.
n
De nition 4 Given a game G =S ; U , with U such that its compo-
i
i=1
nents are non-transferable, we say that the characteristic function is
I
I <
v :2 !2
such that for each coalition T I
I
v T <
and v T is the set of optimal achievable utilities for T .
The -assumption comes into play in the de nition of these optimal achiev-
able utilities:
n
De nition 5 Given a game G =S ; U , and a T I , the set of op-
i
i=1
timal achievable utilities for coalition T = fj ;:::;j g is the set of utility
1 jTj
pro les
U =:::;u ;:::;u ;:::;u ;:::
T j j j
1 2
jTj
such that
n
Y
9s 2 S ;Us=U
i T
i=1
and
Y Y
T I T
S 69s 2 S : 8s 2
j j
j2T
j62T
T I T
U s ;s = 8
0 0 0 0
U =:::;u ;:::;u ;:::;u ;:::
T j j j
1 2
jTj
0 0
with, for al l j 2 T , u u and for at least one j 2 T; u > u .
i j j
j i j
i
The next result of ours follows trivially:
Prop osition 1 For every game G in normal form, v exists.
n
Pro of. Suppose that for a game G =S ; U , v cannot be de ned. So,
i
i=1
for at least one coalition T , v T cannot be determined. But that means
that a set of U scannot be de ned such that
T
Y Y
T I T
69s 2 S : 8s 2 S ;
j j
j2T j62T
T I T
U s ;s =
0 0 0 0
U =:::;u ;:::;u ;:::;u ;:::
T j j j
1 2
jTj
0 0
> u . and such that, for al l j 2 T , u u and for at least one j 2 T; u
j i j
j i j
i
Q
n
Given this condition, we proceed by evaluating U for each s 2 S . If,
i
i=1
by hypothesis, v T is not determinate, then U is not de ned for every s.
Contradiction. 2
To summarize, the normal form notation emphasizes the strategic asp ects
of the interactions among agents. Each agent's actions are strictly individual
even as a memb er of a coalition. Therefore, a characteristic function has to
indicate, for each coalition, howmuch each of its memb ers gets for joining.
Moreover, the characteristic function has to indicate the b est combinations
of individual payo s that the coalition can get. Prop osition 1 just states that
it is p ossible to de ne this kind of characteristic function for every strategic
game. Once this issue is settled, we can move on to see what kinds of solutions
are relevant for this typ e of coalitional games.
3 The -core and sup eradditivity
The -assumption gives rise to the -core solution concept which de nes
a stability criterion for the coalition structure. The idea is that strategy
pro les that do not have an optimal achievable utility are not candidates for
the solution. A vector of joint strategies, is said to b e blocked by a coalition
if its memb ers can b e b etter o bymoving to another vector: 9
De nition 6 Acoalition T blo cks a strategy pro le s =s ;:::;s if for
1 n
Q
0
n
every j 2 T there exists an s 2 S such that
i i
i=1
0 0
8j,u su sand for at least one j , u s >u s; and
i j j i j j
i i i i
0 0 0
s ;::: 2 v T. :::;u s ;:::;u s ;:::;u
j j j
1 2
jTj
The blo cking relation de nes a particular set of stable joint strategies,
the -core. The -core is the set of joint strategies where no coalition can b e
formed if its memb ers are b etter o changing their individual strategies, given
that nonmemb ers pick strategies that are worst for the coalition. In other
words, it is the set of joint strategies for which a stable collective agreement
can b e reached. Formally, the -core corresp ondence is de ned as follows [2]:
De nition 7 A strategy pro le s =s ;:::;s is in the -core if there
1 n C
is no coalition T that blocks s.
As with the Nash corresp ondence, the -core corresp ondence can b e
empty for some games. The following example demonstrates this.
Example 1 G =S ;S ;U, where the set of players is fa; bg
a b
S = S = fc; ncg
a b
and
U = fhc; nci; h0; 10i; hc; ci; h5; 5 i; hnc; ci ; h10 ; 0i; hnc; nci ; h2; 2 ig
where hs ;s i;hu s ;u s i is the general form of the elements of U . This
a b a a b b
is an instance of the prisoner's dilemma [18]. The corresponding values of
the characteristic function are:
v fag= fh10; 0ig
v fbg=fh0; 10ig
v fa; bg= fh5; 5ig
It is easy to see that
fagblocks fhc; ci; hc; nci; hnc; ncig 10
fbgblocks fhc; ci; hnc; ci; hnc; ncig
fa; bg blocks fhnc; ci; hc; nci; hnc; ncig
Therefore, thereisno hs ;s i that is not blockedbyatleast one coalition. In
a b
other words, G is empty.
C
Now, under what conditions do es a stable coalition structure exist? In
other words, what are the conditions for non-emptyness of the -core? In
the rest of this section we will show that surprisingly simple conditions are
necessary and sucient for stability. The concept of superadditivity will b e
used to develop an intuition of this phenomenon. Sup eradditivity implies
that anytwo coalitions are b est o merging:
De nition 8 A game G is sup eradditive if given any two coalitions T ;T ,
1 2
10
T \ T = ;, v T \ v T v T [ T .
1 2 1 2 1 2
The following result of ours establishes an interesting prop erty that re-
lates characteristic functions and sup eradditivity: if the intersection of the
optimal ly achievable utilities for al l the players is not empty, then the game
is superadditive. This condition on characteristic functions will b e later used
to discuss stability.
T
Prop osition 2 For a game G,if v fig6=;then G is superadditive.
i2I
Pro of. We wil l prove this result by induction on the cardinality of coalitions:
Given T ;T , T \ T = ;, jT j =1, jTj=1, it is clear that there
1 2 1 2 1 2
exist agents i; j 2 I such that T = fig and T = fj g. If U =
1 2
T
u ;:::;u 2 v fig, then in particular U 2 v T \ v T .
1 2
i2I
1 n
Q
n
Suppose U 62 v T [ T . Then, there exists s 2 S such that
1 2 i
i=1
U s=:::;u s;:::;u s;::: and u s u and u s u with
i j i j
i j
strict inequality for one of them, say i. But then, U 62 v fig,con-
tradiction. So, v T \ v T v T [ T .
1 2 1 2
10
This de nition byShubik [40] for games without sidepayments di ers technically
from the de nition of sup eradditivity for games with sidepayments [40, 11, 32, 43, 37, 39].
However, they are conceptually the same. 11
Assume that U 2 v T \ v T v T [ T , for any pair of
1 2 1 2
coalitions T ;T , T \ T = ;, jT [ T jk 1 2 1 2 1 2 0 0 i 62 T ;i 62 T , and T = T [fig. Then T \ T = ; and of course 1 2 1 2 1 1 0 0 U 2 v T by the inductive assumption because jT jk, so U 2 1 1 0 0 v T \ v T . Suppose that U 62 v T [ T .Again, this means that 2 2 1 1 Q n exists s 2 S such that U s=:::;u s;:::;u s;:::, where i j j 1 i=1 k 0 0 T [ T = fj ;:::;j g, and u s u for al l j 2 T [ T , with strict 2 1 k j i 2 1 i j 1 i inequality for one of them, say j . Suppose without loss of generality i 0 0 0 that j 2 T , but then, U 62 v T . Contradiction. i 0 1 1 So, v T \ v T v T [ T for any pair of coalitions T ;T , T \T = ;, 1 2 1 2 1 2 1 2 jT [ T j n. That is, G is superadditive. 2 1 2 To see that sup eradditivity is a necessary but not a sucient condition T for v fig 6= ;, recall the prisoner's dilemma of Example 1. It is a i2I sup eradditive game, but v fag and v fbghave no element in common. Our next result relates the condition of the previous prop osition to sta- bility of the coalition structure non-emptyness of the -core: T Lemma 1 For a game G, G 6= ; i v T 6= ;. I C T 22 ; Pro of. ! If G 6= ;, then there exists an s that is not blockedbyany C coalition. But then, by the de nition of blocked joint strategy, it is clear that for each coalition T , U s 2 v T , and therefore s 2 T v T I T 22 ff;g T If v T 6= ;, then there exists at least one U 2 v T I T 22 f;g Q n for every possible coalition T and thereforeans2 S such that i i=1 U s=U . So, s is not blocked by any coalition and thus s 2 G. 2 C This lemma is useful for proving the following theorem of ours. The theorem shows that to characterize the stability of the coalition structure in terms of the -core, only the utilities and the corresp onding actions of individual agents are required. One do es not need to compare utilities and actions of coalitions. T Theorem 1 For a game G, G 6= ; i v fig 6= ;. C i2I 12 Pro of. Q n ! If G 6= ;, there exists a s 2 S such that no coalition C i i=1 blocks it. So for each coalition T , U s 2 v T .Inparticular for al l T the coalitions with a single member, T = fig. So, U s 2 v fig. i2I T By Lemma 1, it is enough to prove that v T 6= ;. The I T 22 f;g proof wil l be by induction on the size of coalitions: T { given that by hypothesis 9U 2 v fig, then for each i 2 I , i2I U 2 v fig { assume that for each coalition T with jT j = k For any i 2 I; i 62 T by proposition 1: U 2 v T \ v fig v T [fig 0 0 0 So, U 2 v T , for any T such that jT j = k +1. Therefore, 00 00 I U 2 T for any T 2 2 f;g. 2 Since, for each agent i, v fig represents that agent's optimal achievable utilities, this result states that the non-emptyness of the -core is equivalent to the existence of at least one utilityvector that is maximal for every agent. 0 This vector, say U ,isPareto-optimal: there is no other U such that for all 0 i, U U , with strict inequality for at least one i. i i It follows from Theorem 1 that in games without sidepayments, the coali- tion structure can b e stable only if every p ossible pair of coalitions is b est o merging [ -core 6= ;] sup eradditivity. This di ers from games with sidepayments: there the core can b e nonemptyeven if the game is not su- p eradditive [32]. On the other hand|as in games with sidepayments|the coalition structure may b e unstable even if every pair of coalitions is b est o merging sup eradditivity 6 [ -core6= ;]. These results show that the -core is nonempty only in sup eradditive games. In all other games there exists at least one coalition that blo cks the solution. While this result is interesting per se, it could p erhaps b e interpreted as a criticism of the -core solution concept itself. However, with self-interested agents with non-transferable utilities, all market-like games are sup eradditive [21]. Market economies are natural examples of this typ e of games and electronic economies constitute a natural eld of application for the results of this pap er. 13 4 Relationships between axiomatic and strategic solution concepts In this section we present some new relationships b etween axiomatic and strategic solution concepts. The imp ortance of these relationships lies in the fact that they allow us to imp ort the other results of this pap er which will b e derived for the axiomatic -core solution concept directly to strategic solution concepts like Nash equilibrium and its re nements. The notion of the -core is axiomatic in that it only characterizes the outcome without a direct reference to strategic b ehavior. The Nash corre- sp ondence is, instead, a strategic solution concept: it is based only on the self-interested strategy choices of agents. Sp eci cally, it analyzes what an agent's b est strategy is, given the strategies of others. A strategy pro le is in Nash equilibrium if every agent's strategy is a b est resp onse to the strate- gies of the others. Nash equilibrium do es not account for the p ossibility that groups of agents coalitions can change their strategies in a co ordi- nated manner. Aumann has intro duced a strategic solution concept called the Strong Nash equilibrium to address this issue [1, 4 ]. A strategy pro le is in Strong Nash equilibrium if no subgroup of agents is motivated to change their strategies given that others do not change: Q n De nition 9 A strategy pro le s 2 S , in a game G,isaStrong Nash i i=1 Q T equilibrium if for any T I and for al l s 2 S there exists an i 2 T j 0 j 2T T I T such that u s u s ;s . i i 0 0 This concept gives rise to the Strong Nash corresp ondence, , i.e. the SN set of Strong Nash equilibria. We show a close relationship b etween the Strong Nash solution concept and the -core solution concept: Theorem 2 For any game G, G G. C SN Pro of. Suppose that s 2 G but s 62 G. Then, there exist an T I C SN Q T T I T and an s 2 S such that for al l j 2 C , u s j j j j 2T 0 0 means that U s is not in v T , so thereisas such that U s 2 v T and 0 u s u s. Contradiction. 2 j j Often the Strong Nash equilibrium is to o strong a solution concept, since in many games no such equilibrium exists. Recently, the Coalition-Proof 14 Nash equilibrium [4] has b een suggested as a partial remedy to this problem. This solution concept requires that there is no subgroup that can makea mutually b ene cial deviation keeping the strategies of nonmemb ers xed in a way that the deviation itself is stable according to the same criterion. A conceptual problem with this solution concept is that the deviation may b e stable within the deviating group, but the solution concept ignores the p ossibility that some of the agents that deviated may prefer to deviate again with agents that did not originally deviate. Furthermore, even these kinds of solutions do not exist in all games. On the other hand, in games where a solution is stable according to the -core, the solution is also stable according to the Coalition-Pro of Nash equilibrium solution concept. This is b ecause G G which follows from our result G G and the C CP N C SN known fact that G G. SN CP N We can also relate the Nash equilibrium itself to the -core this could al- ternatively b e deduced from Theorem 2 and the fact that G G: SN N Theorem 3 For any game G, G G. C N Pro of. Given s 2 G, we wil l show that s is a Nash equilibrium in C pure strategies for G. Suppose not. By Theorem 1 it is enough to con- sider what happens with single individuals. Then, for a i 2 I , given 0 0 the vector s ;:::;s ;s ;:::;s , the b est resp onse for i is s with 1 i 1 i +1 n 0 0 0 i 0 0 u s ;:::;s ;:::;s >u s. But that means that fig blocks s, and there- i 1 n i 0 i 0 0 fore s 62 G.Contradiction. This proves that G G . Example C C N 1 shows that the converse is not true: G= fhnc; ncig and G=;. N C Therefore G G.2 C N An implication of the results in this section is that the other results of this pap er which are derived for the -core carry over directly to analyses that use strategic solution concepts Nash equilibrium, Coalition-Pro of Nash equilibrium or Strong Nash equilibrium. Sp eci cally,any solution that is stable according to the -core is also stable according to these three solution concepts. Another implication is that to verify that a strategy pro le is in the - 11 core, one needs to only consider strategy pro les that are Pareto-optimal 11 The non-emptyness of the -core is equivalent to the existence of a utilityvector U which is common to all sets v fig for all agents i. By de nition 4, this means that there 15 and in Nash equilibrium. Alternatively, one can restrict this searchtoPareto- optimal Strong Nash equilibria or Pareto-optimal Coalition-Pro of Nash equi- libria. 5 Bounded rationality in coalition formation In the previous section it was assumed that delib eration is costless. To relax this assumption weintro duce delib eration and communication actions explicitly into the mo del: De nition 10 For each agent i in a game G, let D be a set of delib era- i tion/communication activities that i can perform in order to choose a strategy s to be executed. Each d 2 D is associated with a C d , i.e. the cost for i i i i i 12 iofperforming the activity d . i In order to avoid unnecessary complications, we assume that C can b e i expressed in the same units as u . We will not assume any sp ecial structure i 13 on D , except the following: i De nition 11 For each agent i,weconsider its process of communica- t 0 1 t i tion/delib eration fa ;a ;::: a g, where a 2 D , for t =0;1;:::;t 1, i i i i i i t i and a 2 S .IfN = max t , we say that the coalition structure has been i i2I i i t t i formedin N steps. For any i and t such that t t i i i The idea b ehind this de nition is that the agents delib erate and exchange messages until each one decides on a strategy to follow. We also assume that this pro cess is nite and each agent stays commited to its choice once it has reached a decision. We use a general characterization of the communication/delib eration pro- cess without going into the details of how an action leads to another one e.g. 0 0 do es not exist a U such that U U for all i, with strict inequality for at least one i i i. That is, U is Pareto-optimal. Therefore one can restrict the searchtoPareto-optimal outcomes. 12 Implicit in this de nition is the existence of another level of delib eration, assumed costless, needed for cho osing delib eration/communication actions. Although strong, this assumption cannot b e dropp ed without leading to a p otential in nite regress [17, 32, 29, 5, 30, 42]. 13 Activities in D and strategies in S will b e called the actions of agent i in the pro cess. i i 16 how delib eration actions lead to the choice of physical actions. Wesay that the payo of agent i in an N -p erio d pro cess is determined as follows: De nition 12 If the communication/deliberation process of agent i is a^ = i t t 0 i i fa ;:::;a g a = s , the payo is i i i i t 0 i ^a ;:::;a^ = u s ;:::;s ;:::;s C a +::: + C a C N t i i n i 1 i n i i i i i i where C > 0 is the waiting cost, which is assumedconstant per time unit. i As this de nition states, we assume that costs of activities are indep en- 1 N dent: if the pro cess is ^a =a ;:::;a , its cost C ^a is equal to the sum of i i i i i t 0 i the costs of the activities, C a +::: + C a +C N t . i i i i i i A new game can b e de ned which explicitly considers the delib eration and communication actions as part of each agent's strategy. This follows the approach of Sandholm and Lesser [32, 33 ] in the sense that such actions are explicitly incorp orated into the solution concept. It di ers from other DAI approaches to coalition formation where the solution concept stability criterion is only applied to the nal outcomes [43, 37 , 39 ]. n t De nition 13 Given G, fD g and t> 0, a new game is de ned, G = i i=1 Q n t n t n D S ;P, where P : D S !< such that for each ^a 2 i i i i=1 i=1 i Q n t D S , P ^a= ^a ;:::; ^a . i 1 1 n n i=1 i The length of the game, t, dep ends on the available communica- tion/delib eration activities and on the sequential choice of activities. For nowwe assume that the time limit is given in Subsection 5.2 we will relax this assumption. To justify this, we supp ose that each agent, i, has a degree of impatience, given by a maximum time, t , to make a nal decision. i In order to maximizepayo s, our self-interested agents engage in negoti- ations. The nal outcomes represent the result of agreements among agents. Coalitions are formed during the negotiations. A particular pro cess gener- ates a stable coalitional structure if the nal outcome in the original game Q n with strategy pro le space S cannot b e blo cked by a coalition formed i i=1 in another pro cess. Formally, Q t 1 n 0 t De nition 14 Aprocess a^ = fa ;:::;a g2 D S leads to a stable i i i=1 Q n t coalition structure if a 2 S cannot be blocked by any coalition formed i i=1 0 0 0 0 t = fa ;:::;a g. in another process a^ 17 The relationship b etween stable coalition structures and the -core is given by the following lemma. Q t 1 n 0 t Lemma 2 If the process a^ = fa ;:::;a g2 D S is in the -core i i=1 i t of game G then a^ leads to a stable coalition structure. Pro of. Suppose that there exists a coalition T for which there exists an Q n t t s 2 S such that u s u a for j 2 T and u s >u afor i j j j j i=1 0 0 0 j 2 T . Then ^a = fa ;:::;sg isaprocess in which T is formed and obtains 0 s, where s = s and P ^a P ^a, for j 2 T . Contradiction because ^a is j j j j t . 2 in the -coreofG We can easily restate the notions given in Section 2 in order to nd conditions for the stability of the coalition structure. First, we de ne the t t characteristic function for G , v , via replacing the optimal achievable util- G ities by the optimal achievable payo s which incorp orate delib eration and communication: t t n De nition 15 Given G =D S ;P, and T I , the set of optimal i i i=1 achievable payo s for T = fj ;:::;j g is the set of P s such that 1 jTj Q n t there exists ^a 2 D S and P ^a=P, and i i=1 i Q 0 0 n t 69a^2 D S such that P ^a=P with P P for al l j 2 T , i j i i i=1 i j i 0 >P for at least one j 2 T . and P j j This means that P is an optimal achievable payo for coalition T if there is no other payo vector such that the payo is no worse for any member and it is b etter for at least one|for all in particular for the worst pro cesses that nonmemb ers can pick. The following prop osition shows the analog of Theorem 1 for the game that includes delib eration/communication: Q t 1 n t Prop osition 3 a^ 2 D S is such that P ^a 2 v fig for each i i G i i=1 t i a^ is in G . C Pro of. Immediate from Theorem 1. 2 This means that a communication/delib eration-action pro cess in the - core corresp onds to a Pareto-optimal payo vector. In the rest of this pap er 18 we will fo cus on -core pro cesses since they are not only Pareto-optimal in Q n t G but also lead according to Lemma 2 to an outcome s 2 S that i i=1 cannot b e blo cked by a coalition formed in any other pro cess. For many games the prisoner's dilemma b eing an example a Pareto- optimal payo can b e reached only through the co ordinated activityof agents, i.e. the Pareto-optimal outcome is reached only if each agent acts in agreement with the other agents. Let us return to Example 1 to show how communication/delib eration activities allow to reach a co ordination on aPareto-optimal outcome: Example 2 Consider again the game G of Example 1, and assume that the communication/deliberation actions at every step are 0 00 D = D = fd; d ;d g a b and the associate costs are C d=C d=0:1 a b 0 0 C d= C d =0:1 a b 00 00 C d = C d =0:5 a b Say that the actions are abstractly described as fol lows: d: engage in negotiations 0 d :reach an agreement 00 d : sign a contract to enforce the agreement Moreover, we assume that engaging in negotiations, d, has as a consequence the realization that if no enforceable agreement is reached, the outcome wil l be hnc; nci. 0 0 00 00 The sequence hd; di; hd ;d i;hd ;d i; hc; ci is in the -corebecause the payo for every player, 5 0:1+0:1+0:5, is higher than any other payo , considering that d is an unavoidable step in the process. If an agent would choose the process fd; ncg it would know that the other agent would do the same, and the payo would be only 2 0:1. 19 5.1 Incorp orating b elief revision The previous example is very simple, but it shows the sequential nature of an agent's choice of action. This subsection intro duces a more sophisticated decision making mo del for an agent that takes part in coalition formation. This mo del is used to show results on the joint outcomes and the joint pro cess. Tocho ose the action that maximizes exp ected payo at each step, an agentmay need to evaluate the exp ected payo s of di erent actions. We will show when this pro cedure leads to the formation of a stable coalition structure. To give a mathematical characterization, weintro duce the notion of \exp ected payo " which is based on each agent's sub jective probabilities: t De nition 16 Given a sequence of actions performed by an agent i, a^ = i t+1 0 t a ;:::;a 2 D , we say that agent i can de ne a subjective probability i i i Q t+1 n t distribution on S such that sja is the conditional probability of i i i=1 i t+1 an outcome s, given that the next action is a .Aprobability distribution i Q n on the total costs associated with the process to reach s 2 S can be also i i=1 t+1 t de ned: C sja is the conditional probability of a cost C s, given that i i i i t+1 the next action is a . Then the expectedpayo , given that the next action i t+1 is a ,is i X X t t+1 t t+1 t t+1 a = u s sja C s C sja i i i i i i i i i Q Q n n s2 S C s;s2 S i i i i=1 i=1 An agent can try to maximize its exp ected payo in each step, i.e. to t t cho ose an a 2 D [ S that maximizes . This is a greedy pro cedure, i i i i and agents that use it may not always converge on a joint solution. Akey element here is the b elief formation pro cess that generates the t+1 t+1 t t conditional probabilities sja and C sja . We assume in the i i i i i Q Q n n following that S , D and the length of the pro cess, N , are common i i i=1 i=1 knowledge. Agents are assumed to up date their b eliefs using Bayes Rule [9]. The general decision pro cess is as follows: 0 Algorithm 1 At stage k =0 agent i generates a probability distribution i Q Q Q Q n n n n N 1 N 1 over D S , such that for each ^a 2 D S , i i i i i=1 i=1 i=1 i=1 0 14 15 ^a > 0. , i 14 0 Tocho ose a , agent i p erforms steps 2;:::;6 with k +1 = 0 i 15 Bayesian up dating requires p ositive probabilityofevery pro cess: if Bayes Rule is applied, a pro cess with zero probability cannot get a p ositive probability thereafter. 20 For k =1:::N 1: k k k 1. The last common action observedisa =a ;:::;a . Then, for every 1 n k +1 0 N a =a ;:::;a , the new distribution is such that i k +1 k k a=0 if a 6= a i k a i k k k +1 if a = a and a= P i k Q Q n n k +1 a N 1 i S : 6=0 D a2 i i i i=1 i=1 j 2. For each a 2 D , j =1;:::;jD j, i i i X j k+1 k +1 sja = a i i i j k+1 N a =a ;a =s i i j 3. For each a 2 D , j =1;:::;jD j, i i i j k+1 k+1 sja = a i i i j k +1 N 0 N such that a = a , a = s and C s= C a ;:::;a i i i i i i j j k +1 4. For each a 2 D , the expectedpayo a is i i i i X X j j j k +1 k +1 k +1 a = u s sja C s C sja i i i i i i i i i Q Q n n s2 S C s;s2 S i i i i=1 i=1 j j k +1 5. Agent i selects the action a that maximizes the expectedpayo a i i i j 6. The action to perform is the a found in the previous step. i In words: each agent considers, at k = 0, the set of all p ossible pro cesses and cho oses an action. After observing the actions of the other agents, it discards all pro cesses that at the rst stage di er from the observed set of actions. Then it cho oses an action according to its new b eliefs. The pro cess is rep eated until in stage N 1 it has to cho ose a domain action in the space S .At each stage, the set of p ossible pro cesses and therefore the set of i p ossible outcomes is narrowed, until a single outcome is pinp ointed. 21 Algorithm 1 has, in each stage, aworst case complexityof4B+ Q Q Q n n n 2 N 1 jDjj S jB + B , where B = j D S j is the number i i i i i=1 i=1 i=1 of p ossible pro cesses. If D = D for all i and j , the worst case complex- i j nN ity b ecomes p olynomial in jD j . This clearly shows that Algorithms 1 is i 16 exp onential in the numb er of agents and the length of negotiations. When this pro cedure is p erformed in conjunction with co ordination among agents in the sense that they happ en to follow a pro cess that is in the -core, they will converge to the b elief that a particular outcome s is the most probable one later we show that s is Pareto-optimal. 0 t N Prop osition 4 If a^ =a;:::;a ;:::;a is in the -core Proposition 4 t t t showed that this means that for each t, a =a;:::;a is the vector of 1 n Q n optimal decisions then 9m such that for t> m there exists an s 2 S i i=1 t 1 t Q n which gives the max sja for each i. i s2 S i i i=1 Pro of. If a^ is in the -core, P ^a is in v N fig by Theorem 1. Suppose G Q n that for each m there exists t>m such that theredoes not exist s 2 S i i=1 t 1 t Q n which gives the max sja . In particular, given m = N 1, j s2 S i i i=1 N 1 N Q n for t = N theredoes not exist s giving max sja . So, it i s2 S i i i=1 means that at least one agent wil l deviate, making another strategy pro le N moreprobable. Then, since a 2 S it is clear that for at least one agent i i 0 0 0 0 0 1 N N N i , ^a > ^a, where a =a ;:::;a , a = a = s for j 6= i and i i j j j 0 N N N a 6= a = s . Contradiction because P ^a 2 v fi g. 2 i G i i The converse is not true. A pro cess that leads to a stable coalition struc- ture might not b e in the -core. It is intuitive that a stable structure can b e formed in a cost-inecient pro cess. This pro cess could b e blo cked by an- other one leading to the same coalition structure, thus preserving stability. Therefore, Prop osition 4 only gives a necessary condition for a pro cess to b e an element of the -core. However, this is all we need since the following result shows that a coalition structure is stable if it leads to convergence of b eliefs ab out the strategy pro le to b e chosen. 16 This seems to b e a discouraging result. However, it is p erhaps less so in lightofthe fact that in economic pro cesses where agents have indep endent preferences, exp onential complexity in the numb er of agents or in the length of pro cesses is p ervasive [26]. 22 N 0 N Theorem 4 For G ,aprocess ^a =a;:::;a leads to a stable coali- Q n tion structurei 9msuch that 8t > m 9s 2 S that gives the i i=1 t 1 t Q n sja for each i. max i S i s2 i i=1 Pro of. N !Trivial: if a coalition T formed in a stage t m can block ^a Q 0 0 n N N N it means that there exists a^ 2 S , such that u a u a i j j i=1 0 0 N N N for j 2 T and for one j 2 T , u a u a and a maximizes j j N 1 N . Contradiction because the optimal decision in N is a . j Suppose that for al l m there exists t> msuch that thereisno Q n t 1 t Q n s2 S that gives the max sja for each i. If so, for i i=1 i i s2 S i i=1 N 1 N s > m = N 1, there exists i and s 6= a verifying that i N 1 N N a , but then, fi g isacoalition that blocks a . Contradiction. i 2 Alternatively, this result can b e stated in a more detailed way b ecause utilities and costs are indep endent: N 0 N Theorem 5 In G ,aprocess a^ =a;:::;a leads to a stable coali- tion structurei 9msuch that 8t> m and for each i, there exists a pair Q t 1 t 1 n N t a ;a 2 S D such that u j and C C j achieve i i i i i i i i i=1 N t 17 a maximum and a minimum respectively in a ;a . i Pro of. t !Assume that for al l m there exists i 2 I such that for al l a 2 i t 1 t 1 D [ S there exists t> m for which u j and C C j i i i i i i i N t ;a . Taking do not achieve a maximum and a minimum in a i Q n m = N 1 it fol lows that there exists s 2 S such that i i=1 N 1 N 1 N 1 N N N N u s sjs >ua a ja and C s C sja < i i i i i i i i i i N 1 N N N N C a C a ja . Therefore i can block a . Contradiction. i i i i 17 A trivial example where this conditions can b e ful lled is that of a game G with a unique Nash-equilibrium that is also Pareto-optimal: each individual action chosen will b e one that leads to that maximal outcome and will therefore minimize costs. There are other examples as well. 23 suppose that there exists a coalition T I that can be formed 0 in another process ^a . Therefore for at least one i 2 T , there exists Q 0 N 1 N 1 n N N N N s 2 S s = a such that u s sjs >u a a ja i i i i i i i i=1 i i N 1 N 1 N N and C s C sjs i i i i i i i i t 1 t 1 N N cause for a ;a u j and C C j achieve a maxi- i i i i i i mum and a minimum. 2 Put together, Theorem 5 gives the conditions under which Algorithm 1 leads to a stable coalition structure. Sometimes the process that leads to this coalition structure is stable according to the -core and sometimes not. In either case, stability is equivalent to the coincidenceofbeliefs about the process. That is, if the negotiation do es not lead the agents to share the same exp ectations ab out the nal outcome, the result will not b e supp orted by a stable coalition structure. To see how this works, let us revisit Example 2, this time incorp orating b elief revision: Example 3 Consider the game G of Example 2, and assume that each agent can choose among eight possible sequences of actions each sequence has at most four stages: 0 00 I d; d ;d ;c 0 00 II d; d ;d ;nc 00 III d; d ;c 00 IV d; d ;nc 0 00 V d ;d ;c 0 00 VI d ;d ;nc 00 VII d ;c 00 V III d ;nc 24 We assume that agents a and b have symmetric prior probability distri- 0 18 butions. Speci cal ly, for i = a; b, the distribution is as fol lows: i 0 0 sequences where the rst action is d: I =0:512; III= 0:128; i i 0 0 II=0:128; IV = 0:032. i i 0 0 0 sequences where the rst action is d : V =0:08; VII=0:02. i i 00 0 0 sequences where the rst action is d : VII= 0:08; V III= i i 0:02. It is immediate that 1 0 0 hc; ci;d= I+ III= 0:64 i i i 1 0 0 hnc; nci;d= II+ IV =0:16 i i i 0 1 0 hc; ci;d = V=0:08 i i 0 1 0 hnc; nci;d = VI=0:02 i i 00 1 0 hc; ci;d = VII= 0:08 i i 00 1 0 hnc; nci;d = V III=0:02 i i Let the costs for i = a; b be C hc; ci=0:7 with probability 0:512if i chooses sequence I i C hc; ci=0:6 with probability 0:128if i chooses sequence III i 18 We assume that all other p ossible cases have probability 0. This is contrary to Algo- rithm 1, but it simpli es the exp osition. 25 C hc; ci=0:6 with probability 0:08if i chooses sequence V i C hc; ci=0:5 with probability 0:08if i chooses sequence VII i Note that C hc; ci varies depending on the communication/deliberation i process that leads to the domain actions hc; ci.Wecan also include a \breach- ing cost" into the model, say 0:6. This means that if the agents agreeona particular pro le of domain strategies, but some agent deviates, then that agent has to pay the breaching cost. With the breaching cost includedwe have: C hnc; nci=1:3 with probability 0:128if i chooses sequence II i C hnc; nci=1:2 with probability 0:032if i chooses sequence IV i C hnc; nci=1:2 with probability 0:02if i chooses sequence VI i C hnc; nci=1:1 with probability 0:02if i chooses sequence V III i Therefore, the agent wil l have the fol lowing beliefs about costs: 1 C hc; cijd=0:64 choosing sequences I or III i i 1 C hnc; ncijd=0:16 choosing sequences II or IV i i 0 1 C hc; cijd =0:08 choosing sequence V i i 0 1 C hnc; ncijd =0:02 choosing sequence VI i i 00 1 C hc; cijd =0:08 choosing sequence VII i i 26 00 1 C hnc; ncijd =0:02 choosing sequence V III i i Final ly, the possible expectedpayo s in the rst iteration are, for i = a; b: 1 1 1 d = u hc; ci hc; ci;d+u hnc; nci hnc; nci;d i i i i i 1 1