1 Reinforcement Learning and Nonparametric Detection of Game-Theoretic Equilibrium Play in Social Networks Omid Namvar Gharehshiran, William Hoiles, Student Member, IEEE, Vikram Krishnamurthy, Fellow, IEEE

Abstract—This paper studies two important signal processing Equilibrium Behavior in Games aspects of equilibrium behavior in non-cooperative games arising in social networks, namely, reinforcement learning and detection Reinforcement Learning Detection of of equilibrium play. The first part of the paper presents a rein- of Equilibrium Equilibrium Play forcement learning (adaptive filtering) algorithm that facilitates · Regret-matching · Revealed preferences learning an equilibrium by resorting to diffusion cooperation · Diffusion cooperation · Afriat’s theorem strategies in a social network. Agents form homophilic social · · groups, within which they exchange past experiences over an undirected graph. It is shown that, if all agents follow the proposed algorithm, their global behavior is attracted to the Fig. 1. Two aspects of game-theoretic equilibrium behavior in social networks correlated equilibria set of the game. The second part of the paper discussed in this paper. Both aspects involve multi-agent signal processing provides a test to detect if the actions of agents are consistent over a network. Stochastic approximation algorithms are used in both cases with play from the equilibrium of a concave potential game. to devise the desired scheme. The theory of revealed from is used to construct a non-parametric decision test and statistical test which only require the probe and associated actions of agents. will do. Game-theoretic learning explains how such coordina- A stochastic gradient algorithm is given to optimize the probe in real time to minimize the Type-II error probabilities of the tion might arise as a consequence of a long-run process of detection test subject to specified Type-I error probability. We learning from interactions and adapting behavior [1]. provide a real-world example using the energy , and a numerical example to detect malicious agents in an online social network. A. Main Ideas and Organization Index Terms—Multi-agent signal processing, non-cooperative The two aspects of equilibrium behavior in games that are games, social networks, correlated equilibrium, diffusion co- addressed in this paper along with the main tools used are operation, homophily behavior, revealed preferences, Afriat’s illustrated in Fig. 1. These two aspects are relevant to the broad theorem, stochastic approximation algorithm. area of machine learning of equilibria in social networks. The main results of this paper are summarized below: 1) Reinforcement learning dynamics in social networks: I.INTRODUCTION The first part of this paper (Sec. II to Sec. IV) addresses EARNING, , and equilibrium in games the questions: Can a social network of self-interested agents L are of central importance in the analysis of social net- that possess limited sensing and communication capabilities works. has traditionally been used in reach a global equilibrium behavior in a distributed fashion? and social sciences with a focus on fully rational interac- If so, can formation of social groups that exhibit identical arXiv:1501.01209v1 [cs.GT] 11 Dec 2014 tions where strong assumptions are made on the information homophilic characteristics facilitate the learning dynamics patterns available to individual agents. In comparison, social within the network? The main idea is to propose a diffusion networks are comprised of agents with limited cognition and based stochastic approximation algorithm (learning scheme) communication capabilities, and it is the dynamic interactions that if each agent deploys, the collective behavior of the social among agents that are of interest. This, together with the network converges to a correlated equilibrium. interdependence of agents’ choices, motivates the need for Sec. II-A introduces non-cooperative games with ho- game-theoretic learning models for agents interacting in social mophilic social groups in a social network. Homophily1 refers network. to a tendency of various types of individuals to exchange infor- The game-theoretic notion of equilibrium describes a con- mation with others who are similar to themselves. The detec- dition of global coordination where all agents are content tion of social groups that show common behavioural character- with their social welfare. Reaching an equilibrium, however, istics can be performed using such methods as matched sample involves a complex process of agents guessing what each other 1In [2] the following illustrative example is provided for homophily O. N. Gharehshiran, W. Hoiles, and V. Krishnamurthy are with the behavior in social networks: “If your friend jumped off a bridge, would you department of Electrical and Computer Engineering, University of British jump too?” A possible reasons for answering “yes” is that you are friends as a Columbia, Vancouver, V6T 1Z4, Canada (e-mail: [email protected]; result of your fondness for jumping off bridges. Notice that this is different to [email protected]; [email protected]). This research was supported by contagion behavior where “your friend inspired you to jump off the bridge”. the NSERC Strategic grant and the Canada Research Chairs program. Due to space restrictions we do not consider contagion behavior in this paper. 2 estimation [3]. Sec. II-B introduces correlated equilibrium [4] provided to illustrate the application decision test, statistical as the for such games. Correlated equilibrium test, and SPSA algorithm developed in Sec. V-B to Sec. V-D. is a generalization of Nash equilibrium, however, is more Concave potential games are considered for the detection realistic in multi-agent learning scenarios since observation test as the detection tests for to satisfy a Nash equilibrium D of the past history of decisions (or their associated outcomes) are very weak [10]. In this paper the requirement of to D naturally correlates agents’ future decisions. be consistent with Nash equilibrium of a concave potential In Sec. III we present a regret-based reinforcement learning game provides stronger restrictions when compared to only a algorithm that, resorting to diffusion cooperation strategies [5], Nash equilibrium while still encompassing a large set of [6], implements cooperation among members of a social functions. group. The proposed algorithm is based on the well-known An interesting aspect of both parts of this paper is the regret-matching algorithm [7]–[9]. The proposed algorithm2 ordinal nature of decision making. In the learning dynamics of suits the emerging information patterns in social networks, and Sec. III, the actions taken are ordinal; in the data-set parsing of allows to combine the past experiences of agents across the Sec. V, the utility function obtained is ordinal. Humans make network, which facilitates the learning dynamics and enables ordinal decisions3 since humans tend to think in symbolic agents to respond in real time to changes underlying the ordinal terms. network. To the best of our knowledge, this is the first work that uses diffusion cooperation strategies to implement col- B. Literature laboration in game-theoretic reinforcement learning. Sec. IV shows that if each agent individually follows the proposed Game theoretic models for social networks have been stud- algorithm, the experienced regret will at most be  after ied widely [11], [12]. For example, [13] formulate graphical sufficient repeated plays of the game. Moreover, if all agents games model where each agent’s influence is restricted to its follow the proposed algorithm independently, their collective immediate neighbors. The reader is referred to [14] for an behavior across the network will converge to an -distance of treatment of interactive sensing and decision making in social the polytope of correlated equilibria. networks from the signal processing perspective. 2) Detection of equilibrium play in a social networks: The Reinforcement Learning Dynamics: Regret-matching [7]– second part of the paper (Sec. V to Sec. VI) addresses the [9] is known to guarantee convergence to the set of correlated question: Given datasets of the external influence and actions equilibria [4] under no structural assumptions on the game of agents in a social network, is it possible to detect if the model. The correlated equilibrium arguably provides a natural behavior of agents is consistent with play from the equilibrium way to capture conformity to social norms [15]. It can be of a concave potential game. The theory of revealed preference interpreted as a mediator [16] instructing people to take actions from microeconomics is used to construct a non-parametric according to some commonly known probability . decision test which only requires the time-series of data The regret-based adaptive procedure in [7] assumes a fully = (p , x ): t 1, 2,...,T where p Rm denotes connected network topology, whereas the regret-based rein- D { t t ∈ { }} t ∈ the external influence, and x Rm denotes the action of forcement learning algorithm in [8] assumes a set of isolated t ∈ an agent. These questions are fundamentally different to the agents who neither know the game, nor share information of model-based theme that is widely used in the signal processing their past decisions with others. In [17], a regret-matching literature in which an objective function (typically convex) algorithm is developed when agents exchange information over is proposed and then algorithms are constructed to compute a non-degenerate network connectivity graph. the minimum. In contrast, the revealed preference approach The cooperation strategies in adaptive networked systems is data-centric—we wish to determine whether the dataset is can be classified as: (i) incremental strategies [18], (ii) con- obtained from the interaction of utility maximizers. sensus strategies [19], and (iii) diffusion strategies [5]. In Sec. V introduces how revealed preferences can be used to the first class of strategies, information is passed from one detect if the actions of agents originated from play from a agent to the next over a cyclic path until all agents are concave potential game using only the external influence and visited. In contrast, in the latter two, cooperation is enforced actions of the agents. Specifically in Sec. V-A we introduce the among multiple agents rather than between two adjacent preliminary tools of revealed preference for detecting utility neighbors. Diffusion strategies are shown to outperform con- maximization of single agents. Sec. V-B provides a non- sensus strategies in [20]; therefore, we concentrate on former parametric test to detect if the actions of agents is a result to implement cooperation among social agents in Sec. III. The of play from a concave potential game. If the actions are diffusion strategies have been used previously for distributed measured in noise then Sec. V-C provides a non-parametric estimation, distributed decision making, and distributed Pareto statistical test for play from a concave potential game with optimization in adaptive networks [6]. guaranteed Type-I error probability. To reduce the probability Detection of Equilibrium Play: Humans can be viewed as of Type-II errors of the statistical test Sec. V-D provides a si- social sensors that interact over a social network to provide multaneous perturbation stochastic gradient (SPSA) algorithm information about their environment. Social sensors go beyond to adjust the external influence in real-time. Two examples are 3Humans typically convert numerical attributes to ordinal scales before 2Although we use the term “algorithm,” the learning procedure mimics making decisions. For example, it does not matter if the cost of a meal at a human behavior; it involves minimizing a moving average regret and random restaurant is $200 or $205; an individual would classify this cost as “high”. experimentation. Also credit rating agencies use ordinal symbols such as AAA, AA, A. 3 physical sensors–for example, user preferences for a particular to establish or abolish links with other agents [27] to choosing movie are available on Rotten Tomatoes but are difficult to among different technologies [28]. measure via physical sensors. Social sensors present unique A generic joint action profile of all agents is denoted by challenges from a statistical estimation point of view. First, a = a1, , aK  K, where K = 1 K , social sensors interact with and influence other social sen- ··· ∈ A A A × · · · × A sors. Second, due to privacy concerns and time-constraints, and denotes the Cartesian product. Following the common × social sensors typically do not reveal their personal preference notation in game theory, one can rearrange a as rankings between actions. In classical revealed preference a = ak, a−k, where a−k = a1, , ak−1, ak+1, , aK  theory in the micro-economics literature Afriat’s theorem gives ··· ··· a non-parametric finite sample test to decide if an agent’s denotes the action profile of all agents excluding agent k. actions to an external influence are consistent with utility 3. Utility function: uk : K R is bounded, and deter- A → maximization [21]. The revealed preference test for single mines the payoff to agent k as a function of the action profile agents has been applied to measuring the welfare effect of a taken by all agents. The interpretation of such a payoff is the discrimination, analyzing the relationship between aggregated rewards and costs associated with the chosen action of broadband Internet access and time of use , and as the of the interaction. The payoff function can auctions for advertisement position placement on page search be quite general: It could reflect reputation or privacy, using results from Google [22]. the models in [29], [30], or benefits and costs associated with For interacting agents in a social network (i.e. players in a maintaining links in a social network, using the models in [27], game), single agent tests are not suitable. Typically the study [31]. It could also reflect benefits of consumption and the costs of interacting agents in a game require parametric assumptions of production, download, and upload in content production on the form of the utility function of the agents. Deb [10] and sharing over peer-to-peer networks [32], or the capacity was the first to propose a detection test for players engaged available to users in communication networks [33]. in a concave potential game based on Varian’s and Afriat’s Throughout the paper, we restrict our attention to non- work [21], [23]. Potential games were introduced by Monderer cooperative games in social networks in which agents have and Shapley [24] and are used extensively in the literature to identical homophilic characteristics. These situations are mod- study the strategic behaviour of utility maximization agents. eled by a symmetric non-cooperative game in the economics A classical example is the [25] in which the literature, that is formally defined as follows: utility of each agent depends on the amount of resource it Definition 2.1: A normal-form non-cooperative game is and other agents in the social network use. Recently the anal- symmetric if agents have identical action spaces, i.e., k = ysis of energy use scheduling and demand side management A = 1,...,A for all k , and for all k, l : schemes in the energy market was performed using potential A { } ∈ K ∈ K games [26]. uk ak, a−k = ul al, a−l , if ak = al, a−k = a−l. (1)

Intuitively speaking, in a , the identities of II.LEARNING EQUILIBRIAIN NON-COOPERATIVE GAMES agents can be changed without transforming the payoffs asso- WITH HOMOPHILIC SOCIAL GROUPS ciated with decisions. Symmetric games have been used in This section introduces a class of non-cooperative games the literature to model interaction of buyers and sellers in with homophilic social groups. Homophily refers to a tendency the global electronic market [34], clustering [35], cooperative of various types of individuals to associate with others who are spectrum sensing [36], and network formation models with similar to themselves—see footnote 1. Agents in homophilic costs for establishing links [37]. relationships share common characteristics that motivates their Agents can further subscribe to social groups within which communication. We then proceed to present and elaborate on a they share information about their past experiences. This is prominent solution concept in non-cooperative games, namely, referred to as neighborhood monitoring [17]. The communi- correlated equilibrium. cation among agents, hence, their level of “social knowledge,” can be captured by a connectivity graph, defined as follows: Definition 2.2: connectivity graph 4 A. Non-Cooperative Game Model The is a simple graph = ( , ), where agents form vertices of the graph, and The standard representation of a non-cooperative game, G E K (k, l) Agents k and l exchange information. known as normal form or strategic form game is comprised ∈ E ⇔ of three elements: The open and closed neighborhoods of each agent k are then, 1. Set of agents: = 1, ,K . Essentially, an agent K { ··· } respectively, defined by models an entity that is entitled to making decisions. Agents k := l ;(k, l) , and k := k k . (2) may be people, sensors, mobile devices, etc., and are indexed N { ∈ K ∈ E} Nc N ∪ { } by k . ∈ K Agents are, in fact, oblivious to the existence of other agents 2. Set of actions: k = 1, ,Ak , that denotes the A ··· except their immediate neighbors on the network topology, actions, also referred to as pure strategies, available to agent k at each decision point. A generic action taken by agent k is 4A simple graph is an unweighted, undirected graph containing no self denoted by ak. The actions of agents may range from deciding loops or multiple edges. 4 nor are they aware of the dependence of the outcome of their provided recommendation. That is, in correlated equilibrium, decisions on those of other agents outside their social group. agents’ decisions are coordinated as if there exists a global Besides exchanging past decisions with neighbors, agents coordinating device that all agents trust to follow. realize the stream of payoffs as the outcome of their choices. k At each time n = 1, 2,..., each agent k makes a decision an, III.REGRET-BASED COLLABORATIVE DECISION MAKING and realizes her utility This section presents the adaptive decision making algo- rithm that combines the regret-based reinforcement learning uk ak  = uk ak , a−k . (3) n n n n procedure [8], in the economics literature, with the diffusion Here, we assume that the agents are unaware of the exact cooperation strategies [5], [6], which has recently attracted form of the utility functions. However, even if agent k knows much attention in the signal processing society. the utility function, computing is impossible as she −k observes some (but not all) elements of an . A. Agents’ Beliefs Remark 2.1: It is straightforward to generalize the game Time is discrete n = 1, 2,.... At each time n, the model described above to social networks in which agents k agent makes a decision an according to a decision form multiple homophilic groups within which each agent k k k pn = (pn(1), , pn(A)) which relies on the agent’s belief forms a social group of its own. The algorithm that we k ··· k k matrix Rn = [rn(i, j)]. Each element rn(i, j) records the present next can be employed in such clustered networks discounted time-averaged regrets—losses in utilities—had the with no further modification. However, for simplicity of the agent selected action j every time it played action i in the presentation, we continue to use the single homophilic group past, and is updated via the recursive expression: [see (5) in the rest of this paper. at the bottom of the next page]. In (5), 0 < ε 1 is  a small parameter that represents the adaptation rate of the B. Correlated Equilibrium strategy update procedure, and is required when agents face a game where the parameters (e.g. utility functions) slowly jump In the first part, we focus on correlated equilibrium, which change over time [17]. Further, I(X) denotes the indicator is defined as follows: operator: I(X) = 1 if statement X is true, and 0 otherwise. Definition 2.3 (Correlated Equilibrium): Let π denote a Note that the update mechanism in (5) relies only on the K joint distribution on the joint action space , i.e., realized utilities, defined in (3). A k K P Positive rn(i, j) implies the opportunity to gain by switch- π (a) 0, a , and K π (a) = 1. ≥ ∀ ∈ A a∈A ing from action i to j in future. Therefore, the regret-matching The set of correlated -equilibria is the convex polytope: reinforcement learning procedure, that we present later in this Q [see (4), shown at the bottom the page], where πk(i, a−k) section, assigns positive probabilities to all actions j for which k denotes the probability that agent k picks action i and the rn(i, j) > 0. In fact, the probabilities of switching to different rest a−k. If  = 0, the convex polytope represents the set of actions are proportional to their regrets relative to the current correlated equilibria, and is denoted by . action, hence the name ‘regret-matching’. Q Several reasons motivate adopting the correlated equilibrium in large-scale social networks. It is structurally and computa- B. Diffusion Cooperation Strategy tionally simpler than the Nash equilibrium. The coordination Inspired by the idea of diffusion least mean squares over among agents in the correlated equilibrium can further lead adaptive networks [6], [38], we enforce cooperation among to potentially higher utilities than if agents take their actions neighboring agents via exchanging and fusing regret informa- independently (as required by Nash equilibrium) [4]. Finally, tion. Such diffusion of regret information is rewarding since it is more realistic as the observation of the common history agents belong to the same homophilic group—see Defini- of actions naturally correlates agents future decisions [8]. tion 2.1. That is, all agents attempt to optimize the same An intuitive interpretation of correlated equilibrium is “co- utility function, however, in the presence of interdependence ordination in decision-making.” Suppose a mediator is observ- among their decisions. It has been shown in [6], [38] that ing a repeated interactive decision making process among mul- such cooperation strategies can lead to faster resolution of tiple selfish agents. The mediator, at each period, gives private uncertainties in decentralized inference and optimization prob- recommendations as what action to take to each agent. The lems over adaptive networks, and enable agents to respond recommendations are correlated as the mediator draws them in real time to changes underlying such problem. In view of from a joint probability distribution on the action profile of all these benefits, this paper studies, for the first time, application agents; however, each agent is only given recommendations of such diffusion cooperation strategies in the game-theoretic about her own decision. Each agent can freely interpret the reinforcement learning context in social networks. recommendations and decide if to follow. A correlated equi- At the end of each decision period n, agent k shares the librium results if neither of agents wants to deviate from the belief matrix Rk with the neighbors k on the network n N

n P k −k  k −k k −k k o = π : −k π i, a u j, a u i, a , i, j , k (4) Q a − ≤ ∀ ∈ A ∈ K  k  k k pn(i) k k  k  k k  k  k rn+1(i, j) = rn(i, j) + ε k un an I an = j un an I an = i rn(i, j) (5) pn(j) · − · − 5

Algorithm 1 Regret-Matching With Diffusion Cooperation Initialization: Set

µk > A uk uk , max − min k k where umax and umin denote the upper and lower bounds on the utility function, respectively. Set the step-size 0 < ε 1, Information Exchange  Within Social Group and initialize pk = (1/A) 1 ,Rk = 0. 0 · A 0 Fig. 2. Regret-matching with diffusion cooperation. Step 1: Choose Action Based on Past Regret. ak pk = pk (1), . . . , pk (A) , n ∼ n n n connectivity graph —see Definition 2.2. Agent k then fuses G where pk (i) is given in (9), and x + = max 0, x . the collected information via a linear combiner [5], [6]: n | | { } Step 2: Update Individual Regret. k P l R := k w R (6) k h k i n l∈G kl n Rk = R + ε F k ak  R (10) n+1 n n − n where wkl denotes the weight that agent k assigns to the k k  k regrets experienced by agent l in her immediate neighborhood where F an = [fij(an)] is an A A matrix with elements × on the network connectivity graph. These weights give rise to k k k  pn(i) k k  k  k k  k  a global weight matrix W = [w ] in the network of agents. f a = u a I a = j u a I a = i . kl ij n pk (j) n n · n − n n · n In this paper, we assume: n Step 3: Fuse Regrets with Members of Social Group. W := IK + εC, (7) k P l Rn+1 = ∈N k wklRn. (11) where ε is the same as the step-size in (5), and l c Recursion. Set n n + 1, and go Step 1. C0 = C,C1 = 0 , and K K (8) ← c 1, c 0 for k = l, c > 0 iff (k, l) . | kl| ≤ kl ≥ 6 kl ∈ E 0 Here, IK denotes the K K identity matrix, ( ) denotes the Step 2: The individual updates regrets based on its actions × · n transpose operator, 1K and 0K represent K 1 vector of all and the associated outcomes as P (1 ε)n−τ F (ak), × τ=1 − τ ones and zeros, respectively. Agent k then combines the fused where F ( ) is an ordinal function of the action. The · information with her own realized utility at the current period, exponential discounting places more importance on recent and updates the decision policy for the next period. actions. It is shown in [39] that, by properly rescaling the periods Step 3: The individual then shares and fuses its regrets at which observation of individual agents and fusion of neigh- with other individuals in the social group. boring beliefs take place, the standard diffusion cooperation strategy in [38] can be approximated by the diffusion strategy Below we abstract the above social decision protocol into with the weight matrix W in (7). This further allows using the Algorithm 1 so as to facilitate analysis of the global behavior; well-known ordinary differential equation (ODE) method [40], see also Fig. 2 for a schematic illustration of Algorithm 1. The [41] for the convergence analysis. In light of (6), the belief of first two steps implement the regret-matching reinforcement each agent is a function of both her own past experience and learning procedure [7], whereas the last step implements the those of neighboring agents. This enables them to respond in diffusion protocol [6], [38]. real-time to the non-stationarities underlying the game. D. Discussion and Intuition C. Regret-Matching With Diffusion Cooperation Distinct properties of the local adaptation and learning The proposed reinforcement learning algorithm is summa- algorithm summarized in Algorithm 1 are as follows: rized in the following protocol that mimics human’s learning a) Decision strategy: The randomized strategy (9) is process: simply a weighted average of two probability vectors: The Social Decision Protocol first term, with weight 1 δ, is proportional to the positive − part of the regrets. Taking the minimum with 1/A guarantees Step 1: Individual k chooses action ak randomly from n that pk is a valid probability distribution, i.e., P pk (i) = 1. a weight vector (probabilities) pk . This weight vector is n i n n The second term, with weight δ, is just a uniform distribution an ordinal function5 of regret due to its previous actions. over the action space [8], [17]. It forces every action to be 5An ordinal function orders pairs of alternatives such that one is considered played with some minimal frequency (strictly speaking, with to be worse than, equal to, or better than the other—see item c) in Sec. III-D probability δ/A). The exploration factor δ is essential to be

( n 1 k k  + 1 o δ k k (1 δ) min µk rn an−1, i , A + A , i = an−1 pn(i) = − 6 (9) 1 P pk (j), i = ak − j=6 i n n−1 6

able to estimate the contingent utilities using only the realized where eaτ is a unit vector on the space of all possible joint utilities; it can, as well, be interpreted as exogenous statistical action profiles K with the element corresponding to the joint A “noise.” As will be discussed later, larger δ will lead to the play a being equal to one. The small parameter 0 < ε 1 τ  convergence of the global behavior to a larger -distance of is the same as the adaptation rate in (10). It introduces an the correlated equilibria set. exponential forgetting of the past decision profiles to enable b) Adaptive behavior: In (10), ε essentially introduces adaptivity of the network behavior to the evolution of the an exponential forgetting of the experienced regrets in the game model. That is, the effect of the old game model past, and facilitates adaptivity to the evolution of the non- on the decisions of agents vanishes as they repeatedly take cooperative game model over time. As agents successively actions. Given zn, the average utility accrued by each agent take actions, the effect of the old experiences on their current can be straightforwardly evaluated, hence the name global decisions vanishes. This enables tracking time variations on a behavior. It is more convenient to define zn via the stochastic timescale that is as fast as the adaptation rate of Algorithm 1. approximation recursion c) Ordinal choice of actions: The decision strategy is z = z − + ε [ea z − ] . (14) an ordinal function of the experienced regrets. Actions are n n 1 n − n 1 ordered based on the regret values with the exception that B. Asymptotic Local and Global Behavior all actions with negative regret are considered to be equally desirable—see footnote 3. In what follows, we present the main theorem that reveals d) Inertia: The choice of µk guarantees that there is both the local and global behavior emerging from each agent always a positive probability of picking the same action as individually following Algorithm 1 in a static game model. k The regret matrices Rk , k , and z will be used as the last period. Therefore, µ can be viewed as an “inertia” n ∈ K n parameter. It mimics humnas’ decision making process and indicatives of agent’s local and global experience, respectively. plays a significant role in breaking away from bad cycles. This We use stochastic averaging [41] in order to characterize the inertia is, in fact, the very factor that makes convergence to the asymptotic behavior of Algorithm 1. The basic idea is that, via correlated equilibria set possible under (almost) no structural a ‘local’ analysis, the noise effects in the stochastic algorithm assumptions on the underlying game [7]. is averaged out so that the asymptotic behavior is determined  k k by that of a ‘mean’ dynamical system. To this end, in lieu of e) Markov chain construction: The sequence an,Rn is a Markov chain with state space , and transition proba- working with the discrete-time iterates directly, one works with A bility matrix continuous-time interpolations of the iterates. Accordingly, define the piecewise constant interpolated processes k k  k  P a = i a = j = Pji R . n n−1 n Rk,ε(t) = Rk , zε(t) = z for t [nε, (n + 1)ε). (15) n n ∈ The above transition matrix is continuous, irreducible and Further, with slight abuse of notation, denote by Rk the aperiodic for each Rk . It is (conditionally) independent of n n regret matrix rearranged as vectors of length (A)2—rather than other agents’ action profile, which may be correlated. More an A A matrix—and let Rk,ε( ) represent the associated precisely, let h = a n , where a = (ak, a−k) denote × · n { τ }τ=1 τ τ τ interpolated vector processes; see (15). Let further denote the history of decisions made by all agents up to time n. Then, k·k the Euclidean norm, and R− represent the negative orthant in k −k  Pr a = i, a = a h − the Euclidean space of appropriate dimension. The following n n | n 1 k  −k  theorem characterizes the local and global behavior emergent = Pr a = i hn−1 P a = a hn−1 (12) n n | from following Algorithm 1. k  −k  = Pak i Rn P an = a hn−1 . n−1 | Theorem 4.1: Let tε be any sequence of real numbers k satisfying tε as ε 0. For each , there exists an The sample path of this Markov chain an is fed back into the → ∞ → { } k upper bound δb() on the exploration parameter δ such that, if stochastic approximation algorithm that updates Rn, which in turn affects its transition matrix. This interpretation is useful in every agent follows Algorithm 1 with 0 < δ < δb() in (9), as ε 0, the following results hold: Sec. A when deriving the limit dynamical system representing → (i) The regret vector Rk,ε( +t ) converges in probability to the behavior of Algorithm 1. · ε an -distance of the negative orthant. That is, for any β > 0,  k,ε   lim P dist R ( + tε), R−  > β = 0 (16) IV. EMERGENCEOF RATIONAL GLOBAL BEHAVIOR ε→0 · − This section characterizes the global behavior emerging where dist[ , ] denotes the usual distance function. · · from agents individually following Algorithm 1. (ii) The global behavior vector zε( + t ) converges in · ε probability to the correlated -equilibria set in the sense C A. Global Behavior that  ε  ε The global behavior z of the network at each time n is dist z ( + tε),  = inf z ( + tε) z 0. (17) n · C z∈C k · − k → defined as the discounted empirical frequency of joint action Proof: See Appendix A for a sketch of the proof. profile of all agents up to period n. Formally, The above theorem simply asserts that, if an agent in- n−1 P k−τ z = (1 ε) ea + ε (1 ε) ea , (13) dividually follows Algorithm 1, she will experience regret n − 1 2≤τ≤k − τ 7

TABLE I Consider a non-cooperative game among three agents = AGENTS’PAYOFFS IN A SYMMETRIC NON-COOPERATIVE GAME K 1, 2, 3 with action set = 1, 2 . Agents 1 and 2 exhibit 2 2 2 2 { } A { } a = 1 a = 2 a = 1 a = 2 identical homophilic characteristics and, hence, form a social a1 = 1 (2, 2, 5) (3, 6, 4) (1, 1, 3) (1, 4, 5) group. That is, = (1, 2), (2, 1) in the network connectivity 1 E { } a = 2 (6, 3, 4) (4, 4, 6) (4, 1, 0) (6, 6, 4) graph —see Definition 2.2. In contrast, agent 3 is isolated G a3 = 1 a3 = 2 from agents 1 and 2 and, in fact, unaware of their existence. Table I presents agents’ utilities in normal form: Each element 0 10 (x, y, z) in the table represents the utility of agents 1, 2, and 3,

n respectively, corresponding to the particular choice of action. d

10−1 Note in Table I that the game is symmetric between agents 1 and 2. Such situations arise in social networks when a homophilic group of agents aims to coordinate their decisions 10−2 in response to the actions of other (homophilic groups of) agents. Further, we set

−3   10 0.25 0.25 C = − Reinforcement learning procedure [8] 0.25 0.25 Distance to Correlated Equilibrium Regret-matching with diffusion cooperation − −4 10 101 102 in the weight matrix W , defined in (7). That is, agents 1 Iteration Number n and 2 place 1/4 weight on the information they receive from their neighbor on the connectivity graph, and 3/4 on their own Fig. 3. Distance to correlated equilibrium vs. iteration number. beliefs. We further set the exploration factor δ = 0.15 in the decision strategy (9), and the step-size ε = 0.01 in (10). of at most  after sufficient repeated plays of the game. As the benchmark, we use the standard reinforcement learn- Indeed,  can be made arbitrarily small by properly choosing ing procedure [8] to evaluate the performance of Algorithm 1. the exploration parameter δ in (9). It further states that, if However, we replace its decreasing step-size with the constant now all agents in the networked multi-agent system start step-size ε so as to make the two algorithms both adaptive following Algorithm 1 independently, their collective behavior and comparable. In view of the last step in the proof of converges to the correlated -equilibria set. Differently put, Theorem 4.1 in Appendix A, the distance to the polytope agents can coordinate their strategies in a distributed fashion of correlated equilibrium can be evaluated by the distance of so that the distribution of their joint behavior is close to the the agents’ regrets to the negative orthant. More precisely, we correlated equilibria polytope. From the game-theoretic point quantify the distance to correlated equilibrium set by of view, it shows that non-fully rational local behavior of q P k +2 dn = max i,j rn(i, j) . (18) agents—due to utilizing a ‘better-response’ rather than a ‘best- k∈K | | response’ strategy—can lead to the manifestation of globally Fig. 3 shows how d diminishes with time n for both sophisticated and rational behavior at the network level. Note n algorithms. Each point of the depicted sample paths is an in the above theorem that the convergence arguments are to a average over 100 independent runs of the algorithms. As is set rather than a particular point in that set. clearly evident, cooperation with neighboring agents over the Remark 4.1: The constant step-size in Algorithm 1 enables network topology improves the rate of convergence to the it to adapt to changes underlying the game model. Using correlated equilibrium. Algorithm 1 outperforms the reinforce- weak convergence methods [41], it can be shown that the ment learning procedure [8] particularly in the initial stages of first result in Theorem 4.1 holds if the parameters underlying the learning process, where sharing experiences (regrets) with the game undergo random changes on a timescale that is no neighbors leads to d monotonically decreasing with n. faster than the timescale determined by the adaptation rate of n Algorithm 1. The second result in Theorem 4.1 will further hold if the changes occur on a slower timescale. The reader V. DETECTIONOF EQUILIBRIUM PLAY IN GAMES is referred to [17] for further details. We now move on to the second part of the paper, namely, us- ing the principle of revealed preferences to detect equilibrium play of agents in a social network. The setup is depicted in Fig. C. Numerical Example 4. The main questions addressed are: Is it possible to detect if The limiting behavior of Algorithm 1 follows a differential the agents are utility maximizers? If yes, can the behavior of inclusion—see the proof of Theorem 4.1 in Appendix A. the agents be learned using the data from the social network? Differential inclusions are generalizations of ordinary differ- As mentioned in Sec. I, these questions are fundamentally ential equations (ODEs) in which the sample paths belongs different to the model-based theme that is widely used in the to a set; therefore, independent runs lead to different sample signal processing literature in which an objective function (typ- paths. This prohibits deriving an analytical rate of convergence ically convex) is proposed and then algorithms are constructed for reinforcement learning algorithms of this type. Here, we to compute the minimum. In contrast, the revealed preference resort to Monte Carlo simulations to illustrate and compare approach is data-centric–we wish to determine whether the the performance of Algorithm 1. dataset is obtained from the interaction of utility maximizers. 8

Classical revealed preference theory seeks to determine if an 1) The agent is a utility maximizer and there exists a non- agent is an utility maximizer subject to a budget constraint satiated and concave utility function that satisfies (19). based on observing its actions over time and is widely studied 2) For scalars ut and λt > 0 the following set of inequalities in the micro-economics literature. The reader is referred to the has a feasible solution: works [22] by Varian (chief economist at Google) for details. u u λ p0 (x x ) 0 for t, τ 1, 2,...,T . τ − t − t t τ − t ≤ ∈ { } (20) Social Network 3) A non-satiated and concave utility function that satisfies 1 2 (19) is given by:

0 n 3 u(x) = min ut + λtpt(x xt) (21) t∈T { − } n-1 4 4) The dataset satisfies the Generalized Axiom of Re- D vealed Preference (GARP), namely for any k T , 1 n 0 0 0 0 ≤ pt xt , , xt p x p x t k 1 = p x p x . ··· t t ≥ t t+1 ∀ ≤ − ⇒ k k ≤ k 1  External Influence As pointed out in [23], a remarkable feature of Afriat’s theorem is that if the dataset can be rationalized by a non- trivial utility function, then it can be rationalized by a contin- Fig. 4. Schematic of a social network containing n interacting agents where Rm i Rm pt ∈ denotes the external influence, and xt ∈ the action of agent uous, concave, monotonic utility function. “Put another way, i in response to the external influence and other agents at time t. Note that violations of continuity, concavity, or monotonicity cannot be dotted line denotes consumers 4, . . . , n − 1. The aim is to determine if the 1 2 n detected with only a finite number of demand observations”. dataset D = {(pt, xt , xt , . . . , xt ): t ∈ {1, 2,...,T }} is consistent with play from a Nash equilibrium of players engaged in a concave potential game. Verifying GARP (statement 4 of Theorem 5.1) on a dataset comprising T points can be done using Warshall’s algorithm D with O(T 3) [23] computations. Alternatively, determining if A. Preliminaries: Utility Maximization and Afriat’s Theorem Afriat’s inequalities (20) are feasible can be done via a LP feasibility test (using for example interior point methods [43]). Deterministic revealed preference tests for utility maximiza- Note that the utility (21) is not unique and is ordinal by tion were pioneered by Afriat [21], and further developed by construction. Ordinal means that any monotone increasing Diewert [42], and Varian [23]. Given a time-series of data m transformation of the utility function will also satisfy Afriat’s = (pt, xt), t 1, 2,...,T where pt R denotes the D { ∈ { }} ∈ theorem. Therefore the utility mimics the ordinal behavior of external influence, xt denotes the action of an agent, and t humans. Geometrically the estimated utility (21) is the lower denotes the time index, is it possible to detect if the agent is envelop of a finite number of hyperplanes that is consistent utility maximizer utility maximizer an ? An agent is an at each with the dataset . time t if for every external influence pt, the selected action xt D satisfies xt(pt) arg max u(x) (19) B. Decision Test for Nash Rationality ∈ { 0 ≤ } ptx It We now consider a version of Afriat’s theorem for deciding with u(x) a non-satiated utility function. Non-satiated means if a dataset from a social network is generated by agents D that an increase in any element of action x results in the playing from the equilibrium of a potential game. Potential utility function increasing. The non-satiated assumption rules games have been used in telecommunication networking for out trivial cases such as a constant utility function which can tasks such as routing, congestion control, power control in be optimized by any action, and as shown by Diewert [42], wireless networks, and peer-to-peer file sharing [44], and without local non-satiation the maximization problem (19) in social networks to study the diffusion of technologies, may have no solution. In (19) the social budget constraint advertisements, and influence [45]. p0 x I denotes the total amount of resources available to Consider the social network of interconnected agents in t t ≤ t the social sensor for selecting the action xt in response to Fig. 4, given a time-series of data from n agents = 1 n RmD the external influence pt. An example is the aggregate power (pt, xt , . . . , xt ): t 1, 2,...,T with pt the { i ∈ { }} ∈ consumption of agents in the energy market. The external external influence, xt the action of agent i, and t the time influence is the cost of using a particular resource, and the index, is it possible to detect if the dataset originated from action is the amount of resources used. The social impact agents that play a potential game? In Fig. 4 the actions of budget is therefore the total cost of using the resources and agents are dependent on both the external influence pt and the 0 is given by ptxt = It. Further insight into the social impact actions of the other agents in the social network. The utility budget constraint is provided in Sec. VI. function of the agent now includes the actions of other agents– The celebrated “Afriat’s theorem” provides necessary and formally if there are n agents, each has a utility function sufficient conditions for a finite dataset to have originated ui(xi, x−i) with xi denoting the action of agent i, x−i the D t t from an utility maximizer. actions of the other n 1 agents, and ui( ) the utility of − · Theorem 5.1 (Afriat’s Theorem): Given a dataset = agent i. Given a dataset , is it possible to detect if the D D (p , x ): t 1, 2,...,T , the following statements are data is consistent with agents playing a game and maximizing { t t ∈ { }} equivalent: their individual utilities? Deb, following Varian’s and Afriat’s 9 work, shows that refutable restrictions exist for the dataset , Note that if only a single agent (i.e. n = 1) is considered, D given by (22), to satisfy Nash equilibrium (23) [10]. These then Theorem 5.2 is identical to Afriat’s Theorem. Similar refutable restrictions are however, satisfied by most [10]. to Afriat’s Theorem, the constructed concave potential func- D The detection of agents engaged in a concave potential game, tion (26) is ordinal–that is, unique up to positive monotone and generating actions that satisfy Nash equilibrium, provide transformations. Therefore several possible options for Vˆ ( ) · stronger restrictions on the dataset [10]. We denote this exist that would produce identical preference relations to the D behaviour as Nash rationality, defined as follows: actual potential function V ( ). In 4) of Theorem 5.2, the first · Definition 5.1 ( [10]): Given a dataset condition only provides necessary and sufficient conditions for the dataset to be consistent with a Nash equilibrium = (p , x1, x2, . . . , xn): t 1, 2,...,T , (22) D D { t t t t ∈ { }} of a game, therefore the second condition is required to is consistent with Nash equilibrium play if there exist utility ensure consistency with the other statements in the Multi-agent D functions ui(xi, x−i) such that Afriat’s Theorem. The intuition that connects statements 1 and i i∗ i i −i 3 in Theorem 5.2 is provided by the following result from [24]; xt = xt (pt) arg max u (x , x ). (23) for any smooth potential game that admits a concave potential ∈ { 0 i≤ i} ptx It i function V , a sequence of responses x i∈{1,2,...,n} are i −i { } In (23), u (x, x ) is a non-satiated utility function in x, generated by a pure-strategy Nash equilibrium if and only if −i j x = x j=6 i for i, j 1, 2, . . . , n , and the elements it is a maximizer of the potential function, { } ∈ { } of pt are strictly positive. Non-satiated means that for any 1 2 n i i i i xt = xt , xt , . . . , xt arg max V ( x i∈{1,2,...,n})  > 0, there exists a x with x xt 2 <  such that { } ∈ { } − −i k − k 0 i i ui(xi, x i) > ui(xi, x ). If for all xi, xj Xi, there exists s.t. ptx It i 1, 2, . . . , n (27) t t ∈ ≤ ∀ ∈ { } a concave potential function V that satisfies for each probe vector p Rm. t ∈ + ui(xi, x−i) ui(xj, x−i) > 0 The non-parametric test for Nash rationality involves deter- −i −i j −i (24) mining if (25) has a feasible solution. Computing parameters iff V (x , x ) V (x , x ) > 0 i − vt and λt > 0 in (25) involves solving a linear program for all the utility functions ui( ) with i 1, 2, . . . , n , then with T 2 linear constraints in (n + 1)T variables, which has · ∈ { } the dataset satisfies Nash rationality. polynomial time complexity [43]. In the special case of one D  Just as with the utility maximization budget constraint in (19), agent, the constraint set in (25) is the dual of the shortest path 0 i i problem the budget constraint ptx It in (23) models the total in network flows. The parameters ut and λt in (20) ≤ 3 amount of resources available to the agent for selecting the can be computed using Warshall’s algorithm with O(T ) [23]. action xi to the external influence p . t t C. Statistical Test for Nash Rationality The following theorem provides necessary and sufficient conditions for a dataset (22) to be consistent with Nash In real world analysis a dataset may fail the Nash rationality D rationality (Definition 5.1). The proof is analogous to Afriat’s test (25) as a result of the agents actions xt being measured Theorem when the concave potential function of the game is in noise. In this section a statistical test is provided to detect differentiable [10], [46]. for Nash rationality when the actions are measured in noise. Theorem 5.2 (Multi-agent Afriat’s Theorem): Given a Here we consider additive noise wt such that the measured dataset (22), the following statements are equivalent: dataset is given by: D 1) is consistent with Nash rationality (Definition 5.1) for 1 2 n D obs = (pt, yt , yt , . . . , yt ): t 1, 2,...,T , (28) an n-player concave potential game. D { ∈ { }} v λi > 0 consisting of external influences pt and noisy observations of 2) Given scalars t and t the following set of i i i inequalities have a feasible solution for t, τ 1,...,T , the agents actions yt = xt + wt. In such cases a feasibility ∈ { } test is required to test if the clean dataset satisfies Nash n D X rationality (25). Let H and H denote the null hypothesis v v λip0 (xi xi) 0. (25) 0 1 τ − t − t t τ − t ≤ that the clean dataset satisfies Nash rationality, and the i=1 D alternative hypothesis that does not satisfy Nash rationality. 3) A concave potential function that satisfies (23) is given D In devising a statistical test for H0 vs H1, there are two by: possible sources of error: n ˆ 1 2 n X i 0 i i Type-I errors: Reject H0 when H0 is valid. V (x , x , . . . , x ) = min vt + λtpt(x xt) . (26) t∈T { − } i=1 Type-II errors: Accept H0 when H0 is invalid. (29) 4) The dataset satisfies the Potential Generalized Axiom Given the noisy dataset (28) the following statistical D Dobs of Revealed Preference (PGARP) if the following two test can be used to detect if a group of agents select actions conditions are satisfied. that satisfy Nash equilibrium (23) when playing a concave i i a) For every dataset = (pt, x ): t 1, 2, . . . , τ potential game: Dτ { t ∈ { }} for all i 1, . . . , n and all τ 1,...,T , i ∈ { } ∈ { } Dτ +∞ satisfies GARP. Z H0 b) The actions xi originated from agents in a concave fM (ψ)dψ ≷ γ . (30) t H1 potential game.  Φ∗{y} 10

In the statistical test (30): using only two measurements of the objective function cor- (i) γ is the “significance level” of the statistical test. rupted by noise, and for decreasing step size the algorithm with (ii) The “test statistic” Φ∗ y is the solution of the following probability one reaches a local stationary point. The SPSA { } constrained optimization problem for algorithm used to compute p is provided below. 1 2 n m×T y = (yt , yt , . . . , yt ) t∈{1,2,...,T }: Step 1: p = [p , p , . . . , p ] R { } Choose initial probe o 1 2 T + Step 2: q = 1, 2, 3,... ∈ min Φ For iterations n n Estimate the cost (i.e. probability of Type-II errors) P i 0 i i P i · s.t. vτ vt λtpt(yτ yt) λtΦ 0 (31) in (33) using − − i=1 − − i=1 ≤ i λt > 0 Φ 0 for t, τ 1, 2,...,T . K ≥ ∈ { } 1 X ∗  Jˆq(pq) = I FM (Φ (yk)) 1 α (34) (iii) fM is the probability density function of the random K ≤ − variable M where k=1 where I denotes the indicator function, and FM ( ) is  Pn 0 i i  M Max pt(wt wτ ) . (32) · ≡ t,τ i=1 | − | an estimate of the cumulative distribution function of M constructing by generating random samples The following theorem characterizes the performance of the ∗ according to (32). In (34) Φ (yk) is computing using statistical test (30). The proof is in the appendix. (31) with the noisy observations yk = x(pq) + wk. Theorem 5.3: Consider the noisy dataset obs (28) of exter- D Note that wk is a fixed realization of w, and the nal influences and actions. The probability that the statistical dataset p , x(p ) defined below (33). The H { q q } ∈ A test (30) yields a Type-I error (rejects 0 when it is true) is parameter K in (34) controls the accuracy of the less then γ. (Recall H0 and H1 are defined in (29)).  P i empirical probability of Type-II errors (33). Note that (31) is non-convex due to λtΦ; however, since Compute the gradient estimate ˆ pJˆq(pq): the objective function is given by the scalar Φ, for any fixed · ∇ value of Φ, (31) becomes a set of linear inequalities allowing Jˆq(pq + ∆qσ) Jˆq(pq ∆qσ) ˆ pJˆq(pq) = − − feasibility to be straightforwardly determined [47]. ∇ 2σ∆q (35) D. Stochastic Gradient Algorithm to Minimize Type-II Errors ( 1 with probability 0.5 ∆q(i) = − Theorem 5.3 above guarantees the probability of Type- +1 with probability 0.5 I errors is less then γ for the statistical test (30) for the detection of Nash rationality (Definition 5.1) from a noisy with gradient step size σ > 0. dataset (28). In this section, the statistical test (30) Update the probe vector pk with step size  > 0: Dobs · is enhanced by adaptively optimizing the external influence pq+1 = pq  ˆ pJˆq(pq). vectors p = [p1, p2, . . . , pT ] to reduce Type-II errors. − ∇ Reducing the Type-II error probability can be achieved The benefit of using the SPSA algorithm is that the estimated by dynamically optimizing the external influence p = gradient pJq(pq) in (35) can be computed using only two ∇ [p1, . . . , pT ]. The external influence p is selected as the measurements of the function (34) per iteration; see [48] for solution of the convergence and tutorial exposition of the SPSA algorithm. In particular, for constant step size , it converges weakly (in p∗ arg min J(p) ∈ Rm×T probability) to a local stationary point [47]. p∈ + ∞ Z+ ! VI.EXAMPLESOF EQUILIBRIUM PLAY:ENERGY MARKET = P fM (β)dβ > α p, x(p) . (33) { } ∈ A AND DETECTION MALICIOUS AGENTS Φ∗(y) In this section we provide two examples of how the decision | {z } Probability of Type-II error test (25), statistical detection test (30), and stochastic opti- mization algorithm (35) from Sec. V can be applied to detect In (33), y = x(p) + w with y defined above (31), f is the M for Nash rationality (Definition 5.1) in a social network. The probability density function of the random variable M (32), first example uses real-world aggregate power consumption P( ) denotes the conditional probability that (30) accepts · · · |· data from the Ontario energy market social network. The H for all agents given that H is false. The set contains all 0 0 A second is the detection of malicious agents in an online social elements p, x(p) , with x(p) = [xi(p ), . . . , xn(p )], where { } t t t t network comprised of normal agents, malicious agents, and an p, x(p) does not satisfy Nash rationality (Definition 5.1). { } authentication agent. To compute (33) requires a stochastic optimization algo- rithm as the probability density functions fM are not known explicitly. Given that we must estimate the gradient of the A. Nash Rationality in Ontario Electrical Energy Market objective function in (33), and that p Rm×T can comprise In this section we consider the aggregate power consump- ∈ + a large dimensional matrix, the simultaneous perturbation tion of different zones in the Ontario power grid. A sampling stochastic gradient (SPSA) algorithm is utilized to estimate p period of T = 79 days starting from January 2014 is used to from (33) [48]. The SPSA allows the gradient to be estimated generate the dataset for the analysis. All price and power D 11 consumption data is available from the Independent Electricity of measurement noise? Assuming the power consumption of System Operator6 (IESO) website. Each zone is considered agents are independent and identically distributed, the central as an agent in the corporate network illustrated in Fig. 5. limit theorem suggests that the aggregate consumption of The study of corporate social networks was pioneered by regions follows a zero mean normal distribution with variance Granovetter [49] which shows that the social structure of the σ2. The noise term w in (32) is given by the normal distri- network can have important economic outcomes. Examples bution (0, σ2). Therefore, to test if the failure is a result of N include agents choice of alliance partners, assumption of ra- noise, the statistical test (30) is applied for each region, and tional behavior, self interest behavior, and the learning of other the noise level σ2 estimated for the dataset to satisfy Dobs agents behavior. Here we test for rational behavior (i.e. utility the γ = 95% confidence interval for utility maximization. maximization and Nash rationality), and if true then learn the The results are provided in Fig. 6. As seen from Fig. 6, the associated behavior of the zones. This analysis provides useful information for constructing demand side management (DSM) strategies for controlling power consumption in the electricity 6 market. 4 2 1 2

Demand [GWh] 0 Aggregate Power 1 2 3 4 5 6 7 8 9 10 3 4 5 6 Zones Fig. 6. Average consumption (gray) and associated noise level σ (black) 7 8 9 10 for the price and demand data to satisfy utility maximization in each of the 1,. . . ,10 zones in the Ontario power grid defined in Fig. 5. The average hourly consumption over the T = 79 days starting from January 2014. Fig. 5. Schematic of the electrical distribution network in the Ontario power grid. The nodes 1, 2,..., 10 correspond to the distribution zones: Northwest, Essa, West, Toronto, and East zones do not satisfy the utility Northeast, Bruce, Southwest, Essa, Ottawa, West, Niagara, Toronto, and East. The dotted circles indicate zones with external interconnections–these nodes maximization requirement. This results as the required noise can import/export power to the external network which includes Manitoba, level σ for the stochastic utility maximization test to pass is too Quebec, Michigan, and New York. The network can be considered a corporate high compared with the average power consumption. There- social network of chief financial officers. fore if each zone is independently maximizing then only 60% The zones power consumption is regulated by the associated of the Ontario power grid satisfies the utility maximization test. price of electricity set by the senior management officer in However it is likely that the zones are engaged in a concave each respective zone. Since there is a finite amount of power potential game–this would not be a surprising result as network in the grid, each officer must communicate with other officers congestion games have been shown to reduce peak power in the network to set the price of electricity. Here we utilize demand in distributed demand management schemes [50]. To test if the dataset is consistent with Nash rationality the aggregate power consumption from each of the n = 10 D zones in the Ontario power grid and apply the non-parametric the detection test (25) is applied. The dataset for the power tests for utility maximization (20) and Nash rationality (25) consumption in the Ontario power gird is consistent with to detect if the zones are demand responsive. If the utility Nash rationality. Using (25) and (26), a concave potential maximization or Nash rationality tests are satisfied, then the function for the game is constructed. Using the constructed power consumption behaviour is modelled by constructing the potential function, when do agents prefer to consume power? marginal rate of substitution7 associated utility function (21) or concave potential function The (MRS) can be used to of the game (26). determine the preferred time for power usage. Formally, the MRS of xi(1) for xi(2) is given by To perform the analysis the external influence pt and action of agents xt must be defined. In the Ontario power grid ∂Vˆ /∂xi(1) the wholesale price of electricity is dependent on several MRS12 = . ∂Vˆ /∂xi(2) factors such as consumer behaviour, weather, and economic conditions. Therefore the external influence is defined as From the constructed potential function we find that MRS12 > pt = [pt(1), pt(2)] with pt(1) the average electricity price 1 suggesting that the agents prefer to use power in the time between midnight and noon, and pt(2) as the average between period associated with xt(1)–that is, the agents are willing to noon and midnight with t denoting day. The action of each give up MRS12 kWh of power in the time period associated zone correspond to the total aggregate power consumption in with xi(2) for 1 additional kWh of power in time period i each respective tie associated with pt(1) and pt(2) and is given associated with x (1). by xi = [xi(1), xi(2)] with i 1, 2, . . . , n . The budget Ii The analysis in this section suggests that the power con- t t t ∈ { } t of each zone has units of dollars as pt has units of $/kWh and sumption behavior of agents is consistent with players engaged i xt units of kWh. in a concave potential game. Using the Multi-agent Afriat’s We found that the aggregate consumption data of each zone Theorem the agents preference for using power was estimated. does not satisfy utility maximization (20). Is this a result 7The amount of one good that an agent is willing to give up in exchange 6http://ieso-public.sharepoint.com/ for another good while maintaining the same level of utility. 12

This information can be used to improve the DSM strategies malicious agents: to control power consumption in the electricity market.  i i  i i −i x (1)x (2) r (x , x ) = ln Pn Pn i j i=1 j=1 x (1)x (2) h  i i (36) si(xi; β) = ln Ql 1 + x (j) B. Detecting Malicious Agents in Online Social Networks j=1 β(j) ui(xi, x−i) = ri(xi, x−i) + si(xi; β)

Socialbots and spambots are autonomous programs which × − where x−i Rm (n 1) is the actions of the other (n 1) attempt to imitate human behavior and are prevalent on pop- t ∈ + − ular social networking sites such as Facebook and Twitter. In agents, r represents the interdependence of the total targets, this section we consider the detection of malicious agents in an and s represents each agents preference to avoid detection. online social network comprised of normal agents, malicious The static inaccuracy (i.e. noise-to-signal ratio) of each quarry Rm agents, and a network authentication agent as depicted in Fig. authentication is contained in the elements of β + . The ∈i 7. malicious social budget of each agent i is given by It . The total resources available to the authentication agent are limited such that in any operating period t the total resources available for Pm queries for authentication is given by j=1 pt(j). Consider the case with m = 2 fictitious agents. As the authentication p t agent commits larger resources to increase the queries for authentication pt(1), the associated number of friends and followers captured by the malicious agent xt(1) decreases. i Given that the total resources available to the authentication Authentication xt agent is limited, as pt(1) increases pt(2) must decrease. This Agent Fictitious Network causes an increase in the friends and followers captured by the malicious agent for the m = 2 fictitious agent xt(2). Therefore the malicious social budget is considered to satisfy Online Social Network i 0 i the linear relation It = ptxt. Malicious agents are those that are engaged in a concave potential game which attempt to Fig. 7. Schematic of an online social network with a network authentication agent that is able to create a fictitious agents (black) to interact with real maximize their respective utility function (36), and normal i agents in the social network. Two types of agents are considered: normal agents which have no target preference and therefore select xt agents (white), and malicious agents (grey). The goal is for the authentication in a uniform random fashion. At each observation t, a noisy agent to be able to detect and eliminate malicious agents from the online i i i measurement y (defined in (28)) is made of the actions x . social network. The parameters pt and actions xt are defined in Sec. VI-B. t t Given the dataset (28), a statistical test can be used to Dobs detect if malicious agents are present. Recent techniques for detecting malicious agents in the The dataset (22) for malicious agents are generated by D i social network (i.e. socialbots and spambots) use a method computing the maximum, xt i∈{1,2,...,n}, of the concave po- Pn { i} i −i i known as behavioural blacklisting which attempts to detect tential function V = u (x , x ), with u ( ) defined by i=1 · emails, tweets, friend and follower requests, and URLs which (36), for a given probe pt (refer to (27)). The parameter values have originated from malicious agents [51]. Behavioural black- for the numerical example are n = 3, m = 2, β = [0.03, 0.08], listing works as socialbots and spambots tend to have different and γ = 0.05, where n, m, β are defined in Sec. VI-B. The behaviors then humans. For example in Twitter, socialbots and malicious social budget for each of the n = 3 agents is spambots tend to re-tweet far more then normal (i.e. human) generated from the normal distributions: I1 (20, 1), I2 t ∼ N t ∼ agents, and by contrast normal accounts tend to receive more (50, 1), and I2 (80, 4). The queries for authentication N t ∼ N replies, mentions, and re-tweets [52]. The goal of malicious are generated from the uniform distribution pt (1, 5). ∼ U agents is to increase their connectivity in the social network For normal agents, (22) is constructed from xi obtained D t to deliver harmful content such as viruses, gaining followers from the uniform random variable xi (1, 50). The datasets t ∼ U and friends, marketing, and political campaigning. Consider obs (28) are obtained using the clean dataset , and additive D D the network topology depicted in Fig. 7. The authentication noise wi (0, κ) where κ represents the magnitude of the ∼ U agent is designed to detect and eliminate malicious agents measurement error. in the network. To this end the authentication agent is able Fig. 8 plots the estimated cost (34) versus iterates generated to construct fictitious accounts to study the actions of other by the SPSA algorithm (35) for σ = 0.1,  = 0.2, κ = agents in the network. Denoting p Rm as queries for 0.1, and T = 20 observations. Fig. 8 illustrates that by t ∈ + authentication for the m fictitious accounts produced by the judiciously adapting the external influence via a stochastic authentication node at time t, the response of agent i is given gradient algorithm, the probability of Type-II errors can be by xi Rm and is the total number of successfully targeted decreased to approximately 30% allowing the statistical test t ∈ + followers and friends of each of the m fictitious accounts. (30) to adequately reject normal agents. Note that larger values of p indicate a stronger quarry for Fig. 9 plots the probability that a dataset (28) that t Dobs authentication. We consider the following utility function for satisfies the decision test (25) and statistical test (30) for agents 13

1 we illustrated the application of the decision test, statistical 0.8 test, and stochastic gradient algorithm in a real-world example 0.6 using the energy market, and provided a numerical example to detect malicious agents in an online social network. An 0.4 important property of both aspects considered in this paper is

Type-II Errors 0.2 their ordinal nature, which provides a useful approximation to human behavior. Empirical Probability of 0 0 1,000 2,000 3,000 4,000 5,000 6,000

Iteration APPENDIX A Fig. 8. Performance of the SPSA algorithm (35) for computing the locally KETCHOFTHE ROOFOF HEOREM optimal external influence p to reduce the probability of Type-II errors of the S P T 4.1 statistical test (30). The parameters are defined in Sec. VI-B. The convergence analysis is based on [53] and is organized into three steps. For brevity and better readability, details for each step of the proof are omitted, however, adequate engaged in a concave potential game. The locally optimized references are provided for the interested reader. external influence p was obtained from the results of the Step 1: The first step uses weak convergence methods to SPSA algorithm above, allowing the malicious and normal characterize the limit individual behavior of agents following agents to be distinguished. As seen, the occurrence of Type-I Algorithm 1 as a dynamical system represented by a differ- errors in the statistical test is less then 5%, as expected from ential inclusion. Differential inclusions are generalizations of Theorem 5.3 because γ = 5%. the ODEs [54]. Below, we provide a precise definition. Definition A.1: A differential inclusion is a dynamical sys- 100 tem of the form Statistical Test d Decision Test Malicious Agents X (X) (37) 80 dt ∈ F Statistical Test where X Rr and : Rr Rr is a Marchaud map [54]. Decision Test ∈ F → That is, i) the graph and domain of are nonempty and 60 F Normal Agents closed; ii) the values (X) are convex; and iii) the growth of F is linear: There exists C > 0 such that, for every X Rr, 40 F ∈ sup Y C (1 + X ) (38) k k ≤ k k 20 Y ∈F(X) % Pass Maximization Test where denotes any norm on Rr. k · k 0 We proceed to study the properties of the sequence ak 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 { n} made according to Algorithm 1, which forms finite-state Noise Variance Markov chain—see Sec. III-D. Standard results on Markov Fig. 9. Performance of the decision test (25), and statistical test (30) for the chains show that the transition matrix (9) admits (at least) one detection of malicious agents and normal agents. The parameters are defined k in Sec. VI-B. invariant measure, denoted by σ . Then, the following lemma characterizes the properties of such an invariant measure. Lemma A.1: The invariant measure σk(Rk) of the transi- VII.SUMMARY tion probabilities (9) takes the form k k k  The unifying theme of this paper was to study equilibrium σ(R ) = (1 δ)ψ R + δ/A 1A, (39) play in non-cooperative games amongst agents in a social net- − · k k work. The first part focused on the distributed reinforcement where ψ (R ) satisfies learning aspect of an equilibrium notion, namely, correlated P k k + k P k + ψ r (j, i) = ψ r (i, j) , (40) equilibrium. Agents with identical homophilic characteristics j=6 i j i j=6 i formed social groups wherein they shared past experiences and x + = max x, 0 . | | { } over the network topology graph. A reinforcement learning In light of the diffusion protocol (6), agents’ successive algorithm was presented that, relying on diffusion coopera- decisions affect, not only their own future decision strate- tion strategies in adaptive networks, facilitated the learning gies, but also their neighbors’ policies. This suggests look- dynamics. It was shown that, if all agents follow the proposed ing at the dynamics of the regret for the entire network: 1 K  algorithm, their global behavior of the network of agents is Rn := col Rn,...,Rn . Using techniques from the theory attracted to the correlated equilibria set of the game. The of stochastic approximations [41], we work with the piecewise second part focused on parsing datasets from a social network constant continuous-time interpolations of Rn, defined by for detecting play from the equilibrium of a concave potential Rε(t) = R for t [kε, (k + 1)ε), (41) game. A non-parametric decision test and statistical test was n ∈ constructed to detect equilibrium play which only required to derive the limiting process associated with R . Let ∆ −k n A the external influence and actions of agents. To reduce the represent the simplex of all probability distributions over the probability of Type-II errors, a stochastic gradient algorithm joint action profiles of all agents excluding agent k, and ⊗ was given to adapt the external influence in real time. Finally, denote the Kronecker product. 14

Theorem A.1: Consider the interpolated process Rε( ) de- Theorem A.3: Recall the interpolated processes for the · fined in (41). Then, as ε 0, Rε( ) converges weakly8 global regret matrix Rε( ), defined in (41), and agent’s col- → · · to R( ) that is a solution of the system of interconnected lective behavior zε( ), defined in (15). Then, zε( ) converges · · · differential inclusions in probability to the correlated -equilibrium if and only if ε   dR R ( ) R− in probability, where R− is defined in (43). H (R) + (C I)R, (42) ·Proof:→ dt ∈ − The proof relies on how the regrets are defined. The interested reader is referred to [17, Section IV-D] for a where C = C I (see (7)), A denotes the cardinality of the ⊗ A somewhat similar proof. agents’ action set, and The above theorem, together with Corollary A.1, completes h (R) = uκ(ι, p−κ) uκ(j, p−κ) ψκ; p−κ ∆ −κ , the proof for the second result in Theorem 4.1. ij − ι ∈ A ι = i mod A, κ =  i  + 1. A APPENDIX B Further, ψk represents the stationary distribution characterized PROOFOF THEOREM 5.3 in Lemma A.1. Consider a dataset (22) that satisfies Nash rationality (25). D Proof: The proof relies on stochastic averaging theory Given , the inequalities (25) have a feasible solution. Denote D and is omitted for brevity. The interested reader is referred the solution parameters of (25), given , by λio > 0,V o . D { t t } to [14, Appendix B] for a detailed proof. Substituting xi = yi wi, from (28), into the inequalities t t − t Step 2: Next, we examine stability of the limit sys- obtained from the solution of (25) given , we obtain the D tem (42), and show its set of global attractors comprises an inequalities: -neighborhood of the negative orthant. With slight abuse of n n X X notation, we rearrange the elements of the global regret matrix V o V o λiop0 (yi yi) λiop0 (wi wi ). (45) τ − t − t t τ − t ≤ t t t − τ R as a vector, but still denote it by R. i=1 i=1 Theorem A.2: Consider the limit dynamical system (42). The goal is to compute an upper bound on the r.h.s. of (45) that io Let is independent of λt . Notice that the following inequalities  n K(A)2 + o R− = x R ; x 1 . (43) provide an upper bound on the r.h.s. of (45): ∈ | | ≤ n n k k X io 0 i i X io 0 i i Let further R (0) = R0 . Then, for each  0, there exists λ p (w w ) λ p (w w ) ≥ t t t − τ ≤ t | t t − τ | δb() 0 such that if δ δb() in (39) (or equivalently in the i=1 i=1 ≥ ≤  n n decision strategy (9)), the set R− is globally asymptotically  X  X 0  stable for the limit system (42). That is, λio p (wi wi ) ≤ t | t t − τ | i=1 i=1    lim dist R(t), R− = 0, (44) Λ M (46) t→∞ ≤ t Pn io where dist [ , ] denotes the usual distance function. with Λt = λ , and M defined by (32). Substituting · · i=1 t Proof: The proof uses Lyapunov stability theory and is (46) into (45) the following inequalities are obtained: omitted for brevity. n 1  X  Subsequently, we study asymptotic stability by looking at V o V o λiop0 (yi yi) M. (47) Λ τ − t − t t τ − t ≤ the case where ε 0, n , and εn . Nevertheless, t i=1 → → ∞ → ∞ instead of considering a two-stage limit by first letting ε 0 A solution of (31) given obs, defined by (28), is denoted ε → ∗ i∗ ∗ D and then t , we study R (t + tε) and require tε by Φ y , λ ,V . By comparing the inequalities obtained → ∞ → ∞ { { } t t } as ε 0. The following corollary asserts that the results of from the solution of (31) given , and the inequalities → Dobs Theorem A.2 also hold for the interpolated processes. (47), notice that Φ∗ y = M, λi∗ = λio,V ∗ = V o is a { { } t t t t } Corollary A.1: Denote by tε any sequence of real num- feasible, but not necessarily optimal solution of (31) given { } bers satisfying tµ as µ 0. Suppose Rn : ε > . Therefore, for satisfying malicious cooperation, it → ∞ → { Dobs D 0, n < is tight or bounded in probability. Then, for each must be the case that Φ∗ y M. This asserts, under the ∞} { ∗} ≤  0, there exists δb() 0 such that if 0 < δ δb() in (9), null hypothesis H0, that Φ y is upper bounded by M. ≥ε  ≥  ≤ ∗ { } R ( + tε) R− in probability, where R− is defined in (43). For a given Φ y , the integral in (30) is the probability of · → { } The above corollary completes the proof of the first result Φ∗ y M; therefore, the conditional probability of rejecting { } ≤ in Theorem 4.1. H0 when true is less then γ.  Step 3: In the final step, we show that the convergence of the regrets of individual agents to an -neighborhood of REFERENCES the negative orthant provides the necessary and sufficient [1] D. Fudenberg and D. K. Levine, “Learning and equilibrium,” Annual condition for convergence of their global behavior to the Review of Economics, vol. 1, pp. 385–420, 2009.  [2] C. Shalizi and A. Thomas, “Homophily and contagion are generically correlated -equilibria set. This is summarized in the following confounded in observational social network studies,” Sociological Meth- theorem. ods & Research, vol. 40, no. 2, pp. 211–239, 2011. [3] S. Aral, L. Muchnik, and A. Sundararajan, “Distinguishing influence- 8 r Let Zn and Z be R -valued random vectors. Zn converges weakly to based contagion from homophily-driven diffusion in dynamic networks,” Z, denoted by Zn ⇒ Z, if for any bounded and continuous function ψ(·), Proceedings of the National Academy of Sciences, vol. 106, no. 51, pp. Eψ(Zn) → Eψ(Z) as n → ∞. 21 544–21 549, 2009. 15

[4] R. J. Aumann, “Correlated equilibrium as an expression of bayesian [32] P. Golle, K. Leyton-Brown, I. Mironov, and M. Lillibridge, “Incentives rationality,” Econometrica, vol. 55, no. 1, pp. 1–18, 1987. for sharing in peer-to-peer networks,” in Electronic Commerce, ser. [5] A. H. Sayed, “Adaptive networks,” Proc. IEEE, vol. 102, no. 4, pp. Lecture Notes in Computer Science, L. Fiege, G. Muhl,¨ and U. Wilhelm, 460–497, 2014. Eds., 2001, vol. 2232, pp. 75–87. [6] ——, “Adaptation, learning, and optimization over networks,” Founda- [33] E. A. Jorswieck, E. G. Larsson, M. Luise, H. V. Poor, and A. Leshem, tions and Trends in Machine Learning, vol. 7, no. 4–5, pp. 311–801, “Game theory and the frequency selective interference channel,” IEEE 2014. J. Sel. Topics Signal Process., vol. 6, no. 2, pp. 73–75, 2012. [7] S. Hart and A. Mas-Colell, “A simple adaptive procedure leading to [34] S. Ba, A. B. Whinston, and H. Zhang, “The dynamics of the electronic correlated equilibrium,” Econometrica, vol. 68, no. 5, pp. 1127–1150, market: An evolutionary game approach,” Information Systems Frontiers, 2000. vol. 2, no. 1, pp. 31–40, 2000. [8] ——, “A reinforcement procedure leading to correlated equilibrium,” [35] M. Pelillo, “What is a cluster? perspectives from game theory,” in Proc. Economics Essays: A Festschrift for Werner Hildenbrand, pp. 181–200, of the NIPS Workshop on Clustering Theory, 2009. 2001. [36] B. Wang, K. J. R. Liu, and T. C. Clancy, “Evolutionary game framework Proc. of [9] S. Hart, A. Mas-Colell, and Y. Babichenko, Simple Adaptive Strategies: for behavior dynamics in cooperative spectrum sensing,” in IEEE GLOBECOM From Regret-Matching to Uncoupled Dynamics. World Scientific , New Orleans, LA, 2008, pp. 1–5. Publishing, 2013. [37] M. Slikker and A. van den Nouweland, “Network formation models with Review of Economic Design [10] R. Deb, “A testable model of consumption with ,” J. Econ. costs for establishing links,” , vol. 5, no. 3, pp. 333–362, 2000. Theory, vol. 144, no. 4, pp. 1804–1816, 2009. [38] C. G. Lopes and A. H. Sayed, “Diffusion least-mean squares over [11] M. O. Jackson and A. Wolinsky, “A strategic model of social and adaptive networks: Formulation and performance analysis,” IEEE Trans. economic networks,” J. Econ. theory, vol. 71, no. 1, pp. 44–74, 1996. Signal Process., vol. 56, no. 7, pp. 3122–3136, 2008. [12] C. Griffin and A. Squicciarini, “Toward a game theoretic model of infor- [39] O. N. Gharehshiran, V. Krishnamurthy, and G. Yin, “Distributed energy- Proceeding mation release in social media with experimental results,” in aware diffusion least mean squares: Game-theoretic learning,” IEEE J. of the IEEE Symp. on Security and Privacy Workshops , San Francisco, Sel. Topics Signal Process., vol. 7, no. 5, pp. 821–836, 2013. CA, 2012, pp. 113–116. [40] A. Benveniste, M. Metivier,´ and P. P. Priouret, Adaptive Algorithms and [13] M. Kearns, “Graphical games,” in Algorithmic Game Theory, N. Nisan, Stochastic Approximations. New York, NY: Springer-Verlag, 1990. T. Roughgarden, E. Tardos, and V. V. Vazirani, Eds. Cambridge Univ. [41] H. J. Kushner and G. Yin, Stochastic Approximation and Recursive Press, 2007, vol. 3, pp. 159–180. Algorithms and Applications, 2nd ed., ser. Stochastic Modeling and [14] V. Krishnamurthy, O. N. Gharehshiran, and M. Hamdi, “Interactive Applied Probability. New York, NY: Springer-Verlag, 2003. sensing and decision making in social networks,” Foundations and [42] W. Diewert, “Afriat’s theorem and some extensions to choice under Trends in Signal Processing, vol. 7, no. 1–2, pp. 1–196, 2014. uncertainty,” The Economic Journal, vol. 122, no. 560, pp. 305–331, [15] E. Cartwright and M. Wooders, “Correlated equilibrium, conformity 2012. and stereotyping in social groups,” Journal of Public Economic Theory, [43] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, UK: vol. 16, no. 5, pp. 743–766, 2014. Cambridge Univ. Press, 2004. [16] J. Xu and M. V. der Schaar, “Social norm design for information ex- [44] P. Maille´ and B. Tuffin, Telecommunication Network Economics: From change systems with limited observations,” IEEE J. Sel. Areas Commun., Theory to Applications. Cambridge, UK: Cambridge Univ. Press, 2014. vol. 30, no. 11, pp. 2126–2135, Dec. 2012. [45] N. Alon, M. Feldman, A. Procaccia, and M. Tennenholtz, “A note on [17] O. N. Gharehshiran, V. Krishnamurthy, and G. Yin, “Distributed tracking competitive diffusion through social networks,” Information Processing of correlated equilibria in regime switching noncooperative games,” Letters, vol. 110, no. 6, pp. 221 – 225, 2010. IEEE Trans. Autom. Control, vol. 58, no. 10, pp. 2435–2450, 2013. [46] W. Hoiles and V. Krishnamurthy, “Nonparametric demand forecasting [18] D. P. Bertsekas, “A new class of incremental gradient methods for least and detection of demand-responsive consumers,” IEEE Trans. Smart squares problems,” SIAM J. Optim., vol. 7, no. 4, pp. 913–926, 1997. Grid. [19] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computa- [47] V. Krishnamurthy and W. Hoiles, “Afriat’s test for detecting malicious tion: Numerical Methods. Singapore: Athena Scientific, 1997. agents,” IEEE Signal Process. Lett., vol. 19, no. 12, pp. 801–804, 2012. [20] S.-Y. Tu and A. H. Sayed, “Diffusion strategies outperform consensus [48] J. C. Spall, Introduction to Stochastic Search and Optimization: Estima- strategies for distributed estimation over adaptive networks,” IEEE tion, Simulation, and Control, 2003. Trans. Signal Process., vol. 60, no. 12, pp. 6217–6234, 2012. [49] M. Granovetter, “The impact of social structure on economic outcomes,” [21] S. Afriat, “The construction of utility functions from expenditure data,” J. Econ. Perspect., vol. 19, no. 1, pp. 33–50, 2005. Int. Econ. Rev., vol. 8, no. 1, pp. 67–77, 1967. [50] C. Ibars, M. Navarro, and L. Giupponi, “Distributed demand manage- [22] H. Varian, “Revealed preference and its applications,” The Economic ment in smart grid with a congestion game,” in Proc. of the 1st IEEE Journal, vol. 122, no. 560, pp. 332–338, 2012. Intl. Conf. on Smart Grid Communications, Gaithersburg, MD, 2010, [23] ——, “Non-parametric tests of consumer behaviour,” Rev. Econ. Stud., pp. 495–500. vol. 50, no. 1, pp. 99–110, 1983. [51] A. Ramachandran, N. Feamster, and S. Vempala, “Filtering spam with behavioral blacklisting,” in Proc. of the 14th ACM conf. on Computer [24] D. Monderer and L. Shapley, “Potential games,” Games Econ. Behav., and Communications Security. ACM, 2007, pp. 342–351. vol. 14, no. 1, pp. 124–143, 1996. [52] E. Ferrara, O. Varol, C. Davis, F. Menczer, and A. Flammini, “The rise [25] R. Rosenthal, “A class of games possessing pure-strategy Nash equilib- of social bots,” arXiv preprint, 2014. Int. J. Game Theory ria,” , vol. 2, no. 1, pp. 65–67, 1973. [53] M. Bena¨ım, J. Hofbauer, and S. Sorin, “Stochastic approximations and [26] A. Chapman, G. Verbic, and D. Hill, “A healthy dose of reality for game- differential inclusions, part II: applications,” Math. Oper. Res., vol. 31, Proc. of the theoretic approaches to residential demand response,” in no. 4, pp. 673–695, 2006. 2013 IREP Symposium – Bulk Power System Dynamics and Control-IX [54] J. P. Aubin and A. Cellina, Differential Inclusions: Set-Valued Maps and Optimization, Security and Control of the Emerging Power Grid . IEEE, Viability Theory. New York, NY: Springer-Verlag, 1984. 2013, pp. 1–13. [27] V. Bala and S. Goyal, “A noncooperative model of network formation,” Econometrica, vol. 68, no. 5, pp. 1181–1229, 2000. [28] K. Zhu and J. P. Weyant, “Strategic decisions of new technology adop- tion under asymmetric information: A game-theoretic model,” Decision Sciences, vol. 34, no. 4, pp. 643–675, 2003. [29] E. Gudes, N. Gal-Oz, and A. Grubshtein, “Methods for computing trust and reputation while preserving privacy,” in Data and Applications Security XXIII, ser. Lecture Notes in Computer Science, E. Gudes and J. Vaidya, Eds., 2009, vol. 5645, pp. 291–298. [30] L. Mui, “Computational models of trust and reputation: Agents, evolu- tionary games, and social networks,” Ph.D. dissertation, MIT, 2002. [31] Y. Zhang and M. V. der Schaar, “Strategic networks: Information dissemination and link formation among self-interested agents,” IEEE J. Sel. Areas Commun., vol. 31, no. 6, pp. 1115–1123, 2013.