SWAPNIL DHAMAL ET AL. A STOCHASTIC GAME FRAMEWORK FOR ANALYZING COMPUTATIONAL INVESTMENT STRATEGIES IN DISTRIBUTED COMPUTING 1

A Stochastic Game Framework for Analyzing Computational Investment Strategies in Distributed Computing

Swapnil Dhamal, Walid Ben-Ameur, Tijani Chahed, Eitan Altman, Albert Sunny, and Sudheer Poojary

Abstract—We study a stochastic game framework with dynamic set of players, for modeling and analyzing their computational investment strategies in distributed computing. Players obtain a certain reward for solving the problem or for providing their computational resources, while incur a certain cost based on the invested time and computational power. We first study a scenario where the reward is offered for solving the problem, such as in blockchain mining. We show that, in Markov perfect equilibrium, players with cost parameters exceeding a certain threshold, do not invest; while those with cost parameters less than this threshold, invest maximal power. Here, players need not know the system state. We then consider a scenario where the reward is offered for contributing to the computational power of a common central entity, such as in volunteer computing. Here, in Markov perfect equilibrium, only players with cost parameters in a relatively low range in a given state, invest. For the case where players are homogeneous, they invest proportionally to the ‘reward to cost’ ratio. For both the scenarios, we study the effects of players’ arrival and departure rates on their utilities using simulations and provide additional insights. !

1 INTRODUCTION

ISTRIBUTED computing systems comprise computers which inputs from a very large search space. A miner is rewarded for D coordinate to solve large problems. In a classical sense, a mining a block, if it finds one of the rare inputs that generates a distributed computing system could be viewed as several providers hash value satisfying certain constraints, before the other miners. of computational power contributing to the power of a common Given the cryptographic hash function, the best known method for central entity (e.g. in volunteer computing [1], [2]). The central finding such an input is randomized search. Since the proof-of- entity could, in turn, use the combined power for either fulfilling work procedure is computationally intensive, successful mining its own computational needs or distribute it to the next level of requires a miner to invest significant computational power, result- requesters of power (e.g. by a computing service provider to ing in the miner incurring some cost. Once a block is mined, it is its customers in a utility computing model). The center would transmitted to all the miners. A miner’s objective is to maximize decide the time for which the system is to be run, and hence its utility based on the offered reward for mining a block before the compensation or reward to be given out per unit time to others, by strategizing on the amount of power to invest. There is the providers. This compensation or reward would be distributed a natural tradeoff: a higher investment increases a miner’s chance among the providers based on their respective contributions. A of solving the problem before others, while a lower investment provider incurs a certain cost per unit time for investing a certain reduces its incurred cost. amount of power. So, in the most natural setting where the reward In this paper, we study the stochastic game where players per unit time is distributed to the providers in proportion to their (miners or providers of computational power) can arrive and depart contributed power, a higher power investment by a provider is during the mining of a block or during a run of volunteer comput- likely to fetch it a higher reward while also increasing its incurred ing. We consider two of the most common scenarios in distributed cost, thus resulting in a tradeoff. computing, namely, (1) in which the reward is offered for solving Distributed computing has gained more popularity than ever the problem (such as in blockchain mining) and (2) in which the owing to the advent of blockchain. Blockchain has found applica- reward is offered for contributing to the computational power of a

arXiv:1809.03143v3 [cs.GT] 16 Nov 2019 tion in various fields [3], such as cryptocurrencies, smart contracts, common central entity (such as in volunteer computing). security services, public services, Internet of Things, etc. Its 1.1 Preliminaries functioning relies on a proof-of-work procedure, where miners (providers of computational power) collect block data consisting Stochastic Game. [4] It is a dynamic game with probabilistic tran- of a number of transactions, and repeatedly compute hashes on sitions across different system states. Players’ payoffs and state transitions depend on the current state and players’ strategies. The • Contact author: Swapnil Dhamal ([email protected]) game continues until it reaches a terminal state, if any. Stochastic • Swapnil Dhamal is a postdoctoral researcher with Chalmers University of games are thus a generalization of both Markov decision processes Technology, Sweden. A part of this work was done when he was a post- and repeated games. doctoral researcher with INRIA Sophia Antipolis-Mediterran´ ee,´ France and Tel´ ecom´ SudParis, France. Walid Ben-Ameur and Tijani Chahed Markov Perfect Equilibrium (MPE). MPE [5] is an adaptation are professors with Tel´ ecom´ SudParis, France. Eitan Altman is a senior of perfect to stochastic games. An research scientist with INRIA Sophia Antipolis-Mediterran´ ee,´ France. MPE of a player is a policy function describing its strategy Albert Sunny is an assistant professor with Indian Institute of Technology, for each state, while ignoring history. Each player computes its Palakkad, India. A part of this work was done when he was a postdoctoral researcher with INRIA Sophia Antipolis-Mediterran´ ee,´ France. Sudheer strategy in each state by foreseeing the effects of Poojary is a senior lead engineer with Qualcomm India Pvt. Ltd. A its actions on the state transitions and the resulting utilities, and part of this work was done when he was a postdoctoral researcher with the strategies of other players. A player’s MPE policy is a best Laboratoire Informatique d’Avignon, Universite´ d’Avignon, France. response to the other players’ MPE policies. SWAPNIL DHAMAL ET AL. A STOCHASTIC GAME FRAMEWORK FOR ANALYZING COMPUTATIONAL INVESTMENT STRATEGIES IN DISTRIBUTED COMPUTING 2 It is worth noting that, while game theoretic solution concepts computing considering the set of players to be dynamic. We such as MPE, Nash equilibrium, etc. may seem impractical owing consider the most general case of heterogeneous players; the cases to the assumption, they provide a strategy of homogeneous as well as multi-type players (which also have not profile which can be suggested to players (e.g. by a mediator) been studied in the literature) are special cases of this study. from which no player would unilaterally deviate. Alternatively, if players play the game repeatedly while observing each other’s 2 OUR MODEL actions, they would likely settle at such a strategy profile. Consider a distributed computing system wherein agents provide 1.2 Related Work their computational power to the system, and receive a certain reward for successfully solving a problem or for providing their Stochastic games have been studied from theoretical perspec- computational resources. We first model the scenario where the tive [6], [7], [8], [9], [10] as well as in applications such as reward is offered for solving the problem, such as in blockchain computer networks [11], cognitive radio networks [12], wireless mining, and explain it in detail. We then model the scenario where network virtualization [13], queuing systems [14], multiagent the reward is offered for contributing to the computational power reinforcement learning [15], and complex living systems [16]. of a common central entity, such as in volunteer computing. We We enlist some of the important works on stochastic games. hence point out the similarities and differences between the utility Altman and Shimkin [17] consider a processor-sharing system, functions of the players in the two scenarios. where an arriving customer observes the current load on the shared system and chooses whether to join it or use a constant- 2.1 Scenario 1: Model cost alternative. Nahir et al. [18] study a similar setup, with the difference that customers consider using the system over a We present our model for blockchain mining, one of the most in- long time scale and for multiple jobs. Hassin and Haviv [19] demand contemporary applications of the scenario where reward propose a version of subgame perfect Nash equilibrium for games is offered for solving the problem. We conclude this subsection by where players are identical; each player selects strategy based showing that the utility function thus obtained, generalizes to other on its private information regarding the system state. Wang and distributed computing applications belonging to this scenario. Zhang [20] investigate Nash equilibrium in a queuing system, Let r be the reward offered to a miner for successfully solving where reentering the system is a strategic decision. Hu and Well- a problem, that is, for finding a solution before all the other miners. man [21] use the framework of general-sum stochastic games to Players. We consider that there are broadly two types of players extend Q-learning to a noncooperative multiagent context. There (miners) in the system, namely, (a) strategic players who can exist works which develop algorithms for computing good, not arrive and depart while a problem is being solved (e.g., during necessarily optimal, strategies in a state-learning setting [22], [23]. the mining of a block) and can modulate the invested power based Distributed systems have been studied from game theoretic on the system state so as to maximize their expected reward and perspective in the literature [24], [25]. Wei et al. [26] study a (b) fixed players who are constantly present in the system and resource allocation game in a cloud-based network, with con- invest a constant amount of power for large time durations (such straints on quality of service. Chun et al. [27] analyze the selfish as typical large mining firms). In blockchain mining, for instance, caching game, where selfish server nodes incur either cost for the universal set of players during the mining of a block consists of replicating resources or cost for access to a remote replica. Grosu all those who are registered as miners at the time. In particular, we and Chronopoulos [28] propose a game theoretic framework for denote by U, the set of strategic players during the mining of the obtaining a user-optimal load balancing scheme in heterogeneous block under consideration. We denote by `, the constant amount distributed systems. of power invested by the fixed players throughout the mining of Zheng and Xie [3] present a survey on blockchain. Sapirshtein the block under consideration. We consider ` > 0 (which is true et al. [29] study selfish mining attacks, where a miner postpones in actual mining owing to mining firms); so the mining does not transmission of its mined blocks so as to prevent other miners from stall even if the set of strategic players is empty. Since the fixed starting the mining of the next block immediately. Lewenberg et al. players are constantly present in the system and invest a constant [30] study pooled mining, where miners form coalitions and share amount of power, we denote them as a single aggregate player k, the obtained rewards, so as to reduce the variance of the reward who invests a constant power of ` irrespective of the system state. received by each player. Xiong et al. [31] consider that miners Since it may not be feasible for a player to manually modulate can offload the mining process to an edge computing service its invested power as and when the system changes its state, we provider. They study a Stackelberg game where the provider sets consider that the power to be invested is modulated by a pre- price for its services, and the miners determine the amount of configured automated software running on the player’s machine. services to request. Altman et al. [32] model the competition over The player can strategically determine the policy, that is, how several blockchains as a non-cooperative game, and hence show much to invest if the system is in a given state. the existence of pure Nash equilibria using a congestion game We denote by cost parameter ci, the cost incurred by player i approach. Kiayias et al. [33] consider a stochastic game, where for investing unit amount of power for unit time. We consider each state corresponds to the mined blocks and the players who that players are not constrained by the cost they could incur. mined them; players strategize on which blocks to mine and when Instead, they aim to maximize their expected utilities (the expected to transmit them. reward they would obtain minus the expected cost they would In general, there exist game theoretic studies for distributed incur henceforth), while forgetting the cost they have incurred thus systems, as well as stochastic games for applications including far. That is, players are Markovian. In our work, we assume that blockchain mining (where a state, however, signifies the state of the cost parameters of all the players are common knowledge. This the chain of blocks). To the best of our knowledge, this work could be integrated in a blockchain mining or volunteer computing is the first to study a stochastic game framework for distributed interface where players can declare their cost parameters. This SWAPNIL DHAMAL ET AL. A STOCHASTIC GAME FRAMEWORK FOR ANALYZING COMPUTATIONAL INVESTMENT STRATEGIES IN DISTRIBUTED COMPUTING 3

information is then made available to the interfaces of all other TABLE 1 players (that is, to the automated software running on the players’ Notation machines). In real world, it may not be practical to make the r reward parameter players’ cost parameters a common knowledge, and furthermore, c cost incurred by player i when it invests unit power for unit time players may not reveal them truthfully. To account for such i limitations, a mean field approach could be used by assuming λi arrival rate corresponding to player i homogeneous or multi-type players (which are special cases of µi departure rate corresponding to player i our analysis). Furthermore, it is an interesting future direction to U universal set of strategic players design incentives for the players to reveal their true costs. ` constant amount of power invested by the fixed players k aggregate player accounting for all the fixed players Arrival and Departure of Players. For modeling the arrivals and S set of strategic players currently present in the system departures of players, we consider a standard queueing setting. (S) A player j, who is not in the system, arrives after time which xi strategy of player i in state S (S) is exponentially distributed with mean 1/λj (that is, the rate x strategy profile of players in state S parameter is λj); this is in line with the Poission arrival process x policy profile (S) where the time for the first arrival is exponentially distributed Γ(S,x ) rate of problem getting solved in state S under strategy profile x(S) with the rate parameter corresponding to the Poisson arrival. R(S,x) expected utility of i computed in state S under policy profile x Further, the departure time of a player j, who is in the system, i is exponentially distributed with rate parameter µj. The stochastic arrival of players is natural, like in most applications. Further, Rate of Problem Getting Solved. As explained earlier, the time players would usually shut down their computers on a regular required to find a solution in a large search space is independent basis, or terminate the computationally demanding mining task of the search space explored thus far. We consider this time to be (by closing the automated software) so as to run other critical exponentially distributed to model its memoryless property (∵ if tasks. Note that since players are Markovian, they do not account a continuous random variable has the memoryless property over for how much computation they have invested thus far for mining the set of reals, it is necessarily exponentially distributed). Let (S) the current block. Also, as we shall later see, the computation Γ(S,x ) be the corresponding rate of problem getting solved in itself is memoryless, that is, the time required to find the solution state S, when players’ strategy profile is x(S). Since the time does not depend on the time invested thus far. Owing to these two required is independent of the search space explored thus far, the reasons, the players do not monitor block mining progress, and probability that a player finds a solution before others at time τ is hence depart stochastically. proportional to its invested power at time τ. State Space. Due to the arrivals and departures of strategic Note that the time required for the problem to get solved is players, we could view this as a continuous time multi-state the minimum of the times required by the players to solve the process, where a state corresponds to the set of strategic players problem. Now, the minimum of exponentially distributed random present in the system. So, if the set of strategic players in the variables, is another exponentially distributed random variable system is S (which excludes the fixed players), we say that the with rate which is the sum of the rates corresponding to the system is in state S. So, we have S ⊆ U or equivalently, S ∈ 2U . original random variables. Furthermore, the probability of an In addition, we have |U| + 1 absorbing states corresponding to original random variable being the minimum, is proportional to its (S,x(S)) the problem being solved by the respective player (one of the rate. Let Pj be the rate (corresponding to an exponentially strategic players in U or a fixed player). The players involved at distributed random variable) of player j solving the problem in (S) any given time would influence each others’ utilities, thus resulting state S, when the strategy profile is x(S). So, we have Γ(S,x ) = in a game. The stochastic arrival and departure of players makes it P (S,x(S)) j∈S∪{k} Pj . Since the probability that player i solves the a stochastic game. As we will see, there are also other stochastic problem before the other players is proportional to its invested events in addition to the arrivals and departures, and which depend computational power at that time, we have that the rate of player (S) (S) (S) on the players’ strategies. (S,x ) xi (S,x ) i solving the problem is Pi = P (S) Γ , and j∈S xj +` Players’ Strategies. Let τ = 0 denote the time when the mining (S) (S,τ) (S,x ) of the current block begins. Let x denote the strategy of player the rate of other players solving the problem is Qi = i (S) (S) P x +` (S) i (amount of power it decides to invest) at time τ when the system P P(S,x ) = j∈S\{i} j Γ(S,x ). j∈(S\{i})∪{k} j P x(S)+` is in state S. Since players use a randomized search approach over j∈S j a search space which is exponentially large as compared to the The Continuous Time Markov Chain. Owing to the players solution space, the time required to find the solution is independent being Markovian, when the system transits from state S to state 0 0 of the search space explored thus far. That is, the search follows S , each player j ∈ S ∩ S could be viewed as effectively memoryless property. Also, note that a player has no incentive reentering the system. So, the expected utility could be written to change its strategy amidst a state owing to this memoryless in a recursive form, which we now derive. Table 1 presents the U property and if no other player changes its strategy. Hence in our notation. The possible events that can occur in a state S ∈ 2 are: analysis, we consider that no player changes its strategy within a (S,x(S)) 0 1) the problem gets solved by player i with rate Pi , thus (S,τ) (S,τ ) 0 state. So we have xi = xi for any τ, τ ; hence player i’s terminating the game in the absorbing state where i gets a (S) strategy could be written as a function of the state, that is, xi . reward of r; For a state S where j∈ / S, we have x(S) = 0 by convention. 2) the problem gets solved by one of the other players in (S \ j (S) (S) (S,x ) Let x denote the strategy profile of the players in state S. Let {i}) ∪ {k} with rate Qi , thus terminating the game in (S) x = (x )S⊆U denote the policy profile. an absorbing state where player i gets no reward; SWAPNIL DHAMAL ET AL. A STOCHASTIC GAME FRAMEWORK FOR ANALYZING COMPUTATIONAL INVESTMENT STRATEGIES IN DISTRIBUTED COMPUTING 4 3) a new player j ∈ U \ S arrives and the system transits to state 2.2 Scenario 2: Model S ∪ {j} with rate λj; 4) one of the players j ∈ S departs and the system transits to We now consider the scenario where the reward is offered for state S \{j} with rate µj. contributing to the computational power of a common central In what follows, we unambiguously write j ∈ U \ S as j∈ / S, for entity, such as in volunteer computing. Here, the reward offered (S) (S) (S,x ) (S,x ) (S,x(S)) per unit time is inversely proportional to the expected time for brevity. Since Pi + Qi = Γ , the sojourn time which the center decides to run the system. Considering that the (S,x(S)) P P −1 (S,x) in state S is (Γ + j∈ /S λj + j∈S µj) . Let D = time for which the center plans to run the system is exponentially (S,x(S)) P P Γ + j∈ /S λj + j∈S µj. So, the expected cost incurred distributed with rate parameter β, the reward offered per unit time (S) 1 cix is inversely proportional to , and hence directly proportional to by player i while the system is in state S is i . β D(S,x) β. Hence, let the offered reward per unit time be rβ, where r is the Utility Function. The probability of an event occurring before reward constant of proportionality. Furthermore, the reward given any other event is equivalent to the corresponding exponentially to a player is proportional to its computational investment. So, (S) distributed random variable being the minimum, which in turn, is xi the revenue received by player i per unit time is P (S) rβ, proportional to its rate. So, player i’s expected utility as computed j∈S xj +` (S) in state S is xi (S) and hence its net profit per unit time is P (S) rβ − cixi . j∈S xj +` The sojourn time in state S, similar to the previous scenario, is (S) (S) (S) (S) P (S,x ) x (S,x ) j∈S\{i} xj +` 1 (S,x) P P i (S,x) , where D = β + λj + µj (here, we Γ (S) Γ (S) D j∈ /S j∈S P x +` P x +` (S) R(S,x) = j∈S j ·r + j∈S j ·0 have β instead of Γ(S,x )). So, the net expected profit made by i D(S,x) D(S,x) (S) player i in state S before the system transits to another state, is (S) λj (S∪{j},x) µj (S\{j},x) cix x X X i i (S) + ·R + ·R − rβ−cix (S,x) i (S,x) i (S,x) P (S) i D D D j∈S x +` j∈ /S j∈S j (S,x) . (1) D Hence, player i’s expected utility as computed in state S is Note that we do not incorporate an explicit discounting factor

with time. However, the utility of player i can be viewed as (S) xi (S) discounting the future owing to the possibility that the problem (S) rβ − cix P x +` i (S,x) j∈S j X λj (S∪{j},x) can get solved in a state S where i∈ / S. Moreover, our anal- R = + ·R i D(S,x) D(S,x) i yses are easily generalizable if an explicit discounting factor is j∈ /S X µj (S\{j},x) incorporated. + ·R (3) D(S,x) i For distributed computing applications with a fixed objective j∈S such as finding a solution to a given problem, it is reasonable to assume that the rate of the problem getting solved is proportional (S,x) P P to the total power invested by the providers of computation. We, Note that since D = β + j∈ /S λj + j∈S µj here, Ex- (S)   (S,x ) P (S) (S,x(S)) hence, consider that Γ = γ j∈S xj + ` , where γ pression (3) is obtainable from Expression (1), when Γ = is the rate constant of proportionality determined by the problem β. being solved. Hence, player i’s expected utility as computed in Other Variants of Scenario 2. We considered that the time state S is for which the center decides to run the system is exponentially distributed with rate parameter β, where β is a constant. For (S) (S,x) x X λj (S∪{j},x) theoretical interest, one could consider a generalization where the R = (γr − c ) i + ·R i i D(S,x) D(S,x) i system may dynamically determine this parameter based on the set j∈ /S of players S ∪{k} present in the system. Let such a rate parameter X µj (S\{j},x) + ·R (2) D(S,x) i be given by f(S). Since the fixed players and their invested power j∈S do not change, these could be encoded in f(·), thus making it a function of only the set of strategic players. The center could (S,x) P (S)  P P where D = γ j∈S xj + ` + j∈ /S λj + j∈S µj. determine f(S) based on the cost parameters of the players in set Other Applications of Scenario 1. We derived Expression (1) S, the past records of the investments of players in set S, etc. If for the expected utility by considering that the probability of the time for which the system is to run is independent of the set of player i being the first to solve the problem is proportional to players currently present in the system, we have the special case: its invested power at the time, and hence obtains the reward r f(S) = β, ∀S. It can be easily seen that the analysis presented in with this probability. Now, consider another type of system which this paper (Section 4) goes through directly by replacing β with (S,x(S)) aims to solve an NP-hard problem where players search for a f(S), since Γ = f(S) is also independent of the players’ solution, and the system rewards the players in proportion to investment strategies. their invested power when the problem gets solved. In this case, Further, note that if the rate parameter is not just dependent the first two terms of Expression (1) are replaced with the term on the set of players present in the system but also proportional (S) ! (S) (S) x (S,x ) (S,x ) i to their invested power, it could be written as Γ = Γ (S) r P x +`   j∈S j γ P x(S) + ` D(S,x) . So, the mathematical form stays the j∈S j . This leads to the utility function being given (S)   (S,x ) P (S) by Equation (2) and hence its analysis is same as that of Scenario same, and so when Γ = γ j∈S xj + ` , our analysis presented in Section 3 holds for this case too. 1 (Section 3). SWAPNIL DHAMAL ET AL. A STOCHASTIC GAME FRAMEWORK FOR ANALYZING COMPUTATIONAL INVESTMENT STRATEGIES IN DISTRIBUTED COMPUTING 5

Convergence of Expected Utility 3 SCENARIO 1: ANALYSIS OF MPE ˆ(S,x) Note that Equation (1) encompasses both scenarios, where Let Ri be the equilibrium utility of player i in state S, that (S)   (S,x ) P (S) is, when i plays its best response strategy to the equilibrium Γ = γ j∈S xj + ` leads to Scenario 1, while (S) strategies of the other players j ∈ S\{i} (while foreseeing effects Γ(S,x ) = β leads to Scenario 2. We now show the conver- of its actions on state transitions and resulting utilities). We can gence of this recursive equation, and hence derive a closed-form determine MPE similar to optimal policy in MDP (using policy- expression for utility function. value iterations to reach a fixed point). Here, for maximizing Let us define an ordering O on sets which presents a one-to- Rˆ(S,x), we could assume that we have optimized for other states one mapping from a set S ⊆ U to an integer between 1 and 2|U|, i and use those values to find an optimizing x for maximizing both inclusive. Let R(x) be the vector whose component O(S) i Rˆ(S,x). In our case, we have a closed form expression for vector is R(S,x). We now show that R(x) computed using the recursive i i i R(x) in terms of policy x (Proposition 1); so we could effectively Equation (1), converges for any policy profile x. Let W(x) be the i determine the fixed point directly. state transition matrix, among the states corresponding to the set of A policy is said to be proper if from any initial state, the strategic players present in the system. In what follows, instead of probability of reaching a terminal state is strictly positive. Con- writing W (x)(O(S), O(S0)), we simply write W (x)(S, S0) since sider the condition that, there exists at least one proper policy, and it does not introduce any ambiguity. So, the elements of W(x) are for any non-proper policy, there exists at least one state where as follows: the value function is negatively unbounded. It is known that, λ For j∈ / S : W (x)(S, S ∪ {j}) = j under this condition, the optimal value function is bounded, and i D(S,x) µ it is the unique fixed point of the optimal Bellman operator [35]. For j ∈ S : W (x)(S, S \{j}) = j , i D(S,x) Our model satisfies this condition, since there does not exist any non-proper policy as the probability of reaching a terminal state All other elements of W(x) are 0. corresponding to the problem getting solved (either by player i (S) Since ` > 0, we have that Γ(S,x ) > 0. So, D(S,x) > or any other player including the fixed players) is strictly positive P P (x) (S,x(S)) j∈ /S λj + j∈S µj. Hence,Wi is strictly substochastic (sum (∵ Γ > 0). of the elements in each of its rows is less than 1). Now, from Equation (2), the Bellman equations over states (x) (S,x) U Let Zi be the vector whose component O(S) is Zi , where S ∈ 2 for player i can be written as ( (S) (S) ! (S,x ) (S) (S,x) xi X λj (S∪{j},x) (S,x) Γ x Rˆ = max (γr − c ) + ·Rˆ Z = r−c i , i i (S,x) (S,x) i i (S) i (S,x) x D D P x + ` D j∈ /S j∈S j ) X µj (S\{j},x) + ·Rˆ (x) (x) −1 (x) D(S,x) i Proposition 1. Ri = (I − W ) Zi . j∈S |U| (x) (1,x) (2 ,x) T Proof. Let Rihti = (Rihti ,...,Rihti ) , where t is the iter- We now derive some results, leading to the derivation of MPE. ation number and (·)T stands for matrix transpose. The iteration (x) Lemma 1. In Scenario 1, for any state S and policy profile x, we for the value of R starts at t = 0; we examine if it converges (S,x) ci (S,x) ci ihti have Ri ci, and Ri >r− if γr

(x) (x) (S,x(S)) Now, since W is strictly substochastic, its spectral radius is less Let Vi be the vector whose component O(S) is Vi . (x) t (x) (x) (S) than 1. So when t → ∞, we have limt→∞(W ) = 0. Since (x) (S,x ) Let Zi = Y Vi . Note that when Γ = (x) (x) t (x)  (S)  R is a finite constant, we have limt→∞(W ) R = 0. P (x) ih0i ih0i γ j∈S xj + ` , we have that Y is a diagonal ma- Pt−1 (x) η (x) −1   Further, limt→∞ (W ) = (I − W ) [34]. This P (S) η=0 γ j∈S xj +` (x) (x) implicitly means that (I − W ) is invertible. Hence, trix, with diagonal elements Y (S, S) = D(S,x) = P (S)  γ j∈S xj +` ∞ !  t  η P P P (S)  . It can hence be seen that (x) (x) (x) X (x) (x) j∈ /S λj + j∈S µj +γ j∈S xj +` lim R = lim W R + W Zi t→∞ ihti t→∞ ih0i (x) (x) η=0 W + Y is a stochastic matrix (the sum of elements in each = 0 + (I − W(x))−1Z(x) of its rows is 1). i Let U(x) = (I − W(x))−1Y(x)1, where 1 is the vector whose each element is 1. It is clear that all the elements of U(x) Owing to the requirement of deriving the inverse of I−W(x), are non-negative. We will now show that ||U(x)|| ≤ 1, that it is clear that a general analysis of the concerned stochastic game ∞ is, the maximum element of the vector U(x) is not more than when considering an arbitrary W(x) is intractable. In this work, 1. Let u be the element with the maximum value (one of the we consider two special scenarios that we motivated earlier in the S0 maximum, if there are multiple). Suppose u(x) = ||U(x)|| > 1. context of distributed computing systems, for which we show that S0 ∞ the analysis turns out to be tractable. So, we would have SWAPNIL DHAMAL ET AL. A STOCHASTIC GAME FRAMEWORK FOR ANALYZING COMPUTATIONAL INVESTMENT STRATEGIES IN DISTRIBUTED COMPUTING 6

U(x) = (I − W(x))−1Y(x)1 (S,x(S)) (S,x(S)) (S) (S,x) ˆ(x) (γr − ci)Ei − γ(Ei + γxi )W Ri (x) (x) (x) (x) =⇒ U = W U + Y 1 (S,x(S)) (S,x(S)) (S) ˆ(S,x) (S,x) = (γr − ci)Ei − γ(Ei + γxi )(Ri − Zi ) (x) X (x) (x) (x) =⇒ u = u W (S ,S) + Y (S ,S ) (S) (S) S0 S 0 0 0 (S,x ) (S,x) (S,x ) (S) = (γr − ci)E −γRˆ (E +γx ) S∈2U i i i i (S) (S) (x) (x) X (x) (x) (x) (γr − ci)xi (S,x ) (S) =⇒ uS < uS W (S0,S) + uS Y (S0,S0) +γ (E +γx ) 0 0 0 (S,x(S)) (S) i i U S∈2 Ei +γxi (x) (x) (S,x(S)) (S,x) (S,x(S)) (S) (S) [ max uS = uS > 1] ˆ ∵ S 0 = (γr−ci)Ei −γRi (Ei +γxi )+γ(γr−ci)xi X (x) (x) (S,x(S)) ˆ(S,x) (S,x(S)) (S) ˆ(S,x) =⇒ W (S0,S) + Y (S0,S0) > 1 = (γr−ci)Ei −γRi Ei +γxi (γr−ci −γRi ) S∈2U (S,x(S)) ˆ(S,x) (S) ˆ(S,x) = Ei (γr − ci − γRi ) + γxi (γr − ci − γRi ) (x) (x) (S,x) (S,x(S)) (S) However, this is a contradiction since W + Y is a = (γr − ci − γRˆ )(E + γx ) (x) i i i stochastic matrix. So, we have shown that ||U ||∞ = ||(I −  ci (S,x) (S,x(S)) (S) (x) −1 (x) (x) −1 (x) = γ r − − Rˆ (E + γx ) W ) Y 1||∞ ≤ 1. That is, (I − W ) Y is either γ i i i stochastic or substochastic. (x) (x) −1 (x) (x) From Proposition 1, R =(I − W ) Y V . Since (I − (S,x(S)) (S) (S,x) i i Since E + γx is positive, and (r − ci − Rˆ ) has (x) −1 (x) (S,x) i i γ i W ) Y is stochastic or substochastic, Ri for each S is (S,x) dRi a linear combination (with weights summing to less than or equal the same sign as (γr − ci) from Lemma 1, we have that (S) (S) dxi (S,x ) U to 1) of Vi over all S ∈ 2 . has the same sign as (γr − ci). Also, note that if γr = ci, we (S) (S) (S,x) (S) (S,x )  c  x U (S,x ) i i have Ri = 0, ∀S ∈ 2 from Proposition 1 when Γ = For each S, Vi = r − γ P (S) . So we have   j∈S xj +` P (S) (S) (S) γ j∈S xj + ` . (S,x ) ci (S,x ) ci Vi ci, and Vi >r − γ if γr ci, no power if γr < ci, and any (S) (S,x ) U amount of power if γr = ci. Since the maximal power of a player summing to less than or equal to 1) of Vi over all S ∈ 2 , (S,x) ci (S,x) ci i would be bounded (let the bound be xi), it would invest xi if we have Ri < r − γ if γr > ci, and Ri > r − γ if γr > ci. Hence, we have a consistent solution for the Bellman γr ci, 0 if γr < ci, and (S,x) (S) Lemma 2. In Scenario 1,Ri is a monotone function of xi . any amount of power in the range [0, xi] if γr = ci. Proof. We define the following for simplifying notation. Thus, the MPE strategy of a player follows a threshold policy,

(S,x) X (S∪{j},x) X (S\{j},x) with a threshold on its cost parameter ci (whether it is lower than A = λ Rˆ + µ Rˆ i j i j i γr) or alternatively, a threshold on the offered reward r (whether j∈ /S j∈S ci it is higher than γ ). Note that though a player i invests maximal ! power when γr > c , this is not inefficient since the power would (S,x(S)) X X X (S) i Ei = λj + µj +γ xj + ` be spent for less time as the problem would get solved faster. An j∈ /S j∈S j∈S\{i} intuition behind this result is that, when there are several miners in the system, the competition drives miners to invest heavily. On the Hence, we can write other hand, when there are few miners in the system, miners invest A(S,x) + (γr − c )x(S) heavily so that the problem gets solved faster (before arrival of R(S,x) = i i i i (S) more competition). Also, since the MPE strategies do not depend E(S,x ) + γx(S) i i on S, the assumption of state knowledge can be relaxed.

(S,x) (S,x(S)) (S,x) We now provide an intuition for why the MPE strategies are dRi (γr − ci)Ei − γAi =⇒ = 2 independent of the arrival and departure rates. From Proposition 1, (S)  (S,x(S)) (S) (x) (x) (S) dxi (x) −1 Ei + γxi Ri = (I − W ) Zi . For γr > ci, when power xi (x) (x) −1 increases, Zi increases and (I − W ) decreases. But The denominator is positive, while the numerator is a constant (x) (S) (S) Ri increases with xi when γr > ci (shown in the proof with respect to x(S), since A(S,x) and E(S,x ) do not depend on (x) i i i of Proposition 2), implying that the rate of increase of Zi (S) (S,x) (S) (x) −1 xi . So, Ri is a monotone function of xi , and whether it dominates the rate of decrease of (I − W ) . So, the effect of is increasing or decreasing, depends on the sign of the numerator: W(x) and hence state transitions is relatively weak, thus resulting (S,x(S)) (S,x) (γr − ci)Ei − γAi . in Markovian players playing strategies that are independent of the arrival and departure rates. Similar argument holds for γr ≤ ci. It Proposition 2. In MPE for Scenario 1, a player i invests its would be interesting to study scenarios where the rate of problem maximal power if γr > ci, no power if γr < ci, and any amount getting solved is a non-linear function of the players’ invested of power if γr = ci. powers. While a linear function is suited to most distributed (S,x) (x) (S,x) computing applications, a non-linear function could possibly see Proof. Let W be the row O(S) of W . Note that Ai = (S) (x) (S,x ) (S) (S,x) ˆ(x) W having a strong effect leading to MPE being dependent on (Ei + γxi )W Ri . From the proof of Lemma 2, (S,x) (S) the arrival and departure rates. dRi (S,x ) (S,x) (S) has same sign as (γr − ci)Ei − γAi , which can For analyzing the expected utility of a strategic player j, dxi be written as let us consider that the power available to it is very large, SWAPNIL DHAMAL ET AL. A STOCHASTIC GAME FRAMEWORK FOR ANALYZING COMPUTATIONAL INVESTMENT STRATEGIES IN DISTRIBUTED COMPUTING 7

say xj. Following our result on MPE, every player j satis- ! ψ(S) fying cj < γr would invest xj entirely. So, we have that X (S) (S) ˆ X xj + ` = ψ |S| − cj + ` γ(P x + `) is very large, and hence D(S,x) (which rβ j∈S,cj <γr j j∈S j∈Sˆ now approximates to γ(P x + `)) is also very large. j∈S,cj <γr j (S∪{j},x) (S\{j},x) Substituting P x(S) + ` as ψ(S), we get Since we know that Ri and Ri are bounded j∈S j by a small quantity from Lemma 1, the limit of the expected 2 (S,x) 1 X  (S) (S) utility Ri computed in any state S (from Equation (2)) is cj ψ − (|Sˆ| − 1)ψ − ` = 0 xi cixi rβ P r − P . To get further insight ˆ xj +` γ( xj +`) j∈S j∈S,cj <γr j∈S,cj <γr into this, say ` is considered insignificant, that is, the computation (S) is dominated by strategic players. Furthermore, say for every Solving this equation for positive value of ψ , we get strategic player i, ci < γr, and let the very large amount of power q 2 4` P available to these players be the same (xi = xj, ∀i, j ∈ U). |Sˆ| − 1 + (|Sˆ| − 1) + ˆ cj (S,x) (S) rβ j∈S Thus, the limit of the expected utility R computed in any ψ = rβ P i 2 ˆ cj r ci j∈S state S simplifies to |S| − γ|S| . This is intuitive, since if x = x , ∀i, j ∈ U i j , the reward would be won by the players Substituting this expression for ψ(S) in Equation (4) gives the with equal probability (hence the term r ), and the cost is reduced |S| MPE strategy of player i. owing to the reduced time due to the combined rate of the problem getting solved (hence the term ci ). P γ|S| (S) 2 j∈Sˆ cj So, x > 0 iff ci < q . That is, i ˆ ˆ 2 4` P |S|−1+ (|S|−1) + rβ j∈Sˆ cj 4 SCENARIO 2: ANALYSIS OF MPE only players with cost parameters in a relatively low range in a Proposition 3. In MPE for Scenario 2, a player i invests given state, invest. The constraint implies that if player i invests, then player j with c < c also invests. So, there exists a threshold   c ψ(S)   j i x(S) = max ψ(S) 1 − i , 0 player ˆi such that any player j with c > c would not invest. i rβ j ˆi Hence, set Sˆ can be constructed iteratively (initiating from an q |Sˆ|−1+ (|Sˆ|−1)2+ 4` P c empty set) by adding players j from set S \ Sˆ one at a time, in (S) rβ j∈Sˆ j where ψ = rβ P . Here, Sˆ is the 2 j∈Sˆ cj ascending order of cj, until the above constraint is violated for the maximal set of players j ∈ S which collectively satisfy the cost parameter of the newly added player. rβ constraints cj < ψ(S) . To get a better understanding of this result, if the power ` (x) invested by fixed players is considered insignificant, we have Proof. Recall that since W is a strictly substochastic ma- (S) |Sˆ|−1 (S) (x) −1 Pt−1 (x) η ψ = rβ P c and the condition for xi > 0 simplifies trix, (I − W ) = limt→∞ η=0(W ) . Since all the el- j∈Sˆ j P c ements of W(x) are non-negative, all the elements of (W(x))η to c < j∈Sˆ j . i |Sˆ|−1 also are non-negative for any natural number η, and hence all Furthermore, if the strategic players are homogeneous (ci = (x) −1 the elements of (I − W ) are non-negative. Also, since cj, ∀i, j ∈ U), the cost constraint is satisfied for all players (x) (x) −1 (x) (x) |S|c Ri = (I − W ) Zi (Proposition 1) and since W in S (since c < ) and so, all the strategic players invest (S) |S|−1 |S|−1 is independent of xi in this scenario, maximizing the compo- rβ  (x) (S,x) (S) c |S|2 . That is, if the computation is dominated by strategic nents of Zi (namely, Zi ) individually with respect to xi players which are homogeneous, they would invest proportionally (x) would essentially maximize all the elements of Ri . Now, since to the ‘reward to cost parameter’ ratio in MPE. (S) Γ(S,x ) = β in this scenario, we have Since the transition probabilities, and hence W(x), are con- ! stant w.r.t. players’ strategies in this scenario, a player’s MPE (S) (S,x) (S,x) β xi Z = r−c utility computed in state S (Ri ) is a linear combination i P (S) i D(S,x) j∈S xj + ` (with constant non-negative weights) of its utilities over all states computed without accounting for state transitions. Hence, the (S,x) (S) As D is independent of xi in this scenario, it can be shown MPE strategies are independent of the arrival and departure rates. (S,x) (S) that Zi is a concave function with respect to xi (the second Note that while the decision regarding whether or not to invest −2r`β derivative turns out to be P (S) 3 (S,x) ). The first order was independent of the cost parameters of the other players in ( j∈S xj +`) D (S,x) the system in Scenario 1, this decision highly depends on the cost dZi condition (S) = 0 gives parameters of other players in Scenario 2. dxi ! ! (S) X (S) ci  X (S)  x = x + ` 1 − x + ` 5 SIMULATION STUDY i j rβ j j∈S j∈S Throughout the paper, we determined MPE strategies, which we (S) P (S) (S) observed to be independent of players’ arrival and departure rates. Let ψ = j∈S xj + `. As xi is non-negative, we have However, it is clear from Equations (1), (2), (3) and Proposition 1  (S)  (S) (S) ψ  that the players’ utilities would indeed depend on these rates. x = max ψ 1 − ci , 0 (4) i rβ We now study the effects of these rates on the utilities in MPE using simulations. In order to reliably obtain an accurate relation ˆ (S) Let S = {j ∈ S : xj > 0}. We later show how to determine between the arrival/departure rates and the expected utilities of set Sˆ. Summing the above over all players in S and then adding ` the players, we consider that the computation is dominated by on both sides, we get the strategic players (that is, the power invested by the fixed SWAPNIL DHAMAL ET AL. A STOCHASTIC GAME FRAMEWORK FOR ANALYZING COMPUTATIONAL INVESTMENT STRATEGIES IN DISTRIBUTED COMPUTING 8

5 5 players is insignificant: ` → 0) and the strategic players are 10 6 = 0 10 7 = 0 homogeneous (their arrival/departure rates and their cost param- 6 = 1 7 = 1 4 4 eters are the same). Let λ, µ, c denote the common arrival rate, 10 6 = 10 10 7 = 10 6 = 100 7 = 100 departure rate, and cost parameter, respectively. Note that if the 103 6 = 1000 103 7 = 1000 strategic players are considered homogeneous, the players’ sets SNE SNE

(states) can be mapped to their cardinalities. We observe how the 102 102 Expected utility Expected expected utility of a player changes as a function of the number of utility Expected 101 101 other players present in the system, for different arrival/departure rates. In our simulations, we consider the following values: 100 100 0 1 2 3 4 0 1 2 3 4 r = 105, γ = β = 0.1, |U| = 104, c = 0.003 (a justification 10 10 10 10 10 10 10 10 10 10 of these values is provided in Appendix). Number of other players Number of other players (a) for different λ’s (µ = 10) (b) for different µ’s (λ = 10) Statewise Nash Equilibrium. For a comparative study, we also look at the equilibrium strategy profile of a given set of players Fig. 1. Expected utility of a player in Scenario 1 S, when there are no arrivals and departures (λj = 0, ∀j∈ / S and µj = 0, ∀j ∈ S). We call this, statewise Nash equilibrium 5 5 10 6 = 0 10 7 = 0 (SNE) in state S. Since the MPE strategies of the players are 6 = 1 7 = 1 independent of the arrival and departure rates, a player’s SNE 103 6 = 10 103 7 = 10 6 = 100 7 = 100 strategy in a state is same as its MPE strategy corresponding to 6 = 1000 7 = 1000 1 1 that state. Note, however, that the expected utilities in SNE would 10 SNE 10 SNE be different from those in MPE, since the expected utilities highly

10-1 10-1 Expected utility Expected depend on the arrival and departure rates (Equations (1), (2), (3) utility Expected and Proposition 1). Also, since SNE does not account for change 10-3 10-3 of the set of players present in the system, the expected utilities in SNE for different values on X-axis in the plots are computed 100 101 102 103 104 100 101 102 103 104 independently of each other. Number of other players Number of other players (a) for different λ’s (µ = 10) (b) for different µ’s (λ = 10)

5.1 Simulation Results Fig. 2. Expected utility of a player in Scenario 2 In Figures 1 and 2, the plots for expected utility largely follow near-linear curve (of negative slope) on log-log scale, with respect Scenario 2. Since a player’s SNE strategy in a state is same to the number of players in the system. That is, they nearly follow as its MPE strategy corresponding to that state, a player’s SNE power law, which means that scaling the number of players by startegy is to invest rβ |S|−1  in state S (as explained at the end a constant factor would lead to proportionate scaling of expected c |S|2 of Section 4 when computation is dominated by strategic players utility. that are homogeneous). Furthermore, in SNE, the expected utility Scenario 1. Figure 1 presents plots for expected utilities with r of each player can be shown to be |S|2 in state S (this can be MPE policy for various values of λ and µ, and compares them seen by substituting in Equation (3): λj = 0 ∀j∈ / S, µj = 0, with expected utilities in SNE. Following are some insights: (S) rβ |S|−1  cj = c, xj = c |S|2 , ∀j ∈ S, and ` → 0). Figure 2 • As seen at the end of Section 3, if the mining is dominated by presents the plots for expected utilities with the analyzed MPE strategic players which are homogeneous, the expected utilities policy for different values of λ and µ, and compares them against r c in MPE are bounded by |S| − γ|S| . It can be similarly shown that SNE. Following are some insights: the limit of the players’ expected utilities in SNE is r − c |S| γ|S| • An increase in the number of players increases competition for (this can be seen by substituting in Equation (2): λj = 0 ∀j∈ / the offered reward and hence reduces the reward per unit time (S) S, µj = 0, cj = c, xj → ∞, ∀j ∈ S, and ` → 0). Owing to received by each player, with no balancing factor (unlike in this, the expected utilities in MPE are bounded by the expected Scenario 1); so the expected utility decreases. utilities in SNE, which is reflected in Figure 1. • For higher λ, there is higher likelihood of system having • In Scenario 1, a higher λ results in a higher likelihood of the more players, thus resulting in lower expected utility owing to system having more players, which results in a higher rate of aforementioned reason. Also, from Figure 2(a), if λ is not very the problem getting solved as well as more competition. This, high, an increase in µ is likely to reduce the competition to the in turn, reduces the time spent in the system as well as the prob- extent that the expected MPE utility when the number of players ability of winning for each player, which hence reduces the cost in the system is large, can exceed the corresponding SNE utility incurred as well as the expected reward. Figure 1(a) suggests r ( |S|2 , which would be very low when the number of players in that, as λ changes, the change in cost incurred balances with the the system is large). change in expected reward, since the change in expected utility • A higher µ likely results in less competition, however it also is insignificant. results in a higher probability of player i departing from the • For a given µ, if the number of players changes, there is a system and hence losing out on the reward for the time it stays balanced tradeoff between the cost and the expected reward as out; this leads to a tradeoff. Figure 2(b) shows that the effect of above; so the change in expected utility is insignificant. But a the probability of player i departing from the system dominates higher µ results in a higher probability of player i departing the effect of the reduction in competition. For similar reasons as from the system and staying out when the problem gets solved, above, the expected MPE utility when the number of players in thus lowering its expected utility (Figure 1(b)). the system is large, can exceed the corresponding SNE utility. SWAPNIL DHAMAL ET AL. A STOCHASTIC GAME FRAMEWORK FOR ANALYZING COMPUTATIONAL INVESTMENT STRATEGIES IN DISTRIBUTED COMPUTING 9

6 FUTURE WORK [13] F. Fu and U. C. Kozat, “Stochastic game for wireless network virtualiza- tion,” IEEE/ACM Transactions on Networking, vol. 21, no. 1, pp. 84–97, One could study a variant of Scenario 1 where the rate of problem 2013. getting solved (and perhaps also the cost) increases non-linearly [14] E. Altman, “Non zero-sum stochastic games in admission, service and with the invested power. Since players are seldom completely routing control in queueing systems,” Queueing Systems, vol. 23, no. rational in real world, it would be useful to study the game under 1-4, pp. 259–279, 1996. [15] M. Bowling and M. Veloso, “An analysis of stochastic for . To develop a more sophisticated stochastic multiagent reinforcement learning,” Carnegie-Mellon University Pitts- model, one could obtain real data concerning the arrivals and burgh Pennsylvania School of Computer Science Technical Report No. departures of players and their investment strategies. Another CMU-CS-00-165, Tech. Rep., 2000. [16] N. Bellomo, Modeling complex living systems: A kinetic theory and promising possibility is to incorporate state-learning in our model. stochastic game approach. Springer Science & Business Media, 2008. A Stackelberg game could be studied, where the system de- [17] E. Altman and N. Shimkin, “Individual equilibrium and learning in cides the amount of reward to offer, and then the computational processor sharing systems,” Operations Research, vol. 46, no. 6, pp. 776– providers decide how much power to invest based on the offered 784, 1998. [18] A. Nahir, A. Orda, and D. Raz, “Workload factoring with the cloud: reward. A game-theoretic perspective,” in IEEE International Conference on Computer Communications (INFOCOM), vol. 12. IEEE, 2012, pp. APPENDIX 2566–2570. [19] R. Hassin and M. Haviv, “Nash equilibrium and subgame perfection in We take cues from bitcoin mining for our numerical simulations. observable queues,” Annals of Operations Research, vol. 113, no. 1-4, The current offered reward for successfully mining a block is 12.5 pp. 15–26, 2002. bitcoins. Assuming 1 bitcoin ≈ $8000, the reward translates to [20] J. Wang and F. Zhang, “Strategic joining in M/M/1 retrial queues,” $105. The bitcoin problem complexity is set such that it takes European Journal of Operational Research, vol. 230, no. 1, pp. 76–87, 2013. around 10 minutes on average for a block to get mined. That is, the [21] J. Hu and M. P. Wellman, “Nash Q-learning for general-sum stochastic rate of problem getting solved is 0.1 per minute on average. One games,” Journal of Machine Learning Research, vol. 4, no. Nov, pp. of the most powerful ASIC (application-specific integrated circuit) 1039–1069, 2003. currently available in market is Antminer S9, which performs [22] C. Jiang, Y. Chen, Y.-H. Yang, C.-Y. Wang, and K. R. Liu, “Dynamic Chinese restaurant game: Theory and application to cognitive radio computations of upto 13 TeraHashes per sec, while consuming networks,” IEEE Transactions on Wireless Communications, vol. 13, about 1.5 kWh in 1 hour, which translates to $0.18 per hour (at no. 4, pp. 1960–1973, 2014. the rate of $0.12 per kWh), equivalently $0.003 per minute. As [23] C.-Y. Wang, Y. Chen, and K. R. Liu, “Game-theoretic cross social media analytic: How Yelp ratings affect deal selection on Groupon?” IEEE per BitNode (bitnodes.earn.com), a crawler developed to estimate Transactions on Knowledge and Data Engineering, vol. 30, no. 5, pp. the size of bitcoin network, the number of bitcoin miners is 908–921, 2018. around 104. Hence, we consider r = 105, γ = β = 0.1, c = [24] I. Abraham, D. Dolev, R. Gonen, and J. Halpern, “Distributed computing 0.003, |U| = 104. meets game theory: Robust mechanisms for rational secret sharing and multiparty computation,” in ACM Symposium on Principles of Dis- tributed Computing. ACM, 2006, pp. 53–62. ACKNOWLEDGMENT [25] Y.-K. Kwok, S. Song, and K. Hwang, “Selfish grid computing: Game- The work is partly supported by CEFIPRA grant No. IFC/DST- theoretic modeling and NAS performance results,” in IEEE International Symposium on Cluster Computing and the Grid. IEEE, 2005. Inria-2016-01/448 “Machine Learning for Network Analytics”. [26] G. Wei, A. V. Vasilakos, Y. Zheng, and N. Xiong, “A game-theoretic method of fair resource allocation for cloud computing services,” The REFERENCES Journal of Supercomputing, vol. 54, no. 2, pp. 252–269, 2010. [27] B.-G. Chun, K. Chaudhuri, H. Wee, M. Barreno, C. H. Papadimitriou, and [1] L. F. G. Sarmenta, “Volunteer computing,” Ph.D. dissertation, Mas- J. Kubiatowicz, “Selfish caching in distributed systems: A game-theoretic sachusetts Institute of Technology, 2001. analysis,” in ACM Symposium on Principles of Distributed Computing. [2] D. P. Anderson and G. Fedak, “The computational and storage potential ACM, 2004, pp. 21–30. of volunteer computing,” in Sixth IEEE International Symposium on [28] D. Grosu and A. T. Chronopoulos, “Noncooperative load balancing in Cluster Computing and the Grid (CCGRID’06), vol. 1. IEEE, 2006, distributed systems,” Journal of Parallel and Distributed Computing, pp. 73–80. vol. 65, no. 9, pp. 1022–1034, 2005. [3] Z. Zheng and S. Xie, “Blockchain challenges and opportunities: A [29] A. Sapirshtein, Y. Sompolinsky, and A. Zohar, “Optimal selfish mining survey,” International Journal of Web and Grid Services, 2018. strategies in bitcoin,” in International Conference on Financial Cryptog- [4] L. S. Shapley, “Stochastic games,” Proceedings of the National Academy raphy and Data Security. Springer, 2016, pp. 515–532. of Sciences, vol. 39, no. 10, pp. 1095–1100, 1953. [30] Y. Lewenberg, Y. Bachrach, Y. Sompolinsky, A. Zohar, and J. S. Rosen- [5] E. Maskin and J. Tirole, “Markov perfect equilibrium: I. observable schein, “Bitcoin mining pools: A cooperative game theoretic analysis,” in actions,” Journal of Economic Theory, vol. 100, no. 2, pp. 191–219, International Conference on Autonomous Agents and Multiagent Systems 2001. (AAMAS). IFAAMAS, 2015, pp. 919–927. [6] D. Gillette, “Stochastic games with zero stop probabilities,” Contribu- [31] Z. Xiong, S. Feng, D. Niyato, P. Wang, and Z. Han, “Optimal pricing- tions to the Theory of Games, vol. 3, pp. 179–187, 1957. based edge computing resource management in mobile blockchain,” in [7] A. M. Fink et al., “Equilibrium in a stochastic n-person game,” Journal IEEE International Conference on Communications (ICC). IEEE, 2018, of Science of the Hiroshima University, series A-I (Mathematics), vol. 28, pp. 1–6. no. 1, pp. 89–93, 1964. [32] E. Altman, A. Reiffers, D. S. Menasche,´ M. Datar, S. Dhamal, and [8] J.-F. Mertens and A. Neyman, “Stochastic games,” International Journal C. Touati, “Mining competition in a multi-cryptocurrency ecosystem at of Game Theory, vol. 10, no. 2, pp. 53–66, 1981. the network edge: A congestion game approach,” ACM SIGMETRICS [9] J. K. Goeree and C. A. Holt, “Stochastic game theory: For playing Performance Evaluation Review, vol. 46, no. 3, pp. 114–117, 2019. games, not just for doing theory,” Proceedings of the National Academy [33] A. Kiayias, E. Koutsoupias, M. Kyropoulou, and Y. Tselekounis, of Sciences, vol. 96, no. 19, pp. 10 564–10 567, 1999. “Blockchain mining games,” in ACM Conference on and [10] E. Altman, T. Boulogne, R. El-Azouzi, T. Jimenez,´ and L. Wynter, Computation (EC). ACM, 2016, pp. 365–382. “A survey on networking games in telecommunications,” Computers & [34] J. H. Hubbard and B. B. Hubbard, Vector calculus, linear algebra, and Operations Research, vol. 33, no. 2, pp. 286–311, 2006. differential forms: A unified approach. Matrix Editions, 2015. [11] E. Altman, R. El-Azouzi, and T. Jimenez, “Slotted Aloha as a stochastic [35] D. P. Bertsekas and J. N. Tsitsiklis, “Neuro-dynamic programming: an game with partial information,” in WiOpt’03: Modeling and Optimization overview,” in IEEE Conference on Decision and Control (CDC), vol. 1. in Mobile, Ad Hoc and Wireless Networks, 2003, p. 9 pages. IEEE, 1995, pp. 560–564. [12] B. Wang, Y. Wu, K. R. Liu, and T. C. Clancy, “An anti-jamming stochastic game for cognitive radio networks,” IEEE Journal on Selected Areas in Communications, vol. 29, no. 4, pp. 877–889, 2011.