Paper ~ Repeated Dollar Auctions: a Multi-Armed Bandit Approach

Repeated Dollar Auctions: A Multi-Armed Bandit Approach Marcin Waniek Long Tran-Tranh Tomasz Michalak University of Warsaw University of Southampton University of Oxford & [email protected] [email protected] University of Warsaw [email protected] ABSTRACT a dollar by making bids just as in the regular English auc- We investigate the repeated version of Shubik's [34] dollar tion. However, both the winner and the loser have to pay auctions, in which the type of the opponent and their level of their final bids. In other words, this is an all-pay auction rationality is not known in advance. We formulate the prob- which means that, once a player decides to participate and lem as an adversarial multi-armed bandit, and we show that make her first bid, any invested resources are irretrievably a modified version of the ELP algorithm [25], tailored to our lost. Consequently, the only way to gain any profit (or min- imize the loss) is to win the dollar. However, since both setting, can achieve O~(pjS jT ) performance loss (compared 0 players are in the same situation, the dollar auction often to the best fixed strategy), where jS j is the cardinality of the 0 ends up with an irrational conflict escalation [23]. set of available strategies and T is the number of (sequential) Interestingly, however, in a celebrated work, O'Neill [31] auctions. We also show that under some further conditions, p p proved that the first player has a winning strategy in the these bound can be improved to O~(jS j1=4 T ) and O~( T ), 0 dollar auction assuming pure strategies. In particular, if she respectively. Finally, we consider the case of spiteful play- makes a bid that is a specific combination of the auction's ers. We prove that when a non-spiteful player bids against a stake, the budget of both players and the minimal bid incre- malicious one, the game converges in performance to a Nash ment, her opponent will never win the auction and should equilibrium if both players apply our strategy to place their therefore pass. For instance, if the price increment is $0:05 bids. and both players have budgets of $2:50 each, then the first player that has a chance to bid should offer precisely $0:60 CCS Concepts and the other player should drop out. As a result, escala- •Theory of computation ! Online learning algorithms; tion should never occur. The result by O'Neill suggests that Computational pricing and auctions; conflict escalation should never occur and if it does so in real-life experiments, it has to be related to human bounded- rationality or other factors not accounted for by the auction Keywords model. dollar auction; online learning; multi-armed bandits However, O'Neil's result was re-examined by Leininger [23] who showed the advantage of the first bidder vanishes con- 1. INTRODUCTION sidering mixed strategies equilibria and the auction esca- lates. Demange [13] proved the same for settings where play- Conflicts are an integral part of human culture [8]. Whether ers are uncertain about each other's strength. In addition, it is a competition of rival technological companies [38], Waniek et al. [39] showed that when used against a spite- a battle of lobbyists trying to acquire a government con- ful or malicious player, the optimal strategy proposed by tract [15] or the arms race of modern empires [31], conflict O'Neill can lead to severe loss. analysis has always been of interest to game theorists and One of the key deficiencies of all of the above results is economists alike [18]. The analysis of conflict is also of nat- that they concern one-shot auction case, where the goal is ural interest analysed in AI and, especially, in Multi-Agent to investigate what would be an optimal bidding strategy in Systems (MAS) literature [27, 37, 11]. Typically conflicts the single auction. However, many real-world applications in MAS used to be considered as failures or synchronisation consist of a sequence of auctions repeatedly played against problems [40, 36]. More recently, however, the need for more the same opponent [7, 20, 17]. In fact, many examples of advanced studies has been recognized [10], including analy- R&D competition, arms races, lobbying, or interactions in ses of conflict generation, escalation, or detection in MAS. MAS can be seen not as a single instance of an auction, but A very simple dollar auction introduced by Shubik [34] rather as a series of confrontations. We will show later in provides a powerful framework to formalise and study con- this paper that the nature of repeated encounters can be flict situations. In this auction two participants compete for leveraged in order to achieve good performance. Typically in such repeated settings, there is little (if any) Appears in: Proceedings of the 15th International Conference prior knowledge about the opponent's incentives, nor about on Autonomous Agents and Multiagent Systems (AAMAS 2016), her rationality level. Second, while it is typical to assume J. Thangarajah, K. Tuyls, C. Jonker, S. Marsella (eds.), some (perfect or bounded) rationality model of the players, May 9{13, 2016, Singapore. Copyright c 2016, International Foundation for Autonomous Agents and it might be the case that the chosen bids do not follow such Multiagent Systems (www.ifaamas.org). All rights reserved. 579 assumptions (e.g., due to lack of computational power, or The remainder of the paper is organised as follows. We lack of perfect execution). As such, efficient bidding strate- start by introducing the basic definitions and notation, and gies need to consider the uncertainty in both the incentive then outline our multi-armed bandit based bidding model. and rationality of the opponent. We then describe the ELP algorithm, and analyse its perfor- In this paper, we reconsider Shubik's dollar auction but mance in different settings. The case of non-spiteful against now in the repeated setting. We ask whether it is possible malicious player is discussed afterwards, followed by the con- to design efficient bidding strategies, given the high level clusions. Appendix provides notation table for the reader's of uncertainty and no prior knowledge of the opponent. Put convenience. differently, we investigate whether a participant's knowledge about the opponent (gained in the previous encounters) can 2. PRELIMINARIES give the participant an edge over her opponent so that the In this section, we formally describe the two building blocks escalation does not occur. To this end, we view the problem of our problem, namely the dollar auction and the multi- from the perspective of online learning theory, where the armed bandit model. auctions are played against an adversary. More specifically, we formulate the problem using an ad- 2.1 The Dollar Auction versarial multi-armed bandit model [5, 25], where at each The dollar auction [34] is an all-pay auction between two round we choose from a (finite) set of possible strategies S0 players N = f0; 1g. We will often denote them by i and to play against the opponent. At the end of each round, we j. The winner of an auction is given the stake s 2 N. The observe the outcome of the auction and update our estima- players can make bids that are multiples of a minimal bid tion on the strategy's fitness (i.e., how much expected profit increment δ. Without loss of generality, we assume that the strategy can achieve). Our model also assumes that there δ = 1. is some level of interdependence between the different strate- While Shubik [34] considered the dollar auction without gies. That is, by observing an outcome of a specific strategy, any budget limit, following O'Neill [31], most subsequent we can get some side information about the fitness of other studies considered the setting in which players have limited strategies. For example, if a particular strategy wins the auc- budgets b0; b1 2 N. In this paper, we assume that both play- tion with a (final) bid b, then any other strategies that shares ers have equal budgets, i.e., b0 = b1 = b. the same behaviour as the current one up to b (i.e., the lat- The players move in turns and the starting player is de- ter is a prefix of the former), would also win the auction. As termined randomly at the beginning of the auction. More- such, we can also update our fitness estimate of the interde- over, the starting player i can either make the first bid, pendent strategies as well. pass and leave the auction, or let j move first. Once the Given this, we prove that by applying ELP (i.e., \Expo- first bid has been made, any player making the move can nentially weighted algorithm with Linear Programming")|a either make a bid higher than the opponent, or pass. In state-of-the-art online learning algorithm proposed by [25]| other words, offering the turn to the opponent is only per- ~ p 1 we can achieve O( jS0jT ) performance loss against the mitted before the first bid is made. We denote by Xb the set best fixed bidding strategy (i.e., playing the same strategy f(x; y) 2 f0; : : : ; b − 1g × f0; : : : ; b − 1g : y > xg [ f(0; 0)g each round) in hindsight. In particular, since the regret is where x is the last bid of the player currently choosing her p sub-linear due to the square root in O~( jS0jT ), it implies bid and y is the last bid of her opponent. Whenever it is a that the average regret per time step is converging to 0 as player's turn to make a bid, the dollar auction would be in T tends to infinity.

Load more