Clustering Confusion Matrices to Identify Avatar Aliases Olivier Cavadenti, Victor Codocedo, Jean-François Boulicaut, Mehdi Kaytoue

When cyberathletes conceal their game: Clustering confusion matrices to identify avatar aliases Olivier Cavadenti, Victor Codocedo, Jean-François Boulicaut, Mehdi Kaytoue To cite this version: Olivier Cavadenti, Victor Codocedo, Jean-François Boulicaut, Mehdi Kaytoue. When cyberathletes conceal their game: Clustering confusion matrices to identify avatar aliases. IEEE In- ternational Conference on Data Science and Advanced Analytics, Oct 2015, Paris, France. 10.1109/DSAA.2015.7344824. hal-01243504 HAL Id: hal-01243504 https://hal.archives-ouvertes.fr/hal-01243504 Submitted on 15 Dec 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. When Cyberathletes Conceal Their Game: Clustering Confusion Matrices to Identify Avatar Aliases ? Olivier Cavadenti, Victor Codocedo, Jean-François Boulicaut and Mehdi Kaytoue Universite´ de Lyon. CNRS, INSA-Lyon, LIRIS. UMR5205, F-69621, France. contact:fi[email protected] Abstract—Video game is a very lucrative industry, unleashed for several academic fields such as artificial intelligence [12], by the ubiquity of gaming devices, multi-player networks and computer human interactions [21] and cognitive sciences [16]. live broadcasting platforms. Games generate large amounts of behavioural data which are valuable to face the new chal- In this paper, we introduce a new problem: the avatar lenges of video game analytics such as detecting balance issues, aliases identification problem. In most of online competitive bugs and cheaters. In electronic sports (e-sports), cyberathletes games, players need an “avatar” (i.e., an online identity) to conceal their online training using different aliases or avatars log in the game network. Nothing forbids a player to have (virtual identities), which allow them not being recognized by several avatars and actually, it is a very common practice the opponents they may face in future competitions (with cash for professional players, or cyberathletes, of several video prices challenging already most of the traditional sports). It was game franchises (e.g. League of Legends, Starcraft 2, Dota 2, recently suggested that behavioural data generated by the games Counter Strike). Players generally have one official avatar for allows predicting the avatar associated to a game play with high accuracy. However, when a player uses several avatars, accuracy official tournaments, and several others to conceal their tactics drastically drops as prediction models cannot easily differentiate while playing without being recognized by other players they the player’s different avatar aliases. Since mappings between may meet online: global rankings and leagues are public just players and avatars do not exist, we introduce the avatar aliases as in chess and tennis, while game logs (called replays) are identification problem and propose an original approach for alias available and prone to analysis by means of visualization1, resolution based on supervised classification and Formal Concept machine learning and data mining techniques [9], [14] just as Analysis. We thoroughly evaluate our method with the video game in standard sport analytics. Accordingly, we are facing a set Starcraft 2 which has a very wide and active community with of players, generating behavioural data (game logs or traces), players from diverse cultures and nations. We show that under in a one-to-many relationship with avatars (the many-to-many some circumstances, the avatars of a given player can easily relationship will be discussed later). Our goal is not to discover be recognized as such. These results are valuable for e-sport structures (to help preparing tournaments), and game editors this mapping, since players privacy is protected. Instead, (detecting cheaters or usurpers). the avatar aliases identification problem aims at discovering groups of avatars belonging to the same player. Solving this problem is motivated by the growing need of e-sport structures I. INTRODUCTION to study the games and strategies of the opponents (tournament Currently, video games are a popular and lucrative industry preparation), and the security challenges of game editors generating more revenue than both Cinema and Music. This (detecting usurpers). recent and fast development has been catalysed by several Recently, it was suggested that behavioural data hide factors. First, an increasing part of the population has access to individual identifying patterns [16], [21], (tested on the real connected electronic devices (computers, smart-phones, etc.), time strategy game Starcraft 2 ( c Blizzard)). After choosing and to the Web including a myriad of single and/or multi- features related to keyboard usage, Yan et al. showed that player games. Video games are now a popular leisure activity. a classifier can be trained to predict with high accuracy Furthermore, the recent development of competitive gaming the avatars involved in a game play [21]. Nevertheless, they and live video game streaming platforms [8] makes electronic purposely considered datasets without players having several sports (e-sports) a reality [15]. Professional players on contract avatars (what we call avatar aliases). Indeed, in presence with teams and sponsors are taking part in competitions with of such avatar aliases, the prediction accuracy drastically cash prices challenging already most of the traditional sports. degrades, since prediction models fail at differentiating two Cyberathletes are widely followed in so-called off-line events avatars of the same player. We build on this work by proposing and daily supported on the Web through their own live broad- an approach called DEBROUILLE´ for answering the avatar casts [8] generating the fourth largest stream of data in 2015 for aliases identification problem: it relies on mining confusion the platform, Twitch.tv. People not only enjoy playing, but also matrices yielded by a supervised classifier using Formal Con- enjoy watching the others for a variety reasons [3], [8]. Games cept Analysis [5], and exploits the confusion a classifier has generate tremendous amounts of behavioural data, valuable in presence of avatar aliases when they belong to the same to face the many new industrial challenges such as detecting player. Experimental evaluation shows very good results under balance issues [2], bugs and cheaters [18], but also valuable certain restrictions imposed to the set of players involved ? This research has been partially funded by the French National Project in a dataset. The remainder of this paper is organized as FUI AAP 14 Tracaverre 2012-2016. 1For example, https://sites.google.com/site/sc2gears/ for Starcraft 2 follows. Section II introduces the problem settings, our claims called Base, assigns it to hotkey 1 (a), selects some units with and goals. We formally present an original methodology in the mouse (s), select units attached to hotkey 2 (s), etc. Section III for discovering avatar aliases and the evaluation strategy in Section IV. Our results are assessed through an Avatar Game trace Outcome RorO s,s,hotkey4a,s,hotkey3a,s,hotkey3s, ... Lose extensive evaluation in Section V. Related work is discussed TAiLS Base,hotkey1a,s,hotkey1s,s,hotkey1s, ... Win in Section VI before we conclude. Table I: Traces of a match for two players II. PROBLEM SETTINGS C. Predicting avatars from behavioural traces A. Starcraft 2 in competitive gaming Let us consider the following scenario. Given a trace, is it We study the real-time strategy game (RTS) Starcraft 2, possible to find its associated avatar? In Table I, given the trace a franchise released in 2010 which met success in compet- in the first row, is it possible to say that the avatar associated to itive gaming and e-sports [15]. A standard game involves it is RorO? We can approach this question from a classification two players and each chooses a faction (Zerg, Protoss or point of view. Given a training set of traces labelled with their Terran) with different weaknesses and strengths (following a associated avatars, we would like to classify new “instances” rock/paper/scissors principle). Players control buildings and (new traces) in the space of classes represented by the set of units through an aerial view of the battleground map, collect avatars. As such, we can apply any supervised classification resources to build an army and achieve victory by destroying technique in this direction. Recently, Yan et al. [21] suggested the opponent’s forces. A player needs an avatar to enter that behavioural data, as presented in the previous subsection, the dedicated multi-player system called Battle.net. In offer predictions with high accuracy, when there are no aliases the context of e-sport, our careful attention is given to the in the dataset. In presence of aliases however, the accuracy avatar aliases identification problem that affects the game2. drastically drops, as trained models are not able to differentiate Nothing forbids a user from having multiple avatars, which properly different avatars of a same individual. is actually a common practice among professional players

Clustering Confusion Matrices to Identify Avatar Aliases Olivier Cavadenti, Victor Codocedo, Jean-François Boulicaut, Mehdi Kaytoue

CMPSCI 585 Programming Assignment 1 Spam Filtering Using Naive Bayes

Using Machine Learning to Improve Dense and Sparse Matrix Multiplication Kernels

Arxiv:1902.03680V3 [Cs.LG] 17 Jun 2019 1

Performance Metric Elicitation from Pairwise Classifier Comparisons Arxiv:1806.01827V2 [Stat.ML] 18 Jan 2019

MASTER TP^ Ostrlbunon of THIS DOCUMENT IS UNL.M.TED Kaon Content of Three-Prong Decays of the Tan Lepton

Designing Alternative Representations of Confusion Matrices to Support Non-Expert Public Understanding of Algorithm Performance

A Generalized Hierarchical Approach for Data Labeling Zachary D. Blanks

PREDICTING ACADEMIC PERFORMANCE of POTENTIAL ELECTRICAL ENGINEERING MAJORS a Thesis by SANJHANA SUNDARARAJ Submitted to the Offi

Jets + Missing Energy Signatures at the Large Hadron Collider

Multiple Classifier Systems Incorporating Uncertainty

Classifier Uncertainty: Evidence, Potential Impact, and Probabilistic Treatment

Evaluating Prediction Strategies in an Enhanced Meta-Learning Framework