Question Difficulty Estimation in Community Question Answering Services

Total Page:16

File Type:pdf, Size:1020Kb

Question Difficulty Estimation in Community Question Answering Services Question Difficulty Estimation in Community Question Answering Services∗ ♯ ♯ Jing Liu† Quan Wang‡ Chin-Yew Lin Hsiao-Wuen Hon †Harbin Institute of Technology, Harbin 150001, P.R.China ‡Peking University, Beijing 100871, P.R.China ♯Microsoft Research Asia, Beijing 100080, P.R.China [email protected] [email protected] cyl,hon @microsoft.com { } Abstract However, less attention has been paid to question difficulty estimation in CQA. Question difficulty es- In this paper, we address the problem of timation can benefit many applications: (1) Experts estimating question difficulty in community are usually under time constraints. We do not want question answering services. We propose a to bore experts by routing every question (including competition-based model for estimating ques- both easy and hard ones) to them. Assigning ques- tion difficulty by leveraging pairwise compar- isons between questions and users. Our ex- tions to experts by matching question difficulty with perimental results show that our model sig- expertise level, not just question topic, will make nificantly outperforms a PageRank-based ap- better use of the experts’ time and expertise (Ack- proach. Most importantly, our analysis shows erman and McDonald, 1996). (2) Nam et al. (2009) that the text of question descriptions reflects found that winning the point awards offered by the the question difficulty. This implies the pos- reputation system is a driving factor in user partici- sibility of predicting question difficulty from pation in CQA. Question difficulty estimation would the text of question descriptions. be helpful in designing a better incentive mechanis- m by assigning higher point awards to more diffi- 1 Introduction cult questions. (3) Question difficulty estimation can help analyze user behavior in CQA, since users may In recent years, community question answering (C- make strategic choices when encountering questions QA) services such as Stackoverflow1 and Yahoo! 2 of different difficulty levels. Answers have seen rapid growth. A great deal To the best of our knowledge, not much research of research effort has been conducted on CQA, in- has been conducted on the problem of estimating cluding: (1) question search (Xue et al., 2008; Du- question difficulty in CQA. The most relevant work an et al., 2008; Suryanto et al., 2009; Zhou et al., is a PageRank-based approach proposed by Yang et 2011; Cao et al., 2010; Zhang et al., 2012; Ji et al. (2008) to estimate task difficulty in crowdsourc- al., 2012); (2) answer quality estimation (Jeon et al., ing contest services. Their key idea is to construct 2006; Agichtein et al., 2008; Bian et al., 2009; Liu a graph of tasks: creating an edge from a task t1 to et al., 2008); (3) user expertise estimation (Jurczyk a task t2 when a user u wins task t1 but loses task and Agichtein, 2007; Zhang et al., 2007; Bouguessa t2, implying that task t2 is likely to be more diffi- et al., 2008; Pal and Konstan, 2010; Liu et al., 2011); cult than task t1. Then the standard PageRank al- and (4) question routing (Zhou et al., 2009; Li and gorithm is employed on the task graph to estimate King, 2010; Li et al., 2011). PageRank score (i.e., difficulty score) of each task. ∗This work was done when Jing Liu and Quan Wang were This approach implicitly assumes that task difficulty visiting students at Microsoft Research Asia. Quan Wang is is the only factor affecting the outcomes of competi- currently affiliated with Institute of Information Engineering, tions (i.e. the best answer). However, the outcomes Chinese Academy of Sciences. 1http://stackoverflow.com of competitions depend on both the difficulty levels 2http://answers.yahoo.com of tasks and the expertise levels of competitors (i.e. 85 Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 85–90, Seattle, Washington, USA, 18-21 October 2013. c 2013 Association for Computational Linguistics other answerers). best answer ub correctly responds to question q that Inspired by Liu et al. (2011), we propose a asker ua does not know. competition-based approach which jointly models The expertise score of the best answerer ub is • question difficulty and user expertise level. Our ap- higher than that of asker ua and all answerers in S. proach is based on two intuitive assumptions: (1) This is straightforward since the best answerer ub given a question answering thread, the difficulty s- solves question q better than asker ua and all non- core of the question is higher than the expertise score best answerers in S. of the asker, but lower than that of the best answerer; Let’s view question q as a pseudo user uq. Tak- (2) the expertise score of the best answerer is higher ing a competitive viewpoint, each pairwise compar- than that of the asker as well as all other answer- ison can be viewed as a two-player competition with ers. Given the two assumptions, we can determine one winner and one loser, including (1) one compe- the question difficulty score and user expertise score tition between pseudo user uq and asker ua, (2) one through pairwise comparisons between (1) a ques- competition between pseudo user uq and the best tion and an asker, (2) a question and a best answerer, answerer ub, (3) one competition between the best (3) a best answerer and an asker, and (4) a best an- answerer ub and asker ua, and (4) S competitions | | swerer and all other non-best answerers. between the best answerer ub and all non-best an- The main contributions of this paper are: swers in S. Additionally, pseudo user uq wins the We propose a competition-based approach to es- first competition and the best answerer ub wins all timate• question difficulty (Sec. 2). Our model signif- remaining ( S + 2) competitions. icantly outperforms the PageRank-based approach Hence, the| problem| of estimating the question d- (Yang et al., 2008) for estimating question difficulty ifficulty score (and the user expertise score) is cast on the data of Stack Overflow (Sec. 3.2). as a problem of learning the relative skills of play- Additionally, we calibrate question difficulty s- ers from the win-loss results of the generated two- cores• across two CQA services to verify the effec- player competitions. Formally, let Q denote the set tiveness of our model (Sec. 3.3). of all questions in one category (or topic), and Rq de- Most importantly, we demonstrate that different note the set of all two-player competitions generated • words or tags in the question descriptions indicate from question q Q, i.e., Rq = (ua uq), (uq ∈ { ≺ ≺ question difficulty levels. This implies the possibil- ub), (ua ub), (uo1 ub), , (uo S ub) , ity of predicting question difficulty purely from the where j ≺ i means that≺ user ···i beats user| | ≺j in the} text of question descriptions (Sec. 3.4). competition.≺ Define 2 Competition based Question Difficulty R = Rq (1) Estimation q Q ∪∈ CQA is a virtual community where people can ask as the set of all two-player competitions. Our prob- questions and seek opinions from others. Formally, lem is then to learn the relative skills of players from when an asker ua posts a question q, there will be R. The learned skills of the pseudo question users several answerers to answer her question. One an- are question difficulty scores, and the learned skills swer among the received ones will be selected as the of all other users are their expertise scores. best answer by the asker ua or voted by the com- TrueSkill In this paper, we follow (Liu et al., munity. The user who provides the best answer is 2011) and apply TrueSkill to learn the relative skill- called the best answerer ub, and we denote the set of s of players from the set of generated competitions all non-best answerers as S = uo1 , , uoM . As- R (Equ. 1). TrueSkill (Herbrich et al., 2007) is a suming that question difficulty{ scores··· and user} ex- Bayesian skill rating model that is developed for es- pertise scores are expressed on the same scale, we timating the relative skill levels of players in games. make the following two assumptions: In this paper, we present a two-player version of The difficulty score of question q is higher than TrueSkill with no-draw. • the expertise score of asker ua, but lower than that TrueSkill assumes that the practical performance of the best answerer ub. This is intuitive since the of each player in a game follows a normal distribu- 86 tion N(µ, σ2), where µ means the skill level of the and mathematics4 (SO/Math) questions for our main player and σ means the uncertainty of the estimated experiments. Additionally, we use the data of Math skill level. Basically, TrueSkill learns the skill lev- Overflow5 (MO) for calibrating question difficulty els of players by leveraging Bayes’ theorem. Giv- scores across communities (Sec. 3.3). The statistics en the current estimated skill levels of two players of these data sets are shown in Table 1. (priori probability) and the outcome of a new game SO/CPP SO/Math MO between them (likelihood), TrueSkill model updates # of questions 122, 012 51, 174 27, 333 its estimation of player skill levels (posterior prob- # of answers 357, 632 94, 488 65, 966 ability). TrueSkill updates the skill level µ and the # of users 67, 819 16, 961 12, 064 uncertainty σ intuitively: (a) if the outcome of a new competition is expected, i.e. the player with higher Table 1: The statistics of the data sets. skill level wins the game, it will cause small updates To evaluate the effectiveness of our proposed in skill level µ and uncertainty σ; (b) if the outcome model for estimating question difficulty scores, we of a new competition is unexpected, i.e.
Recommended publications
  • PERVASIVE BEHAVIOR INTERVENTIONS Using Mobile Devices for Overcoming Barriers for Physical Activity
    PERVASIVE BEHAVIOR INTERVENTIONS Using Mobile Devices for Overcoming Barriers for Physical Activity Vom Fachbereich Elektrotechnik und Informationstechnik der Technischen Universität Darmstadt zur Erlangung des akademischen Grades eines Doktor-Ingenieurs (Dr.-Ing.) genehmigte Dissertation von DIPL.-INF. UNIV. TIM ALEXANDER DUTZ Geboren am 20. Juli 1978 in Darmstadt Vorsitz: Prof. Dr. techn. Heinz Koeppl Referent: Prof. Dr.-Ing. habil. Ralf Steinmetz Korreferent: Prof. Dr. rer. nat. Rainer Malaka Tag der Einreichung: 14. September 2016 Tag der Disputation: 28. November 2016 Hochschulkennziffer D17 Darmstadt 2017 Dieses Dokument wird bereitgestellt von tuprints, E-Publishing-Service der Technischen Universität Darmstadt. http://tuprints.ulb.tu-darmstadt.de [email protected] Bitte zitieren Sie dieses Dokument als: URN: urn:nbn:de:tuda-tuprints-61270 URL: http://tuprints.ulb.tu-darmstadt.de/id/eprint/6127 Die Veröffentlichung steht unter folgender Creative Commons Lizenz: International 4.0 – Namensnennung, nicht kommerziell, keine Bearbeitung https://creativecommons.org/licenses/by-nc-nd/4.0/ Für meine Eltern Abstract Extensive cohort studies show that physical inactivity is likely to have negative consequences for one’s health. The World Health Organization thus recommends a minimum of thirty minutes of medium- intensity physical activity per day, an amount that can easily be reached by doing some brisk walking or leisure cycling. Recently, a Taiwanese-American team of scientists was able to prove that even less effort is required for positive health effects and that as little as fifteen minutes of physical activity per day will increase one’s life expectancy by up to three years on the average. However, simply spreading this knowledge is not sufficient.
    [Show full text]
  • Competition-Based User Expertise Score Estimation
    Competition-based User Expertise Score Estimation Jing Liu†*, Young-In Song‡, Chin-Yew Lin‡ † ‡ Harbin Institute of Technology Microsoft Research Asia No. 92, West Da-Zhi St, Nangang Dist. Building 2, No. 5 Dan Ling St, Haidian Dist. Harbin, China 150001 Beijing, China 100190 [email protected] {yosong, cyl}@microsoft.com ABSTRACT covered by existing web pages. With the explosive growth of web 2.0 sites, community question and answering services (denoted as In this paper, we consider the problem of estimating the relative 1 2 expertise score of users in community question and answering CQA) such as Yahoo! Answers and Baidu Zhidao , have become services (CQA). Previous approaches typically only utilize the important services where people can use natural language rather explicit question answering relationship between askers and an- than keywords to ask questions and seek advice or opinions from swerers and apply link analysis to address this problem. The im- real people who have relevant knowledge or experiences. CQA plicit pairwise comparison between two users that is implied in services provide another way to satisfy a user’s information needs the best answer selection is ignored. Given a question and answer- that cannot be met by traditional search engines. Users are the ing thread, it’s likely that the expertise score of the best answerer unique source of knowledge in CQA sites and all users from ex- is higher than the asker’s and all other non-best answerers’. The perts to novices can generate content arbitrarily. Therefore, it is goal of this paper is to explore such pairwise comparisons inferred desirable to have a system that can automatically estimate the user from best answer selections to estimate the relative expertise expertise score and identify experts who can provide good quality scores of users.
    [Show full text]
  • Empirical Software Engineering at Microsoft Research
    Software Analytics for Digital Games Thomas Zimmermann, Microsoft Research, USA Joint work with Nachi Nagappan and many others. © Microsoft Corporation analytics is the use of analysis, data, and systematic reasoning to make decisions. Definition by Thomas H. Davenport, Jeanne G. Harris Analytics at Work – Smarter Decisions, Better Results © Microsoft Corporation history of software analytics Tim Menzies, Thomas Zimmermann: Software Analytics: So What? IEEE Software 30(4): 31-37 (2013) © Microsoft Corporation © Microsoft Corporation trinity of software analytics Dongmei Zhang, Shi Han, Yingnong Dang, Jian-Guang Lou, Haidong Zhang, Tao Xie: Software Analytics in Practice. IEEE Software 30(5): 30-37, September/October 2013. MSR Asia Software Analytics group: http://research.microsoft.com/en-us/groups/sa/ © Microsoft Corporation software analytics is © Microsoft Corporation software analytics is diversity © Microsoft Corporation The Stakeholders The Tools The Questions © Microsoft Corporation http://aka.ms/145Questions Andrew Begel, Thomas Zimmermann. Analyze This! 145 Questions for Data Scientists in Software Engineering. ICSE 2014 © Microsoft Corporation Essential + Microsoft’s Top 10 Questions Essential Worthwhile How do users typically use my application? 80.0% 99.2% What parts of a software product are most used and/or loved by 72.0% 98.5% customers? How effective are the quality gates we run at checkin? 62.4% 96.6% How can we improve collaboration and sharing between teams? 54.5% 96.4% What are the best key performance indicators (KPIs) for 53.2% 93.6% monitoring services? What is the impact of a code change or requirements change to 52.1% 94.0% the project and its tests? What is the impact of tools on productivity? 50.5% 97.2% How do I avoid reinventing the wheel by sharing and/or searching 50.0% 90.9% for code? What are the common patterns of execution in my application? 48.7% 96.6% How well does test coverage correspond to actual code usage by 48.7% 92.0% our customers? © Microsoft Corporation Obsessing over our customers is everybody's job.
    [Show full text]
  • The Evaluation of Rating Systems in Online Free-For-All Games
    The Evaluation of Rating Systems in Online Free-for-All Games Arman Dehpanah Muheeb Faizan Ghori Jonathan Gemmell Bamshad Mobasher School of Computing School of Computing School of Computing School of Computing DePaul University DePaul University DePaul University DePaul University Chicago, USA Chicago, USA Chicago, USA Chicago, USA [email protected] [email protected] [email protected] [email protected] Abstract—Online competitive games have become increasingly might also be hampered by the inclusion of new players since popular. To ensure an exciting and competitive environment, these the system does not possess any knowledge of these players. games routinely attempt to match players with similar skill levels. In this paper, we consider six evaluation metrics. We include Matching players is often accomplished through a rating system. There has been an increasing amount of research on developing traditional metrics such as accuracy, mean absolute error, such rating systems. However, less attention has been given to the and Kendall’s rank correlation coefficient. We further include evaluation metrics of these systems. In this paper, we present an metrics adapted from the domain of information retrieval, exhaustive analysis of six metrics for evaluating rating systems in including mean reciprocal rank (MRR), average precision online competitive games. We compare traditional metrics such as (AP), and normalized discounted cumulative gain (NDCG). accuracy. We then introduce other metrics adapted from the field of information retrieval. We evaluate these metrics against several We analyze the ability of these metrics to capture meaningful well-known rating systems on a large real-world dataset of over insights when they are used to evaluate the performance of 100,000 free-for-all matches.
    [Show full text]
  • Application and Further Development of Trueskill™ Ranking in Sports
    TVE-F 19019 Examensarbete 15 hp Juni 2019 Application and Further Development of TrueSkill™ Ranking in Sports Julia Ibstedt Elsa Rådahl Erik Turesson Magdalena vande Voorde Abstract Application and Further Development of TrueSkill™ Ranking in Sports Julia Ibstedt, Elsa Rådahl, Erik Turesson, Magdalena vande Voorde Teknisk- naturvetenskaplig fakultet UTH-enheten The aim of this study was to explore the ranking model TrueSkill™ developed by Microsoft, applying it on various sports and Besöksadress: constructing extensions to the model. Two different inference Ångströmlaboratoriet Lägerhyddsvägen 1 methods for TrueSkill was constructed using Gibbs sampling and Hus 4, Plan 0 message passing. Additionally, the sequential method using Gibbs sampling was successfully extended into a batch method, in order Postadress: to eliminate game order dependency and creating a fairer, although Box 536 751 21 Uppsala computationally heavier, ranking system. All methods were further implemented with extensions for taking home team advantage, score Telefon: difference and finally a combination of the two into 018 – 471 30 03 consideration. The methods were applied on football (Premier Telefax: League), ice hockey (NHL), and tennis (ATP Tour) and evaluated on 018 – 471 30 00 the accuracy of their predictions before each game. Hemsida: On football, the extensions improved the prediction accuracy from http://www.teknat.uu.se/student 55.79% to 58.95% for the sequential methods, while the vanilla Gibbs batch method reached the accuracy of 57.37%. Altogether, the extensions improved the performance of the vanilla methods when applied on all data sets. The home team advantage performed better than the score difference on both football and ice hockey, while the combination of the two reached the highest accuracy.
    [Show full text]
  • Thesis Template
    Tailoring a Psychophysiologically Driven Rating System MASTER DISSERTATION Harryharasuthan Vasantharajah MASTER IN COMPUTER ENGINEERING SUPERVISOR Sergi Bermúdez I Badia TAILORING A PSYCHOPHYSIOLOGICALLY DRIVEN RATING SYSTEM Harryharasuthan Vasantharajah B.Sc. (Hons) Supervised by Sergi Bermúdez I Badia, Ph.D. Submitted in fulfillment of the requirements for the degree of Masters in Computer Engineering. Faculty of Exact Sciences and Engineering University of Madeira 2019 Abstract Humans have always been interested in ways to measure and compare their performances to establish who is best at a particular activity. The first Olympic Games, for instance, were carried out in 776 BC, and it was a defining moment in history where ranking based competitive activities managed to reach the general populous. Every competition must face the issue of how to evaluate and rank competitors, and often rules are required to account for many different aspects such as variations in conditions, the ability to cheat, and, of course, the value of entertainment. Nowadays, measurements are performed out through various rating systems, which considers the outcomes of the activity to rate the participants. However, they do not seem to address the psychological aspects of an individual in a competition. This dissertation employs several psychophysiological assessment instruments intending to facilitate the acquisition of skill level rating in competitive gaming. To do so, an exergame that uses non-conventional inputs, such as body tracking to prevent input biases, was developed. The sample size of this study is ten, and the participants were put on a round-robin tournament to provide equal intervals between games for each player. After analyzing the outcome of the competition, it revealed some critical insights on the psychophysiological instruments; Especially the significance of Flow in terms of the prolificacy of a player.
    [Show full text]
  • Ranking (Trueskill) Map #1 #1 Player Map #2 Player #2 Player Vs Player #3 Map Map #3 Map #4 Map #5 #4 Player Ranking Systems ~ 30 Mins Talk
    Math for Game Programmers: Ranking Systems; Elo, TrueSkill and Your Own Mario Izquierdo Sr. Software Engineer at Ranking (TrueSkill) map #1 #1 player map #2 player #2 player vs player #3 map map #3 map #4 map #5 #4 player Ranking Systems ~ 30 mins talk • Elo • TrueSkill • Practical Considerations Elo Rating System Árpád Élő (1903 - 1992) • Physics profesor and master chess player. • Elo's system constituted an improvement on the previous Harkness System. • Elo's system was adopted by the FIDE (World Chess Federation) in 1970. • Published "The Rating of Chessplayers, Past and Present” in 1978. • Fun fact: Up until the mid-80’s, Elo himself made the rating calculations! Elo Rating System: Normal Distribution Assumption: Chess performance is a normally distributed random variable. Using some simplifications (i.e. constant standard deviation) makes easy to calculate the Expected score of a match (probability of win) for two given player skill levels. Elo Rating System: Normal Distribution “Slime Curve” In the eyes of ELO, you are all “slime people” Elo Rating System: Normal Distribution “Slime Curve” Elo Rating System: Formula After a given match, rating points are transferred between players: RatingDiff = (Score - Expected) * K-factor Where: Score is 0 = loss, 0.5 = draw, 1 = win Expected is 0 to 1, the probability of winning K-factor is a constant for maximum change (update “speed”) Elo Rating System: Formula After a given match, rating points are transferred between players: RatingDiff = (Score - Expected) * K-factor Much of the trick is in figuring out what the Expected result of a game is. The original ELO system uses the following formula (from the Normal dist.): Expected[A] = 1/(1+10^(Rating[B-A]/400)) Elo Rating System: Formula After a given match, rating points are transferred between players: RatingDiff = (Score - Expected) * K-factor Much of the trick is in figuring out what the Expected result of a game is.
    [Show full text]
  • Trueskill 2: an Improved Bayesian Skill Rating System
    TrueSkill 2: An improved Bayesian skill rating system Tom Minka Ryan Cleven Yordan Zaykov Microsoft Research The Coalition Microsoft Research March 22, 2018 Abstract Online multiplayer games, such as Gears of War and Halo, use skill-based matchmaking to give players fair and enjoyable matches. They depend on a skill rating system to infer accurate player skills from historical data. TrueSkill is a popular and effective skill rating system, working from only the winner and loser of each game. This paper presents an extension to TrueSkill that incorporates additional information that is readily available in online shooters, such as player experience, membership in a squad, the number of kills a player scored, tendency to quit, and skill in other game modes. This extension, which we call TrueSkill2, is shown to significantly improve the accuracy of skill ratings computed from Halo 5 matches. TrueSkill2 predicts historical match outcomes with 68% accuracy, compared to 52% accuracy for TrueSkill. 1 Introduction When a player wants to play an online multiplayer game, such as Halo or Gears of War, they join a queue of waiting players, and a matchmaking service decides who they will play with. The matchmaking service makes its decision based on several criteria, including geographic location and skill rating. Our goal is to improve the fairness of matches by improving the accuracy of the skill ratings flowing into the matchmaking service. The skill rating of a player is an estimate of their ability to win the next match, based on the results of their previous matches. A typical match result lists the players involved, their team assignments, the length of the match, how long each player played, and the final score of each team.
    [Show full text]
  • Influence of Gameplay on Skill in Halo Reach
    Influence of Gameplay on Skill in Halo Reach Jeff Huang Abstract University of Washington We study the question of how skill develops in video [email protected] game through a rating called TrueSkill. In a previous paper [1] we used the skill ratings from 7 months of Thomas Zimmermann games from over 3 million players to look at how play Microsoft Corporation intensity, breaks in play, other titles played, and skill [email protected] change over time affect skill. In this paper, we briefly summarize our findings and discuss how we plan to Nachiappan Nagappan continue our research. Microsoft Corporation [email protected] Keywords Games, Analytics, Game Usage, Player Progression Bruce Phillips Microsoft Corporation ACM Classification Keywords [email protected] K.8.0 [Personal Computing]: Games. Chuck Harrison General Terms Microsoft Corporation Human Factors, Measurement. [email protected] Introduction In this paper we present a brief overview of player game characteristics in Halo Reach. We discuss play intensity and play patterns with respect to skill development. We present a general method of our analysis that can be applied to other games. We Copyright is held by the author/owner(s). conclude with some of our future plans for mining CHI’13, April 27 – May 2, 2013, Paris, France. game-player data. ACM 978-1-XXXX-XXXX-X/XX/XX. Analysis of Skill Data Skill in Halo Reach Other Titles Played. Players who did not play Halo 3 For our study, we selected a cohort of 3.2 million Halo previously were less skilled but gained skill at about the Step 1: Select a population of Reach players who started playing the game in its first same rate as everyone else in Halo Reach.
    [Show full text]
  • Measuring Cooperative Behavior in Contemporary Multiplayer Games
    MEASURING COOPERATIVE BEHAVIOR IN CONTEMPORARY MULTIPLAYER GAMES Martin Ashton Master of Science School of Computer Science McGill University Montreal, Quebec 2012-08-12 A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Master of Science Copyright c 2012 Martin Ashton DEDICATION This work is dedicated to my friends Pete, Dave, Kevin, Vanessa, Steve, Mike, Nadina, John, Chris, Max, Dan, Francine, Peter, Daniel, Michael, Dahlia, Sabrina, Michaela, Katrina, Diana, Stevie, Kathy, Bronson, Jess, Theo, Cynthia, Ben, Shayne, Paul, Andre, Fred, Justin, J.P., Aly, Carl, Udhay, Phil, Matt, Simon, Julien, and the rest of the Koalas. ii ACKNOWLEDGEMENTS This work was supported, in part, by the Natural Sciences and Engineering Council of Canada (NSERC), and Le Fonds de Recherche du Qu´ebec - Nature et Technologies (FQRNT). My particular thanks are extended to Prof. Clark Verbrugge for giving me the opportunity to research the exciting domain of modern video games, and to my parents for teaching me the importance of continuously exceeding my own limits. Lastly, I would like to thank the developers at Bungie, Blizzard, and Valve for making such excellent games. iii ABSTRACT Social aspects of multiplayer games are well known as contributors to game success, with online friendships and socialization expected to expand and strengthen a player-base. Understanding the nature of social behavior and determining the impact of cooperation on gameplay is thus important to game design. In this work, we make use of data exposed through in-game and web-based API’s of two contemporary multiplayer games, World of Warcraft and Halo: Reach.
    [Show full text]
  • Model-Based Machine Learning
    Model-Based Machine Learning Christopher M. Bishop Microsoft Research, Cambridge, CB3 0FB, U.K. [email protected] Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this paper we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning don’t have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this paper we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications.
    [Show full text]
  • Microsoft Research Cambridge
    Welcome to Microsoft Research Cambridge 21st November Alex Butler Cambridge Wireless UX SIG Sensors & Devices Group “No Free Lunch: The Consumer as Product in a Data-Driven Economy” Microsoft Research Presence Mission • Advance the state of the art in Computer Science • Transfer technology to Microsoft business • Lead Microsoft into the future Microsoft Research Cambridge The Numbers STAFF HONOURS OUTPUTS 121 RESEARCHERS 1 KNIGHTHOOD HUNDREDS OF TIER PUBLICATIONS PER YEAR 100 INTERNS PER YEAR 1 TURING AWARD WINNER 45+ PATENTS FILED PER YEAR 17 R&D STAFF FROM OTHER 1 KYOTO PRIZE WINNER MICROSOFT R&D GROUPS 2 MARR PRIZE WINNERS 11 VISITING RESEARCHERS PER YEAR 2 ACM FELLOWS OVER 150 DAY VISITORS, 1 IEEE FELLOW SEMINAR SPEAKERS PER YEAR 3 ROYAL SOCIETY FELLOWS 1 ROYAL SOCIETY OF EDINBURGH FELLOW 4 ROYAL ACADEMY OF ENGINEERING FELLOWS 1 MACROBERT AWARD 1 ROOKE MEDAL 1 VON NEUMANN MEDAL 1 EADS GRAND PRIZE Research Areas Machine Learning & Perception Computer Vision. Machine Learning. Online Services & Advertising. Constraint Reasoning. Programming Principles & Tools Formal Methods. Programme Structures. Programming Systems. Constructive Security. Systems & Networking Distributed Systems. Networking. Networks, Economics & Algorithms. Operating Systems. Computer-Mediated Living Integrated Systems. Sensors & Devices. Socio-Digital Systems. i3D. Computational Science Biological Computation. Computational Ecology & Environmental Sciences. Research Connections Advanced Research Tools and Services Community, Intellectual Capital Development Earth,
    [Show full text]