Big Data at King
Total Page:16
File Type:pdf, Size:1020Kb
Big Data at King Fabio Scanu, Senior Data Warehouse Engineer– [email protected] A bit about King King in numbers 356 million MAU Studios in Stockholm, London, 1.5 billion game plays per day Barcelona, Malmo, Berlin, Singapore 9 game studios, and Seattle. 1700 employees Offices in San Francisco, New York, Malta, Tokyo, Seoul and Shanghai And lots and lots of data... And for fun: 32 billion rows per day • 100000s of hours played 1.5 TB per day new • Trillions of candies matched > 9 Pb stored 3 A bit about Activision Blizzard Activision Blizzard in numbers l Headquartered in Santa Monica, California l 9000 employees l Focused on games for Xbox, PS, Cmputer, etc l Call of Duty, Guitar Hero, Diablo, Warcraft, etc l Offices pretty much all over the US 4 Players are different We have more players than the entire US 356 m 320 m What is Big Data? Big data is… What's your definition of Big Data? What is Big Data? Big data is… We predict player behaviour… Actionable Effective Predictable Good stuff Our data Our data is… growing Our data Our data is… not that useful raw 20130117T060000.142+0100 23 102 1387107022 1137497977 0 0 fb notif giveGoldToUser 20130117T060000.277+0100 23 10101 1000524045 1 2 5107 20130117T060000.281+0100 23 21 1025951084 0 134 1358388857 20130117T060000.282+0100 23 69 1025951084 0 134 0 1358398800 facebook bookmark_favorites 0 fb_source=bookmark_favorites&ref=bookmarks&count=3&fb_bmpos=9_3 20130117T060000.285+0100 23 38 1025951084 ad1c792b WINDOWS_XP CHROME 24.0.1312.52 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17 20130117T060000.287+0100 23 10101 1140113442 -1 4 5101 20130117T060000.288+0100 23 10005 1140113442 4 3 1358398800288 20130117T060000.305+0100 23 10005 1111576364 5 2 1358398800305 20130117T060000.306+0100 23 10006 1031413225 7 13 0 0 8 1358398598520 -1 20130117T060000.350+0100 23 10101 1151246251 -1 0 5101 20130117T060000.351+0100 23 10005 1151246251 5 7 1358398800351 20130117T060000.358+0100 23 10006 1376461814 4 3 0 0 72 1358398575940 -10001 Our data System architecture Data Reports scientists Raw ETL Data Mart data Game TSV log Log Data Warehouse servers files server Our data Why build a dimensional model? • Ease of use • Flexible framework • Huge bag of techniques & tricks • Structures thinking Our data Our data is… …actually well structured Our data TSV 20130117T060000.142+0100 23 102 1387107022 1137497977 0 0 fb notif giveGoldToUser 20130117T060000.277+0100 23 10101 1000524045 1 2 5107 20130117T060000.281+0100 23 21 1025951084 0 134 1358388857 20130117T060000.282+0100 23 69 1025951084 0 134 0 1358398800 facebook bookmark_favorites 0 fb_source=bookmark_favorites&ref=bookmarks&count=3&fb_bmpos=9_3 20130117T060000.285+0100 23 38 1025951084 ad1c792b WINDOWS_XP CHROME 24.0.1312.52 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17 20130117T060000.287+0100 23 10101 1140113442 -1 4 5101 20130117T060000.288+0100 23 10005 1140113442 4 3 1358398800288 20130117T060000.305+0100 23 10005 1111576364 5 2 1358398800305 20130117T060000.306+0100 23 10006 1031413225 7 13 0 0 8 1358398598520 -1 20130117T060000.350+0100 23 10101 1151246251 -1 0 5101 20130117T060000.351+0100 23 10005 1151246251 5 7 1358398800351 20130117T060000.358+0100 23 10006 1376461814 4 3 0 0 72 1358398575940 -10001 Our data Hadoop strengths and weaknesses Strengths Weaknesses Scalability Structured data performance Resiliency Ease of use Flexibility Maintenance Low cost accessible storage Fast data exploration Unstructured / semi- JOINs structured data 15 Our data Data platform 1.0 Reports ETL Event Games Hive data Data scientists 16 Our data Data platform 1.5 Reports ETL Event Games Hive DB? data Data scientists 17 Our data Benefits of an column-oriented database • Optimised for structured data • Good for dimensional model • Fast data exploration • More friendly / productive environment • Faster queries = happier users! 18 Our data Why ExaSolution? • Speed • Efficiency • Tuning free • Scalability (170Tb and counting...) • ExaSol the company 19 Our data Price / Tb usable storage Performance / price Hadoop grade servers Database grade servers 0 x 2 3 4 5 6 7 x x x x x x 20 Our data Hybrid architecture: best of both worlds Hadoop Analytics database Scalability Structured data performance Resiliency Ease of use Flexibility Low maintenance Low cost accessible storage Fast data exploration Unstructured/semi-structured data JOINs 21 Our data Data platform 2.0 Reports ETL Event Games Hive ExaSolution data Data scientists 22 Our data Cool! But…what kind of analysis can I do with that? • Fairly deep thinking about the players and their motivation, frustration, achievements, persistence, etc • Carefully designed experiments (AB tests) to run in the games, which integrate a hypothesis about player’s behaviour with a nicely designed game feature • Continuing to introduce entirely new challenges as the levels unfold (Candy Crush Saga has 1,280 Reality levels and 665 Dreamworld levels) • The right analysis 23 Machine learning and predictive analytics We have >9 petabytes of player data. Mostly of the form: • “player ‘x’ tried level ‘y’ and succeeded / failed / spent” A fairly large space of opportunity to predict… • Is this player going to stop playing? • Is this player going to start spending? • What product should I recommend to this player? • What other game might they enjoy? • Is it a good time to recommend they play another game? • But also segmentation, recommendation, etc 24 Candy Crush Saga has been at the top of the charts since January 2013 25 Candy Crush Saga: Can a level be too hard? Super hard level 65 • 120+ attempts on average First Episode Unlock • 50% drop out rate Level 65 • Very high revenue • Very high conversion Level 35 • Super happy players when they eventually complete it Should it be easier? Machine learning and predictive analytics The long term value of our players is higher if we make it easier • We get at significantly more direct revenue (all those future levels) • More players stay active in our network (=more players trying out other games, more players helping & competing with their friends) At King we optimise for the long term! 27 Pet Rescue Saga. Which of these is better? or ? Clear Complex Simple Choices to make Obvious button to buy Varied price points No confusion Chance for more Low price point revenue, but does it put people off? 28 Pet Rescue Saga. Which of these is better? or ? Results of a nice AB test: Total revenue up significantly - driven almost entirely by our “medium” and “high” spend segments. No negative impact (zero/low spend segments are unaffected). and We should think of how to target the zero spend and low spend segments in other ways. 29 Where next? Challenges • Upstream and downstream throughput and flexibility • Greater variety of game genres • Keep on scaling • Technology innovation • Evolving data model • Microbatch ETL • Real(er) time… Where next? Bridging the latency canyon 31 Where next? Data platform 4.0 15 0 ms 200 ms Hourly Daily minutes? Real time Data VoltDB? ExaSolution Hadoop system Increasing latency, quality, context Microbatch ETL Batch ETL 32 Where next? In details 33 Where next? Some numbers • Hadoop with 330+ nodes, adding 2 racks / month • 32 Billion events per day, more than Twitter I. If an event had a weight of 1 gram, this would be as big as a 53 fully loaded Airbus 380s. II. If an event was a grain of salt, this would mean about 30 bathtubs of salt. • 64 Nodes in memory column store DB • Hive, Impala, Spark, Yarn, in place • 9PB of data in hdfs, 170TB+ in Exasol • In 12 months time, these numbers will double 34 Where next? In conclusion • What are your requirements? • There’s not one tool for the job • Hybrid architectures give the best of more worlds • 9PB of data opens up to a new set of challenges: l A medium table in King has about 300 billion records; l Having all that amount of data over that architecture allows you to do any kind of analysis you want, using the algorithm you want (NPL, AI, Machine learning, etc) 35 A few words about our people About 1700 employees today • Many 100s of software engineers • Lots of graphic designers, artists, musicians, business managers, producers, marketers,… • In the data area: . 60+ data scientists . 30 data engineers building and maintaining our data and reporting platforms 36 Great roles Data Scientists and Data Engineers working • in our games • on our network • on our systems • on our testing/optimisation frameworks • … And we like people to rotate around over time Between 6 and 11 interviews before joining https://www.youtube.com/watch?v=V9y21zPw4MY 37 Working @King In the office, we have: l Unlimited food & drinks, gym, wine & whisky tasting, many different beers l Boxing, krav maga and yoga classes l Nap rooms, running clubs l Movie nights, boarding games, and football tournament l Everyone's idea matter, no matter the seniority l You get to travel as often as you like l You can work from home l Really cool parties & events l Freedom to work on what you like l You keep learning all the time l And much more... 38 Thank you.