<<

Big Data at

Fabio Scanu, Senior Data Warehouse Engineer– [email protected]

A bit about King King in numbers

356 million MAU Studios in Stockholm, London, 1.5 billion game plays per day Barcelona, Malmo, Berlin, Singapore 9 game studios, and Seattle. 1700 employees Offices in San Francisco, New York, Malta, Tokyo, Seoul and Shanghai

And lots and lots of data... And for fun: 32 billion rows per day • 100000s of hours played 1.5 TB per day new • Trillions of candies matched > 9 Pb stored

3 A bit about Blizzard in numbers

l Headquartered in Santa Monica, California

l 9000 employees

l Focused on games for Xbox, PS, Cmputer, etc

l , , , , etc

l Offices pretty much all over the US

4 Players are different

We have more players than the entire US

356 m 320 m What is Big Data? Big data is…

What's your definition of Big Data? What is Big Data? Big data is… We predict player behaviour…

Actionable Effective Predictable

Good stuff Our data

Our data is… growing Our data Our data is… not that useful raw

20130117T060000.142+0100 23 102 1387107022 1137497977 0 0 fb notif giveGoldToUser 20130117T060000.277+0100 23 10101 1000524045 1 2 5107 20130117T060000.281+0100 23 21 1025951084 0 134 1358388857 20130117T060000.282+0100 23 69 1025951084 0 134 0 1358398800 bookmark_favorites 0 fb_source=bookmark_favorites&ref=bookmarks&count=3&fb_bmpos=9_3 20130117T060000.285+0100 23 38 1025951084 ad1c792b WINDOWS_XP CHROME 24.0.1312.52 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17 20130117T060000.287+0100 23 10101 1140113442 -1 4 5101 20130117T060000.288+0100 23 10005 1140113442 4 3 1358398800288 20130117T060000.305+0100 23 10005 1111576364 5 2 1358398800305 20130117T060000.306+0100 23 10006 1031413225 7 13 0 0 8 1358398598520 -1 20130117T060000.350+0100 23 10101 1151246251 -1 0 5101 20130117T060000.351+0100 23 10005 1151246251 5 7 1358398800351 20130117T060000.358+0100 23 10006 1376461814 4 3 0 0 72 1358398575940 -10001 Our data

System architecture Data Reports scientists

Raw ETL Data Mart data

Game TSV log Log Data Warehouse servers files server Our data Why build a dimensional model?

• Ease of use • Flexible framework • Huge bag of techniques & tricks • Structures thinking Our data Our data is…

…actually well structured Our data TSV 20130117T060000.142+0100 23 102 1387107022 1137497977 0 0 fb notif giveGoldToUser 20130117T060000.277+0100 23 10101 1000524045 1 2 5107 20130117T060000.281+0100 23 21 1025951084 0 134 1358388857 20130117T060000.282+0100 23 69 1025951084 0 134 0 1358398800 facebook bookmark_favorites 0 fb_source=bookmark_favorites&ref=bookmarks&count=3&fb_bmpos=9_3 20130117T060000.285+0100 23 38 1025951084 ad1c792b WINDOWS_XP CHROME 24.0.1312.52 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17 20130117T060000.287+0100 23 10101 1140113442 -1 4 5101 20130117T060000.288+0100 23 10005 1140113442 4 3 1358398800288 20130117T060000.305+0100 23 10005 1111576364 5 2 1358398800305 20130117T060000.306+0100 23 10006 1031413225 7 13 0 0 8 1358398598520 -1 20130117T060000.350+0100 23 10101 1151246251 -1 0 5101 20130117T060000.351+0100 23 10005 1151246251 5 7 1358398800351 20130117T060000.358+0100 23 10006 1376461814 4 3 0 0 72 1358398575940 -10001 Our data Hadoop strengths and weaknesses Strengths Weaknesses Scalability Structured data performance Resiliency Ease of use Flexibility Maintenance Low cost accessible storage Fast data exploration Unstructured / semi- JOINs structured data

15 Our data Data platform 1.0 Reports

ETL

Event Games Hive data

Data scientists

16 Our data Data platform 1.5 Reports

ETL

Event Games Hive DB? data

Data scientists

17 Our data Benefits of an column-oriented database

• Optimised for structured data • Good for dimensional model • Fast data exploration • More friendly / productive environment • Faster queries = happier users!

18 Our data Why ExaSolution?

• Speed • Efficiency • Tuning free • Scalability (170Tb and counting...) • ExaSol the company

19 Our data Price / Tb usable storage

Performance / price

Hadoop grade servers

Database grade servers

0 x 2 3 4 5 6 7 x x x x x x 20 Our data Hybrid architecture: best of both worlds Hadoop Analytics database Scalability Structured data performance Resiliency Ease of use Flexibility Low maintenance Low cost accessible storage Fast data exploration Unstructured/semi-structured data JOINs

21 Our data Data platform 2.0 Reports

ETL

Event Games Hive ExaSolution data

Data scientists

22 Our data Cool! But…what kind of analysis can I do with that?

• Fairly deep thinking about the players and their motivation, frustration, achievements, persistence, etc

• Carefully designed experiments (AB tests) to run in the games, which integrate a hypothesis about player’s behaviour with a nicely designed game feature

• Continuing to introduce entirely new challenges as the levels unfold ( has 1,280 Reality levels and 665 Dreamworld levels)

• The right analysis

23 Machine learning and predictive analytics We have >9 petabytes of player data. Mostly of the form: • “player ‘x’ tried level ‘y’ and succeeded / failed / spent”

A fairly large space of opportunity to predict… • Is this player going to stop playing? • Is this player going to start spending? • What product should I recommend to this player? • What other game might they enjoy? • Is it a good time to recommend they play another game? • But also segmentation, recommendation, etc

24 Candy Crush Saga has been at the top of the charts since January 2013

25 Candy Crush Saga: Can a level be too hard?

Super hard level 65 • 120+ attempts on average First Episode Unlock • 50% drop out rate Level 65 • Very high revenue • Very high conversion Level 35 • Super happy players when they eventually complete it Should it be easier? Machine learning and predictive analytics

The long term value of our players is higher if we make it easier • We get at significantly more direct revenue (all those future levels) • More players stay active in our network (=more players trying out other games, more players helping & competing with their friends) At King we optimise for the long term!

27 Pet Rescue Saga. Which of these is better?

or ?

Clear Complex Simple Choices to make Obvious button to buy Varied price points No confusion Chance for more Low price point revenue, but does it put people off?

28 Pet Rescue Saga. Which of these is better?

or ?

Results of a nice AB test: Total revenue up significantly - driven almost entirely by our “medium” and “high” spend segments. No negative impact (zero/low spend segments are unaffected). and We should think of how to target the zero spend and low spend segments in other ways.

29 Where next? Challenges

• Upstream and downstream throughput and flexibility • Greater variety of game genres • Keep on scaling • Technology innovation • Evolving data model • Microbatch ETL • Real(er) time… Where next? Bridging the latency canyon

31 Where next? Data platform 4.0

15 0 ms 200 ms Hourly Daily minutes?

Real time Data VoltDB? ExaSolution Hadoop system

Increasing latency, quality, context

Microbatch ETL Batch ETL

32 Where next?

In details

33 Where next? Some numbers

• Hadoop with 330+ nodes, adding 2 racks / month

• 32 Billion events per day, more than I. If an event had a weight of 1 gram, this would be as big as a 53 fully loaded Airbus 380s. II. If an event was a grain of salt, this would mean about 30 bathtubs of salt.

• 64 Nodes in memory column store DB

• Hive, Impala, Spark, Yarn, in place

• 9PB of data in hdfs, 170TB+ in Exasol

• In 12 months time, these numbers will double

34 Where next? In conclusion

• What are your requirements?

• There’s not one tool for the job

• Hybrid architectures give the best of more worlds

• 9PB of data opens up to a new set of challenges:

l A medium table in King has about 300 billion records;

l Having all that amount of data over that architecture allows you to do any kind of analysis you want, using the algorithm you want (NPL, AI, Machine learning, etc)

35 A few words about our people

About 1700 employees today • Many 100s of software engineers • Lots of graphic designers, artists, musicians, business managers, producers, marketers,… • In the data area: . 60+ data scientists . 30 data engineers building and maintaining our data and reporting platforms

36 Great roles Data Scientists and Data Engineers working • in our games • on our network • on our systems • on our testing/optimisation frameworks • … And we like people to rotate around over time

Between 6 and 11 interviews before joining https://www.youtube.com/watch?v=V9y21zPw4MY

37 Working @King In the office, we have: l Unlimited food & drinks, gym, wine & whisky tasting, many different beers l Boxing, krav maga and yoga classes l Nap rooms, running clubs l Movie nights, boarding games, and football tournament l Everyone's idea matter, no matter the seniority l You get to travel as often as you like l You can work from home l Really cool parties & events l Freedom to work on what you like l You keep learning all the time l And much more...

38 Thank you