Cloud Trends The evolution of culture and technology

Adrian Cockcroft @adrianco VP Cloud Architecture Strategy

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1 Get the culture right

2 Migrate to the cloud Culture And Evolution 3 The new de-normal, untangle data tier

4 Monoliths to microservices to functions 1 Get the culture right

2 Migrate to the cloud Cloud Trends 3 The new de-normal, untangle data tier

4 Monoliths to microservices to functions

5 Open source and If you want to build a ship, don’t drum up the people to gather wood, divide the Culture work, and give orders. Instead, teach them to yearn for the vast and endless sea. Antoine de Saint-Exupéry, author of “Le Petit Prince” (“The Little Prince”) Culture

Nordstrom Technology NorDNA Culture Deck 1. Values are what we value 2. High performance 3. Freedom & responsibility Culture 4. Context, not control Seven Aspects of Netflix Culture 5. Highly aligned, loosely coupled 6. Pay top of market 7. Promotions & development • Customer • Bias for action obsession • Frugality • Ownership • Learn and • Invent and be curious simplify • Earn trust Culture • Are right, a lot of others Amazon leadership • Hire and develop • Dive deep the best principles • Have backbone; • Insist on the disagree and highest standards commit • Think big • Deliver results Intentional Culture Appropriate Judgement Migrating to Cloud

Lessons from the Netflix cloud journey, brought up to date IT’s assumption: make systems perfect 2008 so that developers don't have to think about failures Start with High-end IBM P-series hardware, Oracle... a shock Two-day outage caused by SAN hardware failure! 2008 Failure raised questions… Question Availability has to be application concern! Assumption Use low cost cloud infrastructure? 2009 Vast increase in datacenter capacity was needed Unpredictable in advance, how much, where… Add an Why? Online streaming replacing DVD shipping logistics existential threat Nowadays, systems of engagement are dominating IT Shipping plan

Inventory

DATACENTER SHIPPING SITE DVD

Business Personalized Mail DVD Browsing Back A few interactions per week per customer to datacenter

Add choices New DVD Encoded content

DATACENTER CDN QoS Start Play DVDStreaming logging config

Progress heartbeat Business Personalized Binge watching browsing episodes of TV Video data shows every day

Add choices Encoded content

Views per week DATACENTER10x CDN QoS Start Play Streaming logging config

Progress Traffic to Business heartbeat Personalized100x datacenter per view Binge watching Browsing episodes of TV Video data shows every day Per customer that 1000x started streaming

Add choices Crunch Capacity Transformation Digital The

DATACENTER CAPACITY when 0.1% of users switch, the capacity needed is equal. is needed capacity the switch, users of 0.1% when If we say new workload causes 1000x traffic to datacenter, then Point where where Point of customers are are customers of streaming TIME 0.1% 0.1% Streaming DVD Recruit world class datacenter operations build team and guess how much capacity they would need, and build it before it was needed — lots of upfront $$$ spend

Choices OR

Use the Elastic Compute service of AWS, built by one of Netflix biggest competitors, and spend $$$ on video content and developers Competition Understand how AWS was separated from Amazon Prime Capacity Experiments 2009 to see what worked Mitigate Business Sign Up For Enterprise risks License Agreement Publicity NYT story about Netflix and AWS April 2010 Encoding movies 2009 Big backlog, not enough capacity Applications Moved to AWS EC2 Showed that capacity existed on demand

Shut down capacity to save as backlog varied Quality of Service (QoS) logging 2009 Too much traffic to datacenter databases Storage for logs moved to S3 Applications Unlimited space Log analysis moved to EMR - Hadoop Worked with AWS to support Hadoop + Hive in Elastic Map-Reduce service Crunch Time Front end on AWS Start of 2010 AWS Decided not to Web pages and Backend capacity API clients migrate expands to fill Most backend to cloud remaining space build any more DATACENTER still in datacenter datacenter capacity

Need to move to AWS before end of 2010 to survive January 2010 December Front end web page and API migration A picture like this was shown in every management meeting. Hard deadline to move capacity out to make space for what was left.

Running out of runway

January 2010 December Start with the simplest possible API service Migration Next the simplest web page Sequence Then pages and APIs one by one Logins and web page NETFLIX WEB PAGE requests How to Run Both? Gradual Migration

Old web pages, Selective web backend, and page redirects login service

Migrated web pages

DATACENTER AWS CLOUD www.netflix.com movies.netflix.com WEB PAGE Reads of Updates SoR data HowAWS Database to Migration Service RunMove Both? Data? Move from Oracle to scalable low cost Continuous cloud database replication services Amazon Updates DMS

Amazon Aurora DynamoDB Postgres

DATACENTER— SYSTEM OF RECORD AWS CLOUD How to MoveBack upData? Data? For cloud to be used as the system of record an archive backup mechanism is needed

To replace offsite tape backup, create a separate account in a different region

Amazon S3 is extremely secure and durable.

Data can’t be deleted. Automatic time based purge after 90 days

Long term very low cost archive using Amazon Glacier ARCHIVE Separate AWS account, How to different region Back up Data?

Versioned Amazon purged after 90 days Glacier

Compress, encrypt, archive

Expensive offsite Amazon Amazon tape backups DynamoDB S3 NETFLIX WEB PAGE Final Stage “All-In” Netflix migration of billing and corporate IT Corporate IT, billing, last Login functions to Close down datacenter be migrated Systems of record data APIs Pages, etc.

DATACENTER AWS CLOUD The New De-Normal

Monolithic Kitchen Sink De-normalized Databases Analogy Monolith Expensive, Hard to Create and Run ic DatabaseMonolith Expensive, Hard to Create and Run Database Schema Entity Relationship Database Schema Entity Relationship Database Schema Entity Relationship Kitchen Sink Analogy GLASSES

Kitchen Sink CleanupAnalogy GLASSES

Kitchen Sink Cleanup GLASSES

Kitchen Sink Cleanup GLASSES

Kitchen Sink Cleanup GLASSES

Kitchen Sink Cleanup GLASSES

Kitchen Sink Cleanup GLASSES

Kitchen Sink Cleanup Consistency Problem How Many Complete Sets Are There? Consistency Problem How Many Complete Sets Are There? GLASSES

Consistency Problem How Many Complete Sets Are There? GLASSES

Adding a New Use Case SAKE SET

GLASSES

Adding a New BOWLS Use Case Cloud Makes it Easy to Add New Databases

Amazon Redshift Untangle and Migrate Existing “Kitchen Sink” Schemas Untangle and Migrate Existing “Kitchen Sink” Schemas The New De-Normal

Monolithic Kitchen Sink De-normalized Databases Analogy Evolution of Business Logic

Monolith Microservices Functions Splitting Monoliths Ten Years Ago XML & SOAP Splitting Monoliths Ten Years Ago Splitting Monoliths FiveTen Years Ago REST JSON Fast binary Splitting encodings Monoliths Five Years Ago Splitting Monoliths FiveTen Years Ago

Microservices Five Years Ago

Amazon SQS

Amazon API Amazon Microservices Gateway DynamoDB Fiveto Functions Years Ago Standard building brick services provide standardized platform capabilities

Amazon Kinesis Amazon S3

Amazon SNS Amazon SQS Business Logic Glue between the bricks Amazon API Amazon Microservices Gateway DynamoDB to Functions Standard building brick services provide standardized platform capabilities

Amazon Kinesis Amazon S3

Amazon SNS Amazon SQS

Amazon API Amazon Microservices Gateway DynamoDB to Functions

Amazon Kinesis Amazon S3

Amazon SNS Amazon SQS

Amazon API Amazon Microservices Gateway DynamoDB to Functions

Amazon Kinesis Amazon S3

Amazon SNS Amazon SQS

Amazon API Amazon Microservices Gateway DynamoDB to FunctionsEphemeral

Amazon Kinesis Amazon S3

Amazon SNS Microservices to Ephemeral Functions Amazon SQS

Amazon API Microservices Gateway to Ephemeral Functions Amazon API Amazon Microservices Gateway DynamoDB to Ephemeral Functions

Amazon Kinesis Amazon API Microservices Gateway to Ephemeral Functions

Amazon S3

Amazon SNS Amazon SQS

Amazon API Amazon Microservices Gateway When the system is DynamoDB idle, it shuts down and to Ephemeral costs nothing to run Functions

Amazon Kinesis Amazon S3

Amazon SNS Evolution of Business Logic

Monolith Microservices Functions Open Source AI at AWS, and Apache MXNet

What I’m currently working on…

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Open Source at AWS

• Applications of

• Apache MXNet Overview

• Apache MXNet API

• Code and DIYrobocars

• Tools and Resources Amazon Open Source Contributions

Linux & Drivers Xen Apache Bigtop Kuromoji PostgreSQL ElasticSearch Docker CBMC Boto Apache Zeppelin Apache MXNet Moses Cloudera HUE Apache Joshua Repositories Owned by AWS & Amazon https://github.com/aws https://github.com/awslabs https://github.com/amznlabs https://blox.github.io/ Blox – Container orchestration for ECS s2n – Secure replacement for openssl, used by S3 Chalice – Python serverless microframework for AWS Sockeye – Neural Machine Translation using MXNet AWS-CLI – Command line interface to AWS AWS-shell – Autocomplete based user interface for AWS-CLI AWS SDKs for Java, Python, PHP, Go, Ruby etc. AWS Mobile SDKs for iOS, Android etc. Cloud9 Ace – Cloud based interactive development editor Cfncluster – Build and manage HPC clusters ION – Amazon data libraries Open Source Made Easy by AWS Services

Amazon EMR – Elastic Hadoop Amazon ElastiCache – Memcached and Redis Amazon RDS – MySQL Amazon RDS – PostgreSQL Amazon Aurora – scalable back-end for MySQL & PostgreSQL AWS OpsWorks – Chef Automate Server Amazon ECS – Docker Orchestration Amazon CloudSearch Amazon Elasticsearch

Database Migration Service – Convert schemas and move data from proprietary databases to MySQL and Postgres. 22,000+ done. Applications of Deep Learning Why is Deep Learning Taking Off Now?

Everything is digital: large data sets are available • Imagenet: 14M+ labeled images - http://www.image-net.org/ • YouTube-8M: 7M+ labeled videos - https://research.google.com/youtube8m/ • AWS public data sets - https://aws.amazon.com/public-datasets/ The parallel computing power of GPUs make training possible • Simard et al (2005), Ciresan et al (2011) • State of the art networks have hundreds of layers • ’s Chinese speech recognition: 4TB of training data, +/- 10 Exaflops Cloud and elasticity make training affordable • Grab a lot of resources for fast training, then release them • Using a DL model is lightweight: you can do it on a Raspberry Pi Autonomous Driving Systems AI on AWS Today

• Zillow • DataXu –Zestimate (using ) –Leverage automated & unattended ML at • Howard Hughes Corp large scale (Amazon EMR + Spark) –Lead scoring for luxury real estate purchase predictions • Mapillary • FINRA –Computer vision for crowd sourced maps –Anomaly detection, sequence matching, regression • Hudl analysis, network/tribe analysis –Predictive analytics on sports plays • Netflix • Upserve –Recommendation engine –Restaurant table mgmt & POS for forecasting • Pinterest customer traffic –Image recognition search • TuSimple • Fraud.net –Computer Vision for Autonomous Driving –Detect online payment fraud • Clarifai – Computer Vision APIs The Challenge For Artificial Intelligence: SCALE

Data Training Prediction

PBs of existing data Tons of GPUs Lots of GPUs and CPUs Aggressive migration Elastic capacity Serverless New data created on AWS Pre-built images At the Edge, On IoT Devices Amazon AI: Democratized Artificial Intelligence

Amazon Amazon Amazon More to come Easy to Use AI Rekognition Polly Lex in 2017 Services

Amazon Amazon Elastic Spark & More to come MapReduce SparkML in 2017 AI Platform

Apache Open Source TensorFlow CNTK MXNet AI Engines

AWS More to Cloud P2 16xGPU G2 & Elastic ECS Lambda Greengrass FPGA Instance GPU come Hardware on IoT in 2017 Demand AWS Deep Learning AMI

Up to~40k CUDA cores on P2 Apache MXNet TensorFlow Theano Caffe & Caffe 2 Torch Keras One-Click GPU & CPU Pre-configured CUDA drivers, Open Source MKL Deep Learning Anaconda, Python3 Installed, Tested, Tuned Ubuntu or Amazon Bootable Machine Image + CloudFormation template

+ Container Image Apache MXNet Overview Apache MXNet

Programmable Portable High Performance Simple syntax, Highly efficient Near linear scaling multiple languages models for mobile across hundreds of GPUs and IoT

Open Governance Tuned On AWS Accepted into the Optimized performance and scalability on AWS GPUs Apache MXNet | Collaborations and Community Diverse Community | Amazon Contributions 35% Bing Su (Apple) Mu Li (AWS) Tianqi Chen (UW) https://wiki.apache.org/incubator/May2017 Eric Xie (AWS) Sergey Kolychev (Whitehat) Sandeep K. (AWS) In the last month, excluding merges, 51 Yizhi Liu (Mediav) authors have pushed 165 commits to master Jian Guo (TuSimple) and 180 commits to all branches. Yao Wang (AWS) Chiyuan Zhang (MIT) Tianjun Xiao (Tesla) Xingjian Shi (HKUST) Liang Depeng (Sun Yat-sen U.) Nan Zhu (MSFT) Yutian Li (Stanford)

*As of 3/30/17 0 20,000 40,000 60,000 **Amazon @35% of Contributions Deep Learning Framework Comparison

Apache MXNet TensorFlow Cognitive Toolkit

N/A – Apache Industry Owner Community

Imperative and Programmability Declarative only Declarative only Declarative

R, Python, Scala, Julia, Python, ++. Language Python, C++, C++. Javascript, Go, Experimental Go and Brainscript. Support Matlab, ... Java

Code Length| 44 sloc 107 sloc using TF.Slim 214 sloc AlexNet (Python)

Memory Footprint 2.6GB 7.2GB N/A (LSTM) Apache MXNet | Amazon Strategy

Integrate with Foundation for Leverage the AWS Services AI Services Community Community brings Bring Scalable Deep Higher Velocity for AI velocity and Learning to EMR, Services, Research innovation with no Lambda, ECS and and Core AI industry ownership many more.. Development Safest for long term investment Deep Learning using MXNet @Amazon

• Applied Research • Q&A Systems • Core Research • Supply Chain Optimization • Alexa • Advertising • Demand Forecasting • Machine Translation • Risk Analytics • Video Content Analysis • Search • Robotics • Recommendations • Lots of Computer Vision.. • AI Services | Rek, Lex, Polly • Lots of NLP/U..

*Teams are either actively evaluating, in development, or transitioning to scale production Apache MXNet API Apache MXNet | The Basics • NDArray: Manipulate multi-dimensional arrays (tensors) in a command line paradigm (imperative).

• Symbol: Symbolic expression for neural network flows (declarative).

• Module: Intermediate-level and high-level interface for neural network training and inference.

• Loading Data: Feeding data into training/inference programs.

• Mixed Programming: Training algorithms developed using NDArrays in concert with Symbols.

https://medium.com/@julsimon/an-introduction-to-the-mxnet-api-part-1-848febdcf8ab Imperative Programming

PROS • Straightforward and flexible. import numpy as np a = np.ones(10) • Take advantage of language b = np.ones(10) * 2 native features (loop, c = b * a condition, debugger). d = c + 1 • E.g. Numpy, Matlab, Torch, …

Easy to tweak CONS in Python, , Perl etc. • Hard to optimize Declarative Programming

PROS A = Variable('A') • More chances for B = Variable('B') A B optimization C = B * A X 1 D = C + 1 • Separates flow structure f = compile(D) + • E.g. TensorFlow, Theano, d = f(A=np.ones(10), Caffe B=np.ones(10)*2) C can share memory with CONS D because C is deleted • Less flexible, hard to debug later AnatomyDeep Learning of a Deep Learning Models Model

mx.model.FeedForward model.fit

Input Output mx.sym.SoftmaxOutput

Input Weights 1 0.2 3 -0.1 2 mx.sym.Activation(data, act_type="xxxx") ...... 4 0.7 Image Image Segmentation mx.sym.FullyConnected(data, num_hidden=128) "sigmoid"

1 1 1 1 0 1 3 Face Search 0 0 0 Video "tanh"

mx.sym.Convolution(data, kernel=(5,5), num_filter=20) "relu" 4 2 4 2 Neural Art Max = 4 Avg = 2 Speech 2 0 2 0 “People Riding Bikes” mx.sym.Pooling(data, pool_type="max", kernel=(2,2), stride=(2,2) "softrelu" Image Caption

Bicycle, People, Events Road, Sport Image Labels

“People Riding Bikes” lstm.lstm_unroll(num_lstm_layer, seq_len, len, num_hidden, num_embed) “Οι άνθρωποι ιππασίας ποδήλατα” Text 0.2 Machine Translation Queen -0.1 ... cos(w, queen) = cos(w, king) - cos(w, man) + cos(w, woman) 0.7

mx.symbol.Embedding(data, input_dim, output_dim = k) Adrian’s @DIYrobocar Project – RC Truck+Raspberry Pi+EC2 Camera feeds Donkey software with Keras or MXNet model, trained on EC2 instance Will Roscoe wrote Donkey, Sunil Mallya added MXNet, Adrian built Donkey Hoté https://github.com/wroscoe/donkey

Donkey Donkey Additional Resources MXNet Resources • MXNet Blog Post | AWS Endorsement • http://www.allthingsdistributed.com/2016/11/mxnet-default-framework-deep-learning-aws.html • Read up on MXNet and Learn More: • mxnet.io https://github.com/dmlc/mxnet/ • Re:Invent MXNet Recommender Systems Talk by Leo Dirac • https://www.portal.reinvent.awsevents.com/connect/sessionDetail.ww?SESSION_ID=8591 AWS Resources: follow Julien Simon @julsimon, Sunil Mallya @sunilmallya • Deep Learning AMI • https://aws.amazon.com/marketplace/pp/B01M0AXXQB | Amazon Linux • https://aws.amazon.com/marketplace/pp/B06VSPXKDX | Ubuntu • CloudFormation Template Instructions • https://github.com/dmlc/mxnet/tree/master/tools/cfn • Deep Learning Benchmark • https://github.com/awslabs/deeplearning-benchmark • MXNet on Lambda • https://github.com/awslabs/mxnet-lambda • MXNet on ECS/Docker • https://github.com/awslabs/ecs-deep-learning-workshop Thank You! Adrian Cockcroft @adrianco

Animations by Silver Fox