ML and Data Analytics with Google Cloud Platform
The power of machine learning on any data, any size
Alex Osterloh Solution Engineer, Google [email protected] @BigDataWizard An Evolving Cloud
1st Wave 2nd Wave Colocation Virtualized Data Centers ?
Your kit, someone else’s Standard virtual kit, for rent. building. Still yours to manage. Yours to manage.
Google Cloud Platform 5 An Evolving Cloud
1st Wave 2nd Wave 3rd Wave Colocation Virtualized Automated Services Data Centers Scalable Data
Your kit, someone else’s Standard virtual kit, for rent. Focus in insight, building. Still yours to manage. not infrastructure Yours to manage.
Google Cloud Platform 6 An Evolving Cloud
1st Wave 2nd Wave 3rd Wave Colocation Virtualized Automated Services Data Centers Scalable Data
Your kit, someone else’s Standard virtual kit, for rent. Focus in insight, building. Still yours to manage. not infrastructure Yours to manage.
Google Cloud Platform 7 “Google is living a few years in the future and sending the rest of us messages” Doug Cutting Chief Architect Cloudera Google Research in Data Technologies
F1
Spanner
MapReduce Dremel Flume
Millwheel GFS BigTable Colossus Megastore PubSub
2002 2004 2006 2008 2010 2012 2013
Google Research Publications referenced are available here: http://research.google.com/pubs/papers.html The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 2009 http://research.google.com/pubs/pub35290.html 10+ Years of Tackling Data Problems
Open Apache Source Beam
Map Flume Google GFS BigTable Dremel PubSub Millwheel TensorFlow Papers Reduce Java
Google Cloud Products BigQuery Pub/Sub Dataflow Bigtable ML
2002 2004 2005 2006 2008 2010 2012 2014 2015 2016
Google Cloud Platform We don’t really use MapReduce anymore
“ Urs Hölzle SVP Technical Infrastructure Google ”
Google Cloud Platform Confidential & Proprietary 11 Big Data
ML
Management Storage Developer Tools Mobile
Compute
Services
Networking Google Cloud Platform Confidential & Proprietary 12 The Big Data Lifecycle
Capture Store Process Analyze
Pub/Sub Storage Dataflow BigQuery SQL Dataflow Datastore Cloud ML BigTable The Big Data Lifecycle
Capture Store Process Analyze Learn
Pub/Sub Storage Dataflow BigQuery SQL Dataflow Datastore Cloud ML BigTable Enterprise Big Data Architecture on Google
Applications + Reports PubSub Bigtable
Your Data Dataflow
Cloud Storage BigQuery
BI Tools
GCS-Hadoop Connector Spreadsheets
unmanaged Fast ETL Regex Coworkers JSON Hadoop on UDFs Compute Engine
Google Cloud Platform Confidential & Proprietary 15 Enterprise Big Data Architecture on Google
Applications + Reports PubSub Bigtable
Your Data Dataflow
Cloud Storage BigQuery
BI Tools
GCS-Hadoop Connector Spreadsheets
unmanaged managed Fast ETL Regex Coworkers JSON Hadoop on Cloud UDFs
Compute Engine Dataproc
Google Cloud Platform Confidential & Proprietary 16 http://blog.shinetech.com/2015/10/14/google-cloud- dataproc-and-the-17-minute-train-challenge/ Applications that can see, hear & understand
Google confidential | Do not distribute Examples of applying ML
Input
Neural Networks
Output
Google confidential | Do not distribute Machine Learning Use Cases
Structured Data Unstructured Data
Classification/ Regression Image Analytics ● Customer Churn Analysis ● Identify damaged shipments ● Product Diagnostics ● Explicit Content Classification ● Forecasting ● Identify “styles” in images
Recommendation Text Analytics ● Content Personalization ● Call Center log analysis ● Product X-Sells/Up-sells ● Language Identification ● Topic Classification Anomaly Detection ● Sentiment Analysis ● Fraud Detection ● Asset Sensor Diagnostics ● Log Metric Anomalies The Spectrum of Machine Learning
Use pretrained models
Cloud Cloud Cloud Translate API Vision API Speech API Or use your own data to train models The Machine Learning Spectrum
Industry / applications
TensorFlow Cloud Machine Learning Machine Learning APIs
Academic / research Translate API
Vision API Cloud Datalab OSS SDK Managed Infrastructure Notebook experience Speech API
Google Cloud Platform Confidential & Proprietary 24 Google Cloud Vision API
● Detect faces, landmarks, logos, text, and more ● Perform sentiment analysis ● Straightforward REST API ● Works on a base64-encoded image ● Connects to Google Cloud Storage ● Returns label, score pair
Google Cloud Platform Confidential & Proprietary 25 Google Cloud Platform Confidential & Proprietary 26 Google Cloud Platform Confidential & Proprietary 27 Google Cloud Speech API
● Pass raw audio data and language
● Returns a transcript of the audio data
● Works across >80 languages
● Receive response in streaming or non- streaming
Google Cloud Platform Confidential & Proprietary 28 Speech API
● Enable voice interface to devices and applications ● Transcribe audio from stored media ● Multiple language support
● Access from mobile devices Click for Demo Speech API Demo
“What are you sinking about ? “
Click for Demo Google Cloud Translate API
● translate text between thousands of language pairs. ● let’s websites and programs integrate with Google Translate programmatically
Google Cloud Platform Confidential & Proprietary 31 The Machine Learning Spectrum
Industry / applications
TensorFlow Cloud Machine Learning Machine Learning APIs
Academic / research Translate API
Vision API Cloud Datalab OSS SDK Managed Infrastructure Notebook experience Speech API
Google Cloud Platform Confidential & Proprietary 32 The Machine Learning Spectrum
Industry / applications
TensorFlow Cloud Machine Learning Machine Learning APIs
Academic / research Translate API
Vision API Cloud Datalab OSS SDK Managed Infrastructure Notebook experience Speech API
Google Cloud Platform Confidential & Proprietary 33 A brief look at TensorFlow
Largest Machine Learning repository on GitHub
Operates over tensors: n-dimensional arrays Using a flow graph: data flow computation framework
● Train on CPUs, GPUs
● Run wherever you like (local, cloud, mobile)
Google Cloud Platform Confidential & Proprietary 34 A brief look at TensorFlow
Largest Machine Learning repository on GitHub
Operates over tensors: n-dimensional arrays Using a flow graph: data flow computation framework
● Train on CPUs, GPUs
● Run wherever you like (local, cloud, mobile)
Google Cloud Platform Confidential & Proprietary 35 The Machine Learning Spectrum
Industry / applications
TensorFlow Cloud Machine Learning Machine Learning APIs
Academic / research Translate API
Vision API Cloud Datalab OSS SDK Managed Infrastructure Notebook experience Speech API
Google Cloud Platform Confidential & Proprietary 36 What Cloud Machine Learning Can Do
● Fully managed service
● Train using a custom Tensor Flow graph
● Batch and online predictions, at scale
● Integrated Datalab experience
● Regression and classification tasks
Google Cloud Platform Confidential & Proprietary 37 Want more ? → http://bit.ly/gcp16data
Google Cloud Platform Confidential & Proprietary 38 Thank You
Alex Osterloh [email protected]