Serverless, NoOps and DevOps are All Marketing Bunkus! Scott Thomson, Customer Solutions & Innovation DevOps Talks, MEL March 2018
Confidential & Proprietary Serverless = No Ops, Right??
✓ Scalable Ops
Confidential & Proprietary Shrinking The Scaling Point
Your Your Code Code Your Your Code Fns Your Code Code PaaS Containers Virtual Machines Dedicated Physical Servers
Confidential & Proprietary Agenda
● Introduction ● Planning process and Trade-offs ● High Level Architecture ● Architecture deep dives ○ Serving infrastructure ○ Analytics and Recommendation Engine ○ Supporting tooling
Confidential & Proprietary
Introduction Case Study: ProductNow Company profile
● Existing online retailer ● Large product catalog and high volume of transactions ● Looking to move to a new way to sell products
Confidential & Proprietary Initiative Priorities
● Minimize operational complexity ● Prioritize time-to-market over cost optimization ● Continuous Delivery ● Leverage existing product catalog and data where possible
Confidential & Proprietary ProductNow Bot
● Mobile and Chat first shopping experience ● Machine learning algorithms for recommending products and guiding users ● Conversational ● Globally available
Confidential & Proprietary Product Team
● ~20 devs assigned to work towards a v1.0 ● Polyglot developers pulled from existing teams ○ Node.js, Go, Java, Python ● Machine learning embedded in team ● DevOps team members to support software delivery and reliability
Confidential & Proprietary Key Decisions Application Architecture
Confidential & Proprietary Conway’s law
"organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations." — M. Conway[ref]
Confidential & Proprietary Conway’s law
Confidential & Proprietary ProductNow Org Structure
VP Eng or CTO
Web Lead Data Lead DevOps
Monitoring & Serving Chatbot ML Analytics Tooling Logging
Confidential & Proprietary High Level System Architecture
Chat Bot Web Mobile
Serving Infrastructure Tooling
CI/CD Monitoring Logging
Data Processing
Analytics Recommendation Engine
Confidential & Proprietary Choosing Technologies
Serving Data Stores Data Processing Tooling
Compute Engine Bigtable BigQuery Spinnaker Kubernetes Engine Datastore Pub/Sub Jenkins App Engine Cloud Storage Dataproc Docker Firebase CloudSQL Dataflow Stackdriver Cloud Functions Spanner ML Engine Terraform Firebase RTDB Genomics Deployment Manager Firestore Chef/Puppet/Ansible Forseti Vault CloudKMS
Confidential & Proprietary Cloud Functions Desktop & Mobile Web Firebase
Cloud Cloud Pub/Sub Dataflow
App
Serving Infrastructure BigQuery
Cloud Storage
Stackdriver Container Builder
Cloud Machine Cloud IAM Deployment Learning Manager Tooling Data Processing
Confidential & Proprietary Serving Infrastructure AI powered Chat
Chat Bot Web Mobile
Serving Infrastructure Tooling
CI/CD Monitoring Logging
Data Processing
Analytics Recommendation Engine
Confidential & Proprietary Serving web content trade-offs
App Engine App Engine Cloud Functions Firebase Hosting Standard Flex
Pros Pros Pros Pros Works for Simpler to manage Serverless firebase deploy *.googleplex.com Less opinionated Scales to zero Free HTTPS Global CDN Cons Cons Cons zero-cost Lots of infrastructure You're running VMs Opinionated that is not always well Does not scale to 0 Javascript-only Cons hidden Express.js is not easy Only for static content Opinionated
Confidential & Proprietary Recommendation: Firebase Hosting Serving web content trade-offs
App Engine App Engine Cloud Functions Firebase Hosting Standard Flex
Pros Pros Pros Pros Works for Simpler to manage Serverless firebase deploy *.googleplex.com Less opinionated Scales to zero Free HTTPS Global CDN Cons Cons Cons zero-cost Lots of infrastructure You're running VMs Opinionated that is not always well Does not scale to 0 Javascript-only Cons hidden Express.js is not easy Only for static content Opinionated
Confidential & Proprietary Varying content If we have a static site, how do we show users their own data? Exactly like we do on an iOS + Android application, client-side rendering.
● Rich client interactions without reloading the page ● A bit slower to load initially but faster from there ● Works with statically hosted content ● Not great for search engines
Confidential & Proprietary Client-side interaction trade-offs
Traditional API Firebase Realtime Cloud Firestore Database
Pros Pros Pros Works out of the box It can be whatever you want Works out of the box
Scales horizontally You only write app code
Cons Cons You have to scale it, secure it, Cons It's in beta make it fancy Scalability is limited
Confidential & Proprietary Recommendation: Cloud Firestore Client-side interaction trade-offs
Traditional API Firebase Realtime Cloud Firestore Database
Pros Pros Pros Works out of the box It can be whatever you want Works out of the box
Scales horizontally You only write app code
Cons Cons You have to scale it, secure it, Cons It's in beta make it fancy Scalability is limited
Confidential & Proprietary Cross-platform SDKs Firebase SDKs are out of the box for iOS, Android, and many more platforms.
All of these work with your existing Firestore back-end. You don't need to build client-side API infrastructure, you just build your app + let Firebase take care of auth, retries, encoding, and all the complexity.
Now our iOS and web codebases, while in different languages look very similar.
Confidential & Proprietary Real-time
Confidential & Proprietary Effortless offline persistence
Confidential & Proprietary What about our bot... It’s just another client!
● Listen to Firestore in the same manner? ○ Firebase Admin SDK is available for “server-side” use
● Mediate with pub/sub and Cloud Functions ○ SDKs have different capabilities based on language ○ Cloud Functions can listen with “wildcards” in their path
Confidential & Proprietary Confidential & Proprietary To data-processing pipeline WEBHOOK
Confidential & Proprietary Progressive Web Application Web Applications which deliver an app-like experience to their end-users.
● Offline-enabled ● Responsive ● Re-engageable ● Installable ● Linkable
Confidential & Proprietary Firebase Hosting + Cloud Firestore
● Static content, serve with Firebase Hosting ○ Load data client-side, just like native apps ○ Firebase SDKs give us a lot for “free”
● Firebase has server (admin) SDKs for clients like the chatbot ○ The chatbot is just “another client”, depending on language choice and data-structure pub/sub might be best choice
● Cloud Function triggers + pub/sub to hook up to rest of infrastructure
Confidential & Proprietary Data Processing Data Processing Architecture
Batch
Data Processing
Analytics Recommendation Engine
Streaming
Confidential & Proprietary Batch
Input from on-prem Input from on-prem Input from listing data, and data warehouse data warehouse analytics data
Product Catalog Product Catalog Image Recommendation Listing Ingest Ingest Engine Training
Output to cloud Output to cloud Output trained model database object store
Confidential & Proprietary Microbatch and Sync
High-level architecture
On-Prem Systems
Compute Data drop off Processing Online Database
Storage
Confidential & Proprietary GCE Filer
Pros: ● POSIX Compatible ● Integrates with many processing solutions Solution Trade-offs Cons: ● Self-managed ● Scaling complexity ● Always-on and billed Technologies for data drop off
Cloud SQL
Pros: ● Ideal for mirroring compatible SQL engines. Data drop off ● Transactional ● Hosted service Cons: GCE Filer ● Not ideal for data drop off, data management overhead. Compute Engine ● Scaling complexity ● Always-on and billed
Cloud SQL Cloud Storage
Pros: ● Serverless, pay as you go Cloud ● High throughput Storage ● Life cycle policies Cons: ● Object store not POSIX ● Object write consistency
Confidential & Proprietary GCE Filer
Pros: ● POSIX Compatible ● Integrates with many processing solutions Solution Trade-offs Cons: ● Self-managed ● Scaling complexity ● Always-on and billed Technologies for data drop off
Cloud SQL
Pros: ● Ideal for mirroring compatible SQL engines. Data drop off ● Transactional ● Hosted service Cons: GCE Filer ● Not ideal for data drop off, data management overhead. Compute Engine ● Scaling complexity ● Always-on and billed
Cloud SQL Cloud Storage
Pros: ● Serverless, pay as you go Cloud ● High throughput Storage ● Life cycle policies Cons: ● Object store not POSIX ● Object write consistency
Confidential & Proprietary Hadoop on GCE
Pros: ● Easy to move existing pipelines and ETL workloads. ● Integrates with many data sources Solution Trade-offs Cons: ● Self-managed ● Scaling complexity ● Always-on and billed Technologies for Data Processing
Cloud Dataflow Data processing Pros: ● Managed processing pipelines and auto-scaling. Hadoop ● Open source Apache Beam for Batch and Streaming Compute Engine processing models ● Can run locally, hosted or on Spark. Cons: ● New programming model if not familiar with Beam. Cloud ● Only supports Java and Python SDKs Dataflow
Cloud Dataproc
Cloud Pros: Dataproc ● Fully managed Apache Spark and Hadoop. ● Can run dataflow pipelines. ● Integrates with many data sources. Cons: ● Complex streaming model.
Confidential & Proprietary Hadoop on GCE
Pros: ● Easy to move existing pipelines and ETL workloads. ● Integrates with many data sources Solution Trade-offs Cons: ● Self-managed ● Scaling complexity ● Always-on and billed Technologies for Data Processing
Cloud Dataflow Data processing Pros: ● Managed processing pipelines and auto-scaling. Hadoop ● Open source Apache Beam for Batch and Streaming Compute Engine processing models ● Can run locally, hosted or on Spark. Cons: ● New programming model if not familiar with Beam. Cloud ● Only supports Java and Python SDKs Dataflow
Cloud Dataproc Cloud Dataproc Pros: ● Fully managed Apache Spark and Hadoop. ● Can run dataflow pipelines. ● Integrates with many data sources. Cons: ● Complex streaming model.
Confidential & Proprietary Streaming
Clickstream Data
Input from application analytics telemetry
Streaming Data Processing
Output to clickstream database
Confidential & Proprietary Solution Trade-offs Technologies for streaming data
Data Ingestion Data Processing
Kafka on GCE Cloud Cloud Cloud Compute Engine Pub/Sub Dataflow Dataproc
Cloud Dataflow Cloud Dataproc Kafka on GCE Cloud Pub/Sub Pros: Pros: Pros: Pros: ● Handles unordered data ● Familiar Spark SDK ● Simple Producer Interface ● Managed global service ● Managed and scalable ● Managed service ● High throughput ● REST API ● Integrations with Pub/Sub ● Lots of integrations ● Ordering within partitions ● Pay as you go and Kafka Cons: Cons: Cons: Cons: ● Complex streaming model ● No REST API ● No ordering of events ● New model for traditional ● No event time ordering ● Self-managed Spark Streaming devs. ● Scaling complexity
Confidential & Proprietary Solution Trade-offs Technologies for streaming data
Data Ingestion Data Processing
Kafka on GCE Cloud Cloud Cloud Compute Engine Pub/Sub Dataflow Dataproc
Cloud Dataflow Cloud Dataproc Kafka on GCE Cloud Pub/Sub Pros: Pros: Pros: Pros: ● Handles unordered data ● Familiar Spark SDK ● Simple Producer Interface ● Managed global service ● Managed and scalable ● Managed service ● High throughput ● REST API ● Integrations with Pub/Sub ● Lots of integrations ● Ordering within partitions ● Pay as you go and Kafka Cons: Cons: Cons: Cons: ● Complex streaming model ● No REST API ● No ordering of events ● New model for traditional ● No event time ordering ● Self-managed Spark Streaming devs. ● Scaling complexity
Confidential & Proprietary Batch and Streaming Architecture
Listing Data Cloud Firestore Mobile Users
Cloud Cloud Clickstream Data On-Prem Systems Pub/Sub Dataflow Cloud BigQuery
Compute Cloud Storage
Storage
Confidential & Proprietary Recommendation Engine Trade-offs
Content Based Collaborative Knowledge Filtering Based
Select most Behavior and Ask user for popular products pattern driven preferences
Confidential & Proprietary Recommendation Engine Trade-offs
Content Based Collaborative Knowledge Filtering Based
Select most Behavior and Ask user for popular products pattern driven preferences
Confidential & Proprietary Weighted Alternating Least Squares
● Method that uses large matrix factorization to derive similarities between sparse datasets. ● Useful for recommendation systems with very little initial user data like reviews or click data. ● Easy to run in distributed environment and optimize model performance with Tensorflow.
Confidential & Proprietary Solution Trade-offs Technologies for Machine Learning
Model Training Model Serving
Tensorflow/MXNet/Caffe Cloud Machine App Engine Cloud Machine Compute Engine Learning Flexible Learning
Tensorflow/MXNet/Caffe on GCE Cloud Machine Learning Custom Serving on App Engine Cloud Machine Learning
Pros: Pros: Pros: Pros: ● Control over framework. ● Fully managed Tensorflow ● Full control over serving ● Hosted solution, managed ● Good for existing models ● Automatic config of API. API for Tensorflow models. in other SDKs. GPU/CPU. ● Security with Endpoints. ● Integration with Cloud ML Cons: ● Integrated monitoring ● Easy to deploy. Cons: ● Self-hosted, operational Cons: Cons: ● Only supports Tensorflow overhead. ● Only supports Tensorflow ● Different workflow than ● API more opinionated traditional deployments
Confidential & Proprietary Solution Trade-offs Technologies for Machine Learning
Model Training Model Serving
Tensorflow/MXNet/Caffe Cloud Machine App Engine Cloud Machine Compute Engine Learning Flexible Learning
Tensorflow/MXNet/Caffe on GCE Cloud Machine Learning Custom Serving on App Engine Cloud Machine Learning
Pros: Pros: Pros: Pros: ● Control over framework. ● Fully managed Tensorflow ● Full control over serving ● Hosted solution, managed ● Good for existing models ● Automatic config of API. API for Tensorflow models. in other SDKs. GPU/CPU. ● Security with Endpoints. ● Integration with Cloud ML Cons: ● Integrated monitoring ● Easy to deploy. Cons: ● Self-hosted, operational Cons: Cons: ● Only supports Tensorflow overhead. ● Only supports Tensorflow ● Different workflow than ● API more opinionated traditional deployments
Confidential & Proprietary Recommendation Engine Architecture
Batch Processing Machine Learning
Listing Data Tensorflow Training Cloud Firestore Cloud Machine Learning
Cloud Cloud Dataproc Storage
Listing Data Tensorflow Model Serving Cloud Firestore App Engine Flexible
Confidential & Proprietary Data Processing Architecture
Listing/User Data Cloud Cloud Cloud Cloud Firestore Functions Pub/Sub Dataproc Mobile Users
Cloud Clickstream Data Dataflow Cloud BigQuery On-Prem Systems
Compute Cloud Tensorflow Training Storage Cloud Machine Learning
Tensorflow Model Serving App Engine Flexible Storage
Confidential & Proprietary Tooling Some definitions Proprietary + Confidential
Continuous Integration (CI) Merge all developer working copies to a shared mainline several times a day
Continuous Delivery (CD) Ensure that your code can be shipped at any time. Requires CI.
Continuous Deployment (also CD) Every change is automatically deployed to production. Requires Continuous Delivery.
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem CI/CD in practice Proprietary + Confidential
Workflows will vary by branch management, staging envs, tests, platform, etc
git build + deploy to trigger int tests CI flow unit test test env Local env, Feature branch/PR CD
build + artifact bake/ deploy to trigger int tests trigger sys tests unit test registry package qa/staging/prod Trunk, Release branch/tag
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem CI/CD Tools Proprietary + Confidential
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem CI workflow with Container Builder
triggers build and test runner Different stage types, can run any number, in any sequence, and in parallel GCR
GitHub BB CSR
GCS custom bazel non-bazel container jobs CLI scripts builds builds commands manual CI systems current Result storage roadmap + dashboard
cron webhook registry Build status
PR Repo 3rd party dependency cloud Proprietary + Confidential Spinnaker: orchestrate deployment pipelines
Spinnaker Pipeline to promote and release image to production
Find image Deploy to prod Scale down Tag source and Destroy Start from test (red/black) Smoke test old prod manual approval old prod
Wait 30 mins Wait 2 hrs
Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem Spinnaker integrations
Security
Build/Bake Providers
Triggers Monitoring
Storage Notification
Stages
Confidential & Proprietary Solution Trade-offs
Pro - Hosted Pro - Easy to get started - Intuitive UI, easy learning curve - Integrations with many deployment Con platforms - Only simple deployment flows are - Streamlined for complex deployments possible - Single tool for team to learn Con - No managed version (yet) - Does not support Firebase
Confidential & Proprietary Solution Trade-offs Recommendation: Container Builder
Pro - Hosted Pro - Easy to get started - Intuitive UI, easy learning curve - Integrations with many deployment Con platforms - Only simple deployment flows are - Streamlined for complex deployments possible - Single tool for team to learn Con - No managed version (yet) - Does not support Firebase
Confidential & Proprietary Container Builder build flow
GCS Logging/Monitoring
Key Questions: Which cloud services? How many resources? What key applications? How much log data do you generate? How long do you retain?
Pro Pro Pro - Native to GCP - Intuitive UI, easy learning curve - Familiar, industry standard - Integrated monitoring/logging - More integrations - Advanced log search functionality - Fully managed, works at scale - Cloud-based, managed service - Hybrid: 1 UI for on-prem/cloud - Trace, Error Reporting, Debug - Splunk Add-On for GCE Con Con Con - Monitoring only, need separate - Logging only - Switching costs, unfamiliarity solution for logging - Self-hosted - Higher licensing cost - Higher TCO cost
Confidential & Proprietary Logging/Monitoring Recommendation: Stackdriver
Key Questions: Which cloud services? How many resources? What key applications? How much log data do you generate? How long do you retain?
Pro Pro Pro - Native to GCP - Intuitive UI, easy learning curve - Familiar, industry standard - Integrated monitoring/logging - More integrations - Advanced log search functionality - Fully managed, works at scale - Cloud-based, managed service - Hybrid: 1 UI for on-prem/cloud - Trace, Error Reporting, Debug - Splunk Add-On for GCE Con Con Con - Monitoring only, need separate - Logging only - Switching costs, unfamiliarity solution for logging - Self-hosted - Higher licensing cost - Higher TCO cost
Confidential & Proprietary IAM/Admin
Key questions: What is the source of truth for your user credentials today? Require separate IT and Dev access?
Using On-Prem LDAP Google Authentication Using SSO (e.g. Okta)
Pro Pro Pro - Can provision accounts (one-way) with - No LDAP or sync to manage - Integrates with other tools GCDS - no programming - Google-stored credentials: Google - No additional authentication to set up - Can sync passwords (PasswordSync) manages authentication, stores - Can enforce MFA, custom security the passwords controls Con - Google two-factor authentication - Have to manage own LDAP Con - Requires LDAP admin skills + access Con - Single point of failure Google-only, does not integrate with other - Additional system to manage tools for SSO - No advanced Google security features
Confidential & Proprietary Recommendation: Cloud Identity, IAM/Admin Using on-prem LDAP with Google Auth
Key questions: What is the source of truth for your user credentials today? Require separate IT and Dev access?
Using on-prem LDAP Moving to Google Using SSO (e.g. Okta) Accounts
Pro Pro Pro - Can provision accounts (one-way) with - No LDAP or sync to manage - Integrates with other tools GCDS - no programming - Google-stored credentials: Google - No additional authentication to set up - Can sync passwords (PasswordSync) manages authentication, stores - Can enforce MFA, custom security the passwords controls Con - Google two-factor authentication - Have to manage own LDAP Con - Requires LDAP admin skills + access Con - Single point of failure Google-only, does not integrate with other - Additional system to manage tools for SSO - No advanced Google security features
Confidential & Proprietary Deployment Key Questions: Are you on multiple platforms today? Do you have an Ops team to run a self-hosted service? Which deployment tools do you use today?
Deployment Manual scripting Manager
Pro Pro Pro - Integrated with GCP - UI, IAM, labels - Multi-cloud - Easier to get started - Managed service, No Ops - Familiarity - No new system to learn or install - New GCP functionality arrives - Declarative, immutable immediately Con - Declarative, immutable Con - Imperative, mutable - Non-native to GCP - Error-prone, toil intensive Con - Lag time to get new GCP functionality - Cannot roll back easily - GCP only - Self-hosted -> manage + pay for a server - Hard to automate - Need a separate platform for multiple - Not supported by GCP, no SLA clouds
Confidential & Proprietary Deployment Recommendation: Deployment Manager Key Questions: Are you on multiple platforms today? Do you have an Ops team to run a self-hosted service? Which deployment tools do you use today?
Deployment Manual scripting Manager
Pro Pro Pro - Integrated with GCP - UI, IAM, labels - Multi-cloud - Easier to get started - Managed service, No Ops - Familiarity - No new system to learn or install - New GCP functionality arrives - Declarative, immutable immediately Con - Declarative, immutable Con - Imperative, mutable - Non-native to GCP - Error-prone, toil intensive Con - Lag time to get new GCP functionality - Cannot roll back easily - GCP only - Self-hosted -> manage + pay for a server - Hard to automate - Need a separate platform for multiple - Not supported by GCP, no SLA clouds
Confidential & Proprietary Strategy - Forseti Open Source Toolkit
Google’s own cloud security tools; built and shared with everyone
Inventory Policy Scanning Enforcement
Build and store an inventory Find policy violations Apply changes
- Projects - Organization level - Inbound GCE firewall - Roles - Project level - ... - Members - .... - ….
Proprietary + Confidential © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. Security
Key Questions: Do you have the skill set and resources to write your own security tooling flow? Do you need custom tooling that isn't provided by Forseti?
Forseti Custom Tooling No Tooling
Pro Pro Pro - Open Source - based on Google's - More customizable No work to be done own tools, shared with everyone - Can still use Google managed - Integrated suite services (Audit logs -> Pub/Sub -> Con - Google (and community) investing Cloud Functions) Pretty insecure... heavily - new features on the way Con Con Have to code, configure and deploy Less customization yourself
Confidential & Proprietary Security Recommendation: Forseti
Key Questions: Do you have the skill set and resources to write your own security tooling flow? Do you need custom tooling that isn't provided by Forseti?
Forseti Custom Tooling No Tooling
Pro Pro Pro - Open Source - based on Google's - More customizable No work to be done own tools, shared with everyone - Can still use Google managed - Integrated suite services (Audit logs -> Pub/Sub -> Con - Google (and community) investing Cloud Functions) Pretty insecure... heavily - new features on the way Con Con Have to code, configure and deploy Less customization yourself
Confidential & Proprietary NoDev Google is an AI company
Google internal projects containing Brain Model Used across products:
AI First Unique project directories
Time
Confidential & Proprietary State of the Industry: Lack of Expertise
Very few users today can create a custom ML model. Data Scientists
To democratize AI, Developers we need to make AI accessible to millions more
The Reality of ML: Complex & Time Intensive
DATA TUNE ML MODEL PREPROCESSING ML MODEL DESIGN EVALUATE DEPLOY UPDATE PARAMETERS Cloud AutoML - automatically create Machine Learning Models
DATA TUNE ML MODEL PREPROCESSING ML MODEL DESIGN EVALUATE DEPLOY UPDATE PARAMETERS Google’s Machine Learning Ecosystem
Perception APIs Auto ML Training & Prediction Google-trained models Easy to use, composable, End to End Platform for for your app learn to learn technology your data & models
App Developers Analysts Data Engineers / Data Scientists
Using Machine Learning to Explore Neural Network Architecture - Quoc Le & Barret Zoph, Research Scientists, Google Brain team Cloud AutoML Vision
Cloud AutoML Handbag Shoe Hat
g.co/next18
July 24-27, 2018 San Francisco Serverless = No Ops, Right??
✓ Scalable Ops
Confidential & Proprietary Shrinking The Scaling Point
Your Your Code Code Your Your Code Fns Your Code Code PaaS Containers Virtual Machines Dedicated Physical Servers
Confidential & Proprietary Google Melbourne [email protected]