<<

Serverless, NoOps and DevOps are All Marketing Bunkus! Scott Thomson, Customer Solutions & Innovation DevOps Talks, MEL March 2018

Confidential & Proprietary Serverless = No Ops, Right??

✓ Scalable Ops

Confidential & Proprietary Shrinking The Scaling Point

Your Your Code Code Your Your Code Fns Your Code Code PaaS Containers Virtual Machines Dedicated Physical Servers

Confidential & Proprietary Agenda

● Introduction ● Planning process and Trade-offs ● High Level Architecture ● Architecture deep dives ○ Serving infrastructure ○ Analytics and Recommendation Engine ○ Supporting tooling

Confidential & Proprietary

Introduction Case Study: ProductNow Company profile

● Existing online retailer ● Large product catalog and high volume of transactions ● Looking to move to a new way to sell products

Confidential & Proprietary Initiative Priorities

● Minimize operational complexity ● Prioritize time-to-market over cost optimization ● Continuous Delivery ● Leverage existing product catalog and data where possible

Confidential & Proprietary ProductNow Bot

● Mobile and Chat first shopping experience ● algorithms for recommending products and guiding users ● Conversational ● Globally available

Confidential & Proprietary Product Team

● ~20 devs assigned to work towards a v1.0 ● Polyglot developers pulled from existing teams ○ Node.js, Go, Java, Python ● Machine learning embedded in team ● DevOps team members to support software delivery and reliability

Confidential & Proprietary Key Decisions Application Architecture

Confidential & Proprietary Conway’s law

"organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations." — M. Conway[ref]

Confidential & Proprietary Conway’s law

Confidential & Proprietary ProductNow Org Structure

VP Eng or CTO

Web Lead Data Lead DevOps

Monitoring & Serving Chatbot ML Analytics Tooling Logging

Confidential & Proprietary High Level System Architecture

Chat Bot Web Mobile

Serving Infrastructure Tooling

CI/CD Monitoring Logging

Data Processing

Analytics Recommendation Engine

Confidential & Proprietary Choosing Technologies

Serving Data Stores Data Processing Tooling

Compute Engine BigQuery Spinnaker Engine Datastore Pub/Sub Jenkins App Engine Cloud Storage Dataproc Docker CloudSQL Dataflow Stackdriver Cloud Functions ML Engine Terraform Firebase RTDB Genomics Deployment Manager Firestore Chef/Puppet/Ansible Forseti Vault CloudKMS

Confidential & Proprietary Cloud Functions Desktop & Mobile Web Firebase

Cloud Cloud Pub/Sub Dataflow

App

Serving Infrastructure BigQuery

Cloud Storage

Stackdriver Container Builder

Cloud Machine Cloud IAM Deployment Learning Manager Tooling Data Processing

Confidential & Proprietary Serving Infrastructure AI powered Chat

Chat Bot Web Mobile

Serving Infrastructure Tooling

CI/CD Monitoring Logging

Data Processing

Analytics Recommendation Engine

Confidential & Proprietary Serving web content trade-offs

App Engine App Engine Cloud Functions Firebase Hosting Standard Flex

Pros Pros Pros Pros Works for Simpler to manage Serverless firebase deploy *..com Less opinionated Scales to zero Free HTTPS Global CDN Cons Cons Cons zero-cost Lots of infrastructure You're running VMs Opinionated that is not always well Does not scale to 0 Javascript-only Cons hidden Express.js is not easy Only for static content Opinionated

Confidential & Proprietary Recommendation: Firebase Hosting Serving web content trade-offs

App Engine App Engine Cloud Functions Firebase Hosting Standard Flex

Pros Pros Pros Pros Works for Simpler to manage Serverless firebase deploy *.googleplex.com Less opinionated Scales to zero Free HTTPS Global CDN Cons Cons Cons zero-cost Lots of infrastructure You're running VMs Opinionated that is not always well Does not scale to 0 Javascript-only Cons hidden Express.js is not easy Only for static content Opinionated

Confidential & Proprietary Varying content If we have a static site, how do we show users their own data? Exactly like we do on an iOS + Android application, client-side rendering.

● Rich client interactions without reloading the page ● A bit slower to load initially but faster from there ● Works with statically hosted content ● Not great for search engines

Confidential & Proprietary Client-side interaction trade-offs

Traditional API Firebase Realtime Cloud Firestore Database

Pros Pros Pros Works out of the box It can be whatever you want Works out of the box

Scales horizontally You only write app code

Cons Cons You have to scale it, secure it, Cons It's in beta it fancy Scalability is limited

Confidential & Proprietary Recommendation: Cloud Firestore Client-side interaction trade-offs

Traditional API Firebase Realtime Cloud Firestore Database

Pros Pros Pros Works out of the box It can be whatever you want Works out of the box

Scales horizontally You only write app code

Cons Cons You have to scale it, secure it, Cons It's in beta make it fancy Scalability is limited

Confidential & Proprietary Cross-platform SDKs Firebase SDKs are out of the box for iOS, Android, and many more platforms.

All of these work with your existing Firestore back-end. You don't need to build client-side API infrastructure, you just build your app + let Firebase take care of auth, retries, encoding, and all the complexity.

Now our iOS and web codebases, while in different languages look very similar.

Confidential & Proprietary Real-time

Confidential & Proprietary Effortless offline persistence

Confidential & Proprietary What about our bot... It’s just another client!

● Listen to Firestore in the same manner? ○ Firebase Admin SDK is available for “server-side” use

● Mediate with pub/sub and Cloud Functions ○ SDKs have different capabilities based on language ○ Cloud Functions can listen with “wildcards” in their path

Confidential & Proprietary Confidential & Proprietary To data-processing pipeline WEBHOOK

Confidential & Proprietary Progressive Web Application Web Applications which deliver an app-like experience to their end-users.

● Offline-enabled ● Responsive ● Re-engageable ● Installable ● Linkable

Confidential & Proprietary Firebase Hosting + Cloud Firestore

● Static content, serve with Firebase Hosting ○ Load data client-side, just like native apps ○ Firebase SDKs give us a lot for “free”

● Firebase has server (admin) SDKs for clients like the chatbot ○ The chatbot is just “another client”, depending on language choice and data-structure pub/sub might be best choice

● Cloud Function triggers + pub/sub to hook up to rest of infrastructure

Confidential & Proprietary Data Processing Data Processing Architecture

Batch

Data Processing

Analytics Recommendation Engine

Streaming

Confidential & Proprietary Batch

Input from on-prem Input from on-prem Input from listing data, and data warehouse data warehouse analytics data

Product Catalog Product Catalog Image Recommendation Listing Ingest Ingest Engine Training

Output to cloud Output to cloud Output trained model database object store

Confidential & Proprietary Microbatch and Sync

High-level architecture

On-Prem Systems

Compute Data drop off Processing Online Database

Storage

Confidential & Proprietary GCE Filer

Pros: ● POSIX Compatible ● Integrates with many processing solutions Solution Trade-offs Cons: ● Self-managed ● Scaling complexity ● Always-on and billed Technologies for data drop off

Cloud SQL

Pros: ● Ideal for mirroring compatible SQL engines. Data drop off ● Transactional ● Hosted service Cons: GCE Filer ● Not ideal for data drop off, data management overhead. Compute Engine ● Scaling complexity ● Always-on and billed

Cloud SQL Cloud Storage

Pros: ● Serverless, pay as you go Cloud ● High throughput Storage ● Life cycle policies Cons: ● Object store not POSIX ● Object write consistency

Confidential & Proprietary GCE Filer

Pros: ● POSIX Compatible ● Integrates with many processing solutions Solution Trade-offs Cons: ● Self-managed ● Scaling complexity ● Always-on and billed Technologies for data drop off

Cloud SQL

Pros: ● Ideal for mirroring compatible SQL engines. Data drop off ● Transactional ● Hosted service Cons: GCE Filer ● Not ideal for data drop off, data management overhead. Compute Engine ● Scaling complexity ● Always-on and billed

Cloud SQL Cloud Storage

Pros: ● Serverless, pay as you go Cloud ● High throughput Storage ● Life cycle policies Cons: ● Object store not POSIX ● Object write consistency

Confidential & Proprietary Hadoop on GCE

Pros: ● Easy to move existing pipelines and ETL workloads. ● Integrates with many data sources Solution Trade-offs Cons: ● Self-managed ● Scaling complexity ● Always-on and billed Technologies for Data Processing

Cloud Dataflow Data processing Pros: ● Managed processing pipelines and auto-scaling. Hadoop ● Open source for Batch and Streaming Compute Engine processing models ● Can run locally, hosted or on Spark. Cons: ● New programming model if not familiar with Beam. Cloud ● Only supports Java and Python SDKs Dataflow

Cloud Dataproc

Cloud Pros: Dataproc ● Fully managed Apache Spark and Hadoop. ● Can run dataflow pipelines. ● Integrates with many data sources. Cons: ● Complex streaming model.

Confidential & Proprietary Hadoop on GCE

Pros: ● Easy to move existing pipelines and ETL workloads. ● Integrates with many data sources Solution Trade-offs Cons: ● Self-managed ● Scaling complexity ● Always-on and billed Technologies for Data Processing

Cloud Dataflow Data processing Pros: ● Managed processing pipelines and auto-scaling. Hadoop ● Open source Apache Beam for Batch and Streaming Compute Engine processing models ● Can run locally, hosted or on Spark. Cons: ● New programming model if not familiar with Beam. Cloud ● Only supports Java and Python SDKs Dataflow

Cloud Dataproc Cloud Dataproc Pros: ● Fully managed Apache Spark and Hadoop. ● Can run dataflow pipelines. ● Integrates with many data sources. Cons: ● Complex streaming model.

Confidential & Proprietary Streaming

Clickstream Data

Input from application analytics telemetry

Streaming Data Processing

Output to clickstream database

Confidential & Proprietary Solution Trade-offs Technologies for streaming data

Data Ingestion Data Processing

Kafka on GCE Cloud Cloud Cloud Compute Engine Pub/Sub Dataflow Dataproc

Cloud Dataflow Cloud Dataproc Kafka on GCE Cloud Pub/Sub Pros: Pros: Pros: Pros: ● Handles unordered data ● Familiar Spark SDK ● Simple Producer Interface ● Managed global service ● Managed and scalable ● Managed service ● High throughput ● REST API ● Integrations with Pub/Sub ● Lots of integrations ● Ordering within partitions ● Pay as you go and Kafka Cons: Cons: Cons: Cons: ● Complex streaming model ● No REST API ● No ordering of events ● New model for traditional ● No event time ordering ● Self-managed Spark Streaming devs. ● Scaling complexity

Confidential & Proprietary Solution Trade-offs Technologies for streaming data

Data Ingestion Data Processing

Kafka on GCE Cloud Cloud Cloud Compute Engine Pub/Sub Dataflow Dataproc

Cloud Dataflow Cloud Dataproc Kafka on GCE Cloud Pub/Sub Pros: Pros: Pros: Pros: ● Handles unordered data ● Familiar Spark SDK ● Simple Producer Interface ● Managed global service ● Managed and scalable ● Managed service ● High throughput ● REST API ● Integrations with Pub/Sub ● Lots of integrations ● Ordering within partitions ● Pay as you go and Kafka Cons: Cons: Cons: Cons: ● Complex streaming model ● No REST API ● No ordering of events ● New model for traditional ● No event time ordering ● Self-managed Spark Streaming devs. ● Scaling complexity

Confidential & Proprietary Batch and Streaming Architecture

Listing Data Cloud Firestore Mobile Users

Cloud Cloud Clickstream Data On-Prem Systems Pub/Sub Dataflow Cloud BigQuery

Compute Cloud Storage

Storage

Confidential & Proprietary Recommendation Engine Trade-offs

Content Based Collaborative Knowledge Filtering Based

Select most Behavior and Ask user for popular products pattern driven preferences

Confidential & Proprietary Recommendation Engine Trade-offs

Content Based Collaborative Knowledge Filtering Based

Select most Behavior and Ask user for popular products pattern driven preferences

Confidential & Proprietary Weighted Alternating Least Squares

● Method that uses large matrix factorization to derive similarities between sparse datasets. ● Useful for recommendation systems with very little initial user data like reviews or click data. ● Easy to run in distributed environment and optimize model performance with Tensorflow.

Confidential & Proprietary Solution Trade-offs Technologies for Machine Learning

Model Training Model Serving

Tensorflow/MXNet/Caffe Cloud Machine App Engine Cloud Machine Compute Engine Learning Flexible Learning

Tensorflow/MXNet/Caffe on GCE Cloud Machine Learning Custom Serving on App Engine Cloud Machine Learning

Pros: Pros: Pros: Pros: ● Control over framework. ● Fully managed Tensorflow ● Full control over serving ● Hosted solution, managed ● Good for existing models ● Automatic config of API. API for Tensorflow models. in other SDKs. GPU/CPU. ● Security with Endpoints. ● Integration with Cloud ML Cons: ● Integrated monitoring ● Easy to deploy. Cons: ● Self-hosted, operational Cons: Cons: ● Only supports Tensorflow overhead. ● Only supports Tensorflow ● Different workflow than ● API more opinionated traditional deployments

Confidential & Proprietary Solution Trade-offs Technologies for Machine Learning

Model Training Model Serving

Tensorflow/MXNet/Caffe Cloud Machine App Engine Cloud Machine Compute Engine Learning Flexible Learning

Tensorflow/MXNet/Caffe on GCE Cloud Machine Learning Custom Serving on App Engine Cloud Machine Learning

Pros: Pros: Pros: Pros: ● Control over framework. ● Fully managed Tensorflow ● Full control over serving ● Hosted solution, managed ● Good for existing models ● Automatic config of API. API for Tensorflow models. in other SDKs. GPU/CPU. ● Security with Endpoints. ● Integration with Cloud ML Cons: ● Integrated monitoring ● Easy to deploy. Cons: ● Self-hosted, operational Cons: Cons: ● Only supports Tensorflow overhead. ● Only supports Tensorflow ● Different workflow than ● API more opinionated traditional deployments

Confidential & Proprietary Recommendation Engine Architecture

Batch Processing Machine Learning

Listing Data Tensorflow Training Cloud Firestore Cloud Machine Learning

Cloud Cloud Dataproc Storage

Listing Data Tensorflow Model Serving Cloud Firestore App Engine Flexible

Confidential & Proprietary Data Processing Architecture

Listing/User Data Cloud Cloud Cloud Cloud Firestore Functions Pub/Sub Dataproc Mobile Users

Cloud Clickstream Data Dataflow Cloud BigQuery On-Prem Systems

Compute Cloud Tensorflow Training Storage Cloud Machine Learning

Tensorflow Model Serving App Engine Flexible Storage

Confidential & Proprietary Tooling Some definitions Proprietary + Confidential

Continuous Integration (CI) Merge all developer working copies to a shared mainline several times a day

Continuous Delivery (CD) Ensure that your code can be shipped at any time. Requires CI.

Continuous Deployment (also CD) Every change is automatically deployed to production. Requires Continuous Delivery.

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem CI/CD in practice Proprietary + Confidential

Workflows will vary by branch management, staging envs, tests, platform, etc

git build + deploy to trigger int tests CI flow unit test test env Local env, Feature branch/PR CD

build + artifact bake/ deploy to trigger int tests trigger sys tests unit test registry package qa/staging/prod Trunk, Release branch/tag

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem CI/CD Tools Proprietary + Confidential

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem CI workflow with Container Builder

triggers build and test runner Different stage types, can run any number, in any sequence, and in parallel GCR

GitHub BB CSR

GCS custom non-bazel container jobs CLI scripts builds builds commands manual CI systems current Result storage roadmap + dashboard

cron webhook registry Build status

PR Repo 3rd party dependency cloud Proprietary + Confidential Spinnaker: orchestrate deployment pipelines

Spinnaker Pipeline to promote and release image to production

Find image Deploy to prod Scale down Tag source and Destroy Start from test (red/black) Smoke test old prod manual approval old prod

Wait 30 mins Wait 2 hrs

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem Spinnaker integrations

Security

Build/Bake Providers

Triggers Monitoring

Storage Notification

Stages

Confidential & Proprietary Solution Trade-offs

Pro - Hosted Pro - Easy to get started - Intuitive UI, easy learning curve - Integrations with many deployment Con platforms - Only simple deployment flows are - Streamlined for complex deployments possible - Single tool for team to learn Con - No managed version (yet) - Does not support Firebase

Confidential & Proprietary Solution Trade-offs Recommendation: Container Builder

Pro - Hosted Pro - Easy to get started - Intuitive UI, easy learning curve - Integrations with many deployment Con platforms - Only simple deployment flows are - Streamlined for complex deployments possible - Single tool for team to learn Con - No managed version (yet) - Does not support Firebase

Confidential & Proprietary Container Builder build flow

GCS Logging/Monitoring

Key Questions: Which cloud services? How many resources? What key applications? How much log data do you generate? How long do you retain?

Pro Pro Pro - Native to GCP - Intuitive UI, easy learning curve - Familiar, industry standard - Integrated monitoring/logging - More integrations - Advanced log search functionality - Fully managed, works at scale - Cloud-based, managed service - Hybrid: 1 UI for on-prem/cloud - Trace, Error Reporting, Debug - Splunk Add-On for GCE Con Con Con - Monitoring only, need separate - Logging only - Switching costs, unfamiliarity solution for logging - Self-hosted - Higher licensing cost - Higher TCO cost

Confidential & Proprietary Logging/Monitoring Recommendation: Stackdriver

Key Questions: Which cloud services? How many resources? What key applications? How much log data do you generate? How long do you retain?

Pro Pro Pro - Native to GCP - Intuitive UI, easy learning curve - Familiar, industry standard - Integrated monitoring/logging - More integrations - Advanced log search functionality - Fully managed, works at scale - Cloud-based, managed service - Hybrid: 1 UI for on-prem/cloud - Trace, Error Reporting, Debug - Splunk Add-On for GCE Con Con Con - Monitoring only, need separate - Logging only - Switching costs, unfamiliarity solution for logging - Self-hosted - Higher licensing cost - Higher TCO cost

Confidential & Proprietary IAM/Admin

Key questions: What is the source of truth for your user credentials today? Require separate IT and Dev access?

Using On-Prem LDAP Authentication Using SSO (e.g. Okta)

Pro Pro Pro - Can provision accounts (one-way) with - No LDAP or sync to manage - Integrates with other tools GCDS - no programming - Google-stored credentials: Google - No additional authentication to set up - Can sync passwords (PasswordSync) manages authentication, stores - Can enforce MFA, custom security the passwords controls Con - Google two-factor authentication - Have to manage own LDAP Con - Requires LDAP admin skills + access Con - Single point of failure Google-only, does not integrate with other - Additional system to manage tools for SSO - No advanced Google security features

Confidential & Proprietary Recommendation: Cloud Identity, IAM/Admin Using on-prem LDAP with Google Auth

Key questions: What is the source of truth for your user credentials today? Require separate IT and Dev access?

Using on-prem LDAP Moving to Google Using SSO (e.g. Okta) Accounts

Pro Pro Pro - Can provision accounts (one-way) with - No LDAP or sync to manage - Integrates with other tools GCDS - no programming - Google-stored credentials: Google - No additional authentication to set up - Can sync passwords (PasswordSync) manages authentication, stores - Can enforce MFA, custom security the passwords controls Con - Google two-factor authentication - Have to manage own LDAP Con - Requires LDAP admin skills + access Con - Single point of failure Google-only, does not integrate with other - Additional system to manage tools for SSO - No advanced Google security features

Confidential & Proprietary Deployment Key Questions: Are you on multiple platforms today? Do you have an Ops team to run a self-hosted service? Which deployment tools do you use today?

Deployment Manual scripting Manager

Pro Pro Pro - Integrated with GCP - UI, IAM, labels - Multi-cloud - Easier to get started - Managed service, No Ops - Familiarity - No new system to learn or install - New GCP functionality arrives - Declarative, immutable immediately Con - Declarative, immutable Con - Imperative, mutable - Non-native to GCP - Error-prone, toil intensive Con - Lag time to get new GCP functionality - Cannot roll back easily - GCP only - Self-hosted -> manage + pay for a server - Hard to automate - Need a separate platform for multiple - Not supported by GCP, no SLA clouds

Confidential & Proprietary Deployment Recommendation: Deployment Manager Key Questions: Are you on multiple platforms today? Do you have an Ops team to run a self-hosted service? Which deployment tools do you use today?

Deployment Manual scripting Manager

Pro Pro Pro - Integrated with GCP - UI, IAM, labels - Multi-cloud - Easier to get started - Managed service, No Ops - Familiarity - No new system to learn or install - New GCP functionality arrives - Declarative, immutable immediately Con - Declarative, immutable Con - Imperative, mutable - Non-native to GCP - Error-prone, toil intensive Con - Lag time to get new GCP functionality - Cannot roll back easily - GCP only - Self-hosted -> manage + pay for a server - Hard to automate - Need a separate platform for multiple - Not supported by GCP, no SLA clouds

Confidential & Proprietary Strategy - Forseti Open Source Toolkit

Google’s own cloud security tools; built and shared with everyone

Inventory Policy Scanning Enforcement

Build and store an inventory Find policy violations Apply changes

- Projects - Organization level - Inbound GCE firewall - Roles - Project level - ... - Members - .... - ….

Proprietary + Confidential © 2017 Google Inc. All rights reserved. Google and the are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. Security

Key Questions: Do you have the skill set and resources to write your own security tooling flow? Do you need custom tooling that isn't provided by Forseti?

Forseti Custom Tooling No Tooling

Pro Pro Pro - Open Source - based on Google's - More customizable No work to be done own tools, shared with everyone - Can still use Google managed - Integrated suite services (Audit logs -> Pub/Sub -> Con - Google (and community) investing Cloud Functions) Pretty insecure... heavily - new features on the way Con Con Have to code, configure and deploy Less customization yourself

Confidential & Proprietary Security Recommendation: Forseti

Key Questions: Do you have the skill set and resources to write your own security tooling flow? Do you need custom tooling that isn't provided by Forseti?

Forseti Custom Tooling No Tooling

Pro Pro Pro - Open Source - based on Google's - More customizable No work to be done own tools, shared with everyone - Can still use Google managed - Integrated suite services (Audit logs -> Pub/Sub -> Con - Google (and community) investing Cloud Functions) Pretty insecure... heavily - new features on the way Con Con Have to code, configure and deploy Less customization yourself

Confidential & Proprietary NoDev Google is an AI company

Google internal projects containing Brain Model Used across products:

AI First Unique project directories

Time

Confidential & Proprietary State of the Industry: Lack of Expertise

Very few users today can create a custom ML model. Data Scientists

To democratize AI, Developers we need to make AI accessible to millions more

The Reality of ML: Complex & Time Intensive

DATA TUNE ML MODEL PREPROCESSING ML MODEL DESIGN EVALUATE DEPLOY UPDATE PARAMETERS Cloud AutoML - automatically create Machine Learning Models

DATA TUNE ML MODEL PREPROCESSING ML MODEL DESIGN EVALUATE DEPLOY UPDATE PARAMETERS Google’s Machine Learning Ecosystem

Perception APIs Auto ML Training & Prediction Google-trained models Easy to use, composable, End to End Platform for for your app learn to learn technology your data & models

App Developers Analysts Data Engineers / Data Scientists

Using Machine Learning to Explore Neural Network Architecture - Quoc Le & Barret Zoph, Research Scientists, team Cloud AutoML Vision

Cloud AutoML Handbag Shoe Hat

g.co/next18

July 24-27, 2018 Serverless = No Ops, Right??

✓ Scalable Ops

Confidential & Proprietary Shrinking The Scaling Point

Your Your Code Code Your Your Code Fns Your Code Code PaaS Containers Virtual Machines Dedicated Physical Servers

Confidential & Proprietary Google Melbourne [email protected]