Cloud Applications CS 162 Lecture 23, Will Wang & David Culler

Nov 19th, 2019 Containers

Microservices + The Cloud

Container Orchestration

Summary Containers Review: what is a container?

- Provides light weight operating system virtualization - A “light weight virtual machine” is a good analogy but technically untrue

- A filesystem of binaries, libraries, and dependencies - Isolation from other processes and containers - via resource isolation () - via namespace isolation - PID namespace - Network namespace - Mount namespace - etc Containers Review: what is a container?

• Containers shares the host kernel, but App A App B have their own system binaries and Virtual libraries Machine Bin/Libs Bin/Libs Guest Kernel Guest Kernel { App A App B Container Hypervisor { Bin/Libs Bin/Libs Host Kernel Host Kernel

Hardware Hardware Containers

- Features on top of the basic OS cgroup containers - Images - Portable deployment - Versioning and management - Image layering - Reproducibility - Union file system - Tools and observability (docker logs, docker exec, etc) - Container lifecycle management (docker run, docker stop, etc) Containers Images FROM ubuntu:latest A compressed collection of the libraries, - MAINTAINER Edward Elric

binaries, and applications that make up a RUN apt-get update -y

container RUN apt-get install -y python-pip python-dev build-essential Defined by a file called the Dockerfile - RUN pip install uwsgi flask Very similar to a script and sets up the container - COPY transmute.py ~/transmute.py environment WORKDIR ~/ - Docker than compresses the environment into a RUN echo “1 soul” > sacrifice.txt binary CMD [“python”, “transmute.py”] - Docker offers tools and repositories (container registry) to build, store, and manage these images. Containers Images

- The image is divided into a sequence of layers. Each command creates a new layer on top of the previous layers. - Images with common commands will have common layers whose binaries they share. This is the union file system. Containers Command Docker CLI Docker Architecture Line Tool HTTP REST API Dockerd (docker daemon) responds to - User Level Dockerd commands from Docker CLI Management - It handles management of docker objects (images, containers, volumes, networks, etc) gRPC - Dockerd uses Containerd to manage the High Level container lifecycle Containerd Runtime - It handles image push / pull, namespace management, etc Exec - Containerd invokes a low level container Low Level runtime (runc) to create the container runc Runtime Containers Container Runtime

• A library that is responsible for starting and managing containers. • Takes in a root file system for the container and a configuration of the isolation configurations • Creates the cgroup and sets resource limitations • Unshare to move to own namespaces • Sets up the root file system for the cgroup with chroot • Running commands in the cgroup • runc by Docker, rkt by CoreOS, gvisor (runsc) by , LXC by Google / IBM

Image source: Cameron Lonsdale, runc Containers Creating a Container cgcreate -g : 1) Create the cgroup with cpu and memory controllers and path = UUID

$ UUID = $(uuidgen) $ cgcreate -g cpu,memory:$UUID $ cgset -r memory.limit_in_bytes = 100000000 $UUID $ cgset -r cpu.cfs_period_us = 1000000 $UUID $ cgset -r cpu.cfs_quota_us = 2000000 $UUID $ cgexec -g cpu,memory:$UUID \ > unshare -uinpUrf —mount-proc && \ > sh -c "/bin/hostname $UUID && chroot $ROOTFS $CMD” Containers Creating a Container 2) Set resource limitations for created group

$ UUID = $(uuidgen) $ cgcreate -g cpu,memory:$UUID $ cgset -r memory.limit_in_bytes = 100000000 $UUID $ cgset -r cpu.cfs_period_us = 1000000 $UUID $ cgset -r cpu.cfs_quota_us = 2000000 $UUID $ cgexec -g cpu,memory:$UUID \ > unshare -uinpUrf —mount-proc && \ > sh -c "/bin/hostname $UUID && chroot $ROOTFS $CMD” Containers Creating a Container unshare [options] [program [arguments]] 3) Unshare the indicated namespaces that were inherited from the parent process, then execute arguments $ UUID = $(uuidgen) $ cgcreate -g cpu,memory:$UUID $ cgset -r memory.limit_in_bytes = 100000000 $UUID $ cgset -r cpu.cfs_period_us = 1000000 $UUID $ cgset -r cpu.cfs_quota_us = 2000000 $UUID $ cgexec -g cpu,memory:$UUID \ > unshare -uinpUrf —mount-proc \ > sh -c "/bin/hostname $UUID && chroot $ROOTFS $CMD” Containers

Creating a Container hostname [-fs] [name-of-host]

4) Change the cgroup’s hostname to the UUID $ UUID = $(uuidgen) $ cgcreate -g cpu,memory:$UUID $ cgset -r memory.limit_in_bytes = 100000000 $UUID $ cgset -r cpu.cfs_period_us = 1000000 $UUID $ cgset -r cpu.cfs_quota_us = 2000000 $UUID $ cgexec -g cpu,memory:$UUID \ > unshare -uinpUrf —mount-proc && \ > sh -c "hostname $UUID && chroot $ROOTFS $CMD” Containers chroot new-root [program [arguments]]

5) Change the cgroup's root to be the Creating a Container subdirectory at new-root, then execute arguments

“chroot /Users/cs162 /bin/ls” will change the root to / Users/cs162 and then execute /bin/ls which is actually $ UUID = $(uuidgen) /Users/cs162/bin/ls $ cgcreate -g cpu,memory:$UUID $ cgset -r memory.limit_in_bytes = 100000000 $UUID $ cgset -r cpu.cfs_period_us = 1000000 $UUID $ cgset -r cpu.cfs_quota_us = 2000000 $UUID $ cgexec -g cpu,memory:$UUID \ > unshare -uinpUrf —mount-proc && \ > sh -c "/bin/hostname $UUID && chroot $ROOTFS $CMD” Containers gVisor

• A new kind of low level container runtime. App / Container • In addition to creating the container, it also sandboxes System Calls the container as it runs. • Conceptually similar to a user space microkernel • A user space kernel that implements all the syscalls gVisor • Intercepts all syscalls the container makes and performs them in the secure user space kernel instead of in the actual kernel Limited System Calls • Defends against privilege escalation attacks and provides stronger container isolation Host Kernel • Written in Go for memory and type safety • Great for running untrusted applications, i.e. cloud serverless applications Containers gVisor

• Intercepts syscalls through ptrace • Linux syscall that allows one process to “control” another • gdb uses it to step through instructors, gVisor to intercept syscalls • Can also use kernel based virtual machines (KVM) • Experimental feature that requires hardware support • Pushes the idea of what is considered a container and what is a virtual machine Containers

Microservices + The Cloud

Container Orchestration

Summary Microservices + The Cloud

The Road Ahead

• Application packed into containers • Great isolation and resource management • Dependencies packaged into the container makes them easy to deploy • Now we will see how modern “cloud native” applications are architectured and deployed Microservices + The Cloud Historically… Django Users Accounts • One big application did everything: Views • Manager users, accounts, data, etc Computation • Access to database Auth • Manager connections with clients • Perform computation • Using one big application is monolithic architecture DB Microservices + The Cloud Ingress Microservices

• As the system itself became more complex, ML Model API Server it becomes harder for one big application to handle everything • Developers began to break out components of the app into separate services, or Users Accounts microservices. • Microservices are a very new variant of service oriented architecture. Batch Compute DB DB Microservices + The Cloud Service Oriented Architecture • SOA is a software design paradigm • Large applications comprised of individual loosely coupled components called services • A service is like an interface • Its implementation is a black box to users of that service • Users just need to understand the service APIs and not its implementation • i.e, a service to manage accounts might have the APIs get_account(), deposit(), withdraw() • Contact the service through its long lived abstract name and its consistent API • http://accounts-service/deposit?account=12345&amount=5&id=67890 • Complex applications strung together from a collection of services Microservices + The Cloud Microservices • A form of service oriented architecture • Loosely coupled • Standardized API contract and independent black box implementations • Long lived hostname to refer to the service • Cloud native - designed to be run in the modern cloud application • Implemented by nearly identical replicas (aside from versioning) • Microservice load balance to one of possibly many replicas • Containerizable • Replicas are packaged as containers • Resource requirements are reasonably definable • Independent - services can be independently developed, tested, deployed, and maintained Microservices + The Cloud Ingress IngressIngress Scalablility

• Gives you the ability to horizontally scale ML Model API Server any of the microservices depending on their specific load.

• If one service is under heavy load, you can Users Accounts simply run more replicas of that particular Users service.

BatchBatch ComputeBatch DB DB ComputeCompute Microservices + The Cloud Independent Implementations

• Each microservice can be specialized and implemented with the best language, framework, tools for the job. • The architecture is pluggable. A company does not even need to write all of its services. • Third party applications like MySQL, Elasticsearch, Redis are all microservices that can be easily integrated into your system. DB Microservices + The Cloud Business Organization • Each microservice is independent and can live in its own repository and be owned by a dedicated team of engineers. • Services maintain API contract between each other, often done with versioning. • If the implementation changes, other services do not need to know as long as API stays consistent. • Eliminates dependencies between teams. DB Microservices + The Cloud Complexity

• Using microservices is hard. More services means more versions, more software to manage, more moving parts, more points of failure, etc. • Management of service lifecycle (orchestration), tracing bugs between services (distributed tracing), access control, fault tolerance, and more all really hard problems • Often not a good choice for start ups and younger companies • Airbnb, Dropbox, Lyft, etc known for being monolithic until very recently Microservices + The Cloud The Cloud

• Applications split into micro services • Each service has independent owners and dependencies • Services packaged as containers, which provide isolation, reproducibility, and dependency management • Containers deployed onto the cloud.

What is the cloud? Microservices + The Cloud The Cloud

• Cloud computing is the on-demand availability of computer system resources, especially data storage and computing power, without direct active management by the user. • Different kinds: • Software as a Service (SaaS): Dropbox, Slack, Zoom • (Paas): Databricks, Heroku, AWS S3 • Infrastructure as a Service (IaaS): AWS EC2, Google Compute Engine Microservices + The Cloud

• SaaS: • The product is software a company has developed. • Hosting and management is taken care of by the provider. • Customers interact with it through APIs or webpages • PaaS • The product is something you can use to build your own applications • Operating systems, runtimes, managed databases, web site hosting • Hosting and management often taken care of by the provider • IaaS • The product is the infrastructure you use to deploy your applications • Access to CPUs, VMs, block devices, networking, etc • You do your own hosting and most management of resources Microservices + The Cloud IaaS

Pros Availability - Cloud providers work really hard to make SLOs like 5 9s availability Scalability - In minutes, you can request from 1 VM to 1000s of VMs Quality - Providers extremely good at creating fault tolerant global scale infrastructure Specialization - No need to worry about running and maintaining data centers Cons Cost - Can often be more expensive per unit resource Security - Hardware may be untrusted and shared with others. Need to trust provider. Limited Control - can’t make low level optimizations, can’t access actual hardware Microservices + The Cloud Public Cloud

• AWS, Google, and Azure offer multi tenant public clouds • Purchase VM time by the second • VMs come with flexible amount of CPUs, memory, and even GPUs • Can also purchase block devices to attach to VM for storage (Elastic Block Store) • VMs connected to a private network for your account (Virtual Private Cloud) • Multi tenant: you share the underlying resources with other users • Your VM and other’s VM run on same hypervisor • Your block device is one amongst thousands distributed across the data center • Your packets sent across same switches as other users’s packets Microservices + The Cloud

Availability Region - where your data center is located VM Image - like container image, specify what OS and what dependencies. Instance Type - Allows you to select # of vCPUs, memory, and storage / network specifications

EBS - Amazon’s networked and replicated block storage device Spot / Preemptible instances - more affordable instances for short or fault tolerant workloads

VPC - Select your private network to start the instance on. Instances on separate VPCs cannot communicate.

Select the availability zone - zones are separate facilities and do not share utilities, or networking for fault tolerance. Security Groups allow you to configure inbound and outbound network rules for your instance.

This instance only allows SSH traffic on port 22 from any source. Once your instance is ready, make sure you have the correct private key and use this command to SSH into your instance.

Your console will show you all the EC2 instances you have in your account. From within the instance, you can do anything - including using root privileges. Microservices + The Cloud Private Cloud • Similar to public cloud, but resources are limited to a controlled set of users • Use full for really large customers like governments • Use full for sensitive or heavily regulated workloads (military, healthcare, etc) • Single Tenant Cloud • Data center only supports a single customer • On Premise (on-prem): • Customer runs their own data center / hardware • Not cloud computing, but the traditional way of doing things • Can be more cost effective • Hybrid Cloud: • A mix on on-prem, private, and / or public clouds. Break Containers

Microservices + The Cloud

Container Orchestration

Summary Container Orchestration

y = 5x + 6 y = 4x - 3 Container Orchestration

x + 2y - z = 4 2x + y + z = -2 x + 2y + z = 2 Container Orchestration

Find y given P, C, F, N

1 1 F P = C (1 − ) + y (1 + y)N (1 + y)N Container Orchestration Container Orchestration

• About managing complexity • Applications split into micro services, services packaged as containers, containers deployed onto virtual machines hosted on the cloud • Deploying containerized micro service applications is like solving a 100 variable non linear equation - its really complex • Containers are scheduled and deployed onto virtual machines • Need to make sure VMs have enough available resources to for the container • Make sure services are networked and can communicate (DNS, iptables, etc) • Provision and mount storage the correct persistent storage devices • Manage the container lifecycle and respond to failures / crashes Container Orchestration Container Orchestration

• In a way, it provides a clean, easy to use abstraction for your physical data center resources • It manages the sharing of those resources • Resource limits, resource requests, container level isolation, private networks • It provides common services to the micro services and containers • Access to shared persistent storage, networking, authorization (RBAC) • Needs to do all of this in a way that is easy for developers to interface with Container Orchestration Container Orchestration

• In a way, it provides a clean, easy to use abstraction for your physical data center resources Referee It manages the sharing of those resources • Illusionist • Resource limits, resource requests, container level isolation, private networks • It provides common services to the micro services and containers • Access to shared persistent storage, networking, authorization (RBAC) Glue • Needs to do all of this in a way that is easy for developers to interface with

Can be thought of as an “operating system” for your data center! Container Orchestration

• Developed here at Berkeley in 2011 • Not quite like a container orchestrator as we know today, but more of a resource management framework • No container or service lifecycle management, not quite concerned with deployment • But similar ideas in terms of scheduling containers onto virtual machines • Before Mesos: • Rapid advancement of cloud computing resulted many difference frameworks like Hadoop, HBase, Cassandra, etc • But frameworks all had their own schedulers, and so were hard to manage and thus was common to have 1 cluster per framework • This led to difficulties managing shared resources and no statistical multiplexing Container Orchestration Apache Mesos • Key Contributions: • Abstract (“virtualize”) data center resources to frameworks • Make it easier to develop and deploy new frameworks • Hadoop is a map reduce framework and MPI is a message passing interface

Visuals: Ion Stoica Container Orchestration Apache Mesos

• A framework (Hadoop, HBase, etc) manages one or more jobs • Each job consists of one or more task. Jobs can be a distributed operation. • A task (map, reduce, etc) is implemented by one or more processes running on a single machine. Tasks run in LXC containers. • A Mesos slave runs on each machine and reports a vector of the available resources at that machine. • A Mesos master coordinates between the slaves its responsible for and the frameworks its serving. Container Orchestration

Slave 1 Hadoop Framework

Hadoop Task 1 Master Framework Scheduler s1: <1 CPU, 8 GB>

Slave 2

s2: <4 CPU, 16 GB> Container Orchestration

Slave 1 Hadoop Framework

Hadoop Task 1 Master Framework Scheduler s1: <1 CPU, 8 GB> s1: <1 CPU, 8 GB>

s2: <4 CPU, 16 GB>

Slave 2

s2: <4 CPU, 16 GB> Container Orchestration

Slave 1 Hadoop Framework

Hadoop Task 1 Master Framework Scheduler s1: <1 CPU, 8 GB> s1: <1 CPU, 8 GB> s1: <1 CPU, 8 GB> s2: <4 CPU, 16 GB> s2: <4 CPU, 16 GB> Slave 2

s2: <4 CPU, 16 GB> Container Orchestration

Slave 1 Hadoop Framework

Hadoop Task 1 Master Framework Scheduler s1: <1 CPU, 8 GB> s1: <1 CPU, 8 GB> s1: <1 CPU, 8 GB> s2: <4 CPU, 16 GB> s2: <4 CPU, 16 GB>

Slave 2

s2: <4 CPU, 16 GB> Container Orchestration

Slave 1 Hadoop Framework

Hadoop Task 1 Master Framework Scheduler s1: <1 CPU, 8 GB> s1: <1 CPU, 8 GB>

s2: <1 CPU, 8 GB> Slave 2

Hadoop Task 2

s2: <1 CPU, 8 GB> Container Orchestration Apache Mesos

• Not a container orchestrator - but can become one with libraries built onto of it (Aurora, Marathon). • Has many of the same goals of modern orchestrators today, arguably first successful open source “data center operating system” • Enjoyed wide spread industry adoption, even today. • Netflix famously uses it to orchestrate all of their containerized workloads • , AirBnb, and many more used it extensively or solely until recently • What did they switch to? Container Orchestration Modern Container Orchestrators • Swarm: Open source orchestrator by Docker with tight Docker integration • Mesos Marathon: relatively user friendly orchestrator created by Mesosphere • Amazon ECS (Elastic Container Service): proprietary orchestrator for AWS customers to deploy directly onto an AWS cluster • Tupperware: In course orchestrator used at Facebook, Instagram, and WhatsApp. Runs essentially all of FB’s workloads but little is known about it. • Borg: Started in 2003/2004 at Google and is arguably the oldest orchestrator. Runs nearly all of Google’s internal workloads. Its architecture and system design was finally published in 2015. Container Orchestration (k8s)

• Open source orchestrator created at Google and heavily based on the design of Borg. • Most popular orchestrator today by far, used almost everywhere • Lyft, Uber, Stripe, AirBnb, Twitter, Spotify, 61A… • Can use any containerization technology, but most common with Docker • I like to think of it as desired state management Container Orchestration Kubernetes API

• Nodes are the virtual (or physical) machines that Kubernetes can schedule containers onto. • Pods are the smallest unit of execution. It is a set of one or more containers that is always deployed together and onto the same node. • Deployments are an object that manages a set of identical pods. It is comprised of a specification for a pod and the number of replicas. • Services expose pods to the network and act as a load balancer for pods registered under it Container Orchestration

Nodes

Node 1 Node 2 Container Orchestration

Pod

Pod 1 Pod 3

Cont. Cont. 1 2

Node 1 Node 2 Container Orchestration

Deployment

Pod 1 Pod 2 Pod 3

Cont. Cont. Cont. Cont. Cont. Cont. 1 2 1 2 1 2

Node 1 Node 2 Container Orchestration

Service Service

Pod 1 Pod 2 Pod 3

Cont. Cont. Cont. Cont. Cont. Cont. 1 2 1 2 1 2

Node 1 Node 2 Container Orchestration Incoming Request

Service cs162.eecs.berkeley.edu service

Pod 1 Pod 2 Pod 3

Cont. Cont. Cont. Cont. Cont. Cont. Flask Web Server 1 2 1 2 1 2

Node 1 Node 2 Container Orchestration Incoming Request MySQL service

Webserver External Internal service Service Service

Flask Pod Flask Pod Flask Pod MySQL Pod

Cont. Cont. Cont. Cont. Cont. Cont. MySQL 1 2 1 2 1 2 Container

Node 1 Node 2 Container Orchestration Incoming Request Ingress

MySQL service

Webserver Internal Internal service Service Service

Flask Pod Flask Pod Flask Pod MySQL Pod

Cont. Cont. Cont. Cont. Cont. Cont. MySQL 1 2 1 2 1 2 Container

Node 1 Node 2 Persistent Storage Container Orchestration Ingress • An endpoint where outside traffic is allowed to enter your cluster, responsible for load balancing and routing AWS Network Load Balancer • Usually associated with an IP / hostname and implemented with a software load balancer, and used in association with a hardware load balancer from by the cloud provider • Requests go to the hardware LB, and then a software LB or gateway for routing • Highly performant and scalable architecture for most applications and use cases • In Kubernetes, programmable through a set of routing rules • i.e. if this path, then forward to this service Container Orchestration rules: - host: berkeleytime.com http: Routing with k8s paths: - path: /api/ Ingress backend: serviceName: berkeleytime-prod-svc • Use an ingress to route requests to the servicePort: 5000 correct backend service - path: / backend: • Useful for implementing multiple serviceName: frontend-prod-svc backends under the same hostname servicePort: 3000 - host: .berkeleytime.com • Example: Berkeleytime uses many http: routing rules in the ingress for paths: *.berkeleytime.com - path: / backend: • Routing based on domain, path, protocol, serviceName: jenkins-svc and more. servicePort: 9000 Container Orchestration Kubernetes

• Kubernetes manages all those resources, deploys your application, and manages the container lifecycle • The container lifecycle is the process of scheduling containers, downloading images, attaching networks and storage devices, starting and monitoring containers, and handling container or node failures / crashes • It the developer a clean abstraction to use the cluster resources • It does all of this through desired state management • Developers provide Kubernetes the desired state declaratively: I want these pods with these resource requirements to be placed under this service • Kubernetes brings the actual state to be equal to the desired state through reconciliation Container Orchestration Kubernetes • Kubernetes is divided into a centralized control plane which runs on the master and a data plane which runs on every node • Control plane handles all the logic of scheduling and resource management • One or more API servers, which are stateless servers responsible for handling requests from the developer • An etcd key value stored, used to store the desired state and other k8s objects • Multiple controllers, each responsible for controlling a different k8s object • A scheduler, responsible for scheduling pods to nodes according to some policy • Data plane executes decisions made by the control plane • A Kubelet (Kubernetes minion) that runs on every Node. Kubelet is responsible for executing operations on the node so that it matches the desired state for that node. Container Orchestration etcd

Node 1 N1: None

N2: None

Master Kubelet API Servers

Deployment: Controllers Node 2 3 Pods Scheduler

Kubelet Container Orchestration etcd Etcd is the single source of truth! Deployment: 3 Pods Node 1 N1: None

N2: None

Master Kubelet API Servers

Deployment: Controllers Node 2 3 Pods Scheduler

• etcd is an open source key value store by CoreOS running the Raft Kubelet consensus protocol. • Raft by Prof John Ousterhout, prof at Stanford and formerly Berkeley Container Orchestration etcd

Pod 1 Deployment: 3 Pods Node 1 Pod 2 N1: None Pod 3 N2: None

Master Kubelet API Servers

Deployment: Controllers Node 2 3 Pods Scheduler

Kubelet Container Orchestration etcd

Pod 1: N1 Deployment: 3 Pods Node 1 Pod 2: N1 N1: Pod 1 & Pod 2 Pod 3: N2 N2: Pod 3

Master Kubelet API Servers

Deployment: Controllers Node 2 3 Pods Scheduler

Kubelet Container Orchestration etcd

Pod 1: N1 Deployment: 3 Pods Node 1 Pod 2: N1 N1: Pod 1 & Pod 2 Pod 3: N2 Pod Pod N2: Pod 3 1 2 Master Kubelet API Servers

Deployment: Controllers Node 2 3 Pods Scheduler Pod 3 Reconciliation: Components keep a local version of the objects and watches the database every period for updates. Then changes local / Kubelet actual state to match the desired state. Containers

Microservices + The Cloud

Container Orchestration

Summary Summary Containers

• A container is comprised of an image (the binaries and dependencies) and some level of resource isolation (usually via cgroups) • Docker provides high level management and tooling for containers and images • Images are composed of layers - each layer can build on top of a preexisting layer. This enables Docker’s union file system. • A low level container runtime is responsible for actually starting the container • This is comprised of unsharing namespaces, setting resource quotas, changing the root file system, and executing the container’s command Summary Microservices

• Distributed cloud native applications built using service oriented architecture • Applications broken into loosely coupled microservices • Each service has independent owners, dependencies, and can choose this best language or framework for its implementation. • Services packaged as containers, which provide isolation, reproducibility, and dependency management. • Containers deployed onto a multi-tenant public (or private) cloud. Summary Container Orchestration

• Can be though of as an operating system for your data center or for the cloud. • Provides an abstraction around cloud resources and provides the interface to allow developers to deploy applications onto the cloud. • Manages the complexity of the container lifecycle and allocating / setting up hardware resources for the containers. • Many closed and open source solutions, but Kubernetes by far the most popular today. • K8s provides a rich set of objects (Nodes, Pods, Deployments, Service, Ingress, etc) and easily support a microservice architecture. Questions?