Elastic Microservice Environment Setup

Julian Hanhart Table of Contents

Management Summary ...... 1

Initial Situation ...... 2

Netstream AG ...... 2

TV as a Service Platform ...... 2

Problem Description ...... 3

Objectives ...... 4

Approach ...... 5

Evaluation of Orchestration Solution ...... 6

Candidates ...... 6

Unsuitable Solutions...... 6

Evaluation Criteria ...... 7

Weights ...... 7

Criteria ...... 7

Risks ...... 9

Ratings ...... 10

Evaluation ...... 11

Docker Swarm ...... 11

Kubernetes ...... 15

Apache Mesos + Marathon ...... 19

HashiCorp Nomad...... 23

Decision Matrix ...... 27

Ranking ...... 27

Proposal ...... 28

Decision ...... 28

Defining the Base Image...... 29

Candidate Distributions ...... 29

Red Hat Project Atomic ...... 30

The Canonical Distribution of ...... 30

CoreOS ...... 30

VMware Photon OS...... 30

Tested Distributions ...... 31

Automated Environment Setup ...... 32

Target Infrastructure ...... 32

Kubernetes Basics ...... 32

Kubernetes Components ...... 32

Networking ...... 33

Tools & Technologies ...... 33

Provisioning ...... 34 Kubernetes Setup ...... 34

Implementation ...... 35

Atomic Host + Ansible Playbook ...... 35

Kubernetes Anywhere ...... 37

Photon OS Provisioning ...... 38

Kubeadm ...... 41

Manual Kubernetes Setup...... 43

Canonical Kubernetes ...... 44

Next Steps ...... 45

Service Deployment...... 46

Next Steps ...... 46

Client Side Load Balancing ...... 47

Next Steps ...... 47

Conclusion ...... 48

Next Steps ...... 48

Glossary ...... 49

References ...... 50

Appendix A: Additional Documents ...... 57

Appendix B: Code ...... 57

Atomic Host + Ansible Playbook ...... 57

Kubernetes Anywhere ...... 62

Kubeadm on vSphere ...... 63

Kubeadm on Google Compute Engine ...... 71

Manual Kubernetes Setup ...... 78

Canonical Kubernetes ...... 87

Microservice Deployment ...... 96 Design and implementation of the automatic setup of an elastic microservice environment.

Management Summary

The objective of this project was to design the automated setup process for an elastic microservice environment. The resulting environment needed to be able to automatically place, manage and scale service applications running in containers. It would also need to be able to use the container images produced by the existing build process.

To achieve these objectives, several container orchestration solution were evaluated. Since all of these solutions would implement the features described above, the focus of the evaluation was not just their functionality, but also commercial considerations such as the available resources, the surrounding ecosystem, the market acceptance and the (commercial) support, as well as several risk factors. The clear winner of the evaluation was Kubernetes, which was very well rated in both the commercial and the technical criteria, with very few week points. On second place was the combination of and Marathon (most well known in the form of Mesosphere's commercial Datacenter Operating System or DC/OS), which did well in the technical criteria, but was hampered by its limited market adoption and small ecosystem. The last places were occupied by Docker Swarm and HashiCorp's Nomad, which both could not overcome their weak adoption and ecosystems with their decent technical ratings.

Once the adoption of Kubernetes was proposed to the development team and its management and subsequently approved, several approaches and tools to automatically setup the necessary virtual machines and software components for a Kubernetes cluster were tested. Multiple viable approaches for the automatic environment setup were found and several ways to setup working cluster identified.

With the resulting environment, the deployment of an existing microservice could be verified. However, further work will be needed to refine and finish the setup process. The project was successful as a proof of concept, but there are still some open issues that need to be addressed before the environment can be used in the daily development process.

1 Initial Situation

First, we shall briefly discuss the environment in which this project shall be conducted. Starting with an introduction to the company and the relevant product (and its technical architecture), we will then describe what the problems are which the project is supposed to tackle.

Netstream AG

Netstream is an Internet Service Provider that, among other things, provides solutions for connectivity, Voice over IP telephony, hosting and media streaming. It mostly serves small and medium sized businesses, but also has other Telecommunication and Internet Service Providers as customers.

The company is located in Dübendorf, Switzerland and has around 90 employees. It operates an in- house data center, where both the internal and the customer’s services are hosted.

TV as a Service Platform

The TV as a Service platform (TVaaS in short) is a hosted Linear TV and Video on Demand solution. It enables service providers and other customers to provide a branded triple play (internet, voice and TV) solution to their end users without the need to develop, operate, maintain and support the TV part of their solution themselves.

The TVaaS platform offers features such as: channel listings, program guide and broadcast details, streaming of linear TV channels, recording of individual programs, library of past programs on certain channels, Video on Demand store (VoD), streaming of VoD assets, searching for programs and VoD assets.

The platform consists of a host of backend services, accessible through a central web service, and clients for multiple platforms. Those clients are either set-top boxes (dedicated hardware boxes that are connected to the end user’s TV set) or over-the-top software applications that access the backend over the open internet, usually a web application and apps for the iOS and Android mobile operating systems.

The backend services of the TVaaS Platform are implemented as an Event-Driven Microservice Architecture [MF-MS]. Events are distributed through a message queue, with individual microservices only list to those events that are relevant to them [DZ-EDM]. Clients access the microservices through a common API Gateway that also authenticates and authorizes requests. The events are mostly either raised by the API Gateway in response to client requests, or by a third- party middleware in response to content imports from the content providers.

2 Figure 1. Overview of the TVaaS Platform Mircoservice Architecture

Problem Description

Our Continuous Integration server already automatically builds Docker images for the microservices and deploys them into the development environment. But currently, the development environment only consists of a single, dedicated Docker host server that has to host all microservice containers simultaneously. Deploying all microservices to a single host worked decently for a while, but with the ever increasing number of services, more and more issues surfaced over time. On multiple occasions, the whole development environment was brought down by problems with the Docker setup. And once Docker had problems on the "Dockerhost", all microservices hosted there were usually unreachable. Using only a single "Dockerhost" just does not really scale with any significant amount of microservices, is prone to errors and not very stable.

Furthermore, there is currently a discrepancy between how the development team operates their development environment and how the operations team operates the pre-production (and production) environment. While the development environment uses Docker containers to run the individual microservices, the other environments are setup much more traditionally with the microservices running as standard processes on standard Linux hosts. Setup and replication is done manually (deployment is partially automated through the CI-server), as are most other tasks.

All of this makes for a lot of manual effort and ensures that there are considerable differences between the development environment and all other environments. And the runtime behaviour of the development environment does not correspond to the one of a real distributed system either, which makes detecting and replicating runtime issues even more difficult.

3 Objectives

The objective of this project shall be to design a new, elastic development environment for the TVaaS platform. The new development environment shall be a distributed system that is able to automatically place applications on one of its member nodes, route requests to those applications automatically and scale them on demand.

The project shall employ the DevOps principles of Infrastructure As Code to ensure that the setup process for the new environment is repeatable, reviewable, versioned and cloneable. Therefore, the provisioning of virtual machines as well as the installation and configuration of the necessary software on those virtual machines shall be done automatically, with minimal user interaction. This provisioning automation shall be as portable as possible and the usage of vendor-specific tools for it shall be avoided. Ideally, it should be possible to execute the setup on both the IaaS-product of a public cloud provider and on the existing on-premise virtualization infrastructure.

The goal of the project shall be to provide a proof of concept for our next generation development environment. It will not be possible to fully replace the existing environment in the scope of this project, but it should be possible to start with the construction of the new environment using the results of this project. Further work will definitely be needed, but this project shall provided a viable basis to move forward on.

The project shall also help acquire the knowledge and skills needed to build an elastic microservice environment for our pre-production and (eventually) production environments. It shall also be used to evaluate and test various tools and products that would be necessary for such an endeavour.

4 Approach

To achieve our objectives for this project, we shall follow the following approach:

1. Evaluation of a Container Orchestration solution Since there already is a category of products that implement our requirements for the new development environment, we will first compare the existing solutions to each other and then select the one best suited for our purposes. The scheduling, management, replication and load balancing of application containers in a distributed system is commonly called "Container Orchestration" and one of major topics in the cloud world at the moment. Therefore, finding a suitable solution in one of the existing tool sets should be possible and building a custom solution should not be necessary.

2. Implementation of the Automated Provisioning Once we have decided on a container orchestration solution, we will implement an automated provisioning and setup process. We will select a base image to use for the virtual machines we want to provision and a set of tools that can be used to implement the provisioning and setup in a portable manner. Then we will implement the provisioning and setup of a cluster for the orchestration tool.

3. Deployment of Microservices (Proof of Concept) The next step will be to implement the deployment of an example microservice or two. Since our goal is not a complete environment, it should be sufficient to show how to deploy one of our existing services for now.

4. Investigation of Client Side Load Balancing Client Side Load Balancing could be helpful in the future, since it can be used to implement features such as the parallel operation of multiple versions of the same microservice, the gradual rollout of new service features, A-B testing and Feature Toggles. Therefore, we will also investigate how we could implement Client Side Load Balancing in the new environment.

5 Evaluation of Orchestration Solution

As a first step, we want to evaluate what container orchestration solution would make the most sense for us to use. Therefore, we will select to most promising candidates and define the evaluation criteria and their respective weights, before evaluating each candidate in detail.

Candidates

To be considered an eligible candidate, a container orchestration solution needs to be:

• Freely available, ideally as an open source project

• Operable on-premise in a private cloud setup

• Run applications in Docker containers without a specific technology stack

In the end, the following solutions were considered for the evaluation:

Table 1. Orchestration Candidates

Orchestration Solution Maintainer

Docker Swarm Docker Inc.

Kubernetes Cloud Native Computing Foundation

Apache Mesos + Marathon Mesosphere, Inc.

Nomad HashiCorp

Unsuitable Solutions

There are numerous other container orchestration solutions and Container as a Service (CaaS) platforms that were not evaluated in more detail because they do not fulfil our eligibility requirements. Some of the more well know solutions and the reasons why they were not considered any further are listed below.

Amazon EC2 Container Service Vendor and cloud service specific solution, only available in public cloud setup.

Azure Container Service Vendor and cloud service specific solution, only available in public cloud setup.

Google Container Engine Vendor and cloud service specific solution, only available in public cloud setup, based on Kubernetes.

Cloud Foundry Diego Can orchestrate Docker containers but more focused on applications based on the Cloud Foundry stack.

6 Evaluation Criteria

To evaluate the selected solutions, we will define commercial and technical evaluation criteria, as well as potentials risks. In order to differentiate the importance of the individual criteria, we shall define a weight for each criteria.

Weights

Weights will have a scale from 1 to 5. A weight of 1 being for the least important criteria and 5 for the most important ones.

Table 2. Definition of Weights

Weight Definition

1 Barely Important

2 Some Importance

3 Important

4 Very Important

5 Extremely Important ("Make or Break")

The weights of the individual evaluation criteria will be defined based on our own needs and challenges and shall reflect our assessment of their importance.

Criteria

First, we shall define evaluation criteria for commercial criteria and the available support and documentation for the solution. Criteria for the technical implementation of the solution shall be our second set of criteria.

Commercial Considerations & Support

Table 3. Commercial Considerations & Support

ID Criteria Description Weight

C1 Community & How much information, documentation and community 5 Available support is freely available? Resources What is the perceived quality of those resources?

C2 Ecosystem How big is the ecosystem around the solution? 5 How many organizations are there that integrate, host, support, extend or promote the solution? Is the solution extendable, are there available extensions?

C3 Market How many organizations have adopted the solution? 2 Acceptance Are there commercial service providers using, hosting or incorporating the solution?

7 ID Criteria Description Weight

C4 Commercial Does the maintainer offer first-party support? 2 Support Are there (preferably competing) third-party support providers? Is support & training available in Switzerland?

Since we are a rather small company and try to use open source technology wherever we can, freely and readily available resources (documentation & community support) are of the utmost importance to us. As is a healthy ecosystem around the solution, so that there are always multiple ways to solve a problem or to mitigate it. Because of that, the criteria C1 an C2 are considered extremely important.

To ensure the continued support and development of the solution and to have the possibility to enlist external help if major issues should arise, market acceptance and the availability of commercial support is important, but since resources are limited the criteria C3 and C4 can not be weighted to highly.

Solution Implementation

Table 4. Solution Implementation

ID Criteria Description Weight

S1 Feature Are all basic feature requirements (Cluster Management, 4 Completeness Workload Scheduling, Scaling, Service Discovery, Load Balancing) covered? What advanced features are supported?

S2 Technology How well is the solution designed and implemented? 4 Is it possible to adapt the solution without forking the project? Is there a form of Spring Cloud integration available [1: Netstream uses Spring Boot with Spring Cloud as the base framework for almost all microservices] ?

S3 Maturity Is the project ready for production use? 3

S4 Contributors How many organizations and individuals are contributing to 3 the solution? Is the development dependent on a single or very few key contributors?

S5 Security How well was security considered in the design and 3 implementation of the solution? Is there a distinct focus on security in the project?

Since all candidates must support the basic features required, any additional features can be very helpful but shall not be the all be-all and end-all criteria. A solid technological implementation is important to us, but shall not override everything else either. Therefore, S1 and S2 are considered very important criteria.

Maturity is of course important, but since the system will be used for the development environment

8 at first, there would be some time for the solution to mature until it could enter productive use. A diverse group of contributors would reduce the risk of a vendor lock-in and should increase quality of the community and the ecosystem around the solution. And security should naturally be an early and major consideration in the solution’s design. For these reasons, the criteria S3, S4 and S5 shall be considered important.

Risks

The risks that have a high impact and high likelihood of coming to pass shall be weighted higher than unlikely risks with low impact.

Table 5. Risks

ID Criteria Description Weight

R1 Abandonment How likely is it that the maintainer of the solution cancels 5 their support for it? Would there be other organizations that could take over the maintenance of the solution?

R2 Vendor Lock-In How difficult would it be to switch to another support or 4 service provider? Would it be necessary to make changes to the running system if one were to change providers?

R3 Complexity How steep is the initial learning curve to work with the 3 solution? How many individual components are necessary to operate the solution? How difficult is it to solve issues within the system?

R4 Development How often are new major versions of the solution released? 1 Velocity How important is it to upgrade to new versions? How difficult are version upgrades?

Since the container orchestration will very likely be an integral part of our company’s future infrastructure, abandonment of the selected solution would be disastrous and result in major expenses for the migration to another solution. Therefore, the risk R1 is considered of very high impact and making sure it is unlikely to occur should be extremely important to us.

Being dependant on a single supplier or partner for such an important system could also result in considerable expenses if the supplier were to suddenly raise prices. Since this risk can be considered of high impact and may have a decent likelihood of occurrence, we should treat risk R2 as very important.

An overly complex system with a steep initial learning curve could provoke up-front rejection and slow down adoption. Risk R3 should therefore be considered of medium impact and to have a medium probability of occurrence.

Very frequent version updates (combined with a difficult update process) could lead to trepidation

9 about updating the system and to a gap between the versions of the upstream project and the system in operation. While this is not particularly unlikely to happen, the impact would be quite low and there is no need to assign too much importance to risk R4 because of that.

Ratings

We shall rate each criteria on a scale from 1 to 5:

Table 6. Definition of Ratings

Rating Definition

1 Sufficient

2 Fair

3 Good

4 Very Good

5 Excellent ("Best of Breed")

Potential risks shall be scored on a reversed scale compared to the regular criteria (Meaning grave and probable risks will get a low score, while small risks would score highly).

Table 7. Definition of Risk Ratings

Risk Definition

1 Very High Risk

2 High Risk

3 Medium Risk

4 Low Risk

5 Barely Any Risk

10 Evaluation

Docker Swarm

• Maintainer: Docker Inc

• Website: https://docs.docker.com/engine/swarm/

• Source: https://github.com/docker

The cluster management and orchestration features embedded in the Docker Engine […]. Docker engines participating in a cluster are running in swarm mode.

— Docker Inc, https://docs.docker.com/engine/swarm/key-concepts/

A swarm is a group of machines that are running Docker and joined into a cluster.

— Docker Inc, https://docs.docker.com/get-started/part4/

Project

While Docker itself has been an open source project since 2013, Docker Swarm was first releases as a standalone container orchestration system in 2015. A year later, its functionality was integrated into the Docker Engine itself (starting with version 1.12 of Docker) [CJ-DH].

Therefore, Docker running in Swarm mode would be the more accurate name for Docker Inc's orchestration solution, but the older naming is still mostly used when discussing it. Having said that, he standalone version of Docker Swarm is still maintained for users that have not yet upgraded to a newer version of Docker.

Since development of "Docker Swarm" has been split up into SwarmKit (the toolkit used as the foundation of Swarm mode in Docker) and Docker Swarm (standalone), tracking the project’s contributors and velocity can be a bit tricky [SM-DS]. But the development seems to be mostly driven by Docker Inc itself, with 88 percent of contributions to SwarmKit made by Docker Inc employees [SA-SK].

Features & Technology

With the integration of Swarm mode into the Docker Engine, Docker now has the integrated ability to manage clusters and to orchestrate containers within those clusters. Therefore, developers can use a command line interface many of them are already familiar with to deploy and manage their containers.

Swarm mode provides a pretty complete set of features for container management and orchestration [D-SMF]:

• Cluster Management

11 • Service Discovery

• Load Balancing

• Scaling

• Security (enforces secure communications between nodes)

• Rolling Updates

• Decentralized Design (handling of differentiation between node roles at runtime)

• Declarative Service Model (declarative approach to define desired state of various services in application stack)

• Desired State Reconciliation (manager monitors cluster state and reconciles differences between actual state and desired state)

• Multi-Host Networking

Generally, Docker in Swarm mode seem to focus on providing easy and fast cluster setup and management using exiting and widely known tools and interfaces. Very large cluster setups over multiple regions or cloud providers do not seem to be supported very well (federation does not seem to be supported) [DF-R].

Adoption

Various surveys put market adoption of Docker Swarm at somewhere around 15 percent of their respective respondents (with some outliers going up to 30 percent). This would make it the second or third most widely used orchestration solution, behind Kubernetes and (depending on the survey, either before or behind) custom built solutions [OS-US], [CHQ-S], [T-GO].

At the moment, of the major public cloud providers, only Microsoft’s Azure Container Service (ACS) seems to support Docker Swarm as a container orchestrator [ACS-DS]. But currently, ACS does only support the old standalone version of Docker Swarm [SO-ADS]. However, support for Swarm mode is currently being made available in certain regions and should be more widely available in time [DG-DSA].

Docker Inc also provides a service called Docker Cloud, which can be used to provision Docker Swarm clusters to other major public cloud providers and supports Swarm mode (although Swarm mode is still in Beta) [D-C].

Assessment

Table 8. Docker Swarm: Commercial Considerations & Support

12 ID Criteria Assessment Rating

C1 Community & 500+K Google Results. 2 Available Resources 850+ tagged questions on Stack Overflow (750+ for "docker- swarm", 100+ for "docker-swarm-mode") [SO-DS], [SO-SM]. 22 book results in Amazon [A-DS]. A lot of resources are available for normal usage of Docker, but there could be more for Swarm mode. Since Swarm mode mostly uses the same APIs as Docker when running remote containers, there is enough information around, but finding specific information can be tricky. The split between Docker Swarm (standalone) and Docker in Swarm mode further increases the difficulty to find the right information. Docker Inc generally seems to be in the habit of renaming its services and products every so often, creating quite a bit of confusion and making it hard to find information that is up to date (i.e. what is the difference between "Docker" and "Moby"? is "Docker" now "Docker Community Edition" and "Docker Datacenter" now "Docker Enterprise Edition"?)

C2 Ecosystem Currently, Docker Swarm is mostly supported by Docker Inc 2 themselves. The focus on easy setup makes provisioning Docker Swarm clusters to a public cloud provider rather painless, but only Azure currently provides a hosted, turn-key solution. At the moment and for the foreseeable future, only Docker containers are supported and there is very limited extensibility. For instance, it was possible to use a specific Service Discovery backend in Docker Swarm (standalone), but it does not seem to be possible anymore with Swarm mode [D- SD], [DF-SD].

C3 Market Acceptance With around 15% (mostly), either the second or third most 3 used orchestration solution [OS-US], [CHQ-S], [T-GO]. Only one major provider seems to offer the solution as a hosted service (Azure).

C4 Commercial Commercial first-party support mostly through Docker Inc 3 Support with Docker Enterprise Edition. There are a few official Docker Partners in Switzerland offering consulting & training (i.e. Puzzle ITC) as well as other consultancies (i.e. Container Solutions, NobleProg), but few of them seem to offer Docker Swarm specific services.

Table 9. Docker Swarm: Solution Implementation

ID Criteria Assessment Rating

S1 Feature Pretty much all basic features necessary are supported, no 3 Completeness missing required features [D-SMF]. Some advanced features such as federation are currently not supported [DF-R].

13 ID Criteria Assessment Rating

S2 Technology Built with simplicity and efficiency as the main focus. 3 Adaptations would be difficult to implement without forking the project. Spring Cloud integration does not seem to be available.

S3 Maturity Some people claim that Swarm mode should be ready for 3 production, but only about half of its users in one survey actually seem to use it in production [OS-US]. Docker Inc is promoting the enterprise use of its platform in the form of the Docker Enterprise Edition, though.

S4 Contributors SwarmKit as such only has 36 individual contributors, 88% of 1 which work for Docker Inc. Therefore, the project is very much dependant on the company [SA-SK].

S5 Security Swarm mode enforces communication between nodes 4 through a secure channel or encrypts that communication [D- PKI], [D-NS]. Security seems to have been a major consideration for the design.

Table 10. Docker Swarm: Risks

ID Criteria Assessment Rating

R1 Abandonment Docker itself is very well established, but since the 2 development of its Swarm mode is very dependant on Docker Inc, abandonment could be a real possibility. There seems to be sufficient usage and media interest to make it unlikely at the moment, though.

R2 Vendor Lock-In Since Docker Inc is solely responsible for the development 1 and maintenance of Swarm mode, one would be very much dependant on the company for support and updates. Switching to another solution would probably be a considerable endeavour.

R3 Complexity One of the biggest advantages of Docker in Swarm mode 4 would certainly be its focus on simplicity. Therefore, the initial learning curve should be quite manageable. Since Swarm mode is build into Docker itself, there would also not be that many individual components to consider for troubleshooting.

R4 Development The development of Swarm mode is still ongoing and Docker 3 Velocity itself is also quite an active project. New major versions of Docker are released roughly every 6 months. Since it is included in most major Linux distributions, upgrades should usually be rather painless. Open Hub: Very High Activity, Krihelimeter: 56 (SwarmKit, 30.08.2017).

14 Kubernetes

• Maintainer: Cloud Native Computing Foundation

• Website: https://kubernetes.io/

• Source: https://github.com/kubernetes/kubernetes

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. Kubernetes builds upon 15 years of experience of running production workloads at Google, combined with best-of-breed ideas and practices from the community.

— Kubernetes Project, https://kubernetes.io/

Project

The Kubernetes [2: Greek for "helmsman" or "pilot" [WP-K]] project was initiated by Google and first released in 2014. Google’s stated goal in developing and releasing Kubernetes as Open Source was to boost its growing public cloud infrastructure business for external developers looking to run their applications in Linux containers [AQ-BOK].

In 2015 Google transferred stewardship of Kubernetes to the newly formed Cloud Native Computing Foundation (CNCF) [WP-K]. Although they have ceded control of the project, Google’s employees remain the most active contributors to Kubernetes. But other organizations have started to provide significant support to the project and now constitute over half of the contributions to Kubernetes [SA-K], [CNCF-PD]. Kubernetes has also been claimed to be one of the projects with the highest development velocity in the history of open source [CNCF-KP].

In August 2017, Amazon Web Services joined the CNCF. Which means that the development of Kubernetes is now supported by all five of the largest public cloud providers (Amazon Web Services, Microsoft Azure, Google Cloud Platform, IBM Bluemix, and Alibaba Cloud) [ZDN-AK].

Features & Technology

Kubernetes employs a rather wholesome approach to container management and orchestration. Its goal is to provide a complete set of concepts and tools to facilitate automated application deployment, management and scaling. Therefore, Kubernetes should be considered a platform rather than just a set of tools.

As such, Kubernetes aims to provide a complete set of features for this use case [K-F]:

• Naming and Service Discovery

• Replicating Application Instances

• Managing and Distributing Secrets and Configurations

• Resource Monitoring

15 • Load Balancing

• Horizontal Auto-Scaling

• Automated Rollouts and Rollbacks (including Rolling Updates)

• Co-Locating Helper Processes (enable composite applications while preserving one-application-per-container model)

• Application Health Checks

• Mounting Storage Systems

• Accessing and Ingesting Logs

• Application Debugging

• Authentication and Authorization

• Batch Execution

To implement its platform, Kubernetes incorporates a few existing projects (such as etcd for distributed data und configuration storage and fluentd for cluster-level logging), but also implements a host of specialized components (i.e. the kube-scheduler for the container scheduling or kublet as the primary node agent) [K-C].

Kubernetes also allows some components to be configured individually. For the container engine, the default configuration is Docker, but Kubernetes can also be used with rkt [K-R]. With the Container Runtime Interface, which is currently in incubation, support for various container runtimes (including Open Container Initiative (OCI) compatible runtimes) is forthcoming [K-CRI]. Another major component that is configurable in Kubernetes is the networking. Kubernetes uses the Container Networking Interface (CNI, [GH-CNI]) to define how the individual containers communicate with each other and there are multiple third-party plugins implementing specific networking approaches [K-CNI], [LM-CNI].

Adoption

At the moment, Kubernetes can certainly be considered to be market leader in container orchestration solutions. According to a few different surveys, between 30 and 45 percent of respondents were using Kubernetes for container orchestration, with the second most widely used solution trailing behind by a margin of 15 to 20 percent [CHQ-S], [OS-US], [T-GO].

Numerous major public cloud service providers offer hosted Container as a Service (CaaS) solutions based on Kubernetes [K-HS]. The most high-profile such solutions would probably be:

• Google: Google Container Engine

• Microsoft: Azure Container Service

• Red Hat: OpenShift Online

• IBM: IBM Bluemix Container Service

Furthermore, various minimal and cloud-focused Linux distributions provide first-party support and integration for Kubernetes, among others: CoreOS's Container Linux, Red Hat's Project Atomic (available in CentOS or Fedora flavour), SUSE's CaaS Platform (commercial) and VMware's Photon OS.

16 Assessment

Table 11. Kubernetes: Commercial Considerations & Support

ID Criteria Assessment Rating

C1 Community & 3.2+M Google Results. 5 Available Resources 4400+ tagged questions on Stack Overflow [SO-K]. 52 book results on Amazon [A-K]. Community seen as very open an inclusive [GS-K]. Numerous articles in the german-language IT press in the last two months alone (3 in Linux Magazin [LM-CNI], [LM-KP], [LM-KAI] and 2 as part of an ongoing series in iX Magazin [IX- K1], [IX-K2]).

C2 Ecosystem Numerous hosted and turn-key solutions available from 5 major public cloud providers [K-HS]. Support by many cloud-centric, minimal Linux distributions. Platforms & distributions with enhanced management and monitoring tools available (such as Red Hat’s OpenShift Origin and Fabric8, The Canonical Distribution of Kubernetes, SUSE's CaaS-Platform, CoreOS's Tectonic, etc). Application packaging possible with Kubernetes Helm, with application store at KubeApps. Extensible, for example with Istio for dynamic routing between microservices. Nearly all major cloud service providers and equipment vendors are members of the CNCF [CNCF-M]. Several projects implement frameworks for Serverless computing based on Kubernetes [MF-SA] (such as Kubeless, Funktion or Fission)

C3 Market Acceptance 30-45% adoption by users of orchestration solutions [CHQ-S], 5 [OS-US], [T-GO]. Basis of hosted container services by major cloud providers [K-HS]. Integration in various minimal Linux distributions

C4 Commercial Commercial first-party support mostly through platform 4 Support providers (such as CoreOS or Canonical). Numerous commercial platforms and distributions available. A handful of Swiss consultancy firms offer Kubernetes consulting & training (i.e. Container Solutions, NobleProg, Cyberlogic Consulting).

Table 12. Kubernetes: Solution Implementation

ID Criteria Assessment Rating

S1 Feature Complete set of features, no missing required features [K-F]. 5 Completeness No additional services needed to implement a complete orchestration solution.

17 ID Criteria Assessment Rating

S2 Technology "Builds upon 15 years of experience of running production 5 workloads at Google" [AQ-BOK]. Supports alternative container engines (rkt) [K-R]. Networking implementation can be adapted to individual requirements or preferences [LM-CNI]. Spring Cloud integration available [GH-SCK].

S3 Maturity Basis for many hosted CaaS solution by major cloud providers 4 [K-HS]. In productive use at many Fortune 500 companies [K-CS], [WB-K]. Survey at conference found nearly 69% of respondents are using Kubernetes in production [CNCF-S]. Was able to scale enormously during the launch of Pokémon GO [GCP-KP].

S4 Contributors Over 3000 individual contributors. 5 43% of commits by Google employees, 27% by independents, 15% by Red Hat [SA-K].

S5 Security Application level security policies through Pod Security 4 Policies [K-PSP]. Security Best Practices are available for the project [K-SBP]. Established process for disclosure of security issues [K-S]. Release 1.7 with focus on security enhancements and hardening [K-RN].

Table 13. Kubernetes: Risks

ID Criteria Assessment Rating

R1 Abandonment Mission critical for numerous organizations [K-CS], [WB-K]. 4 Google has passed stewardship over to CNCF (hosted by the Linux Foundation). Integrated and supported by many major cloud providers and integrators [K-HS]. Can probably be considered "too big to fail" by now.

R2 Vendor Lock-In Open platform, therefore no lock-in to Kubernetes itself. 4 Vast ecosystem of vendors and providers, therefore switching partners should be possible with reasonable effort.

R3 Complexity High initial learning curve, quite a few concepts to learn in 2 the beginning. Many moving parts to understand, extensibility increases complexity (i.e. container format, networking).

R4 Development Usually new major releases every 3 months. 2 Velocity GitHub Rankings 2016: 2nd by Pull Requests, 10th by authors, 29nd by commits; Considered "one of the highest development velocity projects in the history of open source" [CNCF-KP]. Open Hub: Very High Activity, Krihelimeter: 7143 (30.08.2017).

18 Apache Mesos + Marathon

• Maintainer: Mesosphere, Inc / Apache Foundation (Mesos)

• Website: https://mesosphere.github.io/marathon/

• Source: https://github.com/mesosphere/marathon

Program against your datacenter like it’s a single pool of resources. Apache Mesos abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively

— Apache Mesos, https://mesos.apache.org/

Marathon is a production-grade container orchestration platform for […] Apache Mesos.

— Mesosphere, https://mesosphere.github.io/marathon/

Project

There are actually multiple container orchestration solutions that were built on top of Apache Mesos, but we’ll focus on Marathon here, since it is open source and appears to be the most widely used one.

Mesos itself is not a specialized orchestrator, but rather a general solution to distribute work loads in computer clusters. It has been around for quite some time (even tough version 1.0 was not release until Summer 2016) and is in productive use in many major companies [WP-M]. It is a very popular solution to build the infrastructure for massive Big Data information processing. While Mesosphere, Inc seems to be the main contributor, the development of Apache Mesos is also supported by some major IT companies (such as IBM and Microsoft) [AM-DC].

Mesosphere built Marathon on top of Mesos to implement a dedicate container orchestration solution. Maintenance of Marathon seems to be done mainly by Mesosphere, with few outside contributions [GH-M]. Mesos + Marathon is also the basis for their open sourced CaaS platform offering, which is called Datacenter Operating System (DC/OS).

Features & Technology

Marathon is a framework for Apache Mesos that builds upon its cluster management functionality to launch and scale long running application and to expand it for container orchestration [GH-M], [DO-M].

The Mesos + Marathon combination supports the following container orchestration features [M-M]:

• High Availability (active/passive cluster with leader election)

• Service Discovery

19 • Load Balancing

• Multiple Container Runtimes (Mesos containers (using cgroups) and Docker)

• Stateful Applications (can bind persistent storage volumes to application)

• "Beautiful and powerful" UI

• Constraints (i.e. only one instance of an application per rack, node, etc.)

• Health Checks

• Event Subscription (notifications to HTTP endpoint, i.e. to integrate external load balancer)

• Metrics

• Complete REST API

Marathon is can also be described as a "meta framework", since it can be used to start other Mesos frameworks (such as Chronos, a scheduler that is designed as a distributed and fault-tolerant replacement for cron) [M-M].

Since it builds upon Mesos, a Mesos + Marathon cluster could also be used for more generalized distributed computational workloads, such as Big Data or Machine Learning applications. But it would also make the acquisition of the required know-how to operate a Mesos cluster a necessity.

In September 2017, Mesosphere announced that they will integrate Kubernetes as an alternative container orchestrator into DC/OS. This could make it a viable base operating system for a Kubernetes based platform that also supports generic Mesos workloads and is commercially supported. The Kubernetes integration is currently still in Beta, however. [NP-MK]

Adoption

Apache Mesos as such is certainly one of the leading cluster management solutions on the market today and is being used by quite a few large organizations [NP-MK]. But market adoption of Mesos specifiably for container orchestration does not appear to be as high and seems to be somewhere between 7 and 14 percent (with outliers up to 18 percent, depending on the survey). Most surveys ask for "Mesos" in the abstract and do not specify if or what container orchestration framework is used with it, making informed statements about the market adoption of Marathon (or DC/OS) difficult [CHQ-S], [OS-US], [T-GO].

Usage of Mesos + Marathon does not seem as widespread as with other solutions, but Mesosphere is currently mainly targeting large corporations and does seem to have attracted several major customers for their DC/OS product, such as Verizon and Yelp [M-US], [NP-MK].

Similar to the situation with Docker Swarm, only Microsoft’s Azure Container Service seems to currently offer a Mesos + Marathon based, hosted container orchestration solution (using the open source version of DC/OS) [ACS-M]. Mesosphere does however offer templates to provision DC/OS clusters on the IaaS solutions of Microsoft Azure and Amazon Web Services.

20 Assessment

Table 14. Mesos + Marathon: Commercial Considerations & Support

ID Criteria Assessment Rating

C1 Community & 180+K Google Results. 3 Available Resources 400+ tagged questions on Stack Overflow [SO-M]. 2 book results on Amazon [A-M]. As an Apache project, the community around Apache Mesos seems to be quite active and engaging, but since the Marathon project does seem to be heavily dependant on Mesosphere, there does not appear to be a particularly big community around Marathon itself. The documentation provide by Mesosphere for Marathon and DC/OS does seem to be quite extensive and of good quality.

C2 Ecosystem At the moment, Marathon is mostly supported by Mesosphere 3 through their DC/OS product. Provisioning DC/OS clusters to certain public cloud providers through their templates appears to be quite easy, but only Azure currently provides a hosted, turn-key solution. Because of the Mesos base, the solution would be quite extensible and adaptable. This seems not be the case with Marathon itself, however. Support for multiple container runtimes was announced, but seems to be under development, still [NS-M], [M-C].

C3 Market Acceptance With 7-15% of the market, Mesos does not seem to be the 3 most widely used container orchestration solution. In could not manage to be in the Top 3 of the answers of any of the surveys [OS-US], [T-GO]. There are, however, several large corporate users of Mesosphere’s DC/OS [M-US], [NP-MK]. Only one major provider seems to offer the solution as a hosted service (Azure).

C4 Commercial Commercial support would most likely need to be provided 2 Support by Mesosphere, Inc. They offer consultancy services and paid online training courses. No consultancy or training providers based in Switzerland could be identified. Mesosphere does have an office in Hamburg, however [M-H].

Table 15. Mesos + Marathon: Solution Implementation

ID Criteria Assessment Rating

S1 Feature All basic features necessary are supported, no missing 4 Completeness required features [M-M]. Advanced features outside of container orchestration supported.

21 ID Criteria Assessment Rating

S2 Technology Apache Mesos is widely adapted and major organizations are 3 supporting its development. Therefore the technology should be sound, but may be rather complex due to the many individual components. Thanks to the Mesos basis, the solution should also be highly adaptable. Mesosphere’s DC/OS now also supports Kuberentes (still in Beta, though) [NP-MK]. Spring Cloud integration available [GH-SCM].

S3 Maturity Apache Mesos is widely adapted, production-proven and 5 major organizations are supporting its development. Mesos + Marathon is also in productive use at some major corporate entities (in the form of Mesosphere’s DC/OS) [M- US].

S4 Contributors Apache Mesos has many corporate and individual 3 contributors, the bulk of the development is done by Mesosphere employees, though [AM-C]. The development of Marathon is very much dependant on Mesosphere, however (there are only 238 contributors to Marathon) [GH-M].

S5 Security Security does not seem to be a major focus of the project. 2 Some authentication and access control features appear to be reserved for Mesosphere’s DC/OS [M-A].

Table 16. Mesos + Marathon: Risks

ID Criteria Assessment Rating

R1 Abandonment Apache Mesos can probably be describe as "too big to fail" by 3 now and its development could be taken up by one of its many major corporate users. The same can not be said of Marathon, though. Outside of Mesosphere, there do not seem to be many candidates to take over the project. Mesosphere does appear to be very invested in the project, however.

R2 Vendor Lock-In Switching to another Mesos based container orchestration 2 solution might be possible, but a full orchestration layer change would probably be very difficult and costly.

R3 Complexity The initial learning curve for Apache Mesos would be pretty 1 steep. Additionally, due to the many individual components in a Mesos + Marathon setup, using a platform like DC/OS would most likely be a necessity. Even then, operational issue would probably require support by Mesosphere.

R4 Development Apache Mesos seems to be pretty stable and mature. 3 Velocity Marathon development does not seem to be overly ferocious either, major releases were generally published roughly every 5 months. Open Hub: Very High Activity, Krihelimeter: 333 (30.08.2017).

22 HashiCorp Nomad

• Maintainer: HashiCorp

• Website: https://www.nomadproject.io

• Source: https://github.com/hashicorp/nomad

Nomad is a single binary that schedules applications and services on Linux, Windows, and Mac. It is an open source scheduler that uses a declarative job file for scheduling virtualized, containerized, and standalone applications.

— HashiCorp, https://www.nomadproject.io/

Project

Nomad was initially released by its maintainer HashiCorp in late 2015 [GH-NCL] as part of the HashiCorp Suite of DevOps and cluster management tools. HashiCorp’s philosophy is based on building simple, modular and composable tools that follow the Unix philosophy of building dedicated tools that do one job and do it well [HC-T]. Nomad's role in HashiCorp’s tool set is to manage and schedule tasks and applications across nodes in a cluster.

Other tools that HashiCorp provides as open source are [WP-HC]:

• Vagrant, for creating and configuring portable, virtualized development environments

• Packer, for building virtual machine images for various platforms

• Terraform, for provisioning and managing infrastructure across multiple cloud providers

• Consul, for distributed service discovery and configuration

• Vault, for distributed secret storage and access control

Features & Technology

As mentioned above, HashiCorp's idea is to combine their tools with others (either their own or third-party built) to build a custom solution that integrates well in an existing environment. In the case of Nomad, the discovery and configuration service is predetermined to their own Consul, though.

Nomad does support various application or task formats, however [N-TD]:

• Docker Containers

• rkt Containers

• LXC Containers

• Qemu Virtual Machines

• Java Applications (packaged into a JAR file)

• Isolated or Raw Fork/Exec (exec with limited/unlimited resource access)

23 This means that Nomad can not just orchestrate containers, but also schedule (recurring) batch jobs and non-containerized applications. In combination with Consul, Nomad supports the basic features needed for the orchestration of containerized applications [N-H], [C-H]:

• Cluster Management

• Service Discovery

• Configuration Storage & Distribution

• (Manual) Scaling

• Rolling Updates & Deployments

• Failure Detection

• Support for Multi-Datacenter and Multi-Region Aware setups

HashiCorp's focus on simple and modular tools does mean that Nomad is simpler to set-up and configure and less resource intensive than many comparable solutions. But it also means that it can not provide the same amount of integration and not as many features out of the box. Some features can be added through additional components (such as secret management through Vault), but others are probably going to remain reserved for more complex platforms [IX-N].

One required feature that would take a bit more effort to implement with Nomad + Consul would be Load Balancing. Consul does provide some very limited load balancing through its DNS interface, but more advance use cases would either have to be solve on the client side or by using Consul Template to configure a dedicated load balancer [C-DNS], [GH-CT].

Adoption

Commercially, Nomad seems to have a very hard time achieving any significant market adoption. It was only a specific option in one survey we could find and only 5 percent of respondents reported using it in that survey, with only as few as 1 percent of respondents identifying it as their most frequently used orchestration tool [CHQ-S]. This would certainly make it the least used orchestration solution in our evaluation.

Furthermore, there does not seem to be a service provider using Nomad for a CaaS platform or offering Nomad as a hosted service. HashiCorp used to offer their own hosted infrastructure management service called Atlas that combined some of their open source tools (Packer, Terraform and Consul) [H-A]. But it appears that it never provided Nomad support and has since been split up into individual enterprise products [H-VC].

Assessment

Table 17. Nomad: Commercial Considerations & Support

24 ID Criteria Assessment Rating

C1 Community & 200+K Google Results. 2 Available Resources 10+ tagged questions on Stack Overflow [SO-N]. 1 relevant book results on Amazon [A-N]. The resources provided by HashiCorp appear to be of good quality and some of their other tools seem to have build a decent community, but the community around Nomad does not appear be very big or active. There recently was an introductory article about Nomad in the german language iX Magazin, though [IX-N].

C2 Ecosystem There do not appear to be any hosted Nomad solutions and 1 not much of an ecosystem around Nomad in general. The limited scope of Nomad itself would make it rather easily extendable and adaptable, but there do not seem to be many examples of people doing that.

C3 Market Acceptance With only 5% of organizations having even used Nomad, let 2 alone using it as their main orchestrator, it does appear to be one of the least used solutions available [CHQ-S]. The lack of service providers using Nomad for their products further solidifies that impression.

C4 Commercial HashiCorp seems to be planning to offer commercial support 2 Support for Nomad through a Nomad Enterprise product in the future, but at the moment they seem to focus on their other products. They do have a training partner in Munich [H-TP], but there does not appear to be a consultancy firm in Switzerland with much Nomad experience (there seem to be a few freelancers that have experience with the HashiCorp tool suite, though).

Table 18. Nomad: Solution Implementation

ID Criteria Assessment Rating

S1 Feature The basic features necessary are supported (some through 2 Completeness Consul), no missing required features. Some advance features such as multi-datacenter and -region support are provided, but the feature set is very narrow in general. Load Balancing would require additional effort to implement properly.

S2 Technology Because of Nomad's focus on simplicity, the setup and 5 operation of clusters should be rather easy and there should be less of a resource intensive management overhead than with other solutions. Various application and task types are supported (Docker, rkt and LXC containers, Java and native applications) [N-TD]. Adapting Nomad itself would probably require forking the project. Consul integration for Spring Cloud is official part of the project [SC-SCC].

25 ID Criteria Assessment Rating

S3 Maturity Very little evidence of anyone using Nomad in production (in 2 a survey, 5% of respondents indicated they had use Nomad in their organization, while only 1% stated it was their most frequently used orchestration tool) [CHQ-S]. "Stable" 1.0 release not yet reached (current version is 0.6) [GH-NCL].

S4 Contributors The Nomad project seems to be entirely dependant on 2 HashiCorp (there are only 192 contributors) [GH-N].

S5 Security Security seems to be an important consideration for 3 HashiCrop. There is a process to report vulnerabilities in place and they provide a guide on how to secure Nomad [H- S], [N-S].

Table 19. Nomad: Risks

ID Criteria Assessment Rating

R1 Abandonment Due to the small size of HashiCorp and the difficulties to 2 attract market share for Nomad, abandonment is a real possibility. For various reasons, they also have decommissioned projects and broken up products before [H-DO], [H-VC].

R2 Vendor Lock-In Since HashiCorp appears to be solely responsible for the 2 development and maintenance of Nomad, one would be very dependant on them for support and updates. Switching to another solution would probably require quite some effort.

R3 Complexity The simple design of Nomad would certainly be a major 5 advantage of Nomad. Since there are only two components (Nomad and Consul) and since only one executable is necessary for each, operating a Nomad based orchestration solution would probably be much easier than with a more complex solutions [IX-N]. The initial learning curve should also be much flatter because of that.

R4 Development There usually seems to be a gap of around half a year 3 Velocity between major releases. Updates should be rather easy due to the simplicity of the tool set. Open Hub: N/A, Krihelimeter: 832 (30.08.2017).

26 Decision Matrix

Based on the evaluation of the defined criteria (and their respective weights) for the solution candidates, a decision matrix was created to compare the candidates:

Figure 2. Evaluation Decision Matrix

Full Matrix: Appendix: Additional Documents

Ranking

From the decision matrix, we get the following ranking for the orchestration solution candidates:

Table 20. Candidate Ranking

Ranking Maintainer Solution Unweighted Final Rating Rating

1st Cloud Native Computing Kubernetes 54 191 Foundation

2nd Mesosphere, Inc Mesos + Marathon 37 127

3rd Docker Inc Docker Swarm 34 109

4th HashiCorp Nomad 33 108

In conclusion, we can say that Kubernetes was ranked first by a fair margin. The Mesos + Marathon combination was ranked second, with Docker Swarm and Nomad coming in last with near identical ratings.

Kubernetes scored very well in the Commercial Considerations & Support and the Solution Implementation criteria, where the wide support and expansive ecosystem were major contributing factors.

27 Mesos + Marathon also did well in those categories, mostly thanks to the efforts of its maintainer Mesosphere and the commercial adoption of DC/OS. Both Docker Swarm and Nomad could score points in the technical criteria, but their limited commercial adoption and support restricted them to the bottom of the ranking.

Proposal

After the evaluation of the orchestration solution, the results of the evaluation were presented to and discussed with the development team and its management. Since Kubernetes turned out to be the clear winner of the evaluation, it was proposed to base the implementation of the new development environment on Kubernetes.

Figure 3. Evaluation Proposal Presentation

Presentation: Appendix: Additional Documents

Decision

Some reservations about the complexity of Kubernetes and on how to do the networking in a Kubernetes cluster remain, but the broad ecosystem and the impression that Kubernetes has "won the Container Orchestrator wars" were compelling enough arguments to persuade the team that proceeding with Kubernetes as the orchestrator was the best option. Recent developments like AWS and Oracle joining the CNCF [TC-AWS], [IQ-O], Mesosphere integrating Kubernetes into DC/OS [NS-MK] and VMware’s announcement of the Pivotal Container Service [NS-PKE] further solidified that impression.

It was also decided that the focus of remaining project shall be on getting a cluster running as quickly as possible, so that the functionality and utility of the solution can be demonstrated to the team as soon as possible. Therefore, a simple setup shall be a main focus in selecting the distribution of the base image.

28 Defining the Base Image

The next step in building our new development environment was to define a base image to use for the automated provisioning of our environment.

To minimize the overhead needed for the base system, one of the various minimal, container- oriented Linux distributions shall be used. The distribution should provide pre-built Kubernetes packages through their software package management system to simplify the installation.

For simplicity's sake, only the absolutely necessary modifications (such as password setup) to the officially releases images shall be done before provisioning the machines to the virtualization infrastructure. More advanced approaches like pre-building the images specifically for the master and worker nodes (i.e. by installing the necessary software packages) using a tool such as HashiCorp's Packer shall be out of scope for this project, but should be considered for future refinements of the provisioning process.

Candidate Distributions

Most minimal Linux distributions with a focus on container deployments already provide a form of Kubernetes integration. We shall only evaluate the candidates rudimentarily, since the focus shall of this project shall be on the environment setup itself.

The following distributions or products were considered for the base system [CS-COS], [NS-COS]:

Table 21. Candidate Distributions

Vendor Distribution / Product

Red Hat Project Atomic (as CentOS or Fedora flavour)

Canonical The Canonical Distribution of Kubernetes

CoreOS Container Linux

VMware Photon OS

There are multiple other products that would fit our requirements [GH-ALC], but were not further considered for various reasons:

Table 22. Other possible Candidates

Vendor Distribution / Product Reason for Exclusion

Docker Inc LinuxKit Complexity / Time

Mesosphere Distributed Operating System (DC/OS) Kubernetes integration in Beta

Rancher Labs Rancher OS Complexity / Time

SUSE SUSE CaaS-Platform Commercial / Non-Free

29 Red Hat Project Atomic

Red Hat's Project Atomic hosts various projects that are part of their effort to re-designing their distribution around the principle of immutable infrastructure, using the Docker and Kubernetes.

The container operating system part of the project (called Atomic Host) consist of flavours for their commercial product and their open source variants CoreOS and Fedora. They use a tool called rpm- tree to manage bootable, immutable, versioned filesystem trees from their upstream RPM packaging system and to enable atomic system updates that can be rolled back [PA-I].

According to Red Hat, Project Atomic should provide a "happy medium" in between full enterprise distributions and more extreme approaches to lean OSes: "lighter than a traditional OS, but not as small as some of its competitors" [NS-COS].

The Canonical Distribution of Kubernetes

The Canonical Distribution of Kubernetes combines Canonical cloud-focused, minimal Ubuntu variant (called Ubuntu Cloud) with their cloud provisioning tool Juju and a pure, upstream version of Kubernetes to achieve fast and simple deployments of Kubernetes clusters to various public cloud providers and also to simplify on-premise deployments [C-CDK].

Both their Kubernetes distribution and Juju are freely available. Canonical does offer support, customization and training commercially, however.

CoreOS Container Linux

CoreOS was one of the pioneers in the container-focused, minimal Linux distribution space. They developed quite a few fundamental components of Kubernetes (such as etcd for the distributed state-storage and flannel for overlay networking) and have since attracted investments from Google's venture capital arm Google Ventures [NS-COS].

Their Container Linux is partially based on the Chromium OS project and employs a dual-partition system to be able perform cryptographically signed updates as a whole on the inactive secondary partition and to easily roll back updates. It also provides as various cluster management tools [WP- CL].

CoreOS also offers Tectonic as a commercial Kubernetes platform.

VMware Photon OS

Photon OS is VMware's offering in the minimal container OS space. It is optimized for VMware's vSphere platform, but freely available as open source. Photon OS's lifecycle management system is yum-compatible and package-based, but it can also support the image-based system versioning provided by Project Atomic's rpm-ostree [VMW-POS].

Because it is VMware's project, Photon OS's kernel is also optimized for their platforms and products. Since our target environment will mostly be running on vSphere, Photon OS might be an attractive option for our specific use case.

30 Tested Distributions

Originally, the idea was to select a distribution and to use it to implement the provisioning of the development environment. But because of some delays with the availability of the target vSphere environment and because of updates to that environment during the project, multiple different distributions were tested in the end.

At first, the CoreOS Atomic Host variant of Red Hat’s Project Atomic was used for local tests in VirtualBox and subsequently on the vSphere environment. But since the standard image of CentOS Atomic Host does not include VMware's open-vm-tools, the automatic setup of IP addresses for the provisioned virtual machines was not working out of the box. Therefore, the tests with CentOS Atomic Host were abandoned pretty quickly.

After that, the tests were mostly executed with Photon OS as the base system. Since it already included VMware's open-vm-tools, the automatic provisioning on vSphere was rather easy and simple. The only change to the official release for the VM template used for all subsequent tests was that the default password was changed after the initial machine was created from the official OVA- image.

However, during a system migration of the vSphere environment, The Canonical Distribution of Kubernetes was also tested on a public IaaS provider for a short time.

31 Automated Environment Setup

To setup our new development environment, we shall aim to automate the necessary steps as much as possible. Ideally no manual interaction should be necessary for the setup. This would enable us to repeat the setup for additional environments and make updates to the environment much easier. It would also enable us to setup and tear down environments whenever necessary and to scale out an environment with minimal adjustments.

Target Infrastructure

At our company, nearly all servers are already realized as virtual machines running on virtualization infrastructure in our own datacenter in the greater Zurich area. We are currently in the process of developing and introducing our own hybrid cloud product, but since the timeline for that project is currently still in flux, it was to risky to depend on this new environment to be ready on time for this project. Therefore, the decision was made (together with the operators of the virtualization infrastructure) to use the bare virtualization for our setup.

However, we shall try to refrain from using techniques that are specific to the virtualization product. The environment setup shall be as portable as possible and it should also be possible run the setup on the IaaS-product of a public cloud provider without too many adjustments.

The virtualization product we will be using to implement the automated environment setup will be VMware vSphere 6. A pre-production environment for the coming cloud product was kindly provided for our test by our operations engineers.

Kubernetes Basics

For a better understanding of the components that need to be set up for Kubernetes, we shall also briefly discuss the major components needed for a Kubernetes installation, as well as some of the networking basics for Kubernetes.

One term that will come up frequently when discussing Kubernetes is the "Pod". Pods are the smallest units of computing that can be defined, deployed and managed in Kubernetes [OS-P]. They are comprised of a single or a group of containers with shared storage and network, running on the same host. Kubernetes uses the concept of pods to model the common application design pattern of having multiple cooperating processes forming an application or service [K-P]. Depending on the application design, a pod could describe a whole application with back-end, front-end and data storage, but for our microservice architecture, pods will usually only contain one service container.

Kubernetes Components

To setup Kubernetes, the setup tools will need to setup both master and worker nodes. Therefore, we shall briefly discuss the necessary components for each node.

Master Node

A Kubernetes master node runs the components necessary for the cluster coordination and the scheduling of pods (they provide the cluster's control plane).

32 The most important Master Components for us are [K-C]:

• etcd: The cluster's backing store, where all data is stored

• kube-apiserver: Exposes the remote API and acts as the front-end for the control plane

• kube-controller-manager: Runs various controller process that handle routine tasks in the cluster

• kube-scheduler: Watches for newly created pods and selects a node to run them on

With most minimal installations, only one master nodes is set up for simplicity, but for productive, highly-available installations, multiple master nodes and separate etcd instances might be necessary.

Worker Nodes

In most setups, the master nodes will not run application pods themselves and there will be multiple worker nodes in the cluster.

Worker nodes run Kubernetes' Node Components [K-C]:

• kubelet: The primary node agent, which watches for and sets up the pods that have been assigned to the node

• kube-proxy: Maintains network rules and performs connection forwarding.

Networking

There are multiple ways to setup the networking between pods in Kubernetes. Kubernetes implements the Container Networking Interface (CNI) to allow various networking implementation to be used for a cluster setup [K-CNI]. The CNI plugin is usually setup after the cluster’s master and worker nodes were started and have joined the cluster.

The two CNI plugins we will be using in the scope of this project will be:

• Flannel: A simple overlay network that is mostly the default networking plugin used for standard Kubernetes installations.

• Project Calico: Provides networking and network policies and employs the same IP networking principles as the internet to connecting Kubernetes pods. Calico provides high-performance data center networking and can be deployed without encapsulation or overlays. It also provides fine- grained, intent based network security policy for Kubernetes pods [K-CN].

Tools & Technologies

To achieve the required automation, we will need to use tools that helps us with the automated provisioning of virtual machines and with the installation and setup of Kubernetes on those machines. We shall try out multiple tools if possible, to ge a sense of their individual strengths and weaknesses. But a comprehensive evaluation of tools for provisioning and setup would be too much effort for our purposes and shall be out of scope for this project.

33 Provisioning

Since our target infrastructure will be a vSphere cluster, the provisioning tool will need to supports vSphere's remote API. To be able to set up the environment on public cloud platforms as well, the provisioning tool shall also support the most important providers. Therefore, we shall only consider provisioning tools that support both vSphere and public cloud deployments.

HashiCorp Terraform

HashiCorp offers an open source provisioning tool that provides all the features we require. Terraform is designed to provision and manage infrastructure across multiple cloud providers. It supports the provisioning of virtual machines to vSphere and to virtually all major public cloud providers, as well as numerous other kinds of infrastructure resource types [T-P].

Terraform employs the Infrastructure As Code philosophy, where infrastructure resources are describe in a text-based, declarative configuration language. Terraform will then verify the resource description and apply the described changes to the environment.

The open source version of Terraform is already pretty fully featured and sufficient for the scope of this project. But HashiCorp also offers a commercial version called Terraform Enterprise that adds many collaboration features.

Canonical Juju

Juju is an application modelling and provisioning tool specifically for Canonical’s Ubuntu Cloud variant. It allows the setup and deployment of applications to be described in so-called "Charms", which can the be bundled in "Bundles" and deployed to various virtualization providers (including vSphere).

Juju is relevant for our project because it is used for the setup to The Canonical Distribution of Kubernetes. The Distribution itself is a set of Juju Bundles that combine Ubuntu Cloud with an upstream release of Kubernetes. There are various Bundles ranging from minimal to full-fledged, production-ready Kubernetes clusters.

Kubernetes Setup

For the Kubernetes setup, we will also need to evaluate various approaches and tools. The Kubernetes community documents and maintains quite a few different approaches to setting up Kubernetes [K-PRS], so we will have to select and test the most promising ones for our use case.

Ansible Playbook

The Kubernetes project provides an Ansible Playbook to install and setup a basic Kubernetes cluster on a set of existing machines. The Playbook only needs a list of the designated machines for the master and worker nodes, as well as some basic configuration to facilitate SSH access to the target machines. It will then install and setup the necessary components, create certificates and so on.

Kubernetes Anywhere

Kubernetes Anywhere is currently still the setup option highlighted in the Kubernetes

34 documentation for installations on vSphere [K-VS]. It uses a Photon OS template that needs to be copied to the target vSphere cluster and a Docker image that contains the necessary tools to start the machines and setup Kubernetes [GH-KAV].

Kubeadm kubeadm is a tool to bootstrap Kubernetes master nodes and to initialize and join worker nodes to the cluster. It requires the target machines to be provisioned and kubeadm to be installed beforehand. First, one initializes the master node and then the worker nodes are stared and joined to the master. kubeadm is maintained as part of the Kubernetes project and therefore released with Kubernetes releases [K-KA].

Manual Kubernetes Setup

Then, there is of course the possibility to manually install the necessary components on the master and worker nodes (i.e. through the distributions package manager) and to adjust the corresponding configuration files as necessary.

Implementation

For the implementation of the automated environment setup, the tools and methods discussed above where tested to setup a simple Kubernetes installation. Where not otherwise specified, the goal was to set up a Kubernetes cluster with one master node and (at least) three worker nodes and all machines where given static IP addresses for simplicity. As discussed in the Base Image chapter, Photon OS was used as the base system for most of the test because of its optimizations for VMware’s platform, but CentOS Atomic Host and Ubuntu Cloud was also used for some of the tests.

Atomic Host + Ansible Playbook

The first attempt was to use a CentOS Atomic Host virtual machines and the Ansible Playbook to setup the cluster. The VMs were created by manually stepping through the standard installer, setting up passwords and the static IP addresses, installing the necessary software and copying the worker node VMs. Eventually, the goal would have been to automated the setup process, but since Photon OS already provided all the necessary features that was not explored much further.

Once the target machines were ready, the Ansible Playbook (which is hosted in Kubernetes' Contrib repository) would theoretically only need the IP addresses of the target master and worker nodes and some configuration.

The target machines are specified the a simple configuration file called inventory under a folder of the same name:

35 Ansible "inventory" file

[masters] 172.26.4.240

[etcd:children] masters

[nodes] 172.26.4.24[1:3]

All further configuration would be set in the inventory/group_vars/all.yml file, where the SSH user and password needed for access to the VMs, the networking configuration, the addons to be installed (i.e. the dashboard) and more are configured.

The Playbook would then be run using the scripts/deploy-cluster.sh script. It would install and configure all the necessary components on the master and the nodes, create all the certificates and keys needed and so forth.

Unfortunately, the Playbook failed in our tests. It was able to install all the software and seemed to create the necessary secrets, but then failed on a dependency for the master’s certificate. Upon noticing that an issue for the this exact problem has been open and unanswered for weeks and that the officially supported Kubernetes version for the Playbook was still at 1.5.2 (with 1.7.4 being current at the time of the test), it was decided to defer from any further testing with the Ansible Playbook.

Project Atomic Deployment Automation

Before abandoning further testing with CentOS Atomic Host, some test for the automated provisioning were conducted. Atomic supports deployment automation through cloud-init, where setup information can be passed to a machine by mounting an ISO-image with the necessary configuration as a CD-ROM drive [PA-CI].

Instance configuration would be put into a meta-data file (example), while the user configuration would be set in a user-data file (example). Both those files would then be put into an ISO-image and mounted in the virtual machine. One issue with this approach is that the ISO-images need to be build for each machine individually. The building of the images could be automated, but that would be more difficult than with simple, text-based templates. To at least be able to pre-generate the necessary images for a set amount of master and worker nodes, a shell-script was implemented: generate_init_data_isos.sh

But the most severe problem with CentOS Atomic Host was the official images were missing VMware's open-vm-tools. Without these tools, setting up IP addresses while provisioning the VMs was not easily implementable. There would have been the possibility to build an image with the tools included [RH-CAI] or to setup a prototype VM and converting it to a vSphere Template, but since Photon OS already provided the necessary OVA machine templates and included the open-vm- tools, no further effort was invested.

36 Kubernetes Anywhere

Kubernetes Anywhere aims to provide a setup method that is portable across many deployment targets (cloud providers). For deployments on vSphere, the project provides a Photon OS based OVA- template that needs to be uploaded to the target vSphere cluster and a Docker-image that contains the necessary tools and scripts to provision the necessary machines using the template and to setup Kubernetes on those machines [GH-KAV].

An examination of the Docker-images also shows that Kubernetes Anywhere actually uses Terraform to provision the machines on vSphere. To start the deployment, one runs the Docker-image interactively and configures the deployment on the command line. The configuration is then saved and the deployment can be started.

Kubernetes Anywhere Deployment

$ sudo docker run -it -v /tmp:/tmp --rm \ > --env="PS1=[container]:\w> " \ > --net=host \ > cnastorage/kubernetes-anywhere:latest /bin/bash Unable to find image 'cnastorage/kubernetes-anywhere:latest' locally latest: Pulling from cnastorage/kubernetes-anywhere 89d79dde0755: Pull complete 5008fd85bbd7: Pull complete 456aae8d0d1a: Pull complete deef27d05d58: Pull complete e59bf3cae77e: Pull complete 645bae43021d: Pull complete dc614d052117: Pull complete 43472065c1ac: Pull complete Digest: sha256:e2ee18aa00b3b6f9ea7a6b8d1ee117384cc8bec11f100bc6f9aaca7941c04e45 Status: Downloaded newer image for cnastorage/kubernetes-anywhere:latest [container]:/opt/kubernetes-anywhere> make config CONFIG_="." kconfig-conf Kconfig * * Kubernetes Minimal Turnup Configuration * * * Phase 1: Cluster Resource Provisioning * number of nodes (phase1.num_nodes) [4] (NEW) ...

[container]:/opt/kubernetes-anywhere> make deploy util/config_to_json /opt/kubernetes-anywhere/.config > /opt/kubernetes- anywhere/.config.json make do WHAT=deploy-cluster make[1]: Entering directory '/opt/kubernetes-anywhere' ...

Unfortunately, Kubernetes Anywhere was not able to complete a deployment in our tests, either.

37 Terraform crashed with a message consistent with an issue that has been reported to the project, but remained unanswered during our testing period. Therefore, no further attempts were made using Kubernetes Anywhere and the focus was shifted to implement our own provisioning of Photon OS machines with Kubernetes.

Photon OS Provisioning

VMs created from the official OVA-template for Photon OS require an initial user interaction after startup to change the default password. Therefore, the first step for the provisioning of Photon OS machines with Terraform was to create a VM template from an initial machine after the password change on vSphere. The created template could then be used to provision both master and worker nodes with Terraform.

Terraform also allows simple machine provisioning through SSH once the VMs are created and running. In most cases this would included:

• Adding the common SSH public key for machine access to the authorized_keys file

• Installing the necessary software

• Copying the respective configuration files to the machine or adjusting the default ones

• Adding firewall rules to iptables

• (Re)starting the corresponding systemd services.

Terraform also allows the provisioning of individual resources multiple times, meaning that the worker node resources only needed to be defined once, with the actual amount of machines to be created just being a configuration value.

A typical Terraform resource declaration would therefore look like this:

Terraform Resource Declaration for Photon OS Machine

variable "vsphere-config" { type = "map" default = { server = "172.26.4.53" user = "[email protected]" password = "" datacenter = "pre-prod" datastore = "datastore1" cluster = "Cluster01" } }

variable "vm-config" { type = "map" default = { template = "photon-template" user = "root" password = "" network = "net-obb-mgmt"

38 gateway = "172.26.4.1" } } variable "nodes" { type = "map" default = { count = "3" name-prefix = "node-" } } variable "node-ips" { type = "list" default = [ "172.26.4.241", "172.26.4.242", "172.26.4.243" ] } variable "hosts-file" { type = "string" default = <

172.26.4.241 node-1.vsphere.local node-1 172.26.4.242 node-2.vsphere.local node-2 172.26.4.243 node-3.vsphere.local node-3 EOF } provider "vsphere" { user = "${var.vsphere-config["user"]}" password = "${var.vsphere-config["password"]}" vsphere_server = "${var.vsphere-config["server"]}"

# if you have a self-signed cert allow_unverified_ssl = true } data "vsphere_datacenter" "datacenter" { name = "${var.vsphere-config["datacenter"]}" } resource "vsphere_folder" "kubernetes" { path = "kubernetes" datacenter = "${data.vsphere_datacenter.datacenter.name}" } resource "vsphere_virtual_machine" "nodes" {

39 count = "${var.nodes["count"]}" name = "${var.nodes["name-prefix"]}${count.index + 1}" folder = "${vsphere_folder.kubernetes.path}" vcpu = 2 memory = 8192 datacenter = "${data.vsphere_datacenter.datacenter.name}" cluster = "${var.vsphere-config["cluster"]}"

network_interface { label = "${var.vm-config["network"]}" ipv4_address = "${var.node-ips[count.index]}" ipv4_prefix_length = "24" ipv4_gateway = "${var.vm-config["gateway"]}" }

disk { datastore = "${var.vsphere-config["datastore"]}" template = "${vsphere_folder.kubernetes.path}/${var.vm-config["template"]}" bootable = "true" type = "thin" }

connection { type = "ssh" user = "${var.vm-config["user"]}" password = "${var.vm-config["password"]}" }

provisioner "file" { source = "${path.module}/../../resources/photon_rsa.pub" destination = "/root/.ssh/authorized_keys" }

provisioner "remote-exec" { inline = [ "tdnf install -y kubernetes", ... ] }

provisioner "file" { content = "${var.hosts-file}" destination = "/etc/hosts" }

...

}

40 These resource and variable declarations can either be defined in a single file or in multiple ones. Terraform will create an execution plan from all the resource declarations in the working directory.

To provision the declared machines, Terraform must first be initialize in the working directory (only needed once to download the necessary modules). Then, the execution plan can be displayed, applied and destroyed again:

Terraform Usage

$ terraform init $ terraform plan $ terraform apply $ terraform destroy

A video demonstration of the usage of Terraform to provision a set of machines to the Google Compute Engine can be found here: https://youtu.be/rTvDZrMcjcQ

Kubeadm

The kubeadm tool is implemented as a single binary. Therefore, the first step is to install the tool. Fortunately, the Photon OS software repository already contains a version of kubeadm and the standard software installation process can be used.

Once installed, the tool is executed on the master to initialize the Kubernetes control plane. After it has executed its pre-flight checks, created certificates and so on, kubeadm actually uses the kubelet to start the Master Components in Docker containers [IL-KA]. When the control plane is set up, kubeadm generates a token for the worker nodes to use to join the cluster. This token can also be pre-generated and passed to the kubeadm master initialization (which we will do to achieve full automation).

41 Initializing master with kubeadm

# kubeadm init --config $KUBEADM_CONFIG generated token: "d97591.135ba38594a02df1" created keys and certificates in "/etc/kubernetes/pki" created "/etc/kubernetes/kubelet.conf" created "/etc/kubernetes/admin.conf" created API client configuration created API client, waiting for the control plane to become ready all control plane components are healthy after 21.451883 seconds waiting for at least one node to register and become ready first node is ready after 0.503915 seconds created essential addon: kube-discovery, waiting for it to become ready kube-discovery is ready after 17.003674 seconds created essential addon: kube-proxy created essential addon: kube-dns

Kubernetes master initialised successfully!

You can now join any number of machines by running the following on each node:

kubeadm join --token d97591.135ba38594a02df1 10.240.0.2

After the control plane was set up on the master nodes, the worker nodes can be provisioned and joined to the cluster. To ensure that Terraform only provisions the workers once the master is ready, a dependency between the master and worker node resources should be created in the Terraform declaration. It can either be an implicit dependency (i.e. if a dynamically created master IP address is used by the worker node resource) or an explicit one (depends_on=["vsphere_virtual_machine.master-node"]). The only thing necessary to join nodes after kubeadm was installed on the workers is to execute the join command with the token and the master's address:

Joining workers with kubeadm

# kubeadm join --token=$KUBEADM_TOKEN $MASTER_IP:$KUBEAPISERVER_PORT

Once all nodes are connected to the cluster, the networking can be setup by applying the CNI plugin (Calico in this case) on the master:

Networking Setup

# kubectl apply -f http://docs.projectcalico.org/v2.4/getting- started/kubernetes/installation/hosted/kubeadm/1.6/calico.yaml

During the course of this project, two kubeadm based setups were implemented, one using vSphere and the other using the Google Compute Engine (GCE). The GCE setup was adopted from the vSphere implementation to be used for the paper's presentation, since the vSphere environment was in

42 maintenance at the time and accessibility could not be guaranteed during the presentation. Only minor adoptions had to be done to the vSphere setup: Creating a new template from the official one was not necessary since the SSH public keys could be passed to the machines as GCE user data and since no static IP addresses could be used, the master's address had to be passed to the workers by Terraform once the master setup was finished.

The configuration files, setup scripts and Terraform resource declarations for both the vSphere and the GCE implementations can be found in the appendix. A video of the demonstration on GCE can be found here: https://youtu.be/rTvDZrMcjcQ

Generally, setting up clusters with kubeadm works pretty well and is relatively simple. However, the disadvantage is the difficulty in customizing the setup outside of the options provided by the kubeadm configuration file. Since the components installed by kubeadm are running in Docker containers, one cannot simply adjust the configuration files and restart the corresponding services. There are manifest files for the installed components at /etc/kubernetes/manifests/ and it should theoretically be possible to just adjust those to customize the implementation, but in our tests this did not work reliably. When trying to setup Calico for networking, we noticed that the Calico pods try to access etcd directly on the master. But since etcd is set up to only run on the loopback interface, Calico can not connect to it. We tried to use the manifest files to let etcd run on the public interface but this did not work properly. If the kubeadm installation should be further investigated, one would probably need to select another CNI plugin or see if newer kubeadm or Calico versions solve the issue. Furthermore, kubeadm is currently still in Beta and not recommended for productive installations.

Manual Kubernetes Setup

For the manual setup of Kubernetes, the basic idea was to follow the instructions given in the Photon OS documentation for running Kubernetes on Photon OS [P-KP]. Unfortunately, these instructions turned out to be missing some information and to be a bit lacking over all. This led to a lot of trial and error, but was also an opportunity to learn more about the inner workings of Kubernetes.

The first step would be to provision the Photon OS machines from the VM template and to install Kubernetes through Photon's packaging system. Then the IP addresses for the other members of the cluster would be distributed by writing them into the /etc/hosts file. After that the common configuration files (which mostly contain the master node's address) would be copied to the target machines.

On the master node, the configuration file for the kube-apiserver would be overwritten with a prepared one. A step that was omitted in the documentation but turned out to be necessary for the Kubernetes Dashboard to work was the creation of cryptographic certificates and keys for the cluster itself and for the service account creation. It was also necessary to add firewall rules for all the necessary ports to iptables. To use Calico for networking, it was also necessary to adjust the etcd configuration file to make etcd listen on the public network interface instead of the loopback. After that the Master Components would be restarted through systemd, the (uninitialized) worker nodes would be reported to the master and the Calico CNI plugin would be added.

On each of the workers, the kubelet configuration would be overwritten to set up the master node's IP address, the iptables rules would be added and Node Components would be restarted.

43 The configuration files, setup scripts and Terraform resource declarations for the manual setup can again be found in the appendix.

The manual setup of Kubernetes did result in a working cluster, but would probably not be ideal for long term use. Maintaining the setup instructions will probably be rather cumbersome and there are many small problems with the setup that would need to ironed out over time (i.e. that the Kubernetes Dashboard currently wont start because of the self-signed certificates).

Canonical Kubernetes

During a short downtime period of the vSphere environment because of maintenance, The Canonical Distribution of Kubernetes and Juju were briefly tested to setup a cluster on the Google Compute Engine as well.

After being provided with the necessary credentials, the Juju tool will setup a controller machine on the target environment which will then coordinate the provisioning of the desired applications bundles. Starting the deployment of a bundle then only needs an execution of the deploy command.

A full example of the setup process can be found in the appendix.

With the default bundle, Juju will setup a production ready cluster with 9 machines (one for the public key infrastructure, three for etcd, one for the master, one for a nginx load balancer and three worker nodes) and use fairly expensive machine types. But after roughly 15 minutes, one should have a working Kubernetes cluster. Smaller bundles and different setups (i.e. using Calico for networking instead of Flannel) are also available [K-KU].

Figure 4. Screenshot of Juju's Kubernetes Setup

44 While the default setup of Canonical's Kubernetes distribution does use quite a few machines, it would seem to be a very effective way to setup a sophisticated cluster quite easily and would also support updating and scaling a running cluster. One question that would require further investigation would be how to best add persistent storage capacity to a cluster. The recommended way of creating a separate Ceph cluster in Juju, adding the Ceph-nodes to Juju's storage and creating a Persistent Volume in Kubernetes from there seems a bit lacking [K-US]. For one it would require a lot of additional machines that would only have the purpose to provide storage. And since Kubernetes does support the use of vSphere storage through the vSphere Cloud Provider, there should be a less cumbersome way to add persistent storage than that. Therefore, the question would be how easily Canonical's distribution could be customized to incorporate the vSphere Cloud Provider. Canonical would of course be happy to sell support with setting up custom storage through one of their consulting packages, though [C-EK].

Unfortunately, we were not yet able to test The Canonical Distribution of Kubernetes on vSphere since Juju requires an IP address pool on vSphere to work, which was not available on the vSphere environment used for the tests at that time. But it is planned to try out Canonical's solution once this can be arranged.

Next Steps

During the course of this project, a few viable approaches to the automatic setup of a Kubernetes environment could be found. But there is still the need for some additional work:

• Some effort should be made to try to get the Calico networking to work with the kubeadm setup. kubeadm would be a very promising approach once it matures further.

• The manual setup should be further refined (i.e. improvement of the certificate generation)

• The Canonical Distribution of Kubernetes should be tested on vSphere once the test environment is given an IP pool.

45 Service Deployment

Since we are not yet moving any infrastructure components like databases or message queues into the new Kubernetes based environment, the only thing we currently need to adopt to deploy our existing microservices to Kubernetes is the configuration of the location of theses infrastructure components. Fortunately, the vast majority of our microservices are implemented using Spring Boot and such configuration values can therefore be passed to the services as system environment variables. Consequently, there is no need for adjustments to the Docker images we already build for our microservices and we can just pass the necessary values as environment variables to the pods.

Each microservice will require specifications for the pod deployment and its service mapping in Kubernetes. These specifications can be combined in a single YAML file (example for the Product Catalogue Service).

The deployment can then be execute using the Kubernetes command line client:

Deployment of Mircoservice

$ kubectl create -f service-spec.yml

Next Steps

Once the Kubernetes environment is stable, there are a few improvements that should be made to the microservice deployment and integration:

• The services should be better integrated into Kubernetes using Spring Cloud Kubernetes

• Configuration should be distributed through Kubernetes ConfigMaps

• Client Side Load Balancing should be implemented for microservice to microservice communication

• Infrastructure components should be deployed to Kubernetes as well

• Databases: MongoDB & PostgreSQL

• Message Queues: Apache Kafka

46 Client Side Load Balancing

The idea of Client Side Load Balancing (CSLB) is to have all instances of a microservice register themselves at a central service registry. Clients would then query the registry for the currently active instances and decide themselves, which instance to address [NS-CSLB]. The resulting process would look like this:

I. Every instance of every microservice registers itself at the service registry upon startup

II. Client services retrieve the list of available instances for the services they use (either periodically or event-based)

III. The client service receives a request

IV. It selects an instance of their backing service from the instance list it maintains and directs its query to that instance

V. The client service receive the response from the backing service instance, processes it and responds to its request

Since Kubernetes already maintains a list of all the running pod instances in its etcd backing store, we would only need to retrieve the pod list for the needed service from Kubernetes and to select an instance in order to implement CSLB for one of our services (i.e. the API Gateway).

Figure 5. Client Side Load Balancing

Fortunately, Red Hat’s Fabric8 project already implements a Kubernetes client and a Kubernetes integration for the popular Ribbon CSLB library. Fabric8's implementation currently only matches instances by namespace and application name, but for an initial implementation this does suffice.

Next Steps

To enable a more fine-grained filtering of the instance list, the CSLB implementation should also consider each pod’s labels (tags). That way, the CSLB can filter instances by metadata like versions, team affiliation or region. Therefore, it will probably be necessary at some point to extend the existing capabilities of Fabric8's implementation to also consider labels.

47 Conclusion

With this project, we do believe that we have proven the viability of automatically setting up a container orchestration environment as a basis for our new development environment. A good solution for the orchestration was found with Kubernetes and the necessary tools and solutions for the automation of the provisioning and setup of the environment could be tested as well. There are still various improvements to be made and some issues to investigate, but there should be no blocking impediments to the further develop of our next development environment on the basis of this project.

Kubernetes was the clear winner of the container orchestration evaluation and the recent momentum around the project indicates that this was the right decision. Nearly all major public cloud providers now support the development of the project and more and more orchestration products based on Kubernetes are being announced by major players in the cloud market.

HashiCorp's Terraform proved to be a decent tool to automatically provision and setup virtual machines. With it, the Infrastructure As Code principle could successfully be applied and no manual interaction is currently needed for the environment setup. It is also portable with minimal adoption, which could be proven by implementing nearly identical deployments on-premise and in the public cloud. With Canonical's Juju there is also another strong tool at our disposal.

For the setup of Kubernetes, the are various ways that have proven to be viable, but further investigation and refinement is needed before a final decision can be made on one. Our existing microservices can be deployed without significant adoptions and once there is a stable Kubernetes installation, more of them can be deployed into the new environment.

The project also allowed us two acquire a lot of know-how about containers, their orchestration and Kubernetes specifically. Together with what we learned about the operational challenges, this knowledge should continue to help us in building a great new development environment.

Next Steps

To build on the foundations laid out in this project, we will need to invest some additional efforts as well. For one, we will need to further refine our provisioning process and to settle on a final approach. This will also mean testing Canonical's Kubernetes distribution on-premise. To better integrate our own microservices we will also have to gradually adapt them once the environment is stable. It will also be necessary to have the Continuous Integration process deploy new application builds directly to Kubernetes. Once the new development environment is complete, we should also complete the integration of Client Side Load Balancing into both the API Gateway and the microservices to have more flexibility in our service routing and to enable more advanced features.

48 Glossary

TVaaS Netstream’s TV as a Service platform. Provides a hosted steaming solution.

Customer Netstream’s Customers are mostly other companies that want to provide a digital TV solution. The term is therefore only used to refer to those customers in this document and never for end users.

User Users are usually customers of Netstream’s customers. They are the end users using the client applications for the TVaaS platform.

API Application Programming Interface. A public or restricted interface through which the functionality of an application can be used by other applications.

REST Representational State Transfer. A type of HTTP based web service. Used by the IB.

VM Virtual Machine. On operating system running on a hardware virtualization platform, acting like a physical machine for all intents and purposes.

IaaS Infrastructure as a Services. A a form of cloud computing that provides virtualized computing resources publicly over the Internet or internally in an organization.

CaaS Container as a Services. A hosted solution that allows the deployment and orchestration of container based applications.

CNCF Cloud Native Computing Foundation. An organization founded to develop and promoted tools to support container based cloud applications. Hosted by the Linux Foundation.

OVA Open Virtual Appliance. A packaging standard for portable virtualization appliances and applications that can be used in VMware's vSphere for virtual machine templates.

49 References

Articles ▪ [MF-MS] James Lewis, Martin Fowler. 'Microservices' (https://martinfowler.com/articles/microservices.html), Martin Fowler. 25.03.2014.

▪ [DZ-EDM] Carol McDonald. 'Event Driven Microservices Patterns' (https://dzone.com/articles/event-driven-microservices-patterns), DZone. 04.08.2017.

▪ [CJ-DH] Christopher Tozzi. 'Docker at 4: Milestones in Docker History' (https://containerjournal.com/2017/03/23/docker-4-milestones-docker-history/), Container Journal. 23.03.2017.

▪ [SM-DS] Sreenivas Makam. 'Comparing Swarm, Swarmkit and Swarm Mode' (https://sreeninet.wordpress.com/2016/07/14/comparing-swarm-swarmkit-and-swarm-mode/), Sreenivas Makam’s Blog. 14.06.2016.

▪ [DG-DSA] Dimitris-Ilias Gkanatsios. 'Deploying a Docker Swarm Mode cluster on Azure Container Service' (https://dgkanatsios.com/2017/07/10/deploying-a-docker-swarm-mode-cluster- on-azure-container-service/), Dimitris-Ilias Gkanatsios Blog. 10.07.2017.

▪ [AQ-BOK] Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, John Wilkes. 'Borg, Omega, and Kubernetes' (http://queue.acm.org/detail.cfm?id=2898444), acmqueue. 02.03.2016.

▪ [CNCF-KP] Dan Kohn. 'Measuring the Popularity of Kubernetes Using BigQuery' (https://www.cncf.io/blog/2017/02/27/measuring-popularity-kubernetes-using-bigquery/), Cloud Native Computing Foundation. 27.02.2017.

▪ [ZDN-AK] Steven J. Vaughan-Nichols. 'Amazon jumps on Kubernetes bandwagon' (http://www.zdnet.com/article/amazon-jumps-on-kubernetes-bandwagon/), ZDNet. 09.08.2017.

▪ [GS-K] Puja Abbassi. 'Why Kubernetes or How Giant Swarm Builds Infrastructure' (https://blog.giantswarm.io/why-kubernetes-or-how-giant-swarm-builds-infrastructure/), Giant Swarm. 11.01.2017.

▪ [LM-CNI] Thomas Frike. 'Redebedarf - Container richtig vernetzen', Linux Magazin. August 2017.

▪ [LM-KP] Michael Kraus. 'Passgenau - Containercluster mit Prometheus überwachen', Linux Magazin. August 2017.

▪ [LM-KAI] Jonas Schneider. 'Lernfabrik - Kubernetes bei Open AI', Linux Magazin. August 2017.

▪ [IX-R] Erkan Yanar. 'Im Steigflug - Docker-Alternative Rocket', iX Magazin. Juli 2017.

▪ [IX-K1] Martin Gerhard Loschwitz. 'Schlau steuern - Einführung in Kubernetes, Teil 1: Logik und Terminologie', iX Magazin. Juli 2017.

▪ [IX-K2] Martin Gerhard Loschwitz. 'Bauanleitung - Einführung in Kubernetes, Teil 2: Kubernetes installieren', iX Magazin. August 2017.

▪ [MF-SA] Mike Roberts. 'Serverless Architectures' (https://martinfowler.com/articles/serverless.html), Martin Fowler. 04.08.2016.

▪ [WB-K] Vipin Chamakkala. 'Why Kubernetes is Foundational for Fortune 500 Digital Transformation (and the Cloud Infrastructure Management Landscape)' (http://www.work- bench.com/blog/2016/07/12/why-kubernetes-is-foundational-for-fortune-500-digital- transformation), Work-Bench. 21.07.2016.

50 ▪ [CNCF-S] CNCF. 'Survey Shows Kubernetes Leading as Orchestration Platform' (https://www.cncf.io/blog/2017/06/28/survey-shows-kubernetes-leading-orchestration-platform/), Cloud Native Computing Foundation. 28.06.2017.

▪ [GCP-KP] Luke Stone. 'Bringing Pokémon GO to life on Google Cloud' (https://cloudplatform.googleblog.com/2016/09/bringing-Pokemon-GO-to-life-on-Google- Cloud.html), Google Cloud Platform Blog. 29.09.2016.

▪ [K-SBP] Amir Jerbi, Michael Cherny. 'Security Best Practices for Kubernetes Deployment' (http://blog.kubernetes.io/2016/08/security-best-practices-kubernetes-deployment.html), Kubernetes Project. 31.08.2016.

▪ [K-RN] Aparna Sinha, Ihor Dvoretskyi. 'Kubernetes 1.7: Security Hardening, Stateful Application Updates and Extensibility' (http://blog.kubernetes.io/2017/06/kubernetes-1.7-security-hardening- stateful-application-extensibility-updates.html), Kubernetes Project. 29.06.2017.

▪ [AM-DC] Michael Park. 'Mesos Developer Community Status Report' (http://mesos.apache.org/blog/dev-community-status/), Apache Foundation. 21.07.2016.

▪ [DO-M] Mitchell Anicas. 'An Introduction to Mesosphere' (https://www.digitalocean.com/community/tutorials/an-introduction-to-mesosphere), DigitalOcean. 24.09.2014.

▪ [NP-MK] Timothy Prickett Morgan. 'Mesos Borgs Google’s Kubernetes Right Back' (https://www.nextplatform.com/2017/09/07/mesos-borgs-googles-kubernetes-right-back/), The Next Platform. 07.09.2017.

▪ [NS-M] Justin Clayton. 'Mesos will Support Multiple Container Formats with the Unified Containerizer' (https://thenewstack.io/mesos-simplifies-support-container-formats-unified- containerizer/), The New Stack. 14.03.2016.

▪ [M-H] 'Mesosphere opens 2nd office in Hamburg, Germany' (https://mesosphere.com/blog/mesosphere-opens-hamburg-germany-office/), Mesosphere, Inc. 09.05.2014.

▪ [IX-N] Sirk Johannsen. 'Immer in Bewegung - Container- und Microservice-Verwaltung mit Nomad', iX Magazin. Juli 2017.

▪ [H-A] Mitchell Hashimoto. 'Atlas General Availability' (https://www.hashicorp.com/blog/atlas- general-availability/), HashiCorp. 07.07.2015.

▪ [H-VC] Justin Campbell. 'Vagrant Cloud Migration Announcement' (https://www.hashicorp.com/blog/vagrant-cloud-migration-announcement/), HashiCorp. 31.05.2017.

▪ [H-DO] Mitchell Hashimoto. 'Decommissioning Otto' (https://www.hashicorp.com/blog/decommissioning-otto/), HashiCorp. 19.08.2016.

▪ [TC-AWS] Frederic Lardinois. 'AWS joins the Cloud Native Computing Foundation' (https://techcrunch.com/2017/08/09/aws-joins-the-cloud-native-computing-foundation/), TechCrunch. 09.08.201i.

▪ [IQ-O] Daniel Bryant. 'Oracle Joins CNCF, and Releases Kubernetes on Oracle Linux and Terraform Kubernetes Cloud Installer' (https://www.infoq.com/news/2017/09/oracle-joins-cncf), InfoQ. 13.09.2017.

▪ [NS-PKE] Scott M. Fulton III. 'Pivotal Container Service Hardwires Cloud Foundry, Kubo to

51 Google Cloud' (https://thenewstack.io/pivotal-container-service-hard-wires-cloud-foundry-kubo- google-cloud/), The New Stack. 30.08.2017.

▪ [NS-MK] Susan Hall. 'Mesosphere Returns to Kubernetes with a Beta for DC/OS' (https://thenewstack.io/mesosphere-returns-kubernetes-beta-dcos/), The New Stack. 07.09.2017.

▪ [CS-COS] Jonas Rosland . 'Container OS Comparison' (https://blog.codeship.com/container-os- comparison/), Codeship. 16.06.2017.

▪ [NS-COS] Susan Hall. 'Rise of the Container-Focused Operating Systems' (https://thenewstack.io/docker-fuels-rethinking-operating-system/), The New Stack. 27.01.2016.

▪ [PA-CI] Matthew Micene. 'Project Atomic - Getting started with cloud-init' (https://www.projectatomic.io/blog/2014/10/getting-started-with-cloud-init/), Project Atomic. 21.10.2014.

▪ [RH-CAI] Brent Baude. 'Red Hat Developer Blog - Creating custom Atomic trees, images, and installers – Part 1' (https://developers.redhat.com/blog/2015/01/08/creating-custom-atomic-trees- images-and-installers-part-1/), Red Hat Inc. 08.01.2015.

▪ [IL-KA] Ian Lewis. 'How kubeadm Initializes Your Kubernetes Master' (https://www.ianlewis.org/en/how-kubeadm-initializes-your-kubernetes-master), Ian Lewis’s Blog. 12.10.2016.

▪ [NS-CSLB] Richard Li. 'Baker Street: Avoiding Bottlenecks with a Client-Side Load Balancer for Microservices' (https://thenewstack.io/baker-street-avoiding-bottlenecks-with-a-client-side-load- balancer-for-microservices/), The New Stack. 02.10.2015.

Documents ▪ [CHQ-S] 'Container Market Adoption Survey 2016' (https://clusterhq.com/assets/pdfs/state-of- container-usage-june-2016.pdf), ClusterHQ. 16.06.2016.

▪ [OS-US] 'OpenStack User Survey' (https://www.openstack.org/assets/survey/April2017SurveyReport.pdf), OpenStack. April 2017.

▪ [D-SMF] 'Swarm mode overview - Feature highlights' (https://docs.docker.com/engine/swarm/#feature-highlights), Docker Inc. August 2017.

▪ [D-C] 'Swarms in Docker Cloud (Beta)' (https://docs.docker.com/docker-cloud/cloud-swarm/), Docker Inc. August 2017.

▪ [ACS-DS] 'Deploy Docker Swarm cluster' (https://docs.microsoft.com/en-us/azure/container- service/dcos-swarm/container-service-swarm-walkthrough), Microsoft. August 2017.

▪ [D-SD] 'Docker Swarm (standalone) discovery' (https://docs.docker.com/swarm/discovery/), Docker Inc. August 2017.

▪ [D-NS] 'Docker swarm mode overlay network security model' (https://docs.docker.com/engine/userguide/networking/overlay-security-model/), Docker Inc. August 2017.

▪ [D-PKI] 'Manage swarm security with public key infrastructure (PKI)' (https://docs.docker.com/engine/swarm/how-swarm-mode-works/pki/), Docker Inc. August 2017.

▪ [WP-K] 'Wikipedia: Kubernetes' (https://en.wikipedia.org/wiki/Kubernetes), Wikipedia. August 2017.

▪ [K-HS] 'Kubernetes - Picking the Right Solution: Hosted Solutions'

52 (https://kubernetes.io/docs/setup/pick-right-solution/#hosted-solutions), Kubernetes Project. August 2017.

▪ [K-CS] 'Kubernetes User Case Studies' (https://kubernetes.io/case-studies/), Kubernetes Project. August 2017.

▪ [K-F] 'Kubernetes - Why do I need Kubernetes and what can it do?' (https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/#why-do-i-need-kubernetes- and-what-can-it-do), Kubernetes Project. August 2017.

▪ [K-C] 'Kubernetes Components' (https://kubernetes.io/docs/concepts/overview/components/), Kubernetes Project. August 2017.

▪ [K-R] 'Running Kubernetes with rkt' (https://kubernetes.io/docs/getting-started-guides/rkt/), Kubernetes Project. August 2017.

▪ [GH-SCK] 'Spring Cloud Kubernetes' (https://github.com/fabric8io/spring-cloud-kubernetes), GitHub. August 2017.

▪ [K-CRI] 'Introducing Container Runtime Interface (CRI) in Kubernetes' (http://blog.kubernetes.io/2016/12/container-runtime-interface-cri-in-kubernetes.html), Kubernetes Project. August 2017.

▪ [K-CNI] 'Kubernetes - Network Plugins' (https://kubernetes.io/docs/concepts/cluster- administration/network-plugins/), Kubernetes Project. August 2017.

▪ [GH-CNI] 'CNI - the Container Network Interface' (https://github.com/containernetworking/cni), GitHub. August 2017.

▪ [CNCF-M] 'Cloud Native Computing Foundation - Members' ( https://www.cncf.io/about/members/), Cloud Native Computing Foundation. August 2017.

▪ [GH-AK] 'Awesome-Kubernetes - A curated list for awesome kubernetes sources' (https://github.com/ramitsurana/awesome-kubernetes), GitHub. August 2017.

▪ [K-PSP] 'Kubernetes - Pod Security Policies' (https://kubernetes.io/docs/concepts/policy/pod- security-policy/), Kubernetes Project. August 2017.

▪ [WP-M] 'Wikipedia: Apache Mesos' (https://en.wikipedia.org/wiki/Apache_Mesos), Wikipedia. August 2017.

▪ [AM-C] 'Apache Mesos - Committers' ( http://mesos.apache.org/documentation/latest/committers/), Apache Foundation. August 2017.

▪ [M-US] 'Mesosphere - User Stories' (https://mesosphere.com/case-studies/), Mesosphere, Inc. August 2017.

▪ [GH-M] 'Marathon' (https://github.com/mesosphere/marathon), GitHub. August 2017.

▪ [ACS-M] 'Deploy a DC/OS cluster' (https://docs.microsoft.com/en-us/azure/container-service/dcos- swarm/container-service-dcos-quickstart), Microsoft. August 2017.

▪ [M-DLB] 'Marathon - Service Discovery & Load Balancing' (https://mesosphere.github.io/marathon/docs/service-discovery-load-balancing.html), Mesosphere, Inc. August 2017.

▪ [GH-SCM] 'Spring Cloud Marathon' (https://github.com/aatarasoff/spring-cloud-marathon), GitHub. August 2017.

53 ▪ [M-A] 'Marathon - Authorization and Access Control' (https://mesosphere.github.io/marathon/docs/auth-access-ctrl.html), Mesosphere, Inc. August 2017.

▪ [GH-NCL] 'Nomad - Change Log' (https://github.com/hashicorp/nomad/blob/master/CHANGELOG.md), GitHub. August 2017.

▪ [HC-T] 'The Tao of HashiCorp' (https://www.hashicorp.com/tao-of-hashicorp/), HashiCorp. August 2017.

▪ [WP-HC] 'Wikipedia: HashiCorp' (https://en.wikipedia.org/wiki/HashiCorp), Wikipedia. August 2017.

▪ [N-TD] 'Nomad - Task Drivers' (https://www.nomadproject.io/docs/drivers/index.html), HashiCorp. August 2017.

▪ [C-DNS] 'Consul - DNS Interface' (https://www.consul.io/docs/agent/dns.html), HashiCorp. August 2017.

▪ [GH-CT] 'Consul Template' (https://github.com/hashicorp/consul-template), GitHub. August 2017.

▪ [H-TP] 'HashiCorp - Training Partners' (https://www.hashicorp.com/partners/#trainingpart), HashiCorp. August 2017.

▪ [SC-SCC] 'Spring Cloud Consul' (https://cloud.spring.io/spring-cloud-consul/), Pivotal Software. August 2017.

▪ [GH-N] 'Nomad' (https://github.com/hashicorp/nomad), GitHub. August 2017.

▪ [H-S] 'HashiCorp - Security' (https://www.hashicorp.com/security/), HashiCorp. August 2017.

▪ [N-S] 'Securing Nomad with TLS' (https://www.nomadproject.io/guides/securing-nomad.html), HashiCorp. August 2017.

▪ [GH-ALC] 'Awesome Linux Containers - Operating Systems' (https://github.com/Friz-zy/awesome- linux-containers#operating-systems), GitHub. August 2017.

▪ [PA-I] 'Introduction to Project Atomic' (http://www.projectatomic.io/docs/introduction/), Project Atomic. September 2017.

▪ [C-CDK] 'The Canonical Distribution of Kubernetes' (https://www.ubuntu.com/kubernetes), Canonical Ltd. September 2017.

▪ [WP-CL] 'Wikipedia: Container Linux by CoreOS' (https://en.wikipedia.org/wiki/Container_Linux_by_CoreOS), Wikipedia. September 2017.

▪ [VMW-POS] 'VMware Photon OS' (https://vmware.github.io/photon/), VMware, Inc. September 2017.

▪ [T-P] 'Terraform - Providers' (https://www.terraform.io/docs/providers/index.html), HashiCorp. September 2017.

▪ [K-PRS] 'Kubernetes - Picking the Right Solution' (https://kubernetes.io/docs/setup/pick-right- solution/), Kubernetes Project. September 2017.

▪ [K-VS] 'Kubernetes - VMware vSphere' (https://kubernetes.io/docs/getting-started- guides/vsphere/), Kubernetes Project. September 2017.

▪ [GH-KAV] 'Kubernetes Anywhere - Getting Started on vSphere' (https://github.com/kubernetes/kubernetes-

54 anywhere/blob/master/phase1/vsphere/README.md), GitHub. September 2017.

▪ [K-KA] 'Kubernetes - Using kubeadm to Create a Cluster' (https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/), Kubernetes Project. September 2017.

▪ [OS-P] 'OpenShift Core Concepts - Pods and Services' (https://docs.openshift.com/enterprise/3.0/architecture/core_concepts/pods_and_services.html), Red Hat. September 2017.

▪ [K-P] 'Kubernetes - Pods' (https://kubernetes.io/docs/concepts/workloads/pods/pod/), Kubernetes Project. September 2017.

▪ [K-CN] 'Kubernetes - Cluster Networking' (https://kubernetes.io/docs/concepts/cluster- administration/networking/), Kubernetes Project. September 2017.

▪ [P-KP] 'Running Kubernetes on Photon OS' (https://github.com/vmware/photon/blob/master/docs/kubernetes.md), GitHub. September 2017.

▪ [K-KU] 'Kubernetes - Kubernetes on Ubuntu' (https://kubernetes.io/docs/getting-started- guides/ubuntu/), Kubernetes Project. September 2017.

▪ [K-US] 'Kubernetes - Ubuntu Storage' (https://kubernetes.io/docs/getting-started- guides/ubuntu/storage/), Kubernetes Project. September 2017.

▪ [C-EK] 'Canonical Enterprise Kubernetes Packages' (https://assets.ubuntu.com/v1/Enterprise- Canonical-Kubernetes.pdf), Canonical Limited. September 2017.

Other Sources ▪ [SA-SK] 'Docker SwarmKit: Contribution by companies' (http://stackalytics.com/?metric=commits&project_type=docker-group&module=swarmkit), Stackalytics. August 2017.

▪ [DF-R] 'Answer to: Docker Swarm on Multi Regions' (https://forums.docker.com/t/docker-swarm- on-multi-regions/28661/2), . August 2017.

▪ [SO-ADS] 'Answer to: Is Swarm Mode going to be supported on Azure Container Service?' (https://stackoverflow.com/a/45613108/447218), GitHub. August 2017.

▪ [SO-DS] 'Stack Overflow - Tags: docker-swarm' (https://stackoverflow.com/questions/tagged/docker-swarm), Stack Overflow. August 2017.

▪ [SO-SM] 'Stack Overflow - Tags: docker-swarm-mode' (https://stackoverflow.com/questions/tagged/docker-swarm-mode), Stack Overflow. August 2017.

▪ [A-DS] 'Amazon results for Books in Computers & Technology: "docker swarm"' (https://www.amazon.com/s/ref=sr_nr_n_0?fst=as%3Aoff&rh=n%3A283155%2Cn%3A5&keyword s=docker+swarm), Amazon. August 2017.

▪ [DF-SD] 'Docker Community Forums: Swarm Mode and Service Discovery' (https://forums.docker.com/t/swarm-mode-and-service-discovery/16563/14), Docker Inc. August 2017.

▪ [SA-K] 'Kubernetes: Contribution by companies' (http://stackalytics.com/?project_type=kubernetes-group&release=all&metric=commits), Stackalytics. August 2017.

▪ [CNCF-PD] 'CNCF Projects Dashboard' (https://cncf.biterg.io), Cloud Native Computing

55 Foundation. August 2017.

▪ [SO-K] 'Stack Overflow - Tags: kubernetes' (https://stackoverflow.com/questions/tagged/kubernetes), Stack Overflow. August 2017.

▪ [T-GO] Chris Gaun. 'Tweet about Gartner survey about orchestration solutions usage' (https://twitter.com/Chris_Gaun/status/768891210702266368), Twitter. August 2017.

▪ [A-K] 'Amazon results for Books in Computers & Technology: "kubernetes"' (https://www.amazon.com/s/ref=sr_nr_n_0?fst=as%3Aoff&rh=n%3A283155%2Cn%3A5&keyword s=kubernetes), Amazon. August 2017.

▪ [K-S] 'Kubernetes Security and Disclosure Information' (https://kubernetes.io/security/), Kubernetes Project. August 2017.

▪ [M-M] 'Mesosphere Marathon' (https://mesosphere.github.io/marathon/), Mesoshpere, Inc. August 2017.

▪ [SO-M] 'Stack Overflow - Tags: marathon' ( https://stackoverflow.com/questions/tagged/marathon), Stack Overflow. August 2017.

▪ [A-M] 'Amazon results for Books in Computers & Technology: "mesos marathon"' (https://www.amazon.com/s/ref=sr_nr_n_0?fst=as%3Aoff&rh=n%3A283155%2Cn%3A5&keyword s=mesos+marathon), Amazon. August 2017.

▪ [M-C] 'Apache Issues - MesosContainerizer support multiple image provisioners' (https://issues.apache.org/jira/browse/MESOS-2840), Apache Foundation. August 2017.

▪ [N-H] 'Nomad - Easily Deploy Applications at Any Scale' (https://www.nomadproject.io/), HashiCorp. August 2017.

▪ [C-H] 'Consul - Service Discovery and Configuration Made Easy' (https://www.consul.io/), HashiCorp. August 2017.

▪ [SO-N] 'Stack Overflow - Tags: nomad' (https://stackoverflow.com/questions/tagged/nomad), Stack Overflow. August 2017.

▪ [A-N] 'Amazon results for Books in Computers & Technology > Operating Systems : "nomad"' (https://www.amazon.com/s/ref=sr_nr_n_18?fst=as%3Aoff&rh=n%3A283155%2Cn%3A5%2Cn%3 A3756&keywords=nomad), Amazon. August 2017.

56 Appendix A: Additional Documents

• Project Proposal (in German): PDF, Google Slides

• Evaluation - Decision Matrix: PDF, Google Spreadsheats

• Evaluation - Proposal Presentation: PDF, Google Slides

• Presentation of Paper (in German): PDF, Google Slides

• Demonstration Video from the Paper's Presentation: YouTube

Appendix B: Code

Atomic Host + Ansible Playbook

Ansible "inventory" File: inventory/inventory

[masters] 172.26.4.240

[etcd:children] masters

[nodes] 172.26.4.24[1:3]

Ansible Configuration: inventory/group_vars/all.yml

# This value determines how kubernetes binaries, config files, and service # files are loaded onto the target machines. The following are the only # valid options: # # localBuild - requires make release to have been run to build local binaries # packageManager - will install packages from your distribution using yum/dnf/apt source_type: packageManager

# will be used as the Internal dns domain name if DNS is enabled. Services # will be discoverable under ..svc., e.g. # myservice.default.svc.cluster.local cluster_name: cluster.local

# Set if you want to access kubernetes cluster via load balancer. The installer # assumes that a load balancer has been preconfigured or resolves to # kubenetes master master_cluster_hostname: k8s.dev.netstream.com

# External fqdn used for the cluster (certificat only) #master_cluster_public_hostname: public-kubernetes

57 # Port number for the load balanced master hostname. #master_cluster_port: 443

# Account name of remote user. Ansible will use this user account to ssh into # the managed machines. The user must be able to use sudo without asking # for password unless ansible_sudo_pass is set ansible_ssh_user: root

# password for the ansible_ssh_user. If this is unset you will need to set up # ssh keys so a password is not needed. ansible_ssh_pass: 1

# If a password is needed to sudo to root that password must be set here ansible_sudo_pass: 1

# A list of insecure registrys you might need to define insecure_registrys: - "dockerhub.dev.netstream.com" # - "gcr.io"

# Required for CoreOS. CoreOS does not include a Python interpreter. The # pre-ansible role installs a python interpreter to /opt/bin/. For more # information see https://coreos.com/blog/managing-coreos-with-ansible.html #ansible_python_interpreter: "PATH=/opt/bin:$PATH python"

# If you need a proxy for the docker daemon define these here #http_proxy: "http://proxy.example.com:3128" #https_proxy: "http://proxy.example.com:3128" #no_proxy: "127.0.0.1,localhost,docker-registry.somecorporation.com"

# Kubernetes internal network for services. # Kubernetes services will get fake IP addresses from this range. # This range must not conflict with anything in your infrastructure. These # addresses do not need to be routable and must just be an unused block of space. kube_service_addresses: 10.254.0.0/16

# Network implementation (flannel|opencontrail|contiv) networking: flannel

# External network # opencontrail_public_subnet: 10.1.4.0/24

# Underlay network # opencontrail_private_subnet: 192.168.1.0/24

# Data interface # opencontrail_interface: eth1

# Flannel internal network (optional). When flannel is used, it will assign IP # addresses from this range to individual pods. # This network must be unused in your network infrastructure!

58 flannel_subnet: 172.16.0.0

# Flannel internal network total size (optional). This is the prefix of the # entire flannel overlay network. So the entirety of 172.16.0.0/12 must be # unused in your environment. flannel_prefix: 12

# Flannel internal network (optional). This is the size allocation that flannel # will give to each node on your network. With these defaults you should have # room for 4096 nodes with 254 pods per node. flannel_host_prefix: 24

# Create a default Contiv network for providing connectivity among pods # networking: contiv must be set to use Contiv networking #contiv_default_network: true #contiv_default_subnet: 172.16.0.0/16 #contiv_default_gw: 172.16.0.1

# Set to false to disable logging with elasticsearch cluster_logging: true

# Turn to false to disable cluster monitoring with heapster and influxdb cluster_monitoring: true kube_version: 1.7.4

# Turn to false to disable the kube-ui addon for this cluster kube_ui: true

# Turn to false to disable the kube-dash addon for this cluster kube_dash: true

# Turn to false to disable the node_problem_detector addon for this cluster node_problem_detector: false

# Turn this varable to 'false' to disable whole DNS configuration. dns_setup: true # How many replicas in the Replication Controller dns_replicas: 1

# Certificate authority private key should not be kept on server # but you probably want to keep it to generate user certificates. Set # that value to "true" to keep ca.key file on {{ kube_cert_dir}}. # It's recommanded to remove the private key from the server. So if you set # kube_cert_keep_ca to true, please copy the ca.key file somewhere that # is secured, and remove it from server. kube_cert_keep_ca: false

# There are other variable in roles/kubernetes/defaults/main.yml but changing # them comes with a much higher risk to your cluster. So proceed over there # with caution.

59 # See kube documentation for apiserver runtime config options. Example below enables HPA, deployments features. #kube_apiserver_additional_options: # - --runtime-config=extensions/v1beta1/deployments=true

# To enable etcd auto cert generation set the following *_scheme vars to "https" etcd_url_scheme: "https" etcd_peer_url_scheme: "https" # For etcd client and peer cert authentication set these to true etcd_client_cert_auth: true etcd_peer_client_cert_auth: true

etcd_client_port: '2379' # when the scheme vars above are set to "https" you need to set these to true! flannel_etcd_use_certs: true apiserver_etcd_use_certs: true

Project Atomic cloud-init Instance Meta-Data: meta-data

instance-id: atomic-host001 local-hostname: atomic01.example.org

Project Atomic cloud-init User Meta-Data: user-data

#cloud-config user: photon password: kubernetes99 ssh_pwauth: True chpasswd: { expire: False }

ssh_authorized_keys: - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCyfQAErn1gJ6QMsoY/KfQdOIzaFogHpQvvNeutkzZpEmlEE9xz4mKDHV iS9nQSnRiAsnPOU8mSwL1v3JuQyyNrOf447enPPfIp34Y1I73R0wvLSd+LOHvNfoAAMIFCxKkMycGqnoa+ek2c KNI8roIPYBql0JrGmRir7HoCTsVEE92WmNZA9EHrGs7kGGjIF0nRWsZ/yNqoz/PeHp4ErxWvKFmfXoU5Ue4WoK WOctUuULj+TJuC/EoYtJzq4HJGJclFZJXJSqglgaO7JjL/VMkpOGUxAfBY4mtIh5IO71a5hxsHj5HNK/0/fI42 CKz6mGdV4BuBFSDJFau5wZXtBqO5 root

60 Project Atomic cloud-init ISO-Generation: generate_init_data_isos.sh

#!/bin/bash

NR_MASTERS=1 NR_WORKERS=3

if (( $1 >= 0 )) 2>/dev/null; then NR_MASTERS=$1 fi

if (( $2 >= 0 )) 2>/dev/null; then NR_WORKERS=$2 fi

WD="/tmp/atomic-isos" echo "Preparing generation of ISO-files for $NR_MASTERS Kubernetes master and $NR_WORKERS worker nodes to: $WD"

USER_DATA_YAML="../resources/user_data.yml"

META_DATA="$WD/meta-data" USER_DATA="$WD/user-data"

mkdir -p $WD cp $USER_DATA_YAML $USER_DATA

for (( i=1; i<=$NR_MASTERS; i++ )); do MASTER_NAME="kubernetes-master-$i" echo "##### Master Node: $MASTER_NAME" echo -e "instance-id: $MASTER_NAME\nlocal-hostname: $MASTER_NAME.local" \ > "$WD/meta-data" genisoimage -output "$WD/init-$MASTER_NAME.iso" \ -volid cidata \ -joliet \ -rock \ $META_DATA $USER_DATA done

for (( i=1; i<=$NR_WORKERS; i++ )); do WORKER_NAME="kubernetes-worker-$i" echo "##### Worker Node: $WORKER_NAME" echo -e "instance-id: $WORKER_NAME\nlocal-hostname: $WORKER_NAME.local" \ > "$WD/meta-data" genisoimage -output "$WD/init-$WORKER_NAME.iso" \ -volid cidata \ -joliet \ -rock \ $META_DATA $USER_DATA done

61 Kubernetes Anywhere

Kubernetes Anywhere Configuration File

# # Automatically generated file; DO NOT EDIT. # Kubernetes Minimal Turnup Configuration #

# # Phase 1: Cluster Resource Provisioning # .phase1.num_nodes=4 .phase1.cluster_name="ns-kubernetes" .phase1.ssh_user="" .phase1.cloud_provider="vsphere"

# # vSphere configuration # .phase1.vSphere.url="172.26.4.53" .phase1.vSphere.port=443 .phase1.vSphere.username="[email protected]" .phase1.vSphere.password="terraform99" .phase1.vSphere.insecure=y .phase1.vSphere.datacenter="pre-prod" .phase1.vSphere.datastore="datastore1" .phase1.vSphere.placement="cluster" .phase1.vSphere.cluster="Cluster01" .phase1.vSphere.useresourcepool="no" .phase1.vSphere.vmfolderpath="kubernetes" .phase1.vSphere.vcpu=2 .phase1.vSphere.memory=4096 .phase1.vSphere.network="net-obb-mgmt" .phase1.vSphere.template="KubernetesAnywhereTemplatePhotonOS" .phase1.vSphere.flannel_net="172.1.0.0/16"

# # Phase 2: Node Bootstrapping # .phase2.kubernetes_version="v1.6.5" .phase2.provider="ignition" .phase2.installer_container="docker.io/cnastorage/k8s-ignition:v2" .phase2.docker_registry="dockerhub.dev.netstream.ch"

# # Phase 3: Deploying Addons # .phase3.run_addons=y .phase3.kube_proxy=y .phase3.dashboard=y

62 .phase3.heapster=y .phase3.kube_dns=y # .phase3.weave_net is not set

Kubeadm on vSphere

Kubeadm Configuration Template: resources/kubeadm_master_config.yml.tpl

apiVersion: kubeadm.k8s.io/v1alpha1 kind: MasterConfiguration token: ${kubeadm-token} api: advertiseAddress: ${master-ip} bindPort: ${kube-apiserver-port} networking: podSubnet: 192.168.0.0/16

63 Kubeadm Master Setup Script: resources/kubeadm_master_setup.sh.tpl

#!/bin/bash

# Set up Kubernetes master node

KUBE_SETUP_DIR="${kube-setup-dir}"

# Expose etcd through public interface instead of loopback ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379" sed --in-place=.bak 's|http://127.0.0.1:2379|$ETCD_LISTEN_CLIENT_URLS|g' /etc/kubernetes/apiserver echo "ETCD_LISTEN_CLIENT_URLS=$ETCD_LISTEN_CLIENT_URLS" >> $HOME/.bash_rc source $HOME/.bash_rc

# Enable access to port 8080 (api-server) & 2379 (ectd) echo -e "\n#Custom: Enable Kubernetes connections" >> /etc/systemd/scripts/iptables for PORT in 8080 6443 10250 10251 10252 10255 2379 2780; do echo "Enabling access to port $PORT..." echo "iptables -A INPUT -p tcp --dport $PORT -j ACCEPT" >> /etc/systemd/scripts/iptables done systemctl restart iptables

chmod -R a+r "$KUBE_SETUP_DIR"

# Run kubeadm initialization for master kubeadm init --config $KUBE_SETUP_DIR/kubeadm-config.yml | tee " $KUBE_SETUP_DIR/kubeadm-setup.log"/etc/kubernetes/manifests/etcd.yaml

# Set up kubectl mkdir -p $HOME/.kube cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

# Changing the etcd interface from loopback to the public one: # sed --in-place=.original 's|--listen-client-urls=http://127.0.0.1:2379|--listen- client-urls=http://0.0.0.0:2379|g' /etc/kubernetes/manifests/etcd.yaml # sed --in-place=.bak 's|--advertise-client-urls=http://127.0.0.1:2379|--advertise- client-urls=http://${master-ip}:2379|g' /etc/kubernetes/manifests/etcd.yaml # # secs=$((1 * 60)) # while [ $secs -gt 0 ]; do # echo -ne " Waiting for: $secs\033[0K\r" # sleep 1 # : $((secs--)) # done

echo "Done with Kubernetes setup, rebooting..." reboot

64 Kubeadm Worker Setup Script: resources/kubeadm_worker_setup.sh.tpl

#!/bin/bash

# Set up Kubernetes worker nodes

# Enable access to port 8080 echo -e "\n#Custom: Enable Kubernetes connections" >> /etc/systemd/scripts/iptables for PORT in 8080 10250 10255; do echo "Enabling access to port $PORT..." echo "iptables -A INPUT -p tcp --dport $PORT -j ACCEPT" >> /etc/systemd/scripts/iptables done echo "Enabling access to ports 30000-32767 (for NodePort service)..." echo "iptables -A INPUT -p tcp --match multiport --dports 30000:32767 -j ACCEPT" >> /etc/systemd/scripts/iptables systemctl restart iptables

# Run kubeadm to join cluster kubeadm join --token=${kubeadm-token} ${master-ip}:${kube-apiserver-port}

Terraform Variable Declarations: terraform/vsphere.kubeadm/00_config.tf

variable "vsphere-config" { type = "map" default = { server = "172.26.4.53" user = "[email protected]" password = "terraform99" datacenter = "pre-prod" datastore = "datastore1" cluster = "Cluster01" } }

variable "vm-config" { type = "map" default = { template = "photon-template" user = "root" password = "kubernetes99" network = "net-obb-mgmt" gateway = "172.26.4.1" } }

// Generated with `kubeadm token generate` variable "kubeadm-token" { type = "string" default = "e3721f.9b15df3a1a1118bd"

65 }

variable "master" { type = "map" default = { name = "kubernetes-master" ip = "172.26.4.240" port = 8080 } }

variable "workers" { type = "map" default = { count = "1" name-prefix = "kubernetes-worker-" } }

variable "worker-ips" { type = "list" default = [ "172.26.4.241", "172.26.4.242", "172.26.4.243" ] }

variable "hosts-file" { type = "string" default = <

172.26.4.240 kubernetes-master.vsphere.local kubernetes-master 172.26.4.241 kubernetes-worker-1.vsphere.local kubernetes-worker-1 172.26.4.242 kubernetes-worker-2.vsphere.local kubernetes-worker-2 172.26.4.243 kubernetes-worker-3.vsphere.local kubernetes-worker-3 EOF }

variable "kubernetes-setup-dir" { type = "string" default = "/var/kubernetes" }

data "template_file" "kubeadm-master-config" { template = "${file("${path.module}/../../resources/kubeadm_master_config.yml.tpl")}"

vars { kubeadm-token = "${var.kubeadm-token}" kube-apiserver-port = "${var.master["port"]}"

66 master-ip = "${var.master["ip"]}" } }

data "template_file" "kubeadm-master-setup" { template = "${file("${path.module}/../../resources/kubeadm_master_setup.sh.tpl")}"

vars { kube-setup-dir = "${var.kubernetes-setup-dir}" master-ip = "${var.master["ip"]}" } }

data "template_file" "kubeadm-worker-setup" { template = "${file("${path.module}/../../resources/kubeadm_worker_setup.sh.tpl")}"

vars { kubeadm-token = "${var.kubeadm-token}" kube-apiserver-port = "${var.master["port"]}" master-ip = "${var.master["ip"]}" } }

Terraform Provider Configuration: terraform/vsphere.kubeadm/01_init-vsphere.tf

provider "vsphere" { user = "${var.vsphere-config["user"]}" password = "${var.vsphere-config["password"]}" vsphere_server = "${var.vsphere-config["server"]}"

# if you have a self-signed cert allow_unverified_ssl = true }

data "vsphere_datacenter" "datacenter" { name = "${var.vsphere-config["datacenter"]}" }

resource "vsphere_folder" "kubernetes" { path = "kubernetes" datacenter = "${data.vsphere_datacenter.datacenter.name}" }

67 Master Networking Initialization: sh/kubeadm_master_init.sh

#!/bin/bash

# Initialize Kubernetes master node

echo "Initializing Calico for networking..." kubectl apply -f http://docs.projectcalico.org/v2.4/getting- started/kubernetes/installation/hosted/kubeadm/1.6/calico.yaml

echo "Done with Kubernetes initialization."

Terraform Resource Declaration: terraform/vsphere.kubeadm/02_nodes.tf

//======// Kubernetes Master

resource "vsphere_virtual_machine" "master-node" {

name = "${var.master["name"]}" folder = "${vsphere_folder.kubernetes.path}" vcpu = 2 memory = 8192 datacenter = "${data.vsphere_datacenter.datacenter.name}" cluster = "${var.vsphere-config["cluster"]}"

network_interface { label = "${var.vm-config["network"]}" ipv4_address = "${var.master["ip"]}" ipv4_prefix_length = "24" ipv4_gateway = "${var.vm-config["gateway"]}" }

disk { datastore = "${var.vsphere-config["datastore"]}" template = "${vsphere_folder.kubernetes.path}/${var.vm-config["template"]}" bootable = "true" type = "thin" }

connection { type = "ssh" user = "${var.vm-config["user"]}" password = "${var.vm-config["password"]}" }

provisioner "file" { source = "${path.module}/../../resources/photon_rsa.pub" destination = "/root/.ssh/authorized_keys" }

68 provisioner "remote-exec" { inline = [ "tdnf install -y kubernetes-kubeadm", "mkdir -m 6755 ${var.kubernetes-setup-dir}" ] }

provisioner "file" { content = "${var.hosts-file}" destination = "/etc/hosts" }

provisioner "file" { content = "${data.template_file.kubeadm-master-setup.rendered}" destination = "${var.kubernetes-setup-dir}/kubeadm-setup.sh" }

provisioner "file" { content = "${data.template_file.kubeadm-master-config.rendered}" destination = "${var.kubernetes-setup-dir}/kubeadm-config.yml" }

provisioner "remote-exec" { inline = [ "chmod u+x ${var.kubernetes-setup-dir}/kubeadm-setup.sh", "${var.kubernetes-setup-dir}/kubeadm-setup.sh" ] }

}

//======// Kubernetes Worker Nodes resource "vsphere_virtual_machine" "worker-nodes" {

depends_on = [ "vsphere_virtual_machine.master-node" ]

count = "${var.workers["count"]}" name = "${var.workers["name-prefix"]}${count.index + 1}" folder = "${vsphere_folder.kubernetes.path}" vcpu = 2 memory = 8192 datacenter = "${data.vsphere_datacenter.datacenter.name}" cluster = "${var.vsphere-config["cluster"]}"

network_interface { label = "${var.vm-config["network"]}"

69 ipv4_address = "${var.worker-ips[count.index]}" ipv4_prefix_length = "24" ipv4_gateway = "${var.vm-config["gateway"]}" }

disk { datastore = "${var.vsphere-config["datastore"]}" template = "${vsphere_folder.kubernetes.path}/${var.vm-config["template"]}" bootable = "true" type = "thin" }

connection { type = "ssh" user = "${var.vm-config["user"]}" password = "${var.vm-config["password"]}" }

provisioner "file" { source = "${path.module}/../../resources/photon_rsa.pub" destination = "/root/.ssh/authorized_keys" }

provisioner "remote-exec" { inline = [ "tdnf install -y kubernetes-kubeadm", "mkdir -m 6755 ${var.kubernetes-setup-dir}" ] }

provisioner "file" { content = "${var.hosts-file}" destination = "/etc/hosts" }

provisioner "file" { content = "${data.template_file.kubeadm-worker-setup.rendered}" destination = "${var.kubernetes-setup-dir}/kubeadm-setup.sh" }

provisioner "remote-exec" { inline = [ "chmod u+x ${var.kubernetes-setup-dir}/kubeadm-setup.sh", "${var.kubernetes-setup-dir}/kubeadm-setup.sh" ] }

}

//======// Networking

70 resource "null_resource" "kubernetes-master-init" {

depends_on = [ "vsphere_virtual_machine.master-node", "vsphere_virtual_machine.worker-nodes" ]

connection { host = "${var.master["ip"]}" type = "ssh" user = "${var.vm-config["user"]}" password = "${var.vm-config["password"]}" }

provisioner "remote-exec" { script = "${path.module}/../../sh/kubeadm_master_init.sh" }

}

Kubeadm on Google Compute Engine

Kubeadm Configuration Template: resources/gce-kubeadm_master_config.yml.tpl

apiVersion: kubeadm.k8s.io/v1alpha1 kind: MasterConfiguration token: ${kubeadm-token} api: bindPort: ${kube-apiserver-port} networking: podSubnet: 192.168.0.0/16

71 Kubeadm Master Setup Script: resources/gce-kubeadm_master_setup.sh.tpl

#!/bin/bash

# Set up Kubernetes master node

KUBE_SETUP_DIR="${kube-setup-dir}"

# Expose etcd through public interface instead of loopback ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379" sed --in-place=.bak 's|http://127.0.0.1:2379|$ETCD_LISTEN_CLIENT_URLS|g' /etc/kubernetes/apiserver echo "ETCD_LISTEN_CLIENT_URLS=$ETCD_LISTEN_CLIENT_URLS" >> $HOME/.bash_rc source $HOME/.bash_rc

# Enable access to port 8080 (api-server) & 2379 (ectd) echo -e "\n#Custom: Enable Kubernetes connections" >> /etc/systemd/scripts/iptables for PORT in 8080 6443 10250 10251 10252 10255 2379 2780; do echo "Enabling access to port $PORT..." echo "iptables -A INPUT -p tcp --dport $PORT -j ACCEPT" >> /etc/systemd/scripts/iptables done systemctl restart iptables

chmod -R a+r "$KUBE_SETUP_DIR"

# Run kubeadm initialization for master kubeadm init --config $KUBE_SETUP_DIR/kubeadm-config.yml | tee " $KUBE_SETUP_DIR/kubeadm-setup.log"/etc/kubernetes/manifests/etcd.yaml

# Set up kubectl mkdir -p /home/${user}/.kube cp -i /etc/kubernetes/admin.conf /home/${user}/.kube/config chown -R ${user} /home/${user}/.kube

echo "Done with Kubernetes setup"

Terraform Variable Declarations: terraform/gce/00_config.tf

variable "gce-config" { type = "map" default = { region = "europe-west1-b" project = "jaynestown-for-k8s" credentials = "../../resources/gcp-account.json" } }

variable "vm-config" { type = "map"

72 default = { template = "photon-template" user = "kube-setup" ssh-key = "../../resources/gcp-photon_rsa" } } variable "master" { type = "map" default = { name = "kubernetes-master" port = "8080" machine-type = "n1-standard-4" } } variable "workers" { type = "map" default = { count = "3" name-prefix = "kubernetes-worker" machine-type = "n1-standard-2" } } variable "kubernetes-setup-dir" { type = "string" default = "/var/kubernetes" }

// Generated with `kubeadm token generate` variable "kubeadm-token" { type = "string" default = "e3721f.9b15df3a1a1118bd" } data "template_file" "kubeadm-master-config" { template = "${file("${path.module}/../../resources/gce- kubeadm_master_config.yml.tpl")}"

vars { kubeadm-token = "${var.kubeadm-token}" kube-apiserver-port = "${var.master["port"]}" } } data "template_file" "kubeadm-master-setup" { template = "${file("${path.module}/../../resources/gce- kubeadm_master_setup.sh.tpl")}"

vars {

73 kube-setup-dir = "${var.kubernetes-setup-dir}" user = "${var.vm-config["user"]}" } }

Terraform Provider Configuration: terraform/gce/01_init-gce.tf

provider "google" { credentials = "${file(var.gce-config["credentials"])}" project = "${var.gce-config["project"]}" region = "${var.gce-config["region"]}" }

Kubeadm Worker Setup Script: resources/kubeadm_worker_setup.sh.tpl

#!/bin/bash

# Set up Kubernetes worker nodes

# Enable access to port 8080 echo -e "\n#Custom: Enable Kubernetes connections" >> /etc/systemd/scripts/iptables for PORT in 8080 10250 10255; do echo "Enabling access to port $PORT..." echo "iptables -A INPUT -p tcp --dport $PORT -j ACCEPT" >> /etc/systemd/scripts/iptables done echo "Enabling access to ports 30000-32767 (for NodePort service)..." echo "iptables -A INPUT -p tcp --match multiport --dports 30000:32767 -j ACCEPT" >> /etc/systemd/scripts/iptables systemctl restart iptables

# Run kubeadm to join cluster kubeadm join --token=${kubeadm-token} ${master-ip}:${kube-apiserver-port}

Master Networking Initialization: sh/kubeadm_master_init.sh

#!/bin/bash

# Initialize Kubernetes master node

echo "Initializing Calico for networking..." kubectl apply -f http://docs.projectcalico.org/v2.4/getting- started/kubernetes/installation/hosted/kubeadm/1.6/calico.yaml

echo "Done with Kubernetes initialization."

Terraform Resource Declaration: terraform/gce/02_nodes.tf

//======

74 // Kubernetes Master resource "google_compute_instance" "master-node" {

name = "${var.master["name"]}" machine_type = "${var.master["machine-type"]}" zone = "${var.gce-config["region"]}"

tags = ["k8s", "k8s-master"]

boot_disk { initialize_params { type = "pd-standard" image = "${var.vm-config["template"]}" } }

network_interface { network = "default"

access_config { // Ephemeral IP } }

metadata { ssh-keys = "${var.vm-config["user"]}:${file("${var.vm-config["ssh-key"]}.pub")}" }

connection { type = "ssh" user = "${var.vm-config["user"]}" private_key = "${file(var.vm-config["ssh-key"])}" }

provisioner "remote-exec" { inline = [ "sudo tdnf install -y kubernetes-kubeadm", "sudo mkdir -m 6755 ${var.kubernetes-setup-dir}" ] }

provisioner "file" { content = "${data.template_file.kubeadm-master-setup.rendered}" destination = "~/kubeadm-setup.sh" }

provisioner "file" { content = "${data.template_file.kubeadm-master-config.rendered}" destination = "~/kubeadm-config.yml" }

75 provisioner "remote-exec" { inline = [ "sudo mv ~/kubeadm* ${var.kubernetes-setup-dir}/", "sudo chmod u+x ${var.kubernetes-setup-dir}/kubeadm-setup.sh", "sudo ${var.kubernetes-setup-dir}/kubeadm-setup.sh" ] } }

//======// Kubernetes Worker Nodes

data "template_file" "kubeadm-worker-setup" { template = "${file("${path.module}/../../resources/kubeadm_worker_setup.sh.tpl")}"

vars { kubeadm-token = "${var.kubeadm-token}" kube-apiserver-port = "${var.master["port"]}" master-ip = "${google_compute_instance.master- node.network_interface.0.address}" } }

resource "google_compute_instance" "worker-node" {

count = "${var.workers["count"]}" name = "${var.workers["name-prefix"]}-${count.index}" machine_type = "${var.workers["machine-type"]}" zone = "${var.gce-config["region"]}"

tags = ["k8s", "k8s-master"]

boot_disk { initialize_params { type = "pd-standard" image = "${var.vm-config["template"]}" } }

network_interface { network = "default"

access_config { // Ephemeral IP } }

metadata { ssh-keys = "${var.vm-config["user"]}:${file("${var.vm-config["ssh-key"]}.pub")}" }

76 connection { type = "ssh" user = "${var.vm-config["user"]}" private_key = "${file(var.vm-config["ssh-key"])}" }

provisioner "remote-exec" { inline = [ "sudo tdnf install -y kubernetes-kubeadm", "sudo mkdir -m 6755 ${var.kubernetes-setup-dir}" ] }

provisioner "file" { content = "${data.template_file.kubeadm-worker-setup.rendered}" destination = "~/kubeadm-setup.sh" }

provisioner "remote-exec" { inline = [ "sudo mv ~/kubeadm-setup.sh ${var.kubernetes-setup-dir}/", "sudo chmod u+x ${var.kubernetes-setup-dir}/kubeadm-setup.sh", "sudo ${var.kubernetes-setup-dir}/kubeadm-setup.sh" ] }

}

//======// Networking resource "null_resource" "kubernetes-master-init" {

depends_on = [ "google_compute_instance.master-node", "google_compute_instance.worker-nodes" ]

connection { type = "ssh" user = "${var.vm-config["user"]}" private_key = "${file(var.vm-config["ssh-key"])}" }

provisioner "remote-exec" { script = "${path.module}/../../sh/kubeadm_master_init.sh" }

}

77 Manual Kubernetes Setup

Kubernetes Master Setup Script: resources/kubernetes_master_setup.sh.tpl

#!/bin/bash

# Set up Kubernetes master node

KUBE_SETUP_DIR="${kube-setup-dir}"

echo "Generating Keys & Certificate..." openssl genrsa -out "$KUBE_SETUP_DIR/service-account.key" 2048 openssl genrsa -out "$KUBE_SETUP_DIR/ca.key" 2048 openssl req -x509 -new -nodes -key "$KUBE_SETUP_DIR/ca.key" -days 999999 -out "$KUBE_SETUP_DIR/ca.cert" -subj "/CN=kube-ca"

# Enable access to port 8080 (api-server) & 2379 (ectd) echo -e "\n#Custom: Enable Kubernetes connections" >> /etc/systemd/scripts/iptables for PORT in 8080 6443 10250 10251 10252 10255 2379 2780; do echo "Enabling access to port $PORT..." echo "iptables -A INPUT -p tcp --dport $PORT -j ACCEPT" >> /etc/systemd/scripts/iptables done systemctl restart iptables

chmod -R a+r "$KUBE_SETUP_DIR"

LISTEN_PEERS="s|listen-peer-urls: http://localhost|listen-peer-urls: http://0.0.0.0|g" LISTEN_CLIENTS="s|listen-client-urls: http://localhost|listen-client-urls: http://0.0.0.0|g" INIT_ADVERTISE_PEERS="s|initial-advertise-peer-urls: http://localhost|initial- advertise-peer-urls: http://${master-ip}|g" ADVERTISE_CLIENTS="s|advertise-client-urls: http://localhost|advertise-client-urls: http://${master-ip}|g" INIT_CLUSTER="s|initial-cluster:|initial-cluster: default=http://${master-ip}:2380|g" sed -i.original "$LISTEN_PEERS; $LISTEN_CLIENTS; $INIT_ADVERTISE_PEERS; $ADVERTISE_CLIENTS; $INIT_CLUSTER" /etc/etcd/etcd-default-conf.yml

# Restart all Kubernetes services for SERVICES in etcd kube-apiserver kube-controller-manager kube-scheduler; do systemctl restart $SERVICES systemctl enable $SERVICES systemctl status $SERVICES done

# Added worker nodes echo "Setting up worker nodes..." kubectl create -f "$KUBE_SETUP_DIR/nodes.yml" kubectl get nodes

# Add Calico for networking

78 echo "Setting upd Calico..." kubectl create -f "$KUBE_SETUP_DIR/calico.yml"

echo "Done with Kubernetes setup."

Kubernetes Worker Setup Script: resources/kubernetes_worker_setup.sh.tpl

#!/bin/bash

# Set up Kubernetes worker nodes

KUBE_SETUP_DIR="${kube-setup-dir}"

# Enable access to port 8080 echo -e "\n#Custom: Enable Kubernetes connections" >> /etc/systemd/scripts/iptables for PORT in 8080 10250 10255; do echo "Enabling access to port $PORT..." echo "iptables -A INPUT -p tcp --dport $PORT -j ACCEPT" >> /etc/systemd/scripts/iptables done systemctl restart iptables

chmod -R a+r "$KUBE_SETUP_DIR"

# Restart all Kubernetes services for SERVICES in kube-proxy kubelet docker; do systemctl restart $SERVICES systemctl enable $SERVICES systemctl status $SERVICES done

Terraform Variable Declarations: terraform/vsphere.manual/00_config.tf

variable "vsphere-config" { type = "map" default = { server = "172.26.4.53" user = "[email protected]" password = "terraform99" datacenter = "pre-prod" datastore = "datastore1" cluster = "Cluster01" } }

variable "vm-config" { type = "map" default = { template = "photon-template" user = "root"

79 password = "kubernetes99" network = "net-obb-mgmt" gateway = "172.26.4.1" } }

variable "master" { type = "map" default = { name = "kubernetes-master" ip = "172.26.4.240" } }

variable "workers" { type = "map" default = { count = "1" name-prefix = "kubernetes-worker-" } }

variable "worker-ips" { type = "list" default = [ "172.26.4.241", "172.26.4.242", "172.26.4.243" ] }

variable "hosts-file" { type = "string" default = <

172.26.4.240 kubernetes-master.vsphere.local kubernetes-master 172.26.4.241 kubernetes-worker-1.vsphere.local kubernetes-worker-1 172.26.4.242 kubernetes-worker-2.vsphere.local kubernetes-worker-2 172.26.4.243 kubernetes-worker-3.vsphere.local kubernetes-worker-3 EOF }

variable "kubernetes-setup-dir" { type = "string" default = "/var/kubernetes" }

variable "kubernetes-worker-nodes-yml" { type = "string" default = <

80 apiVersion: v1 kind: Node metadata: name: kubernetes-worker-1 labels: name: node-1 spec: externalID: kubernetes-worker-1 --- apiVersion: v1 kind: Node metadata: name: kubernetes-worker-2 labels: name: node-2 spec: externalID: kubernetes-worker-2 --- apiVersion: v1 kind: Node metadata: name: kubernetes-worker-3 labels: name: node-3 spec: externalID: kubernetes-worker-3 EOF } data "template_file" "kubernetes-master-setup" { template = "${file(" ${path.module}/../../resources/kubernetes_master_setup.sh.tpl")}"

vars { master-ip = "${var.master["ip"]}" kube-setup-dir = "${var.kubernetes-setup-dir}" } } data "template_file" "kubernetes-worker-setup" { template = "${file(" ${path.module}/../../resources/kubernetes_worker_setup.sh.tpl")}"

vars { kube-setup-dir = "${var.kubernetes-setup-dir}" } } data "template_file" "calico-yml" { template = "${file("${path.module}/../../resources/calico.yml.tpl")}"

81 vars { master_node = "${var.master["name"]}" } }

Terraform Provider Configuration: terraform/vsphere.manual/01_init-vsphere.tf

provider "vsphere" { user = "${var.vsphere-config["user"]}" password = "${var.vsphere-config["password"]}" vsphere_server = "${var.vsphere-config["server"]}"

# if you have a self-signed cert allow_unverified_ssl = true }

data "vsphere_datacenter" "datacenter" { name = "${var.vsphere-config["datacenter"]}" }

resource "vsphere_folder" "kubernetes" { path = "kubernetes" datacenter = "${data.vsphere_datacenter.datacenter.name}" }

Kubernetes Configuration: resources/kubernetes.config

# Comma separated list of nodes in the etcd cluster KUBE_MASTER="--master=http://kubernetes-master:8080"

# logging to stderr routes it to the systemd journal KUBE_LOGTOSTDERR="--logtostderr=true"

# journal message level, 0 is debug KUBE_LOG_LEVEL="--v=0"

# Should this cluster be allowed to run privileged docker containers KUBE_ALLOW_PRIV="--allow_privileged=true"

82 Kubernetes kube-apiserver Configuration: resources/kube-apiserver.config

# The address on the local server to listen to. KUBE_API_ADDRESS="--address=0.0.0.0"

# Comma separated list of nodes in the etcd cluster KUBE_ETCD_SERVERS="--etcd_servers=http://127.0.0.1:2379"

# Address range to use for services KUBE_SERVICE_ADDRESSES="--service-cluster-ip-range=10.254.0.0/16"

# Add your own KUBE_API_ARGS="--service-account-key-file=/var/kubernetes/service-account.key --admission -control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultSt orageClass,ResourceQuota,DefaultTolerationSeconds"

Kubernetes kube-controller-manager Configuration: resources/kube-controller-manager.config

KUBE_CONTROLLER_MANAGER_ARGS="--root-ca-file=var/kubernetes/ca.cert --service-account -private-key-file=/var/kubernetes/service-account.key"

Kubernetes kubelet Configuration: resources/kubelet.config

### # Kubernetes kubelet (node) config

# The address for the info server to serve on (set to 0.0.0.0 or "" for all interfaces) KUBELET_ADDRESS="--address=0.0.0.0"

# You may leave this blank to use the actual hostname #KUBELET_HOSTNAME="--hostname_override=photon-node"

# location of the api-server KUBELET_API_SERVER="--api_servers=http://kubernetes-master:8080"

# Add your own KUBELET_ARGS="--cluster-dns=10.254.0.0 --cluster-domain=cluster.local"

Terraform Resource Declaration: terraform/vsphere.manual/02_nodes.tf

//======// Kubernetes Master

resource "vsphere_virtual_machine" "master-node" {

name = "${var.master["name"]}" folder = "${vsphere_folder.kubernetes.path}"

83 vcpu = 2 memory = 8192 datacenter = "${data.vsphere_datacenter.datacenter.name}" cluster = "${var.vsphere-config["cluster"]}"

network_interface { label = "${var.vm-config["network"]}" ipv4_address = "${var.master["ip"]}" ipv4_prefix_length = "24" ipv4_gateway = "${var.vm-config["gateway"]}" }

disk { datastore = "${var.vsphere-config["datastore"]}" template = "${vsphere_folder.kubernetes.path}/${var.vm-config["template"]}" bootable = "true" type = "thin" }

connection { type = "ssh" user = "${var.vm-config["user"]}" password = "${var.vm-config["password"]}" }

provisioner "file" { source = "${path.module}/../../resources/photon_rsa.pub" destination = "/root/.ssh/authorized_keys" }

provisioner "remote-exec" { inline = [ "tdnf install -y kubernetes", "mkdir -m 6755 ${var.kubernetes-setup-dir}" ] }

provisioner "file" { content = "${var.hosts-file}" destination = "/etc/hosts" }

provisioner "file" { source = "${path.module}/../../resources/kubernetes.config" destination = "/etc/kubernetes/config" }

provisioner "file" { source = "${path.module}/../../resources/kube-apiserver.config" destination = "/etc/kubernetes/apiserver" }

84 provisioner "file" { source = "${path.module}/../../resources/kube-controller-manager.config" destination = "/etc/kubernetes/controller-manager" }

provisioner "file" { content = "${var.kubernetes-worker-nodes-yml}" destination = "${var.kubernetes-setup-dir}/nodes.yml" }

provisioner "file" { content = "${data.template_file.calico-yml.rendered}" destination = "${var.kubernetes-setup-dir}/calico.yml" }

provisioner "file" { content = "${data.template_file.kubernetes-master-setup.rendered}" destination = "${var.kubernetes-setup-dir}/kube-setup.sh" }

provisioner "remote-exec" { inline = [ "chmod u+x ${var.kubernetes-setup-dir}/kube-setup.sh", "${var.kubernetes-setup-dir}/kube-setup.sh | tee ${var.kubernetes-setup- dir}/kube-setup.log" ] }

}

//======// Kubernetes Worker Nodes resource "vsphere_virtual_machine" "worker-nodes" {

// depends_on = ["vsphere_virtual_machine.master-node"]

count = "${var.workers["count"]}" name = "${var.workers["name-prefix"]}${count.index + 1}" folder = "${vsphere_folder.kubernetes.path}" vcpu = 2 memory = 8192 datacenter = "${data.vsphere_datacenter.datacenter.name}" cluster = "${var.vsphere-config["cluster"]}"

network_interface { label = "${var.vm-config["network"]}" ipv4_address = "${var.worker-ips[count.index]}" ipv4_prefix_length = "24"

85 ipv4_gateway = "${var.vm-config["gateway"]}" }

disk { datastore = "${var.vsphere-config["datastore"]}" template = "${vsphere_folder.kubernetes.path}/${var.vm-config["template"]}" bootable = "true" type = "thin" }

connection { type = "ssh" user = "${var.vm-config["user"]}" password = "${var.vm-config["password"]}" }

provisioner "file" { source = "${path.module}/../../resources/photon_rsa.pub" destination = "/root/.ssh/authorized_keys" }

provisioner "remote-exec" { inline = [ "tdnf install -y kubernetes", "mkdir -m 6755 ${var.kubernetes-setup-dir}" ] }

provisioner "file" { content = "${var.hosts-file}" destination = "/etc/hosts" }

provisioner "file" { source = "${path.module}/../../resources/kubernetes.config" destination = "/etc/kubernetes/config" }

provisioner "file" { source = "${path.module}/../../resources/kubelet.config" destination = "/etc/kubernetes/kubelet" }

provisioner "file" { content = "${data.template_file.kubernetes-worker-setup.rendered}" destination = "${var.kubernetes-setup-dir}/kube-setup.sh" }

provisioner "remote-exec" { inline = [ "chmod u+x ${var.kubernetes-setup-dir}/kube-setup.sh",

86 "${var.kubernetes-setup-dir}/kube-setup.sh | tee ${var.kubernetes-setup- dir}/kube-setup.log" ] }

}

Canonical Kubernetes

Installing The Canonical Distribution of Kubernetes using Juju

$ juju clouds Cloud Regions Default Type Description aws 14 us-east-1 ec2 Amazon Web Services aws-china 1 cn-north-1 ec2 Amazon China aws-gov 1 us-gov-west-1 ec2 Amazon (USA Government) azure 26 centralus azure Microsoft Azure azure-china 2 chinaeast azure Microsoft Azure China cloudsigma 5 hnl cloudsigma CloudSigma Cloud google 8 us-east1 gce Google Cloud Platform joyent 6 eu-ams-1 joyent Joyent Cloud oracle 5 uscom-central-1 oracle rackspace 6 dfw rackspace Rackspace Cloud localhost 1 localhost lxd LXD Container Hypervisor

Try 'list-regions ' to see available regions. 'show-cloud ' or 'regions --format yaml ' can be used to see region endpoints. 'add-cloud' can add private clouds or private infrastructure. Update the known public clouds with 'update-clouds'.

$ juju add-credential google Enter credential name: Auth Types jsonfile* oauth2

Select auth-type: jsonfile Enter file: /media/data/work/src/co/src/main/resources/gcp-account.json Credentials added for cloud google.

$ juju bootstrap google juju-ctl Creating Juju controller "juju-ctl" on google/us-east1 Looking for packaged Juju agent version 2.0.2 for amd64 Launching controller instance(s) on google/us-east1... - juju-edc8bf-0 (arch=amd64 mem=3.5G cores=4) Fetching Juju GUI 2.9.2 Waiting for address Attempting to connect to 35.196.214.120:22 Attempting to connect to 10.142.0.2:22

87 Logging to /var/log/cloud-init-output.log on the bootstrap machine Running apt-get update Running apt-get upgrade Installing curl, cpu-checker, bridge-utils, cloud-utils, tmux Fetching Juju agent version 2.0.2 for amd64 Installing Juju machine agent Starting Juju machine agent (service jujud-machine-0) Bootstrap agent now started Contacting Juju controller at 10.142.0.2 to verify accessibility... Bootstrap complete, "juju-ctl" controller now available. Controller machines are in the "controller" model. Initial model "default" added.

$ juju status Model Controller Cloud/Region Version default juju-ctl google/us-east1 2.0.2

App Version Status Scale Charm Store Rev OS Notes

Unit Workload Agent Machine Public address Ports Message

Machine State DNS Inst id Series AZ

$ juju deploy canonical-kubernetes Located bundle "cs:bundle/canonical-kubernetes-101" Deploying charm "cs:~containers/easyrsa-15" added resource easyrsa Deploying charm "cs:~containers/etcd-48" added resource snapshot added resource etcd Deploying charm "cs:~containers/flannel-26" added resource flannel Deploying charm "cs:~containers/kubeapi-load-balancer-25" application kubeapi-load-balancer exposed Deploying charm "cs:~containers/kubernetes-master-47" added resource kube-scheduler added resource kubectl added resource cdk-addons added resource kube-apiserver added resource kube-controller-manager Deploying charm "cs:~containers/kubernetes-worker-52" added resource kubelet added resource cni added resource kube-proxy added resource kubectl application kubernetes-worker exposed Related "kubernetes-master:kube-api-endpoint" and "kubeapi-load-balancer:apiserver" Related "kubernetes-master:loadbalancer" and "kubeapi-load-balancer:loadbalancer" Related "kubernetes-master:kube-control" and "kubernetes-worker:kube-control" Related "kubernetes-master:certificates" and "easyrsa:client" Related "etcd:certificates" and "easyrsa:client"

88 Related "kubernetes-master:etcd" and "etcd:db" Related "kubernetes-worker:certificates" and "easyrsa:client" Related "kubernetes-worker:kube-api-endpoint" and "kubeapi-load-balancer:website" Related "kubeapi-load-balancer:certificates" and "easyrsa:client" Related "flannel:etcd" and "etcd:db" Related "flannel:cni" and "kubernetes-master:cni" Related "flannel:cni" and "kubernetes-worker:cni" Deploy of bundle completed.

$ juju status Model Controller Cloud/Region Version default juju-ctl google/us-east1 2.0.2

App Version Status Scale Charm Store Rev OS Notes easyrsa waiting 0/1 easyrsa jujucharms 15 ubuntu etcd waiting 0/3 etcd jujucharms 48 ubuntu flannel waiting 0 flannel jujucharms 26 ubuntu kubeapi-load-balancer waiting 0/1 kubeapi-load-balancer jujucharms 25 ubuntu exposed kubernetes-master waiting 0/1 kubernetes-master jujucharms 47 ubuntu kubernetes-worker waiting 0/3 kubernetes-worker jujucharms 52 ubuntu exposed

Unit Workload Agent Machine Public address Ports Message easyrsa/0 waiting allocating 0 35.185.120.204 waiting for machine etcd/0 waiting allocating 1 35.190.142.83 waiting for machine etcd/1 waiting allocating 2 35.196.121.205 waiting for machine etcd/2 waiting allocating 3 35.196.243.75 waiting for machine kubeapi-load-balancer/0 waiting allocating 4 waiting for machine kubernetes-master/0 waiting allocating 5 waiting for machine kubernetes-worker/0 waiting allocating 6 waiting for machine kubernetes-worker/1 waiting allocating 7 waiting for machine kubernetes-worker/2 waiting allocating 8 waiting for machine

Machine State DNS Inst id Series AZ 0 pending 35.185.120.204 juju-192512-0 xenial us-east1-b 1 pending 35.190.142.83 juju-192512-1 xenial us-east1-c

89 2 pending 35.196.121.205 juju-192512-2 xenial us-east1-b 3 pending 35.196.243.75 juju-192512-3 xenial us-east1-d 4 pending pending xenial 5 pending pending xenial 6 pending pending xenial 7 pending pending xenial 8 pending pending xenial

Relation Provides Consumes Type certificates easyrsa etcd regular certificates easyrsa kubeapi-load-balancer regular certificates easyrsa kubernetes-master regular certificates easyrsa kubernetes-worker regular cluster etcd etcd peer etcd etcd flannel regular etcd etcd kubernetes-master regular cni flannel kubernetes-master regular cni flannel kubernetes-worker regular loadbalancer kubeapi-load-balancer kubernetes-master regular kube-api-endpoint kubeapi-load-balancer kubernetes-worker regular cni kubernetes-master flannel subordinate kube-control kubernetes-master kubernetes-worker regular cni kubernetes-worker flannel subordinate

$ watch juju status Every 2.0s: juju status JHan-Desktop: Mon Oct 2 19:59:21 2017

Model Controller Cloud/Region Version default juju-ctl google/us-east1 2.0.2

App Version Status Scale Charm Store Rev OS Notes easyrsa 3.0.1 active 1 easyrsa jujucharms 15 ubuntu etcd 2.3.8 active 3 etcd jujucharms 48 ubuntu flannel 0.7.0 active 4 flannel jujucharms 26 ubuntu kubeapi-load-balancer 1.10.3 active 1 kubeapi-load-balancer jujucharms 25 ubuntu exposed kubernetes-master 1.7.4 active 1 kubernetes-master jujucharms 47 ubuntu kubernetes-worker 1.7.4 active 3 kubernetes-worker jujucharms 52 ubuntu exposed

Unit Workload Agent Machine Public address Ports Message easyrsa/0* active idle 0 35.185.120.204 Certificate Authority connected. etcd/0* active idle 1 35.190.142.83 2379/tcp

90 Healthy with 3 known peers etcd/1 active idle 2 35.196.121.205 2379/tcp Healthy with 3 known peers etcd/2 active idle 3 35.196.243.75 2379/tcp Healthy with 3 known peers kubeapi-load-balancer/0* active idle 4 35.196.93.180 443/tcp Loadbalancer ready. kubernetes-master/0* active idle 5 35.196.147.162 6443/tcp Kubernetes master running. flannel/1 active idle 35.196.147.162 Flannel subnet 10.1.20.1/24 kubernetes-worker/0* active idle 6 35.185.35.115 80/tcp,443/tcp Kubernetes worker running. flannel/2 active idle 35.185.35.115 Flannel subnet 10.1.47.1/24 kubernetes-worker/1 active idle 7 35.196.240.93 80/tcp,443/tcp Kubernetes worker running. flannel/0* active idle 35.196.240.93 Flannel subnet 10.1.81.1/24 kubernetes-worker/2 active idle 8 35.196.102.184 80/tcp,443/tcp Kubernetes worker running. flannel/3 active idle 35.196.102.184 Flannel subnet 10.1.35.1/24

Machine State DNS Inst id Series AZ 0 started 35.185.120.204 juju-192512-0 xenial us-east1-b 1 started 35.190.142.83 juju-192512-1 xenial us-east1-c 2 started 35.196.121.205 juju-192512-2 xenial us-east1-b 3 started 35.196.243.75 juju-192512-3 xenial us-east1-d 4 started 35.196.93.180 juju-192512-4 xenial us-east1-c 5 started 35.196.147.162 juju-192512-5 xenial us-east1-d 6 started 35.185.35.115 juju-192512-6 xenial us-east1-b 7 started 35.196.240.93 juju-192512-7 xenial us-east1-c 8 started 35.196.102.184 juju-192512-8 xenial us-east1-d

Relation Provides Consumes Type certificates easyrsa etcd regular certificates easyrsa kubeapi-load-balancer regular certificates easyrsa kubernetes-master regular certificates easyrsa kubernetes-worker regular cluster etcd etcd peer etcd etcd flannel regular etcd etcd kubernetes-master regular cni flannel kubernetes-master regular cni flannel kubernetes-worker regular loadbalancer kubeapi-load-balancer kubernetes-master regular kube-api-endpoint kubeapi-load-balancer kubernetes-worker regular cni kubernetes-master flannel subordinate kube-control kubernetes-master kubernetes-worker regular cni kubernetes-worker flannel subordinate

91 $ juju scp kubernetes-master/0:config kube-config $ mkdir ~/.kube $ cp kube-config ~/.kube/config $ $ kubectl cluster-info Kubernetes master is running at https://35.196.93.180:443 Heapster is running at https://35.196.93.180:443/api/v1/proxy/namespaces/kube- system/services/heapster KubeDNS is running at https://35.196.93.180:443/api/v1/proxy/namespaces/kube- system/services/kube-dns kubernetes-dashboard is running at https://35.196.93.180:443/api/v1/proxy/namespaces/kube-system/services/kubernetes- dashboard Grafana is running at https://35.196.93.180:443/api/v1/proxy/namespaces/kube- system/services/monitoring-grafana InfluxDB is running at https://35.196.93.180:443/api/v1/proxy/namespaces/kube- system/services/monitoring-influxdb

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. $ $ kubectl get all --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default po/default-http-backend-bn5zv 1/1 Running 0 4m default po/nginx-ingress-controller-0zq7n 1/1 Running 0 4m default po/nginx-ingress-controller-j9x5d 1/1 Running 0 4m default po/nginx-ingress-controller-xnqdb 1/1 Running 0 4m kube-system po/heapster-v1.4.1-742169768-hz52f 4/4 Running 0 3m kube-system po/kube-dns-3097350089-6xrmn 3/3 Running 0 7m kube-system po/kubernetes-dashboard-1265873680-jk99q 1/1 Running 0 7m kube-system po/monitoring-influxdb-grafana-v4-7wlpn 2/2 Running 0 7m

NAMESPACE NAME DESIRED CURRENT READY AGE default rc/default-http-backend 1 1 1 4m default rc/nginx-ingress-controller 3 3 3 4m kube-system rc/monitoring-influxdb-grafana-v4 1 1 1 7m

NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE default svc/default-http-backend 10.152.183.83 80/TCP 4m default svc/kubernetes 10.152.183.1 443/TCP 7m

92 kube-system svc/heapster 10.152.183.171 80/TCP 7m kube-system svc/kube-dns 10.152.183.10 53/UDP,53/TCP 7m kube-system svc/kubernetes-dashboard 10.152.183.136 80/TCP 7m kube-system svc/monitoring-grafana 10.152.183.198 80/TCP 7m kube-system svc/monitoring-influxdb 10.152.183.150 8083/TCP,8086/TCP 7m

NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE kube-system deploy/heapster-v1.4.1 1 1 1 1 7m kube-system deploy/kube-dns 1 1 1 1 7m kube-system deploy/kubernetes-dashboard 1 1 1 1 7m

NAMESPACE NAME DESIRED CURRENT READY AGE kube-system rs/heapster-v1.4.1-1432482419 0 0 0 3m kube-system rs/heapster-v1.4.1-1537851546 0 0 0 7m kube-system rs/heapster-v1.4.1-3125553707 0 0 0 3m kube-system rs/heapster-v1.4.1-742169768 1 1 1 3m kube-system rs/heapster-v1.4.1-782453146 0 0 0 4m kube-system rs/kube-dns-3097350089 1 1 1 7m kube-system rs/kubernetes-dashboard-1265873680 1 1 1 7m $ $ kubectl proxy & [1] 27236 $ Starting to serve on 127.0.0.1:8001 $ curl http://127.0.0.1:8001 { "paths": [ "/api", "/api/v1", "/apis", "/apis/", "/apis/apiextensions.k8s.io", "/apis/apiextensions.k8s.io/v1beta1", "/apis/apiregistration.k8s.io", "/apis/apiregistration.k8s.io/v1beta1", "/apis/apps", "/apis/apps/v1beta1", "/apis/authentication.k8s.io", "/apis/authentication.k8s.io/v1", "/apis/authentication.k8s.io/v1beta1", "/apis/authorization.k8s.io", "/apis/authorization.k8s.io/v1", "/apis/authorization.k8s.io/v1beta1",

93 "/apis/autoscaling", "/apis/autoscaling/v1", "/apis/batch", "/apis/batch/v1", "/apis/certificates.k8s.io", "/apis/certificates.k8s.io/v1beta1", "/apis/extensions", "/apis/extensions/v1beta1", "/apis/networking.k8s.io", "/apis/networking.k8s.io/v1", "/apis/policy", "/apis/policy/v1beta1", "/apis/rbac.authorization.k8s.io", "/apis/rbac.authorization.k8s.io/v1alpha1", "/apis/rbac.authorization.k8s.io/v1beta1", "/apis/settings.k8s.io", "/apis/settings.k8s.io/v1alpha1", "/apis/storage.k8s.io", "/apis/storage.k8s.io/v1", "/apis/storage.k8s.io/v1beta1", "/healthz", "/healthz/autoregister-completion", "/healthz/ping", "/healthz/poststarthook/apiservice-registration-controller", "/healthz/poststarthook/apiservice-status-available-controller", "/healthz/poststarthook/bootstrap-controller", "/healthz/poststarthook/ca-registration", "/healthz/poststarthook/extensions/third-party-resources", "/healthz/poststarthook/generic-apiserver-start-informers", "/healthz/poststarthook/kube-apiserver-autoregistration", "/healthz/poststarthook/start-apiextensions-controllers", "/healthz/poststarthook/start-apiextensions-informers", "/healthz/poststarthook/start-kube-aggregator-informers", "/healthz/poststarthook/start-kube-apiserver-informers", "/logs", "/metrics", "/swagger-2.0.0.json", "/swagger-2.0.0.pb-v1", "/swagger-2.0.0.pb-v1.gz", "/swagger.json", "/swaggerapi", "/ui", "/ui/", "/version" ] } $ juju gui Opening the Juju GUI in your browser. If it does not open, open this URL: https://35.196.214.120:17070/gui/ad172816-d43b-4fae-8d05-0056f0192512/ $

94 $ juju show-controller --show-password | grep account -A 5 account: user: admin access: superuser password: 6382c16dcef89a5db8293b5d6856a605 $ $ juju models Controller: juju-ctl

Model Cloud/Region Status Machines Cores Access Last connection controller google/us-east1 available 1 4 admin just now default* google/us-east1 available 9 30 admin 2 minutes ago $ juju destroy-model default WARNING! This command will destroy the "default" model. This includes all machines, applications, data and other resources.

Continue [y/N]? y Destroying model Waiting on model to be removed, 9 machine(s), 6 application(s)...... Waiting on model to be removed, 9 machine(s), 6 application(s)... Waiting on model to be removed, 9 machine(s), 4 application(s)... Waiting on model to be removed, 9 machine(s), 1 application(s)... Waiting on model to be removed, 9 machine(s)...... Waiting on model to be removed, 9 machine(s)... Waiting on model to be removed, 7 machine(s)...... Waiting on model to be removed, 4 machine(s)... Waiting on model to be removed... $ $ juju controllers Use --refresh flag with this command to see the latest information.

Controller Model User Access Cloud/Region Models Machines HA Version juju-ctl* - admin superuser google/us-east1 2 1 none 2.0.2

$ juju destroy-controller juju-ctl WARNING! This command will destroy the "juju-ctl" controller. This includes all machines, applications, data and other resources.

Continue? (y/N):y Destroying controller Waiting for hosted model resources to be reclaimed All hosted models reclaimed, cleaning up controller machines

95 Adding a Persistent Volume to a Juju-managed Cluster

$ juju deploy cs:ceph-mon -n 3 $ juju deploy cs:ceph-osd -n 3 $ juju add-relation ceph-mon ceph-osd $ juju add-storage ceph-osd/0 osd-devices=gce,10G,1 $ juju add-storage ceph-osd/1 osd-devices=gce,10G,1 $ juju add-storage ceph-osd/2 osd-devices=gce,10G,1 $ juju add-relation kubernetes-master ceph-mon $ juju storage $ juju run-action kubernetes-master/0 create-rbd-pv name=dev-storage size=9168 $ juju show-action-output 46467320-bb7b-453c-898a-10ab4bebaea0 $ watch kubectl get pv

Microservice Deployment

96 Example Microservice Deployment Specifications

apiVersion: extensions/v1beta1 kind: Deployment metadata: name: product-catalogue spec: replicas: 2 # tells deployment to run 2 pods matching the template template: # create pods using pod definition in this template metadata: labels: app: product-catalogue env: dev team: clubs spec: containers: - name: product-catalogue image: /product-catalogue ports: - containerPort: 8090 env: - name: DB_PORT_27017_TCP_ADDR value: imagePullSecrets: - name: gcr-json-key --- apiVersion: v1 kind: Service metadata: name: product-catalogue spec: ports: - port: 8080 targetPort: 8090 selector: app: product-catalogue env: dev team: clubs type: NodePort

97