Deploying OpenShift Container Platform 3 on Engine

Chris Murphy, Peter Schiffer

Version 1.6, 2016-11-05 Table of Contents

Comments and Feedback ...... 2

1. Executive Summary...... 3

2. Components and Configuration ...... 4

2.1. GCE ...... 5

2.1.1. Compute Engine ...... 5

2.1.2. Solid-state persistent disks ...... 5

2.1.3. Custom Images and Instance Templates ...... 6

2.1.4. Load Balancing and Scaling ...... 6

2.1.5. Compute Engine Networking ...... 7

2.1.6. Regions and Zones ...... 9

2.2. and Container Registry ...... 9

2.3. Cloud DNS...... 10

2.3.1. Public Zone ...... 10

2.4. Cloud Identity and Access Management ...... 11

2.5. Dynamic Inventory ...... 11

2.6. Bastion ...... 12

2.7. Nodes ...... 13

2.7.1. Master nodes ...... 13

2.7.2. Infrastructure nodes ...... 13

2.7.3. Application nodes ...... 13

2.7.4. Node labels ...... 13

2.8. OpenShift Pods ...... 14

2.9. Routing Layer ...... 14

2.10. Authentication ...... 15

2.11. Software Version Details ...... 15

2.12. Required Channels...... 15

2.13. Tooling Prerequisites ...... 16

2.13.1. gcloud Setup...... 16

2.13.2. Setup ...... 16

2.13.3. GitHub Repository ...... 16

3. Provisioning the Infrastructure ...... 17

3.1. Preparing the Environment ...... 17

3.1.1. Red Hat Subscription Manager ...... 17

3.1.2. Download RHEL Image ...... 18

3.2. Preparing the Google Cloud Engine Infrastructure ...... 18

3.2.1. Sign in to the Console ...... 18 3.2.2. Create a new GCE Project ...... 19

3.2.3. Set up Billing ...... 20

3.2.4. Add a Cloud DNS Zone ...... 21

3.2.5. Create Google OAuth 2.0 Client IDs ...... 22

3.2.6. Fill in OAuth Request Form ...... 23

3.2.7. Download and Set Up GCE tools ...... 25

3.2.8. Configure gcloud default project ...... 25

3.2.9. List out available gcloud Zones and Regions...... 25

3.3. Prepare the Installation Script ...... 26

3.4. Running the gcloud.sh script ...... 30

4. Finishing the Installation ...... 31

4.1. Validate the Deployment ...... 33

4.1.1. Running the Validation playbook again ...... 33

4.2. Operational Management ...... 34

4.3. Gathering Host Names ...... 34

4.4. Running Diagnostics ...... 34

4.5. Checking the Health of ETCD ...... 39

4.6. Default Node Selector ...... 40

4.7. Management of Maximum Pod Size ...... 40

4.8. Repositories ...... 42

4.9. Console Access...... 42

4.9.1. Log into GUI console and deploy an application...... 42

4.9.2. Log into CLI and Deploy an Application ...... 42

4.10. Explore the Environment...... 45

4.10.1. List Nodes and Set Permissions...... 45

4.10.2. List Router and Registry ...... 47

4.10.3. Explore the Registry ...... 48

4.10.4. Explore Using the oc Client to Work With Docker ...... 48

4.10.5. Explore Docker Storage...... 49

4.10.6. Explore Firewall Rules ...... 52

4.10.7. Explore the Load Balancers ...... 53

4.10.8. Explore the GCE Network ...... 53

4.11. Persistent Volumes ...... 53

4.11.1. Creating a Persistent Volumes ...... 54

4.11.2. Creating a Persistent Volumes Claim ...... 54

4.12. Testing Failure ...... 56

4.12.1. Generate a Master Outage ...... 56

4.12.2. Observe the Behavior of ETCD with a Failed Master Node ...... 56 4.12.3. Generate an Infrastruture Node outage ...... 57

5. Conclusion ...... 61

Appendix A: Revision History ...... 62

Appendix B: Contributors...... 63 100 East Davie Street Raleigh NC 27601 USA Phone: +1 919 754 3700 Phone: 888 733 4281 PO 13588 Research Triangle Park NC 27709 USA

All trademarks referenced herein are the property of their respective owners.

© 2016 by Red Hat, Inc. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, V1.0 or later (the latest version is presently available at http://www.opencontent.org/openpub/).

The information contained herein is subject to change without notice. Red Hat, Inc. shall not be liable for technical or editorial errors or omissions contained herein.

Distribution of modified versions of this document is prohibited without the explicit permission of Red Hat Inc.

Distribution of this work or derivative of this work in any standard (paper) book form for commercial purposes is prohibited unless prior permission is obtained from Red Hat Inc.

The GPG fingerprint of the [email protected] key is: CA 20 86 86 2B D6 9D FC 65 F6 EC C4 21 91 80 CD DB 42 A6 0E

Send feedback to [email protected]

www.redhat.com 1 [email protected] Comments and Feedback

In the spirit of open source, we invite anyone to provide feedback and comments on any reference architecture. Although we review our papers internally, sometimes issues or typographical errors are encountered. Feedback allows us to not only improve the quality of the papers we produce, but allows the reader to provide their thoughts on potential improvements and topic expansion to the papers. Feedback on the papers can be provided by emailing [email protected]. Please refer to the title within the email.

[email protected] 2 www.redhat.com 1. Executive Summary

Red Hat OpenShift Container Platform 3 is built around a core of application containers powered by Docker, with orchestration and management provided by , on a foundation of Atomic host and Enterprise Linux. OpenShift Origin is the upstream community project that brings it all together along with extensions, to accelerate application development and deployment.

This reference environment provides a comprehensive example demonstrating how OpenShift Container Platform 3 can be set up to take advantage of the native high availability capabilities of Kubernetes in order to create a highly available OpenShift Container Platform 3 environment. The configuration consists of three OpenShift masters, two OpenShift infrastructure nodes and two OpenShift application nodes in a multi zone environment. In addition to the configuration, operational management tasks are shown to demonstrate functionality.

www.redhat.com 3 [email protected] 2. Components and Configuration

This section provides a high-level representation of the components within this reference architecture. This reference architecture describes the deployment of the highly available OpenShift Container Platform 3 environment on Google Cloud Engine (GCE).

The image below provides a high-level representation of the components within this reference architecture. Utilizing Google Cloud Engine (GCE), the resources detailed in this document are deployed in a highly available configuration consisting of Regions and Zones, Load Balancing and Scaling, Google DNS, Google oAuth, Custom Images, and Solid-State Persistent disks. Instances deployed are given specific roles to support OpenShift. The Bastion host limits the external access to internal servers by ensuring that all SSH traffic passes through the Bastion host. The master instances host the OpenShift master components such as etcd and the OpenShift API. The Application instances are for users to deploy their containers while the Infrastructure instances are used for the OpenShift router and registry. Authentication is managed by Google OAuth. Storage for both instances and persistent storage is handled by Solid-State Persistent disks. Networking for access to all OpenShift resources are handled by three Load Balancers that distribute traffic to the external facing OpenShift API and console access originating from outside the cluster, internal access to the OpenShift API, and lastly all services opened through OpenShift routing. Finally, all resources are registered with Google’s Cloud DNS.

[email protected] 4 www.redhat.com This reference architecture breaks down the deployment into separate phases.

• Phase 1: Provision the infrastructure on GCE

• Phase 2: Provision OpenShift Container Platform on GCE

• Phase 3: Post deployment activities

For Phase 1, the infrastructure is provisioned using the scripts provided in the openshift-ansible- contrib Github repo. Once completed, the scripts will automatically start Phase 2 which is the deployment of OpenShift Enterprise. The Phase 2 is done via the Ansible playbooks installed by the openshift-ansible-playbooks rpm package. The last phase, Phase 3, concludes the deployment by confirming the environment was deployed properly. This is done by running tools like oadm diagnostics and the systems engineering teams validation Ansible playbook.

The scripts provided in the Github repo are not supported by Red Hat. They merely  provide a mechanism that can be used to build out your own infrastructure.

2.1. GCE

This section will give a brief overview of some of the common Google Cloud Engine products that are used in this reference implementation.

2.1.1. Compute Engine

Per Google; " delivers virtual machines running in Google’s innovative data centers and worldwide fiber network. Compute Engine’s tooling and workflow support enable scaling from single instances to global, load-balanced ."

Within this reference environment, the instances are deployed in multiple Regions and Zones in the us- central-a zone as the default. However, the default region can be changed during the deployment portion of the reference architecture to a zone closer to the user’s geo-location. The master instances for the OpenShift environment are machine type n1-standard-2 and are provided an extra disk which is used for Docker storage. The node instances are also machine type n1-standard-2 and are provided two extra disks which are used for Docker storage and OpenShift ephemeral volumes. The bastion host is deployed as machine type n1-standard-1. Instance sizing can be changed in the variable file for the installer which is covered in a later chapter.

For more on sizing and choosing zones and regions, see https://cloud.google.com/compute.

2.1.2. Solid-state persistent disks

Google provides persistent block storage for your virtual machines as both standard (HDD) and solid- state (SSD) disks. This reference implementation will be utilizing Solid-State Persistent disks as they provide better performance for random input/output operations per second.

Google’s persistent storage has the following attributes:

www.redhat.com 5 [email protected] • Snapshotting capabilities

• Redundant, industry-standard data protection

• All data written to disk is encrypted on the fly and stored in encrypted form

Within this reference environment, both master and node instances for the OpenShift environment are provisioned with extra Solid-State Persistent disks. Each master instance is deployed with a single extra 25GB disk which is used for Docker storage. The node instances are deployed with an extra 25GB disk used for Docker storage and a 50GB disk used for OpenShift ephemeral volumes. Instance sizing can be changed in the variable file for the installer which is covered in a later chapter.

For more information see https://cloud.google.com/compute/docs/disks.

2.1.3. Custom Images and Instance Templates

Google provides images of many popular Operating Systems including Red Hat, CentOS, and Fedora which can be used to deploy infrastructure. They also provide the ability to upload and create custom images, which has the added appeal of being pre-configured with common packages and services common for every OpenShift master and node instance. This feature speeds up the couple of initial steps needed for each instance, thus improving the overall deployment time. For this reference architecture a image provided by Red Hat will be uploaded and pre-configured for use as the base image for all OpenShift Container Platform instances.

This reference architecture will also take advantage of GCE’s instance template feature. This allows for all the required information (such as a machine type, disk image, network, and zone) to be turned into templates for each the OpenShift instance types. These custom instance templates are then used to successfully launch both OpenShift v3 masters and nodes.

For more information see https://cloud.google.com/compute/docs/creating-custom-image and https://cloud.google.com/compute/docs/instance-templates.

2.1.4. Load Balancing and Scaling

A Load Balancing service enables distribution of traffic across multiple instances in the GCE cloud. There are clear advantages to doing this:

• Keeps your applications available

• Supports Compute Engine Autoscaler which allows users to perform autoscaling on groups of instances

GCE provides three kinds of Load Balancing, HTTP(S), Network based and SSL Proxy. To be able to distribute WebSocket traffic as well as HTTP(S), this implementation guide will use Network Load Balancing and SSL Proxy.

For more information see https://cloud.google.com/compute/docs/load-balancing-and-autoscaling.

GCE LB implements parts of the Routing Layer.

[email protected] 6 www.redhat.com Load Balancers Details

Three load balancers are used in the reference environment. The table below describes the load balancer Cloud DNS name, the instances in which the load balancer is attached, and the port(s) monitored by the health check attached to each load balancer to determine whether an instance is in or out of service.

Table 1. Load Balancers

Load Balancers Assigned Instances Port

openshift- ocp-master-* 443 master.gce.sysdeseng.com

internal-openshift- ocp-master-* 443 master.gce.sysdeseng.com

*.apps.gce.sysdeseng.com ocp-infra-* 80 and 443

Both the internal-openshift-master, and the openshift-master load balancers utilize the OpenShift Master API port for communication. The internal-openshift-master load balancer uses the private subnets for internal cluster communication with the API in order to be more secure. The openshift- master load balancer is used for external traffic to access the OpenShift environment through the API or the web interface. The "*.apps" load balancer allows public traffic to reach the infrastructure nodes. The infrastructure nodes run the router pod which then directs traffic directly from the outside world into OpenShift pods with external routes defined.

2.1.5. Compute Engine Networking

A GCE Networking allows you to configure custom virtual networking which includes IP address ranges, gateways, route tables and firewalls.

For more information see https://cloud.google.com/compute/docs/networks-and-firewalls.

www.redhat.com 7 [email protected] Firewall Details

According to the specific needs of OpenShift services, only the following ports are open in the network firewall.

Table 2. ICMP and SSH Firewall Rules

Port Protocol Service Source Target

ICMP All nodes

22 TCP SSH Internet Bastion

22 TCP SSH Bastion All internal nodes

Table 3. Master Nodes Firewall Rules

Port Protocol Service Source Target

2379, 2380 TCP etcd Master nodes Master nodes

8053 TCP, UDP SkyDNS All internal nodes Master nodes

443 TCP Web Console, API Internet Master nodes

Table 4. App and Infra Nodes Firewall Rules

Port Protocol Service Source Target

4789 UDP SDN App and Infra App and Infra Nodes Nodes

10250 TCP Kubelet Master Nodes App and Infra Nodes

5000 TCP Registry App and Infra Infra Nodes Nodes

80, 443 TCP HTTP(S) Internet Infra Nodes

[email protected] 8 www.redhat.com 2.1.6. Regions and Zones

GCE has a global infrastructure that covers 4 regions and 13 availability zones. Per Google; "Certain Compute Engine resources live in regions or zones. A region is a specific geographical location where you can run your resources. Each region has one or more zones. For example, the us-central1 region denotes a region in the Central United States that has zones us-central1-a, us-central1-b, and us- central1-f."

For this reference architecture, we will default to the "us-central1" region. To check which region and zone your user is set to use, the following command will provide the answer:

$ gcloud config configurations list NAME IS_ACTIVE ACCOUNT PROJECT DEFAULT_ZONE DEFAULT_REGION default True [email protected] ose-refarch us-central1-a us-central1

In the next chapter, we will go deeper into regions and zones and discuss how to set them using the gcloud tool as well as in the variables file for the infrastructure setup.

For more information see https://cloud.google.com/compute/docs/zones.

2.2. Cloud Storage and Container Registry

GCE provides object cloud storage which can be used by OpenShift Container Platform 3.

OpenShift can build Docker images from your source code, deploy them, and manage their life-cycle. To enable this, OpenShift provides an internal, integrated Docker registry that can be deployed in your OpenShift environment to manage images.

The registry stores Docker images and metadata. For production environment, you should use persistent storage for the registry, otherwise any images anyone has built or pushed into the registry would disappear if the pod were to restart.

Using the installation methods described in this document, the registry is deployed using a (or GCS) bucket. The GCS bucket allows for multiple pods to be deployed at once for HA but also use the same persistent backend storage. GCS is object based storage which does not get assigned to nodes in the same way that Persistent Disks are attached and assigned to a node. The bucket does not mount as block based storage to the node so commands like fdisk or lsblk will not show information in regards to the GCS bucket. The configuration for the GCS bucket and credentials to login to the bucket are stored as OpenShift secrets and applied to the pod. The registry can be scaled to many pods and even have multiple instances of the registry running on the same host due to the use of GCS.

For more information see https://cloud.google.com/storage.

www.redhat.com 9 [email protected] 2.3. Cloud DNS

DNS is an integral part of a successful OpenShift Container Platform deployment/environment. GCE has the Google Cloud DNS web service, per Google; "Google Cloud DNS is a scalable, reliable and managed authoritative Domain Name System (DNS) service running on the same infrastructure as Google. It has low latency, high availability and is a cost-effective way to make your applications and services available to your users. Cloud DNS translates requests for domain names like www.google.com into IP addresses like 74.125.29.101."

OpenShift Container Platform requires a properly configured wildcard DNS zone that resolves to the IP address of the OpenShift router. For more information, please refer to the OpenShift Container Platform DNS. In this reference architecture Cloud DNS will manage DNS records for the OpenShift Container Platform environment.

For more information see https://cloud.google.com/dns.

2.3.1. Public Zone

The Public Cloud DNS zone requires a domain name either purchased through Google’s "Domains" service or through one of many 3rd party providers. Once the zone is created in Cloud DNS, the name servers provided by Google will need to be added to the registrar.

For more information see https://domains.google.

[email protected] 10 www.redhat.com 2.4. Cloud Identity and Access Management

GCE provides IAM to securely control access to the GCE services and resources for your users. In this guide, IAM provides security for our admins creating the cluster.

The deployment of OpenShift requires a user that has the proper permissions by the GCE IAM administrator. The user must be able to create service accounts, cloud storage, instances, images, templates, Cloud DNS entries, and deploy load balancers and health checks. It is helpful to have delete permissions in order to be able to redeploy the environment while testing.

For more information see https://cloud.google.com/iam.

2.5. Dynamic Inventory

Ansible relies on inventory files and variables to perform playbook runs. As part of this reference architecture, Ansible playbooks are provided which scan the inventory automatically using a dynamic inventory script. This script generates an Ansible host file which is kept in memory for the duration of the Ansible run. The dynamic inventory script queries the Google Compute API to display information about GCE instances. The dynamic inventory script is also referred to as an Ansible Inventory script and the GCE specific script is written in python. The script can manually be executed to provide information about the environment but for this reference architecture, it is automatically called to generate the Ansible Inventory. For the OpenShift installation, the python script and the Ansible module add_host are used to allow for instances to be grouped based on their purpose to be used in later playbooks. The python script groups the instances based on hostname structures of the instances provisioned and tags are applied to each instance. The masters are assigned the master tag, the infrastructure nodes are assigned the infra tag, and the application nodes are assigned the app tag.

For more information see http://docs.ansible.com/ansible/intro_dynamic_inventory.html.

www.redhat.com 11 [email protected] 2.6. Bastion

As shown in the Bastion Diagram the bastion server in this reference architecture provides a secure way to limit SSH access to the GCE environment. The master and node firewall rules only allow for SSH connectivity between nodes inside of the local network while the bastion allows SSH access from everywhere. The bastion host is the only ingress point for SSH traffic in the cluster from external entities. When connecting to the OpenShift Container Platform infrastructure, the bastion forwards the request to the appropriate server. Connecting through the bastion server requires specific SSH configuration which is configured by the gcloud.sh installation script.

Figure 1. Bastion Diagram

[email protected] 12 www.redhat.com 2.7. Nodes

Nodes are GCE instances that serve a specific purpose for OpenShift. OpenShift masters are also considered nodes. Nodes deployed on GCE can be vertically scaled before or after the OpenShift installation using the GCE console. All OpenShift specific nodes are assigned an IAM role which allows for cloud specific tasks to occur against the environment such as adding persistent volumes or removing a node from the OpenShift Container Platform cluster automatically. There are three types of nodes as described below.

2.7.1. Master nodes

The master nodes contain the master components, including the API server, controller manager server and ETCD. The master maintains the clusters configuration, manages nodes in its OpenShift cluster. The master assigns pods to nodes and synchronizes pod information with service configuration. The master is used to define routes, services, and volume claims for pods deployed within the OpenShift environment.

2.7.2. Infrastructure nodes

The infrastructure nodes are used for the router and registry pods. These nodes could be used if the optional components Kibana and Hawkular metrics are required. The storage for the Docker registry that is deployed on the infrastructure nodes is backed by a GCS bucket which allows for multiple pods to use the same storage. GCE GCS storage is used because it is synchronized between the availability zones, providing data redundancy.

2.7.3. Application nodes

The Application nodes are the instances where non-infrastructure based containers run. Depending on the application, GCE specific storage can be applied such as Block Storage which can be assigned using a Persistent Volume Label for application data that needs to persist between container restarts. A configuration parameter is set on the master which ensures that OpenShift Container Platform user containers will be placed on the application nodes by default.

2.7.4. Node labels

All OpenShift Container Platform nodes are assigned a label. This allows certain pods to be deployed on specific nodes. For example, nodes labeled infra are Infrastructure nodes. These nodes run the router and registry pods. Nodes with the label app are nodes used for end user Application pods. The configuration parameter 'defaultNodeSelector: "role=app" in /etc/origin/master/master-config.yaml ensures all projects automatically are deployed on Application nodes.

www.redhat.com 13 [email protected] 2.8. OpenShift Pods

OpenShift Container Platform leverages the Kubernetes concept of a pod, which is one or more containers deployed together on one host, and the smallest compute unit that can be defined, deployed, and managed. For example, a pod could be just a single php application connecting to a database outside of the OpenShift environment or a pod could be a php application that has an ephemeral database. OpenShift pods have the ability to be scaled at runtime or at the time of launch using the OpenShift console or the oc CLI tool. Any container running in the environment is considered to be a part of the pod. The pods containing the OpenShift router and registry are required to be deployed in the OpenShift environment.

2.9. Routing Layer

Pods inside of an OpenShift cluster are only reachable via their IP addresses on the cluster network. An edge load balancer can be used to accept traffic from outside networks and proxy the traffic to pods inside the OpenShift cluster.

An OpenShift administrator can deploy routers in an OpenShift cluster. These enable routes created by developers to be used by external clients.

OpenShift routers provide external hostname mapping and load balancing to services over protocols that pass distinguishing information directly to the router; the hostname must be present in the protocol in order for the router to determine where to send it. Routers support the following protocols:

• HTTP

• HTTPS (with SNI)

• WebSockets

• TLS with SNI

The router utilizes the wildcard zone specified during the installation and configuration of OpenShift. This wildcard zone is used by the router to create routes for a service running within the OpenShift environment to a publically accessible URL. The wildcard zone itself is a wildcard entry in Cloud DNS which is linked using a CNAME to an Load Balancer which performs a health checks for High Availablity and forwards traffic to router pods on port 80 and 443.

[email protected] 14 www.redhat.com 2.10. Authentication

There are several options when it comes to authentication of users in OpenShift Container Platform. OpenShift can leverage an existing identity provider within an organization such as LDAP or OpenShift can use external identity providers like GitHub, Google, and GitLab. The configuration of identification providers occurs on the OpenShift master instances. OpenShift allows for multiple identity providers to be specified. The reference architecture document uses Google as the authentication provider but any of the other mechanisms would be an acceptable choice. Roles can be added to user accounts to allow for extra privileges such as the ability to list nodes or assign persistent storage volumes to a project.

For more information on Google Oauth and other authentication methods see the OpenShift documentation.

2.11. Software Version Details

The following tables provide the installed software versions for the different servers that make up the Red Hat OpenShift highly available reference environment.

Table 5. RHEL OSEv3 Details

Software Version

Red Hat Enterprise Linux 7.2 x86_64 kernel-3.10.0-327

Atomic-OpenShift{master/clients/node/sdn- 3.3.x.x ovs/utils}

Docker 1.10.x

Ansible 2.2.0-0.5.prerelease.el7.noarch

2.12. Required Channels

A subscription to the following channels is required in order to deploy this reference environment’s configuration.

Table 6. Required Channels - OSEv3 Master and Node Instances

Channel Repository Name

Red Hat Enterprise Linux 7 Server (RPMs) rhel-7-server-rpms

Red Hat OpenShift Enterprise 3.3 (RPMs) rhel-7-server-ose-3.3-rpms

Red Hat Enterprise Linux 7 Server - Extras (RPMs) rhel-7-server-extras-rpms

www.redhat.com 15 [email protected] 2.13. Tooling Prerequisites

This section describes how the environment should be configured to provision the infrastructure, use Ansible to install OpenShift, and perform post installation tasks.

2.13.1. gcloud Setup

The gcloud sdk will need to be installed on the system that will be provisioning the GCE infrastructure. Installation of gcloud utility is interactive, usually you will want to answer positively to the asked questions.

$ yum install curl python which $ curl https://sdk.cloud.google.com | bash $ exec -l $SHELL $ gcloud components install beta $ gcloud init

2.13.2. Ansible Setup

The gcloud.sh install script will provision custom image with the following repositories and packages that will be used for all instances that make up the OpenShift infrastructure on GCE. This includes the Ansible package that will be used on the bastion host to deploy the OpenShift Container Platform.

$ rpm -q python-2.7 $ subscription-manager repos --enable rhel-7-server-rpms $ subscription-manager repos --enable rhel-7-server-extras-rpms $ subscription-manager repos --enable rhel-7-server-ose-3.3-rpms $ yum -y install atomic-openshift-utils \ git \ ansible-2.2.0-0.50.prerelease.el7.noarch \ python-netaddr \ python-httplib2

2.13.3. GitHub Repository

The code in the openshift-ansible-contrib repository referenced below handles the installation of OpenShift and the accompanying infrastructure. The openshift-ansible-contrib repository is not explicitly supported by Red Hat but the Reference Architecture team performs testing to ensure the code operates as defined and is secure.

[email protected] 16 www.redhat.com 3. Provisioning the Infrastructure

This chapter focuses on Phase 1 of the process. The prerequisites defined below are required for a successful deployment of infrastructure and the installation of OpenShift.

3.1. Preparing the Environment

The script and playbooks provided within the openshift-ansible-contrib Github repository deploys infrastructure, installs and configures OpenShift, and performs post installation tasks such as scaling the router and registry. The playbooks create specific roles, policies, and users required for cloud provider configuration in OpenShift and management of a newly created GCS bucket to manage container images.

Before beginning to provision resources and installing OpenShift Container Platform, there are few prerequisites that need to be taken care of that will be covered in the following chapters.

3.1.1. Red Hat Subscription Manager

The installation of OCP requires a valid Red Hat subscription. For the installation of OCP on GCE the following items are required:

Table 7. Red Hat Subscription Manager requirements

Requirement Example User, Example Password, and Required Subscription

Red Hat Subscription Manager User rhsm-se

Red Hat Subscription Manager Password SecretPass

Subscription Name or Pool ID Red Hat OpenShift Container Platform, Standard, 2-Core

The items above are examples and should reflect subscriptions relevant to the account performing the installation. There are a few different variants of the OpenShift Subscription Name. It is advised to visit https://access.redhat.com/management/subscriptions to find the specific Pool ID and Subscription Name as the values will be used below during the deployment.

www.redhat.com 17 [email protected] 3.1.2. Download RHEL Image

Download the latest RHEL 7.x KVM image from the Customer Portal.

Figure 2. RHEL 7.x KVM Image Download

Remember where you download the KVM image to, the path to the '.qcow2' file that gets downloaded will be used to create our Custom Image

3.2. Preparing the Google Cloud Engine Infrastructure

Preparing the Google Cloud Platform Environment.

3.2.1. Sign in to the Google Cloud Platform Console

Documentation for the Google Cloud Engine and the Google Cloud Platform Console can be found on the following link: https://cloud.google.com/compute/docs. To access the Google Cloud Platform Console, a Google account is needed.

[email protected] 18 www.redhat.com 3.2.2. Create a new GCE Project

This reference architecture assumes that a "Greenfield" deployment is being done in a newly created GCE project. To set that up, log into the GCE console home and select the Project button:

Figure 3. Create New Project

Next, name your new project. The name ocp-refarch was used for this reference architecture, and can be used for your project also:

Figure 4. Name New Project

www.redhat.com 19 [email protected] 3.2.3. Set up Billing

The next step will be to set up billing for GCE so you are able to create new resources. Go to the Billing tab in the GCE Console and select Enable Billing. The new project can be linked to an existing project or financial information can be entered:

Figure 5. Enable Billing

Figure 6. Link to existing account

[email protected] 20 www.redhat.com 3.2.4. Add a Cloud DNS Zone

In this reference implementation guide, the domain sysdeseng.com was purchased through an external provider and the subdomain gce.sysdeseng.com will be managed by Cloud DNS. For the example below, the domain gce.sysdeseng.com will represent the Publicly Hosted Domain Name used for the installation of OpenShift, however, your Publicly Hosted Domain Name will need to be substituted for the actual installation. Follow the below instructions to add your Publicly Hosted Domain Name to Cloud DNS.

• From the main GCP dashboard, select the Networking section and click Cloud DNS

• Click "Create Zone"

• Enter a name logical represent your Publicly Hosted Domain Name as the Cloud DNS Zone

• the zone name must be separated by "-",

• it is recommended to just replace any "."'s with "-" from your Publicly Hosted Domain Name easier recognition of the zone: gce-sysdeseng-com

• Enter your Publicly Hosted Domain Name: gce.sysdeseng.com

• Enter a Description: Public Zone for RH Reference Architecture

• Click Create

Once the Pubic Zone is created, select the gce-sysdeseng-com Domain and copy the Google Name Servers to add to your external registrar if applicable.

www.redhat.com 21 [email protected] 3.2.5. Create Google OAuth 2.0 Client IDs

In order to obtain new OAuth credentials in GCE, the OAuth consent screen must be configured. Browse to the API Manager screen in the GCE console and select Credentials. Select the "OAuth consent screen" tab as shown in the image below:

Figure 7. Google OAuth consent screen

• Chose a email address from the dropdown

• Enter a Product Name

• (Optional, but recommended) Enter in a Homepage URL

• The Homepage URL will be "https://." (This will be the URL used when accessing OpenShift )

• Optionally fill out the remaining fields, but they are not necessary for this reference architecture.

• Click Save

[email protected] 22 www.redhat.com 3.2.6. Fill in OAuth Request Form

Clicking save will bring you back to the API Manager → Credentials tab. Next click on the "Create Credentials" dropdown and select "OAuth Client ID"

Figure 8. Select OAuth Client ID

www.redhat.com 23 [email protected] On the the "Create Client ID" page:

Figure 9. New Client ID Page

Obtain both "Client ID" and "Client Secret" • Click the "Web Application" radio button:

• Enter a Name

• Enter Authorization callback URL

• The Authorization callback URL will be the Homepage URL + /oauth2callback/google (See example in the New Client ID Page image above)

• Click "Create"

Copy down the Client ID and and Client Secret for use in the provisioning script.

[email protected] 24 www.redhat.com 3.2.7. Download and Set Up GCE tools

To manage your Google Compute Engine resources, download and install the gcloud command-line tool.

Installation of gcloud utility is interactive, usually you will want to answer positively to asked questions.

$ yum install curl python which $ curl https://sdk.cloud.google.com | bash $ exec -l $SHELL $ gcloud components install beta $ gcloud init

Detailed documentation found here: https://cloud.google.com/sdk.

3.2.8. Configure gcloud default project

$ gcloud config set project PROJECT-ID

3.2.9. List out available gcloud Zones and Regions.

To find all regions and zones available to you, run:

$ gcloud compute zones list NAME REGION STATUS NEXT_MAINTENANCE TURNDOWN_DATE asia-east1-a asia-east1 UP asia-east1-b asia-east1 UP asia-east1-c asia-east1 UP europe-west1-b europe-west1 UP europe-west1-d europe-west1 UP europe-west1-c europe-west1 UP us-central1-a us-central1 UP us-central1-f us-central1 UP us-central1-c us-central1 UP us-central1-b us-central1 UP us-east1-b us-east1 UP us-east1-c us-east1 UP us-east1-d us-east1 UP

This reference architecture uses the us-central1-a as the default zone. If you decide to use another zone, you can select from any of the zones within a region to be the default, there is no advantage to choosing one over the other.

www.redhat.com 25 [email protected] 3.3. Prepare the Installation Script

If the openshift-ansible-contrib GitGub repository has not been cloned as instructed in the [GitHub Repositories] sections from the previous chapters openshift-ansible-contrib/reference-architecture/gce- cli/config.sh.example to openshift-ansible-contrib/reference-architecture/gce-cli/config.sh.example

Once the gcloud utility is configured, deployment of the infrastructure and OpenShift can be started. First, clone the openshift-ansible-contrib repository, switch to the gce-cli directory and copy the config.sh.example file to config.sh:

$ git clone https://github.com/openshift/openshift-ansible-contrib.git && cd openshift- ansible-contrib/reference-architecture/gce-cli $ cp config.sh.example config.sh

Next edit the configuration variables in the config.sh file to match those which you have been creating throughout this reference architecture. Below are the variables that can be set for the OpenShift Container Platform installation on Google Compute Engine. The Table is broken into two sections. Section A contains variables that must be changed in order for the installation to complete. The only case where these would not need to be changed would be if the default values were used during the set up sections in previous chapters. Links back to those chapters will be placed where possible. Section B mostly contains variables that should not be changed which are indicated by the icon. There are a small number of variables in Section B that can be set based on user preference including node sizing and number of nodes types, however, modifying any of these variables is beyond the scope this reference architecture and is therefore not recommended.

Table 8. OpenShift config.sh Installation Variables (Section A)

Variable Name Defaults Explanation/Purpose

RHEL_IMAGE_PATH "${HOME}/Downloads/rhel- RHEL 7.x KVM Image Download guest-image-7.2- 20160302.0.x86_64.qcow2"

RH_USERNAME '[email protected]' Red Hat Subscription Manager

RH_PASSWORD 'xxx' Red Hat Subscription Manager

RH_POOL_ID 'xxx' Red Hat Subscription Manager

GCLOUD_PROJECT 'project-1' Create New Project

GCLOUD_ZONE 'us-central1-a' List out available gcloud Zones and Regions.

DNS_DOMAIN 'ocp.example.com' Add a Cloud DNS Zone

DNS_DOMAIN_NAME 'ocp-example-com' Add a Cloud DNS Zone

[email protected] 26 www.redhat.com Variable Name Defaults Explanation/Purpose

MASTER_DNS_NAME 'master.ocp.example.com' Replace with master.$DNS_DOMAIN

INTERNAL_MASTER_DNS_NAME 'internal- Replace with internal- master.ocp.example.com' master.$DNS_DOMAIN

OCP_APPS_DNS_NAME 'apps.ocp.example.com' Replace with apps.$DNS_DOMAIN

If SSL Certs signed by a CA are available for the Web Console, place them in the provided default paths on the local file system. If you don’t have SSL Certs, comment out or delete the lines that define MASTER_HTTPS_CERT_FILE and MASTER_HTTPS_KEY_FILE, and a SSL key and a self-signed certificate will be generated automatically during the provisioning process.

MASTER_HTTPS_CERT_FILE "${HOME}/master.${DNS_DOMAIN}.pem"

MASTER_HTTPS_KEY_FILE "${HOME}/master.${DNS_DOMAIN}.key"

OCP_IDENTITY_PROVIDERS '[ {"name": "google", Obtain both "Client ID" and "kind": "Client Secret" "GoogleIdentityProvider", "login": "true", "challenge": "false", "mapping_method": "claim", "client_id": "xxx- yyy.apps.googleusercontent.co m", "client_secret": "zzz", "hosted_domain": "example.com"} ]'

www.redhat.com 27 [email protected] Table 9. OpenShift config.sh Installation Variables (Section B)

Variable Name Defaults Explanation/Purpose

OCP_VERSION '3.3' Do Not Modify

CONSOLE_PORT '443' Do Not Modify

OCP_NETWORK 'ocp-network' Name of GCE Network that will be created for the project

MASTER_MACHINE_TYPE 'n1-standard-2' Minimum Machine Size for Master Nodes

NODE_MACHINE_TYPE 'n1-standard-2' Minimum Machine Size for Base Nodes

INFRA_NODE_MACHINE_TYPE 'n1-standard-2' Minimum Machine Size for Infrastructure Nodes

BASTION_MACHINE_TYPE 'n1-standard-1' Minimum Machine Size for the Bastion Node

MASTER_INSTANCE_TEMPLATE 'master-template' Do Not Modify

NODE_INSTANCE_TEMPLATE 'node-template' Do Not Modify

INFRA_NODE_INSTANCE_TEMPL 'infra-node-template' Do Not Modify ATE

BASTION_INSTANCE 'bastion' Do Not Modify

MASTER_INSTANCE_GROUP 'ocp-master' Do Not Modify

MASTER_INSTANCE_GROUP_SIZ '3' Minimum Number of Master E Nodes

MASTER_NAMED_PORT_NAME 'web-console' Do Not Modify

INFRA_NODE_INSTANCE_GROUP 'ocp-infra' Do Not Modify

INFRA_NODE_INSTANCE_GROU '2' Minimum Number of P_SIZE Infrastructure Nodes

NODE_INSTANCE_GROUP 'ocp-node' Do Not Modify

NODE_INSTANCE_GROUP_SIZE '2' Minimum Number of Base Nodes

NODE_DOCKER_DISK_SIZE '25' Minimum Size of Docker Disks

NODE_DOCKER_DISK_POSTFIX '-docker' Do Not Modify

NODE_OPENSHIFT_DISK_SIZE '50' Minimum Size of OpenShift Disks

[email protected] 28 www.redhat.com Variable Name Defaults Explanation/Purpose

NODE_OPENSHIFT_DISK_POSTFI '-openshift' Do Not Modify X

MASTER_NETWORK_LB_HEALT 'master-network-lb-health-check' Do Not Modify H_CHECK

MASTER_NETWORK_LB_POOL 'master-network-lb-pool' Do Not Modify

MASTER_NETWORK_LB_IP 'master-network-lb-ip' Do Not Modify

MASTER_NETWORK_LB_RULE 'master-network-lb-rule' Do Not Modify

MASTER_SSL_LB_HEALTH_CHEC 'master-ssl-lb-health-check' Do Not Modify K

MASTER_SSL_LB_BACKEND 'master-ssl-lb-backend' Do Not Modify

MASTER_SSL_LB_IP 'master-ssl-lb-ip' Do Not Modify

MASTER_SSL_LB_CERT 'master-ssl-lb-cert' Do Not Modify

MASTER_SSL_LB_TARGET 'master-ssl-lb-target' Do Not Modify

MASTER_SSL_LB_RULE 'master-ssl-lb-rule' Do Not Modify

ROUTER_NETWORK_LB_HEALT 'router-network-lb-health-check' Do Not Modify H_CHECK

ROUTER_NETWORK_LB_POOL 'router-network-lb-pool' Do Not Modify

ROUTER_NETWORK_LB_IP 'router-network-lb-ip' Do Not Modify

ROUTER_NETWORK_LB_RULE 'router-network-lb-rule' Do Not Modify

IMAGE_BUCKET "${GCLOUD_PROJECT}-rhel- Do Not Modify guest-raw-image"

REGISTRY_BUCKET "${GCLOUD_PROJECT}-openshift- Do Not Modify docker-registry"

TEMP_INSTANCE 'ocp-rhel-temp' Do Not Modify

GOOGLE_CLOUD_SDK_VERSION '133.0.0' Currently Supported Google Cloud SDK Version

BASTION_SSH_FW_RULE 'bastion-ssh-to-external-ip' Do Not Modify

www.redhat.com 29 [email protected] 3.4. Running the gcloud.sh script

Listing 1. OpenShift gcloud.sh Installation Script

$ ./gcloud.sh $MASTER_HTTPS_CERT_FILE or $MASTER_HTTPS_KEY_FILE variable is empty - self-signed certificate will be generated Converting gcow2 image to raw image: (100.00/100%) Creating archive of raw image: disk.raw Creating gs://ocp-on-gce-rhel-guest-raw-image/... Copying file://rhel-guest-image-7.2-20160302.0.x86_64.tar.gz [Content-Type=application/x- tar]... | [1 files][439.4 MiB/439.4 MiB] 277.3 KiB/s Operation completed over 1 objects/439.4 MiB. Created [https://www.googleapis.com/compute/v1/projects/ocp-on-gce/global/images/rhel- guest-image-7-2-20160302-0-x86-64]. NAME PROJECT FAMILY DEPRECATED STATUS rhel-guest-image-7-2-20160302-0-x86-64 ocp-on-gce READY Removing gs://ocp-on-gce-rhel-guest-raw-image/rhel-guest-image-7.2- 20160302.0.x86_64.tar.gz#1478124628607057... / [1/1 objects] 100% Done Operation completed over 1 objects......

[email protected] 30 www.redhat.com 4. Finishing the Installation

Listing 2. Installation Cont’d From Chapter 3 - OpenShift gcloud.sh Installation Script

...... TASK [validate-etcd : Validate etcd] ******************************************* changed: [ocp-master-ln1n] changed: [ocp-master-y3kz] changed: [ocp-master-0d2x]

TASK [validate-etcd : ETCD Cluster is healthy] ********************************* ok: [ocp-master-ln1n] => { "msg": "Cluster is healthy" } ok: [ocp-master-y3kz] => { "msg": "Cluster is healthy" } ok: [ocp-master-0d2x] => { "msg": "Cluster is healthy" }

TASK [validate-etcd : ETCD Cluster is NOT healthy] ***************************** skipping: [ocp-master-ln1n] skipping: [ocp-master-0d2x] skipping: [ocp-master-y3kz]

PLAY [primary_master] **********************************************************

TASK [validate-app : Gather facts] ********************************************* ok: [ocp-master-0d2x]

TASK [validate-app : Create the validation project] **************************** changed: [ocp-master-0d2x]

TASK [validate-app : Create Hello world app] *********************************** changed: [ocp-master-0d2x]

TASK [validate-app : Wait for build to complete] ******************************* changed: [ocp-master-0d2x]

TASK [validate-app : Wait for App to be running] ******************************* changed: [ocp-master-0d2x]

TASK [validate-app : Sleep to allow for route propegation] ********************* Pausing for 10 seconds

www.redhat.com 31 [email protected] (ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort) ok: [ocp-master-0d2x]

TASK [validate-app : check the status of the page] ***************************** ok: [ocp-master-0d2x]

TASK [validate-app : Delete the Project] *************************************** changed: [ocp-master-0d2x]

PLAY RECAP ********************************************************************* localhost : ok=23 changed=16 unreachable=0 failed=0 ocp-infra-b5ni : ok=155 changed=50 unreachable=0 failed=0 ocp-infra-hvqv : ok=155 changed=50 unreachable=0 failed=0 ocp-master-0d2x : ok=503 changed=163 unreachable=0 failed=0 ocp-master-ln1n : ok=335 changed=113 unreachable=0 failed=0 ocp-master-y3kz : ok=335 changed=113 unreachable=0 failed=0 ocp-node-1pks : ok=148 changed=53 unreachable=0 failed=0 ocp-node-o2br : ok=148 changed=53 unreachable=0 failed=0

Connection to 104.199.62.228 closed.

Deployment is complete. OpenShift Console can be found at https://master.gce.sysdeseng.com

A successful deployment is shown above with the two important columns being  "unreachable=0" and "failed=0"

The last line of the installation script will show the address for the configured  Openshift Console. In this example, the Openshift Console can now be reached at https://master.gce.sysdeseng.com.

With the deployment successful, the following section demonstrates how to confirm proper functionality of the Red Hat OpenShift Container Platform.

[email protected] 32 www.redhat.com 4.1. Validate the Deployment

The final part of the deployment script will run the OpenShift Ansible validation playbook. The purpose of the playbook run to confirm a successfull installation of the newly deployed OpenShift environment. The playbook run will deploy the "php/cakebook" demo app which will test the functionality of the master, nodes, registry, and router. The playbook will test the deployment and clean up any projects and pods created during the validation run.

The playbook will perform the following steps:

Environment Validation

• Validate the public OpenShift Load Balancer "master-https-lb-map" address from the installation system

• Validate the public OpenShift Load Balancer "router-network-lb-pool" address from the master nodes

• Validate the internal OpenShift Load Balancer "master-network-lb-pool" address from the master nodes

• Validate the master local master address

• Validate the health of the ETCD cluster to ensure all ETCD nodes are healthy

• Create a project in OpenShift called validate

• Create an OpenShift Application

• Add a route for the Application

• Validate the URL returns a status code of 200 or healthy

• Delete the validation project

4.1.1. Running the Validation playbook again

Ensure the URLs below and the tag variables match the variables used during  deployment.

$ ssh [email protected] $ cd ~/openshift-ansible-contrib/reference-architecture/gce-ansible $ ansible-playbook -e 'openshift_master_cluster_public_hostname=master.gce.example.com \ openshift_master_cluster_hostname=internal-master.gce.example.com \ wildcard_zone=apps.gce.example.com \ console_port=443' \ playbooks/validation.yaml

www.redhat.com 33 [email protected] 4.2. Operational Management

With the successful deployment of OpenShift, the following section demonstrates how to confirm proper functionality of the Red Hat OpenShift Container Platform.

4.3. Gathering Host Names

Because GCE will automatically append a random 4 digit string to instances created from an "Instance Template", it is impossible to know the final hostnames of the instances until after they are created. To view the final hostnames, browse to the GCE "Compute Engine" → "VM instances" dashboard. The following image shows an example of a successful installation.

Figure 10. Compute Engine - VM Instances

To help facilitate Operational Management Chapter, the following hostnames based on the above image will be used with numbers replacing the random 4 digit strings. The numbers are not meant to suggest ordering or importance, they are simply used to make it easier to distinguish the hostsnames.

• ocp-master-01.gce.sysdeseng.com

• ocp-master-02.gce.sysdeseng.com

• ocp-master-03.gce.sysdeseng.com

• ocp-infra-01.gce.sysdeseng.com

• ocp-infra-02.gce.sysdeseng.com

• ocp-node-01.gce.sysdeseng.com

• ocp-node-02.gce.sysdeseng.com

• bastion.gce.sysdeseng.com

4.4. Running Diagnostics

Perform the following steps from the first master node.

[email protected] 34 www.redhat.com To run diagnostics, SSH into the first master node (ocp-master-01.gce.sysdeseng.com).

$ ssh [email protected] $ sudo -i

Connectivity to the first master node (ocp-master-01.gce.sysdeseng.com) as the root user should have been established. Run the diagnostics that are included as part of the install.

# oadm diagnostics [root@ocp-master-0v02 ~]# oadm diagnostics [Note] Determining if client configuration exists for client/cluster diagnostics Info: Successfully read a client config file at '/root/.kube/config' Info: Using context for cluster-admin access: 'validate/internal-master-gce-sysdeseng- com:443/system:admin' [Note] Performing systemd discovery

[Note] Running diagnostic: ConfigContexts[validate/internal-master-gce-sysdeseng- com:443/system:admin] Description: Validate client config context is complete and has connectivity

Info: The current client config context is 'validate/internal-master-gce-sysdeseng- com:443/system:admin': The server URL is 'https://internal-master.gce.sysdeseng.com' The user authentication is 'system:admin/internal-master-gce-sysdeseng-com:443' The current project is 'validate' Successfully requested project list; has access to project(s): [logging management-infra openshift openshift-infra default kube-system]

[Note] Running diagnostic: ConfigContexts[default/master-gce-sysdeseng- com:443/system:admin] Description: Validate client config context is complete and has connectivity

ERROR: [DCli0006 from diagnostic ConfigContexts@openshift/origin/pkg/diagnostics/client/config_contexts.go:285] For client config context 'default/master-gce-sysdeseng-com:443/system:admin': The server URL is 'https://master.gce.sysdeseng.com' The user authentication is 'system:admin/internal-master-gce-sysdeseng-com:443' The current project is 'default' (*url.Error) Get https://master.gce.sysdeseng.com/api: x509: certificate signed by unknown authority

This means that we cannot validate the certificate in use by the master API server, so we cannot securely communicate with it. Connections could be intercepted and your credentials stolen.

Since the server certificate we see when connecting is not validated

www.redhat.com 35 [email protected] by public certificate authorities (CAs), you probably need to specify a certificate from a private CA to validate the connection.

Your config may be specifying the wrong CA cert, or none, or there could actually be a man-in-the-middle attempting to intercept your connection. If you are unconcerned about any of this, you can add the --insecure-skip-tls-verify flag to bypass secure (TLS) verification, but this is risky and should not be necessary. Connections could be intercepted and your credentials stolen.

[Note] Running diagnostic: DiagnosticPod Description: Create a pod to run diagnostics from the application standpoint

[Note] Running diagnostic: ClusterRegistry Description: Check that there is a working Docker registry

Info: The "docker-registry" service has multiple associated pods each mounted with ephemeral storage, but also has a custom config /etc/registryconfig/config.yml mounted; assuming storage config is as desired.

WARN: [DClu1012 from diagnostic ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:300] The pod logs for the "docker-registry-4-0x2le" pod belonging to the "docker-registry" service indicated unknown errors. This could result in problems with builds or deployments. Please examine the log entries to determine if there might be any related problems:

time="2016-10-11T17:12:51-04:00" level=error msg="obsolete configuration detected, please add openshift registry middleware into registry config file" time="2016-10-11T17:12:51-04:00" level=error msg="obsolete configuration detected, please add openshift storage middleware into registry config file" time="2016-10-11T17:14:39.063687325-04:00" level=error msg="error authorizing context: authorization header required" go.version=go1.6.2 http.request.host="172.30.40.230:5000" http.request.id=4e5031f9-c62b-406c-9c2f- e6083e349218 http.request.method=GET http.request.remoteaddr="172.16.1.1:36832" http.request.uri="/v2/" http.request.useragent="docker/1.10.3 go/go1.6.2 git- commit/5206701-unsupported kernel/3.10.0-327.36.2.el7.x86_64 os/linux arch/amd64" instance.id=1ac16b7a-874c-419f-a402-949f37e7aacb time="2016-10-11T17:14:54.845088078-04:00" level=error msg="error authorizing context: authorization header required" go.version=go1.6.2 http.request.host="172.30.40.230:5000" http.request.id=fe7753f3-cc62-415f-a9e6- d3b566f71adc http.request.method=GET http.request.remoteaddr="172.16.1.1:36876" http.request.uri="/v2/" http.request.useragent="docker/1.10.3 go/go1.6.2 git- commit/5206701-unsupported kernel/3.10.0-327.36.2.el7.x86_64 os/linux arch/amd64" instance.id=1ac16b7a-874c-419f-a402-949f37e7aacb

WARN: [DClu1012 from diagnostic

[email protected] 36 www.redhat.com ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:300] The pod logs for the "docker-registry-4-pa319" pod belonging to the "docker-registry" service indicated unknown errors. This could result in problems with builds or deployments. Please examine the log entries to determine if there might be any related problems:

time="2016-10-11T17:12:59-04:00" level=error msg="obsolete configuration detected, please add openshift registry middleware into registry config file" time="2016-10-11T17:12:59-04:00" level=error msg="obsolete configuration detected, please add openshift storage middleware into registry config file"

[Note] Running diagnostic: ClusterRoleBindings Description: Check that the default ClusterRoleBindings are present and contain the expected subjects

Info: clusterrolebinding/cluster-readers has more subjects than expected.

Use the oadm policy reconcile-cluster-role-bindings command to update the role binding to remove extra subjects.

Info: clusterrolebinding/cluster-readers has extra subject {ServiceAccount management- infra management-admin }.

[Note] Running diagnostic: ClusterRoles Description: Check that the default ClusterRoles are present and contain the expected permissions

[Note] Running diagnostic: ClusterRouterName Description: Check there is a working router

[Note] Running diagnostic: MasterNode Description: Check if master is also running node (for Open vSwitch)

WARN: [DClu3004 from diagnostic MasterNode@openshift/origin/pkg/diagnostics/cluster/master_node.go:175] Unable to find a node matching the cluster server IP. This may indicate the master is not also running a node, and is unable to proxy to pods over the Open vSwitch SDN.

[Note] Skipping diagnostic: MetricsApiProxy Description: Check the integrated heapster metrics can be reached via the API proxy Because: The heapster service does not exist in the openshift-infra project at this time, so it is not available for the Horizontal Pod Autoscaler to use as a source of metrics.

www.redhat.com 37 [email protected] [Note] Running diagnostic: NodeDefinitions Description: Check node records on master

WARN: [DClu0003 from diagnostic NodeDefinition@openshift/origin/pkg/diagnostics/cluster/node_definitions.go:112] Node ocp-master-0v02.c.ose-refarch.internal is ready but is marked Unschedulable. This is usually set manually for administrative reasons. An administrator can mark the node schedulable with: oadm manage-node ocp-master-0v02.c.ose-refarch.internal --schedulable=true

While in this state, pods should not be scheduled to deploy on the node. Existing pods will continue to run until completed or evacuated (see other options for 'oadm manage-node').

WARN: [DClu0003 from diagnostic NodeDefinition@openshift/origin/pkg/diagnostics/cluster/node_definitions.go:112] Node ocp-master-2nue.c.ose-refarch.internal is ready but is marked Unschedulable. This is usually set manually for administrative reasons. An administrator can mark the node schedulable with: oadm manage-node ocp-master-2nue.c.ose-refarch.internal --schedulable=true

While in this state, pods should not be scheduled to deploy on the node. Existing pods will continue to run until completed or evacuated (see other options for 'oadm manage-node').

WARN: [DClu0003 from diagnostic NodeDefinition@openshift/origin/pkg/diagnostics/cluster/node_definitions.go:112] Node ocp-master-q0jy.c.ose-refarch.internal is ready but is marked Unschedulable. This is usually set manually for administrative reasons. An administrator can mark the node schedulable with: oadm manage-node ocp-master-q0jy.c.ose-refarch.internal --schedulable=true

While in this state, pods should not be scheduled to deploy on the node. Existing pods will continue to run until completed or evacuated (see other options for 'oadm manage-node').

[Note] Running diagnostic: ServiceExternalIPs Description: Check for existing services with ExternalIPs that are disallowed by master config

[Note] Running diagnostic: AnalyzeLogs Description: Check for recent problems in systemd service logs

Info: Checking journalctl logs for 'atomic-openshift-node' service Info: Checking journalctl logs for 'docker' service

[Note] Running diagnostic: MasterConfigCheck Description: Check the master config file

[email protected] 38 www.redhat.com WARN: [DH0005 from diagnostic MasterConfigCheck@openshift/origin/pkg/diagnostics/host/check_master_config.go:52] Validation of master config file '/etc/origin/master/master-config.yaml' warned: assetConfig.loggingPublicURL: Invalid value: "": required to view aggregated container logs in the console assetConfig.metricsPublicURL: Invalid value: "": required to view cluster metrics in the console

[Note] Running diagnostic: NodeConfigCheck Description: Check the node config file

Info: Found a node config file: /etc/origin/node/node-config.yaml

[Note] Running diagnostic: UnitStatus Description: Check status for related systemd units

[Note] Summary of diagnostics execution (version v3.3.0.34): [Note] Warnings seen: 7 [Note] Errors seen: 1

 The warnings will not cause issues in the environment

If using a self signed SSL certificate, 1 Error noting this will be seen, but it will not  affect the performance of the application

Based on the results of the diagnostics, actions can be taken to alleviate any issues.

4.5. Checking the Health of ETCD

This section focuses on the ETCD cluster. It describes the different commands to ensure the cluster is healthy. The internal Cloud DNS names of the nodes running ETCD must be used.

SSH into the first master node (ocp-master-01.gce.sysdeseng.com). Using the output of the command hostname issue the etcdctl command to confirm that the cluster is healthy.

$ ssh [email protected] $ sudo -i

www.redhat.com 39 [email protected] $ ETCD_ENDPOINT=$(hostname -f) && etcdctl -C https://$ETCD_ENDPOINT:2379 --ca-file /etc/etcd/ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key -file=/etc/origin/master/master.etcd-client.key cluster-health member 3f67344b00548177 is healthy: got healthy result from https://10.132.0.9:2379 member cc889bf04a07ac7c is healthy: got healthy result from https://10.132.0.7:2379 member cf713f874e4e4f31 is healthy: got healthy result from https://10.132.0.8:2379

In this configuration the ETCD services are distributed among the OpenShift master  nodes.

4.6. Default Node Selector

As explained in section 2.12.4 node labels are an important part of the OpenShift environment. By default of the reference architecture installation, the default node selector is set to "role=apps" in /etc/origin/master/master-config.yaml on all of the master nodes. This configuration parameter is set by the Ansible role openshift-default-selector on all masters and the master API service is restarted that is required when making any changes to the master configuration.

SSH into the first master node (ocp-master-01.gce.sysdeseng.com) to verify the defaultNodeSelector is defined.

$ vi /etc/origin/master/master-config.yaml ...omitted... projectConfig: defaultNodeSelector: "role=app" projectRequestMessage: "" projectRequestTemplate: "" ...omitted...

If making any changes to the master configuration then the master API service must  be restarted or the configuration change will not take place. Any changes and the subsequent restart must be done on all masters.

4.7. Management of Maximum Pod Size

Quotas are set on ephemeral volumes within pods to prohibit a pod from becoming to large and impacting the node. There are three places where sizing restrictions should be set. When persistent volume claims are not set a pod has the ability to grow as large as the underlying filesystem will allow. The required modifcations are set by Ansible. The roles below will be the specific Ansible role that defines the parameters along with the locations on the nodes in which the parameters are set.

Openshift Volume Quota

[email protected] 40 www.redhat.com At launch time user-data creates a xfs partition on the /dev/sdc block device, adds an entry in fstab, and mounts the volume with the option of gquota. If gquota is not set the OpenShift node will not be able to start with the "perFSGroup" parameter defined below. This disk and configuration is done on the infrastructure and application nodes. The configuration is not done on the masters due to the master nodes being unschedulable.

SSH into the first infrastructure node (ocp-infra-01.gce.sysdeseng.com) to verify the entry exists within fstab.

# vi /etc/fstab /dev/xvdc /var/lib/origin/openshift.local.volumes xfs gquota 0 0

Docker Storage Setup

The docker-storage-setup file is created at luanch time by user-data. This file tells the Docker service to use /dev/sdb and create the volume group of docker-vol. The extra Docker storage options ensures that a container can grow no larger than 3G. Docker storage setup is performed on all master, infrastructure, and application nodes.

SSH into the first infrastructure node (ocp-infra-01.gce.sysdeseng.com) to verify /etc/sysconfig/docker- storage-setup matches the information below.

$ vi /etc/sysconfig/docker-storage-setup DEVS=/dev/sdb VG=docker-vol DATA_SIZE=95%VG EXTRA_DOCKER_STORAGE_OPTIONS="--storage-opt dm.basesize=3G"

OpenShift Emptydir Quota

The role openshift-emptydir-quota sets a parameter within the node configuration. The perFSGroup setting restricts the ephemeral emptyDir volume from growing larger than 512Mi. This empty dir quota is done on the infrastructure and application nodes. The configuration is not done on the masters due to the master nodes being unschedulable.

SSH into the first infrastructure node (ocp-infra-01.gce.sysdeseng.com) to verify /etc/origin/node/node- config.yml matches the information below.

$ vi /etc/origin/node/node-config.yml ...omitted... volumeConfig: localQuota: perFSGroup: 512Mi

www.redhat.com 41 [email protected] 4.8. Yum Repositories

In section 2.3 Required Channels the specific repositories for a successful OpenShift installation were defined. All systems except for the bastion host should have the same subscriptions. To verify subscriptions match those defined in Required Channels perfom the following. The repositories below are enabled during the rhsm-repos playbook during the installation. The installation will be unsuccessful if the repositories are missing from the system.

$ yum repolist repo id repo name status google-cloud-compute Google Cloud Compute 4 rhel-7-server-extras-rpms/x86_64 Red Hat Enterprise Linux 7 Server - Extras (RPMs) 319 rhel-7-server-ose-3.3-rpms/x86_64 Red Hat OpenShift Container Platform 3.3 (RPMs) 590 rhel-7-server-rpms/7Server/x86_64 Red Hat Enterprise Linux 7 Server (RPMs) 13,352 repolist: 14,265

4.9. Console Access

This section will cover logging into the OpenShift Container Platform management console via the GUI and the CLI. After logging in via one of these methods applications can then be deployed and managed.

4.9.1. Log into GUI console and deploy an application

Perform the following steps from the local workstation.

Open a browser and access https://master.gce.sysdeseng.com/console. When logging into the OpenShift web interface the first time the page will redirect and prompt for Google OAuth credentials. Log into Google using an account that is a member of the account specified during the install. Next, Google will prompt to grant access to authorize the login. If Google access is not granted the account will not be able to login to the OpenShift web console.

To deploy an application, click on the New Project button. Provide a Name and click Create. Next, deploy the -ephemeral instant app by clicking the corresponding box. Accept the defaults and click Create. Instructions along with a URL will be provided for how to access the application on the next screen. Click Continue to Overview and bring up the management page for the application. Click on the link provided and access the application to confirm functionality.

4.9.2. Log into CLI and Deploy an Application

Perform the following steps from your local workstation.

[email protected] 42 www.redhat.com Install the oc client by visiting the public URL of the OpenShift deployment. For example, https://master.gce.sysdeseng.com/console/command-line and click latest release. When directed to https://access.redhat.com, login with the valid Red Hat customer credentials and download the client relevant to the current workstation. Follow the instructions located on the production documentation site for getting started with the cli.

A token is required to login using GitHub OAuth and OpenShift. The token is presented on the https://master.gce.sysdeseng.com/console/command-line page. Click the click to show token hyperlink and perform the following on the workstation in which the oc client was installed.

$ oc login https://master.gce.sysdeseng.com --token=fEAjn7LnZE6v5SOocCSRVmUWGBNIIEKbjD9h -Fv7p09

www.redhat.com 43 [email protected] After the oc client is configured, create a new project and deploy an application.

$ oc new-project test-app

$ oc new-app https://github.com/openshift/cakephp-ex.git --name=php --> Found image 2997627 (7 days old) in image stream "php" in project "openshift" under tag "5.6" for "php"

Apache 2.4 with PHP 5.6 ------Platform for building and running PHP 5.6 applications

Tags: builder, php, php56, rh-php56

* The source repository appears to match: php * A source build using source code from https://github.com/openshift/cakephp-ex.git will be created * The resulting image will be pushed to image stream "php:latest" * This image will be deployed in deployment config "php" * Port 8080/tcp will be load balanced by service "php" * Other containers can access this service through the hostname "php"

--> Creating resources with label app=php ... imagestream "php" created buildconfig "php" created deploymentconfig "php" created service "php" created --> Success Build scheduled, use 'oc logs -f bc/php' to track its progress. Run 'oc status' to view your app.

$ oc expose service php route "php" exposed

[email protected] 44 www.redhat.com Display the status of the application.

$ oc status In project test-app on server https://master.gce.sysdeseng.com

http://test-app.apps.gce.sysdeseng.com to pod port 8080-tcp (svc/php) dc/php deploys istag/php:latest <- bc/php builds https://github.com/openshift/cakephp- ex.git with openshift/php:5.6 deployment #1 deployed about a minute ago - 1 pod

1 warning identified, use 'oc status -v' to see details.

Access the application by accessing the URL provided by oc status. The CakePHP application should be visible now.

4.10. Explore the Environment

4.10.1. List Nodes and Set Permissions

If you try to run the following command, it should fail.

# oc get nodes --show-labels Error from server: User "[email protected]" cannot list all nodes in the cluster

The reason it is failing is because the permissions for that user are incorrect. Get the username and configure the permissions.

$ oc whoami

Once the username has been established, log back into a master node and enable the appropriate permissions for your user. Perform the following step from the first master (ocp-master- 01.gce.sysdeseng.com).

# oadm policy add-cluster-role-to-user cluster-admin [email protected]

www.redhat.com 45 [email protected] Attempt to list the nodes again and show the labels.

# oc get nodes --show-labels NAME STATUS AGE LABELS ocp-infra-sj2p.c.ose-refarch.internal Ready 9h beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard- 2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=europe- west1,failure-domain.beta.kubernetes.io/zone=europe-west1-b,kubernetes.io/hostname=ocp- infra-sj2p.c.ose-refarch.internal,role=infra ocp-infra-tuu9.c.ose-refarch.internal Ready 9h beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard- 2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=europe- west1,failure-domain.beta.kubernetes.io/zone=europe-west1-b,kubernetes.io/hostname=ocp- infra-tuu9.c.ose-refarch.internal,role=infra ocp-master-0v02.c.ose-refarch.internal Ready,SchedulingDisabled 9h beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard- 2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=europe- west1,failure-domain.beta.kubernetes.io/zone=europe-west1-b,kubernetes.io/hostname=ocp- master-0v02.c.ose-refarch.internal,role=master ocp-master-2nue.c.ose-refarch.internal Ready,SchedulingDisabled 9h beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard- 2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=europe- west1,failure-domain.beta.kubernetes.io/zone=europe-west1-b,kubernetes.io/hostname=ocp- master-2nue.c.ose-refarch.internal,role=master ocp-master-q0jy.c.ose-refarch.internal Ready,SchedulingDisabled 9h beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard- 2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=europe- west1,failure-domain.beta.kubernetes.io/zone=europe-west1-b,kubernetes.io/hostname=ocp- master-q0jy.c.ose-refarch.internal,role=master ocp-node-1rug.c.ose-refarch.internal Ready 9h beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard- 2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=europe- west1,failure-domain.beta.kubernetes.io/zone=europe-west1-b,kubernetes.io/hostname=ocp- node-1rug.c.ose-refarch.internal,role=app ocp-node-9enz.c.ose-refarch.internal Ready 9h beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard- 2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=europe- west1,failure-domain.beta.kubernetes.io/zone=europe-west1-b,kubernetes.io/hostname=ocp- node-9enz.c.ose-refarch.internal,role=app

[email protected] 46 www.redhat.com 4.10.2. List Router and Registry

List the router and registry by changing to the default project.

 Perform the following steps from your the workstation.

# oc project default # oc get all NAME REVISION DESIRED CURRENT TRIGGERED BY dc/docker-registry 4 2 2 config dc/router 1 2 2 config NAME DESIRED CURRENT AGE rc/docker-registry-1 0 0 9h rc/docker-registry-2 0 0 9h rc/docker-registry-3 0 0 9h rc/docker-registry-4 2 2 8h rc/router-1 2 2 9h NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE svc/docker-registry 172.30.40.230 5000/TCP 9h svc/kubernetes 172.30.0.1 443/TCP,53/UDP,53/TCP 10h svc/router 172.30.225.162 80/TCP,443/TCP,1936/TCP 9h NAME READY STATUS RESTARTS AGE po/docker-registry-4-0x2le 1/1 Running 0 8h po/docker-registry-4-pa319 1/1 Running 0 8h po/router-1-gmb8e 1/1 Running 0 9h po/router-1-gzeew 1/1 Running 0 9h

Observe the output of oc get all

www.redhat.com 47 [email protected] 4.10.3. Explore the Docker Registry

The OpenShift Ansible playbooks configure two infrastructure nodes that have two registries running. In order to understand the configuration and mapping process of the registry pods, the command 'oc describe' is used. oc describe details how registries are configured and mapped to the Google GCS for storage. Using oc describe should help explain how HA works in this environment.

 Perform the following steps from your the workstation.

$ oc describe svc/docker-registry Name: docker-registry Namespace: default Labels: docker-registry=default Selector: docker-registry=default Type: ClusterIP IP: 172.30.40.230 Port: 5000-tcp 5000/TCP Endpoints: 172.16.0.4:5000,172.16.2.3:5000 Session Affinity: ClientIP No events.

Notice that the registry has two endpoints listed. Each of those endpoints represents a Docker container. The ClusterIP listed is the actual ingress point for the registries.

4.10.4. Explore Using the oc Client to Work With Docker

The oc client allows similar functionality to the docker command. To find out more information about the registry storage perform the following.

# oc get pods NAME READY STATUS RESTARTS AGE docker-registry-2-8b7c6 1/1 Running 0 2h docker-registry-2-drhgz 1/1 Running 0 2h

[email protected] 48 www.redhat.com # oc exec docker-registry-2-8b7c6 cat /etc/registryconfig/config.yml version: 0.1 log: level: debug http: addr: :5000 storage: cache: layerinfo: inmemory gcs: bucket: "ose-refarch-openshift-docker-registry" rootdirectory: /registry auth: openshift: realm: openshift middleware: repository: - name: openshift

Observe the GCE stanza. Confirm the bucket name is listed, and access the GCE console. Click on the "Storage" tab and locate the bucket. The bucket should contain the "/registry" content as seen in the image below.

Figure 11. Registry Bucket in GCS

Confirm that the same bucket is mounted to the other registry via the same steps.

4.10.5. Explore Docker Storage

This section will explore the Docker storage on an infrastructure node.

The example below can be performed on any node but for this example the infrastructure node(ocp- infra-01.gce.sysdeseng.com) is used.

The output below verifies docker storage is not using a loop back device.

www.redhat.com 49 [email protected] $ docker info Containers: 4 Running: 4 Paused: 0 Stopped: 0 Images: 4 Server Version: 1.10.3 Storage Driver: devicemapper Pool Name: docker--vol-docker--pool Pool Blocksize: 524.3 kB Base Device Size: 3.221 GB Backing Filesystem: xfs Data file: Metadata file: Data Space Used: 1.303 GB Data Space Total: 25.5 GB Data Space Available: 24.19 GB Metadata Space Used: 372.7 kB Metadata Space Total: 29.36 MB Metadata Space Available: 28.99 MB Udev Sync Supported: true Deferred Removal Enabled: true Deferred Deletion Enabled: true Deferred Deleted Device Count: 0 Library Version: 1.02.107-RHEL7 (2016-06-09) Execution Driver: native-0.2 Logging Driver: json-file Plugins: Volume: local Network: host bridge null Authorization: rhel-push-plugin Kernel Version: 3.10.0-327.36.2.el7.x86_64 : Employee SKU OSType: linux Architecture: x86_64 Number of Docker Hooks: 2 CPUs: 2 Total Memory: 7.148 GiB Name: ocp-infra-tuu9 ID: WQG7:MTQD:IF3G:GBOA:5XRO:552Q:EF6J:7IE6:AW6O:3R7J:5AZM:OVSO WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled Registries: registry.access.redhat.com (secure), docker.io (secure)

Verify 3 disks are attached to the instance. The disk /dev/sda is used for the OS, /dev/sdb is used for docker storage, and /dev/sdc is used for emptyDir storage for containers that do not use a persistent

[email protected] 50 www.redhat.com volume.

$ fdisk -l Disk /dev/sda: 26.8 GB, 26843545600 bytes, 52428800 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk label type: dos Disk identifier: 0x0000b3fd

Device Boot Start End Blocks Id System /dev/sda1 * 2048 52428305 26213129 83 Linux

Disk /dev/sdb: 26.8 GB, 26843545600 bytes, 52428800 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk label type: dos Disk identifier: 0x00000000

Device Boot Start End Blocks Id System /dev/sdb1 2048 52428799 26213376 8e Linux LVM

Disk /dev/sdc: 53.7 GB, 53687091200 bytes, 104857600 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk /dev/mapper/docker--vol-docker--pool_tmeta: 29 MB, 29360128 bytes, 57344 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk /dev/mapper/docker--vol-docker--pool_tdata: 25.5 GB, 25497174016 bytes, 49799168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk /dev/mapper/docker--vol-docker--pool: 25.5 GB, 25497174016 bytes, 49799168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 262144 bytes / 524288 bytes

www.redhat.com 51 [email protected] Disk /dev/mapper/docker-8:1-25244720- f9a43128f37b6c5efe261755269b2d4f4d7356ffc7ba8cce2eeaaee24bbb2766: 3221 MB, 3221225472 bytes, 6291456 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 262144 bytes / 524288 bytes

Disk /dev/mapper/docker-8:1-25244720- 9528140ae9917d125197b318b3be45453077732861fc556861bac90269bdc991: 3221 MB, 3221225472 bytes, 6291456 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 262144 bytes / 524288 bytes

Disk /dev/mapper/docker-8:1-25244720- a98b399f5c1dcc5bde86c9cd1cf4ce0439d97d2fb806522766905e65fcbab74a: 3221 MB, 3221225472 bytes, 6291456 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 262144 bytes / 524288 bytes

Disk /dev/mapper/docker-8:1-25244720- a58e7bde8f30799bfa6e191a66470258a1f98214707505a9e98fa5c356776b71: 3221 MB, 3221225472 bytes, 6291456 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 262144 bytes / 524288 bytes

4.10.6. Explore Firewall Rules

As mentioned earlier in the document several security groups have been created. The purpose of this section is to encourage exploration of the security groups that were created.

 Perform the following steps from the GCE web console.

On the main GCE console, click on the left hand navigation panel select the Networking. Select "Firewall Rules" and check the rules that were created as part of the infrastructure provisioning. For example, notice how the ssh-external firewall rule only allows SSH traffic inbound. That can be further restricted to a specific network or host if required.

[email protected] 52 www.redhat.com 4.10.7. Explore the Load Balancers

As mentioned earlier in the document several Load Balancers have been created. The purpose of this section is to encourage exploration of the Load Balancers that were created.

 Perform the following steps from the GCE web console.

On the main GCE console, click on the left hand navigation panel select the Networking → Load Balancing. Select the master-https-lb-map load balancer and note the Frontend and how it is configured for port 443. That is for the OpenShift web console traffic. On the same tab, notice that the master-https-lb-cert is offloaded at the load balancer level and therefore not terminated on the masters. On the "Backend Services", there should be three master instances running with a Status of Healthy. Next check the master-network-lb-pool load balancer see that it contains the three masters that make up the and the "Backend Services" from the master-https-lb-map load balancer. Further details of the configuration can be viewed by exploring the Ansible playbooks to see exactly what was configured.

4.10.8. Explore the GCE Network

As mentioned earlier in the document, a network was created especially for OCP. The purpose of this section is to encourage exploration of the network that was created.

 Perform the following steps from the GCE web console.

On the main GCE console, click on the left hand navigation panel select the Networking → Networks. Select the ocp-network recently created and explore the Subnetworks, Firewall Rules, and Routes. More detail can be looked at with the configuration by exploring the Ansible playbooks to see exactly what was configured.

4.11. Persistent Volumes

Persistent volumes (pv) are OpenShift objects that allow for storage to be defined and then claimed by pods to allow for data persistence. The most common persistent volume source on GCE is Google Compute Storage or GCS. GCS volumes can only be mounted or claimed by one pod at a time. Mounting of persistent volumes is done by using a persistent volume claim (pvc). This claim will mount the persistent storage to a specific directory within a pod. This directory is referred to as the mountPath.

www.redhat.com 53 [email protected] 4.11.1. Creating a Persistent Volumes

Login to the first OpenShift master to define the persistent volume.

$ oc project persistent $ vi pv.yaml apiVersion: "v1" kind: "PersistentVolume" metadata: name: "pv0001" spec: capacity: storage: "5Gi" accessModes: - "ReadWriteOnce" gcePersistentDisk: fsType: "ext4" pdName: "pd-disk-1" $ oc create -f pv.yaml persistentvolume "persistent" created $ oc get pv NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE persistent 10Gi RWO Available 47s

4.11.2. Creating a Persistent Volumes Claim

The persitent volume claim will change the pod from using EmptyDir non-persistent storage to storage backed by an GCS volume. To claim space from the persistent volume a database server will be used to demostrate a persistent volume claim.

[email protected] 54 www.redhat.com $ oc new-app --docker-image registry.access.redhat.com/openshift3/mysql-55-rhel7 --name=db -e 'MYSQL_USER=chmurphy,MYSQL_PASSWORD=password,MYSQL_DATABASE=persistent'

... ommitted ...

$ oc get pods NAME READY STATUS RESTARTS AGE db-1-dwa7o 1/1 Running 0 5m

$ oc describe pod db-1-dwa7o

... ommitted ...

Volumes: db-volume-1: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:

... ommitted ...

$ oc volume dc/db --add --overwrite --name=db-volume-1 --type=persistentVolumeClaim --claim-size=10Gi persistentvolumeclaims/pvc-ic0mu deploymentconfigs/db

$ oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES AGE pvc-ic0mu Bound persistent 10Gi RWO 4s

$ oc get pods NAME READY STATUS RESTARTS AGE db-2-0srls 1/1 Running 0 23s

$ oc describe pod db-2-0srls

.... ommitted ....

Volumes: db-volume-1: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pvc-ic0mu ReadOnly: false

.... ommitted ....

www.redhat.com 55 [email protected] The above has created a database pod with a persistent volume claim named database and has attached the claim to the previously EmptyDir volume.

4.12. Testing Failure

In this section, reactions to failure are explored. After a successful install and some of the smoke tests noted above have been completed, failure testing is executed.

4.12.1. Generate a Master Outage

 Perform the following steps from the GCE web console and the OpenShift public URL.

On the main GCE console, click on the left hand navigation panel select the Instances. Locate your running ocp-master-02.gce.sysdeseng.com instance, select it, right click and change the state to stopped.

Ensure the console can still be accessed by opening a browser and accessing master.gce.sysdeseng.com. At this point, the cluster is in a degraded state because only 2/3 master nodes are running, but complete funcionality remains.

4.12.2. Observe the Behavior of ETCD with a Failed Master Node

SSH into the first master node (ocp-master-01.gce.sysdeseng.com). Using the output of the command hostname issue the etcdctl command to confirm that the cluster is healthy.

$ ssh [email protected] $ sudo -i

# hostname ip-10-20-1-106.ec2.internal # ETCD_ENDPOINT=$(hostname -f) && etcdctl -C https://$ETCD_ENDPOINT:2379 --ca-file /etc/etcd/ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key -file=/etc/origin/master/master.etcd-client.key cluster-health failed to check the health of member 82c895b7b0de4330 on https://10.20.2.251:2379: Get https://10.20.1.251:2379/health: dial tcp 10.20.1.251:2379: i/o timeout member 82c895b7b0de4330 is unreachable: [https://10.20.1.251:2379] are all unreachable member c8e7ac98bb93fe8c is healthy: got healthy result from https://10.20.3.74:2379 member f7bbfc4285f239ba is healthy: got healthy result from https://10.20.1.106:2379 cluster is healthy

Notice how one member of the ETCD cluster is now unreachable. Restart ocp-master- 02.gce.sysdeseng.com by following the same steps in the GCE web console as noted above.

[email protected] 56 www.redhat.com 4.12.3. Generate an Infrastruture Node outage

This section shows what to expect when an infrastructure node fails or is brought down intentionally.

Confirm Application Accessibility

 Perform the following steps from the browser on a local workstation.

Before bringing down an infrastructure node, check behavior and ensure things are working as expected. The goal of testing an infrastructure node outage is to see how the OpenShift routers and registries behave. Confirm the simple application deployed from before is still functional. If it is not, deploy a new version. Access the application to confirm connectivity. As a reminder, to find the required information the ensure the application is still running, list the projects, change to the project that the application is deployed in, get the status of the application which including the URL and access the application via that URL.

$ oc get projects NAME DISPLAY NAME STATUS openshift Active openshift-infra Active ttester Active test-app1 Active default Active management-infra Active

$ oc project test-app1 Now using project "test-app1" on server "https://master.gce.sysdeseng.com".

$ oc status In project test-app1 on server https://master.gce.sysdeseng.com

http://php-test-app1.apps.sysdeseng.com to pod port 8080-tcp (svc/php-prod) dc/php-prod deploys istag/php-prod:latest <- bc/php-prod builds https://github.com/openshift/cakephp-ex.git with openshift/php:5.6 deployment #1 deployed 27 minutes ago - 1 pod

1 warning identified, use 'oc status -v' to see details.

Open a browser and ensure the application is still accessible.

Confirm Registry Funtionality

This section is another step to take before initiating the outage of the infrastructure node to ensure that the registry is functioning properly. The goal is to push to the OpenShift registry.

www.redhat.com 57 [email protected] Perform the following steps from a CLI on a local workstation and ensure that the oc  client has been configured.

A token is needed so that the Docker registry can be logged into.

# oc whoami -t feAeAgL139uFFF_72bcJlboTv7gi_bo373kf1byaAT8

Pull a new docker image for the purposes of test pushing.

# docker pull fedora/apache # docker images

Capture the registry endpoint. The svc/docker-registry shows the endpoint.

# oc status In project default on server https://master.gce.sysdeseng.com

svc/docker-registry - 172.30.237.147:5000 dc/docker-registry deploys docker.io/openshift3/ose-docker-registry:v3.3.0.32 deployment #2 deployed 51 minutes ago - 2 pods deployment #1 deployed 53 minutes ago

svc/kubernetes - 172.30.0.1 ports 443, 53->8053, 53->8053

svc/router - 172.30.144.227 ports 80, 443, 1936 dc/router deploys docker.io/openshift3/ose-haproxy-router:v3.3.0.32 deployment #1 deployed 55 minutes ago - 2 pods

View details with 'oc describe /' or list everything with 'oc get all'.

Tag the docker image with the endpoint from the previous step.

# docker tag docker.io/fedora/apache 172.30.110.31:5000/openshift/prodapache

Check the images and ensure the newly tagged image is available.

# docker images

[email protected] 58 www.redhat.com Issue a Docker login.

# docker login -u [email protected] -e [email protected] -p _7yJcnXfeRtAbJVEaQwPwXreEhlV56TkgDwZ6UEUDWw 172.30.110.31:5000

# oadm policy add-role-to-user admin [email protected] -n openshift # oadm policy add-role-to-user system:registry [email protected] # oadm policy add-role-to-user system:image-builder [email protected]

Push the image to the OpenShift registry now.

# docker push 172.30.110.222:5000/openshift/prodapache The push refers to a repository [172.30.110.222:5000/openshift/prodapache] 389eb3601e55: Layer already exists c56d9d429ea9: Layer already exists 2a6c028a91ff: Layer already exists 11284f349477: Layer already exists 6c992a0e818a: Layer already exists latest: digest: sha256:ca66f8321243cce9c5dbab48dc79b7c31cf0e1d7e94984de61d37dfdac4e381f size: 6186

www.redhat.com 59 [email protected] Get Location of Router and Registry.

 Perform the following steps from the CLI of a local workstation.

Change to the default OpenShift project and check the router and registry pod locations.

$ oc project default Now using project "default" on server "https://master.gce.sysdeseng.com".

$ oc get pods NAME READY STATUS RESTARTS AGE docker-registry-2-8b7c6 1/1 Running 1 21h docker-registry-2-drhgz 1/1 Running 0 7h router-1-6y5td 1/1 Running 1 21h router-1-rlcwj 1/1 Running 1 21h

$ oc describe pod docker-registry-2-8b7c6 | grep -i node Node: ocp-infra-01.gce.sysdeseng.com/10.132.0.4 $ oc describe pod docker-registry-2-gmvdr | grep -i node Node: ocp-infra-02.gce.sysdeseng.com/10.132.0.5 $ oc describe pod router-1-6y5td | grep -i node Node: ocp-infra-01.gce.sysdeseng.com/10.132.0.4 $ oc describe pod router-1-rlcwj | grep -i node Node: ocp-infra-02.gce.sysdeseng.com/10.132.0.5

Initiate the Failure and Confirm Functionality

 Perform the following steps from the GCE web console and a browser.

On the main GCE console, click on the left hand navigation panel select the Instances. Locate your running infra01 instance, select it, right click and change the state to stopped. Wait a minute or two for the registry and pod to migrate over to infra01. Check the registry locations and confirm that they are on the same node.

$ oc describe pod docker-registry-2-8b7c6 | grep -i node Node: ocp-infra-01.gce.sysdeseng.com/10.132.0.4 $ oc describe pod docker-registry-2-drhgz | grep -i node Node: ocp-infra-02.gce.sysdeseng.com/10.132.0.5

Follow the proceedures above to ensure a Docker image can still be pushed to the registry now that infra01 is down.

[email protected] 60 www.redhat.com 5. Conclusion

Red Hat solutions involving the OpenShift PaaS are created to deliver a production-ready foundation that simplifies the deployment process, shares the latest best practices, and provides a stable highly available environment on which to run your production applications.

A successful deployment consists of the following:

• Properly configured servers that meet the OpenShift pre-requisites.

• Properly configured DNS.

• Access to a host with Ansible support.

• Proper subscription entitlement on all servers.

Once these requirements are met, the highly available OpenShift environment can be deployed.

For any questions or concerns, please email [email protected] and ensure to visit the Red Hat Reference Architecture page to find about all of our Red Hat solution offerings.

www.redhat.com 61 [email protected] Appendix A: Revision History

Revision Release Date Author(s)

1.0 Tuesday September 8, 2015 Scott Collier

1.1 Friday Apris 4, 2016 Chris Murphy / Peter Schiffer

1.2 Tuesday September 16, 2016 Chris Murphy / Peter Schiffer

1.3 Monday Oct 10, 2016 Chris Murphy / Peter Schiffer

1.4 Friday Oct 28, 2016 Chris Murphy / Peter Schiffer

1.5 Tuesday Nov 2, 2016 Chris Murphy / Peter Schiffer

1.6 Saturday Nov 5, 2016 Chris Murphy / Peter Schiffer

PDF generated by Asciidoctor PDF

Reference Architecture Theme version 1.0

[email protected] 62 www.redhat.com Appendix B: Contributors

1. Jason DeTiberus, content provider

2. Matthew Farrellee, content provider

3. Tim St. Clair, content provider

4. Rob Rati, content provider

5. Aaron Weitekamp, review

6. James Shubin, review

7. Dirk Herrmann, review

8. John Ruemker, review

9. Andrew Beekhof, review

10. David Critch, review

11. Eric Jacobs, review

12. Scott Collier, review

www.redhat.com 63 [email protected]