Deploy to Kuberenetes Documentation Release 1.0.0

Deploy to Kuberenetes Documentation Release 1.0.0 Jay Johnson Mar 12, 2019 Contents 1 Deploying a Distributed AI Stack to Kubernetes on CentOS3 2 Getting Started 5 2.1 Overview.................................................5 2.2 Install...................................................6 3 Validate 9 4 Deploy Redis and Postgres and the Nginx Ingress 11 5 Start Applications 15 5.1 Confirm Pods are Running........................................ 17 6 Run a Database Migration 19 7 Add Ingress Locations to /etc/hosts 23 8 Using the Minio S3 Object Store 25 8.1 View the Verification Tests on the Minio Dashboard.......................... 25 8.2 Test Minio S3 with Bucket Creation and File Upload and Download.................. 25 9 Using the Rook Ceph Cluster 27 10 Create a User 29 11 Deployed Web Applications 31 12 View Django REST Framework 33 13 View Swagger 35 14 View Jupyter 37 15 View pgAdmin 39 16 View Minio S3 Object Storage 41 17 View Ceph 43 i 18 View Splunk 45 19 Training AI with the Django REST API 47 20 Train a Deep Neural Network on Kubernetes 49 21 Get the AI Job Record 51 22 Get the AI Training Job Results 53 23 Standalone Deployments 55 24 Deploy Redis 57 24.1 Confirm Connectivity.......................................... 57 24.2 Debug Redis Cluster........................................... 57 24.3 Possible Errors.............................................. 58 24.4 Delete Redis............................................... 58 24.5 Delete Persistent Volume and Claim................................... 58 25 Deploy Postgres 59 25.1 Install Go................................................. 59 25.2 Start.................................................... 59 25.3 Debug Postgres.............................................. 59 26 Deploy pgAdmin 61 26.1 Start.................................................... 61 26.2 Get Logs................................................. 61 26.3 SSH into pgAdmin............................................ 61 27 Deploy Django REST API 63 27.1 Start.................................................... 63 27.2 Run a Database Migration........................................ 63 27.3 Get Logs................................................. 63 27.4 SSH into the API............................................. 63 28 Deploy Django Celery Workers 65 28.1 Start.................................................... 65 28.2 Get Logs................................................. 65 28.3 SSH into the Worker........................................... 65 29 Deploy AntiNex Core 67 29.1 Start.................................................... 67 29.2 Get Logs................................................. 67 29.3 SSH into the API............................................. 67 30 Deploy Jupyter 69 30.1 Start.................................................... 69 30.2 Login to Jupyter............................................. 69 30.3 Get Logs................................................. 69 30.4 SSH into Jupyter............................................. 69 31 Deploy Splunk 71 31.1 Start.................................................... 71 31.2 Login to Splunk............................................. 71 32 Searching in Splunk 73 ii 33 Search using Spylunking 75 34 Find Django REST API Logs in Splunk 77 35 Find Django Celery Worker Logs in Splunk 79 36 Find Core Logs in Splunk 81 37 Find Jupyter Logs in Splunk 83 37.1 Get Logs................................................. 83 37.2 SSH into Splunk............................................. 83 38 Deploy Nginx Ingress 85 38.1 Start.................................................... 85 38.2 Get Logs................................................. 85 38.3 SSH into the Ingress........................................... 85 39 View Ingress Nginx Config 87 40 View a Specific Ingress Configuration 89 41 Deploy Splunk 91 41.1 Start.................................................... 91 42 Deploy Splunk-Ready Applications 93 42.1 Get Logs................................................. 93 42.2 SSH into Splunk............................................. 93 42.3 View Ingress Config........................................... 93 43 Create your own self-signed x509 TLS Keys, Certs and Certificate Authority with Ansible 95 44 Deploying Your Own x509 TLS Encryption files as Kubernetes Secrets 97 44.1 Deploy Secrets.............................................. 97 44.2 List Secrets................................................ 97 44.3 Reload Secrets.............................................. 98 45 Deploy Cert Manager with Let’s Encrypt 99 45.1 Start with Let’s Encrypt x509 SSL Certificates............................. 99 45.2 View Logs................................................ 99 46 Stop the Cert Manager 101 46.1 Debugging................................................ 101 47 Troubleshooting 103 48 Customize Minio and How to Troubleshoot 105 48.1 Change the Minio Access and Secret Keys............................... 105 48.2 View the Minio Dashboard........................................ 106 48.3 Get S3 Internal Endpoint......................................... 106 48.4 Get S3 External Endpoint........................................ 106 48.5 Debugging Steps............................................. 106 48.6 Describe Pod............................................... 107 48.7 Describe Service............................................. 107 48.8 Describe Ingress............................................. 107 48.9 Uninstall Minio.............................................. 107 iii 49 Ceph Troubeshooting 109 49.1 Validate Ceph System Pods are Running................................ 109 49.2 Validate Ceph Pods are Running..................................... 109 49.3 Validate Persistent Volumes are Bound................................. 110 49.4 Validate Persistent Volume Claims are Bound.............................. 110 49.5 Create a Persistent Volume Claim.................................... 110 49.6 Verify the Persistent Volume is Bound.................................. 111 49.7 Verify the Persistent Volume Claim is Bound.............................. 111 49.8 Describe Persistent Volumes....................................... 111 49.9 Show Ceph Cluster Status........................................ 112 49.10 Show Ceph OSD Status......................................... 112 49.11 Show Ceph Free Space.......................................... 112 49.12 Show Ceph RDOS Free Space...................................... 113 49.13 Out of IP Addresses........................................... 113 50 AntiNex Stack Status 115 51 Reset Cluster 117 52 Development 121 53 Testing 123 54 License 125 55 Running a Distributed Ceph Cluster on a Kubernetes Cluster 127 55.1 Overview................................................. 127 55.2 Background................................................ 127 56 Add the Ceph Mon Cluster Service FQDN to /etc/hosts 129 57 Build KVM HDD Images 131 58 Attach KVM Images to VMs 133 59 Format Disks in VM 135 60 Install Ceph on All Kubernetes Nodes 137 61 Deploy Ceph Cluster 139 62 Watch all Ceph Logs with Kubetail 141 63 Show Pods 143 64 Check Cluster Status 145 65 Validate a Pod can Mount a Persistent Volume on the Ceph Cluster in Kubernetes 147 65.1 Create PVC................................................ 147 65.2 Verify PVC is Bound........................................... 147 65.3 Create Pod using PVC as a mounted volume.............................. 147 65.4 Verify Pod has Mounted Volume inside Container........................... 148 65.5 Verify Ceph is Handling Data...................................... 148 65.6 Delete Ceph Tester Pod......................................... 148 65.7 Recreate Ceph Tester Pod........................................ 148 65.8 View Logs from Previous Pod...................................... 148 65.9 Cleanup Ceph Tester Pod........................................ 149 iv 66 Kubernetes Ceph Cluster Debugging Guide 151 66.1 Confirm Ceph OSD pods are using the KVM Mounted Disks..................... 151 66.2 The ceph-tester failed to start...................................... 152 66.3 Orphaned fdisk Processes........................................ 153 66.4 Check osd pods.............................................. 155 66.5 Watch the Ceph Mon Logs with Kubetail................................ 155 66.6 Attach Successful but Mounting a Ceph PVC fails........................... 155 66.7 Previous Cluster Cleanup Failed..................................... 156 67 OSD Issues 157 67.1 OSD Pool Failed to Initialize...................................... 157 67.2 OSD Pod Prepare is Unable to Zap................................... 157 67.3 OSD unable to find IP Address..................................... 158 68 Cluster Status Tools 159 68.1 Show All................................................. 159 68.2 Show Cluster Status........................................... 159 68.3 Show Ceph DF.............................................. 160 68.4 Show Ceph OSD Status......................................... 160 68.5 Show Ceph Rados DF.......................................... 160 69 Uninstall 163 69.1 Uninstall and Reformat KVM Images.................................. 163 70 Managing a Multi-Host Kubernetes Cluster with an External DNS Server 165 70.1 Overview................................................. 165 70.2 Background................................................ 165 70.3 Allocate VM Resources......................................... 166 70.4 Install CentOS 7............................................

Load more