Urika®-GX Analytic Applications Guide

Urika®-GX Analytic Applications Guide (2.2.UP00) S-3015 Contents Contents 1 About the Urika®-GX Analytic Applications Guide ................................................................................................. 4 2 Analytic Software Stack Components.....................................................................................................................6 3 Urika-GX Service Modes........................................................................................................................................ 7 4 Access Urika-GX Applications.............................................................................................................................. 10 4.1 Disable Framing on Urika Applications Interface (UAI)........................................................................... 11 5 Authentication Mechanisms..................................................................................................................................13 6 Apache Hadoop Support...................................................................................................................................... 15 6.1 Load Data into the Hadoop Distributed File System (HDFS).................................................................. 16 6.2 Run a Simple Hadoop Job.......................................................................................................................17 6.3 Run a Simple Word Count Application Using Hadoop.............................................................................18 6.4 Monitor Hadoop Applications...................................................................................................................19 6.5 Use Tiered Storage on Urika-GX.............................................................................................................20 6.6 Assign the HDFS/ptmp Directory to Use SSDs for Block Storage.......................................................... 22 6.7 Change the Default HDFS Storage Policy...............................................................................................22 7 Apache Spark Support..........................................................................................................................................24 7.1 Monitor Spark Applications......................................................................................................................26 7.2 Remove Temporary Spark Files from SSDs............................................................................................28 7.3 Obtain Additional Temporary Space for Running Spark Jobs................................................................. 28 7.4 Enable Anaconda Python and the Conda Environment Manager........................................................... 29 7.5 Provide Kerberos Credentials to Spark................................................................................................... 30 7.6 Redirect a Spark Job to a Specific Directory........................................................................................... 31 7.7 Modify the Default Number of Maximum Spark Cores............................................................................ 31 7.8 Execute Spark Jobs on Kubernetes........................................................................................................ 33 7.9 Multi-tenant Spark Thrift Server on Urika-GX..........................................................................................35 8 Use Apache Mesos on Urika-GX .........................................................................................................................38 8.1 Access the Apache Mesos Web UI......................................................................................................... 40 8.2 Use mrun to Retrieve Information About Marathon and Mesos Frameworks..........................................40 8.3 Clean Up Residual mrun Jobs.................................................................................................................44 8.4 Launch an HPC Job Using mrun............................................................................................................. 45 8.5 Manage Resources on Urika-GX.............................................................................................................46 8.6 Manage Long Running Services Using Marathon................................................................................... 48 8.7 Flex up a YARN sub-cluster on Urika-GX................................................................................................51 9 Access the Jupyter Notebook UI.......................................................................................................................... 53 9.1 Create a Jupyter Notebook......................................................................................................................54 9.2 Share or Upload a Jupyter Notebook...................................................................................................... 56 S3015 2 Contents 9.3 Create a Custom Python Based Kernel for JupyterHub.......................................................................... 59 10 Get Started with Using Grafana..........................................................................................................................60 10.1 Urika-GX Performance Analysis Tools.................................................................................................. 62 10.2 Update the InfluxDB Data Retention Policy...........................................................................................62 11 Use Docker on Urika-GX.................................................................................................................................... 64 11.1 Image Management with Docker and Kubernetes................................................................................ 65 11.2 Run the Native Docker Engine on Marathon......................................................................................... 66 12 Start Individual Kafka Brokers............................................................................................................................ 68 13 Overview of the Cray Application Management UI............................................................................................. 69 14 Update the InfluxDB Data Retention Policy........................................................................................................ 71 15 Manage the Spark Thrift Server as a Non-Admin User...................................................................................... 73 16 Use Tableau® with Urika-GX...............................................................................................................................74 16.1 Connect Tableau to HiveServer2 Using LDAP...................................................................................... 74 16.2 Connect Tableau to HiveServer2 Securely............................................................................................78 16.3 Connect Tableau to the Spark Thrift Server ......................................................................................... 81 16.4 Connect Tableau to the Spark Thrift Server Securely........................................................................... 85 16.5 Connect Tableau to Apache Spark Thrift Server on a VM.....................................................................89 16.6 Enable SSL for Spark Thrift Server of a Tenant.................................................................................... 92 17 File Systems....................................................................................................................................................... 94 18 Check the Current Service Mode........................................................................................................................95 19 Fault Tolerance on Urika-GX.............................................................................................................................. 96 20 Default Urika-GX Configurations........................................................................................................................ 97 20.1 Default Grafana Dashboards.................................................................................................................99 20.2 Performance Metrics Collected on Urika-GX.......................................................................................110 20.3 Default Log Settings............................................................................................................................ 114 20.4 Tunable Hadoop and Spark Configuration Parameters.......................................................................115 20.5 Node Types......................................................................................................................................... 117 20.6 Service to Node Mapping.................................................................................................................... 118 20.7 Port Assignments................................................................................................................................ 121 20.8 Major Software Components Versions................................................................................................ 124 21 Troubleshooting...............................................................................................................................................

Urika®-GX Analytic Applications Guide

Myriad: Resource Sharing Beyond Boundaries

Is 'Distributed' Worth It? Benchmarking Apache Spark with Mesos

The Dzone Guide to Volume Ii

Kubernetes As an Availability Manager for Microservice Based Applications Leila Abdollahi Vayghan

A Single Platform for Container Orchestration and Data Services

Delivering Business Value with Apache Mesos

Apache/Mesos

Guide to the Open Cloud Open Cloud Projects Profiled

Two Stage Cluster for Resource Optimization with Apache Mesos

2017 Kevin Klues [email protected]

Deploying Apache Flink at Scale

Why Kubernetes Matters