Revolution Enterprise™ 7 HPC Administrator’s Guide

The correct bibliographic citation for this manual is as follows: , Inc. 2015. Revolution R Enterprise 7 Microsoft HPC Administrator’s Guide. Revolution Analytics, Inc., Redmond, WA.

Revolution R Enterprise 7 Microsoft HPC Administrator’s Guide Copyright © 2015 Revolution Analytics, Inc. All rights reserved.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of Revolution Analytics.

U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of The Rights in Technical Data and Computer Software clause at 52.227-7013.

Revolution R, Revolution R Enterprise, RPE, RevoScaleR, RevoDeployR, RevoTreeView, and Revolution Analytics are trademarks of Revolution Analytics.

Other product names mentioned herein are used for identification purposes only and may be trademarks of their respective owners.

Revolution Analytics One Microsoft Way Redmond, WA 98052

Revised on August 11, 2015

We want our documentation to be useful, and we want it to address your needs. If you have comments on this or any Revolution document, write to [email protected].

Table of Contents

1 Introduction ...... 1 1.1 Terminology ...... 1 1.2 System Requirements ...... 1 2 Quick Deployment ...... 2 2.1 Creating Firewall Exceptions ...... 3 2.2 Creating Shared and Working Directories...... 3 2.3 Providing Access to Jobs and Large Data Directories ...... 3 2.4 Installing the Package on All Nodes ...... 4 2.5 Removing the Package from All Nodes ...... 5 2.6 Sharing Setup Information with your R Users ...... 5 3 Managing Data ...... 6 3.1 Copying Data with ClusterCopy ...... 6 4 Additional Install Options ...... 7 4.1 Performing a Full Install of Revolution R Enterprise ...... 7 4.2 Installing to a Non-Default Location ...... 8

1 Introduction

Revolution R Enterprise for Windows is an enhanced, supported version of the open- source R language. It includes RevoScaleR, Revolution’s package for statistical analysis of large data sets. RevoScaleR turns R into a clustered high performance computing (HPC) application when run via HPC Server. In the usual configuration, users will access the HPC Server cluster while running Revolution R Enterprise from their client laptop or desktop workstation. To support this configuration, Revolution Analytics has developed a lightweight node installer that allows you to quickly deploy Revolution R Enterprise to all the nodes of your HPC cluster. This manual provides detailed instructions for installing and uninstalling Revolution R Enterprise for Windows on your HPC cluster.

1.1 Terminology

In this manual, we use the following terminology for various computers (the first three terms are Microsoft’s; more information can be found here.

 Head node: The HPC Server node that serves as the head node for the HPC cluster.  Compute node: An HPC Server node configured as a compute node on the HPC cluster.  Workstation node: A Windows 7 or 8 computer configured as a workstation node on the HPC cluster.  Client workstation: A Windows computer that is not part of the HPC cluster but is joined to the same domain and that is equipped with Revolution R Enterprise and is capable of establishing a cluster connection to the HPC cluster.

1.2 System Requirements

Revolution R Enterprise runs on clusters running HPC Pack 2012 and those running HPC Server 2008R2, but HPC Server 2008R2 support is now deprecated.

To run Revolution R Enterprise for Windows on clusters running HPC Pack 2012, the cluster must meet the following minimum requirements:

 Head Node o x64-architecture computer o x64 version of Windows Server 2012. o HPC Pack 2012 o 4GB RAM o 200MB disk space for installation o Must be joined to an Active Directory domain.  Compute Nodes o x64-architecture computer o x64 version of Windows Server 2012 or Windows Server 2008 R2. o HPC Pack 2012 o 4GB RAM

2 Quick Deployment

o 200MB disk space for installation o Must be joined to same Active Directory domain as head node.  Workstation Nodes (optional) o x64-architecture computer o Windows 7, Windows 8 o HPC Pack 2012 o 4GB RAM o 200MB disk space for installation o Must be joined to same Active Directory domain as head node.

To run Revolution R Enterprise for Windows on HPC Server 2008R2, the cluster must meet the following minimum requirements:

 Head Node o x64-architecture computer o x64 version of Windows Server 2008 R2 Standard, Enterprise, Datacenter, or HPC edition. o HPC Pack 2008 R2 o 4GB RAM o 200MB disk space for installation o Must be joined to an Active Directory domain.  Compute Nodes o x64-architecture computer o x64 version of Windows Server 2008 or Windows Server 2008 R2 Standard, Enterprise, Datacenter, or HPC edition. o HPC Pack 2008 R2 o 4GB RAM o 200MB disk space for installation o Must be joined to same Active Directory domain as head node.  Workstation Nodes (optional) o x86- or x64-architecture computer o Windows 7 Professional or Enterprise o HPC Pack 2008 R2 o 4GB RAM o 200MB disk space for installation o Must be joined to same Active Directory domain as head node.

2 Quick Deployment

Installing Revolution R Enterprise on your cluster involves three main steps:

1. Ensuring that your cluster firewall allows the two main Revolution R Enterprise processes to communicate. 2. Creating a network share directory on the head node that all Revolution R Enterprise users can access, together with user-specific working directories on all compute nodes.

Quick Deployment 3

3. Ensuring that all Revolution R Enterprise users have the necessary permissions to run HPC Server jobs together with read-access to “large data” directories on all compute nodes. 4. Running a parametric sweep job to install the software on all the nodes.

2.1 Creating Firewall Exceptions

Revolution R Enterprise requires two distinct processes on each node of your cluster, and the results of these processes need to be communicated between nodes. You must therefore create firewall exceptions for these processes. To do this, go to the Allowed Apps page in your Windows Firewall and add the following programs to your Allowed Apps list:

C:\\RRO\R-3.1.3\library\RevoScaleR\rxLibs\x64\BxlServer.exe C:\Program Files\RRO\R-3.1.3\bin\x64\Rterm.exe

2.2 Creating Shared and Working Directories

Revolution R Enterprise uses a network share directory to store its job and task information. You can give this share any name, but it must be read and write accessible by all Revolution R Enterprise users. The main share directory will contain folders for each user’s individual use. Thus, a typical structure might be to create a network share directory named \\AllShare, and then to populate that share with individual directories for each unique user: \\AllShare\fred, \\AllShare\gloria, etc. Your R users will refer to their particular subdirectory as their shareDir.

Each worker also needs to have a private working directory on each node. Typically these will be standard user directories, e.g., C:\Users\fred, C:\Users\gloria, etc. Your R users will refer to this working directory as their workingDir.

2.3 Providing Access to Jobs and Large Data Directories

RevoScaleR is a package for large data analysis, and it is expected that within a given organization, numerous R users will want to analyze shared data sets. To do this, all users must have an account on the cluster that allows them to run distributed jobs, and each user must have access to the shared data sets. These files will normally be in the RevoScaleR .xdf format, and will either be complete copies or distributed copies (that is, each node might have just the portion of the data needed for its own computations). In either case, all Revolution R Enterprise users need to have read access to the data. Write access must be available to some individual (who may be an R user or a cluster administrator) to allow distribution of the data. These data directories must not be network shares; they must be local to the individual nodes. For simplicity, we suggest “C:\data” as the data path on each node.

In most cases, if users can access the cluster with their standard domain credentials, they can run jobs. However, in some configurations, each user must be an HPC administrator to run jobs. Check with your local system administrator to see what configuration your cluster is running in, and determine if your R users need to be HPC administrators.

4 Quick Deployment

2.4 Installing the Package on All Nodes

Once you’ve set up the network share and user accounts, and ensured access to the data directories, you’re ready to actually install Revolution R Enterprise on all of the nodes. The actual Revolution R Enterprise install consists of three parts—one to install R itself, one to install additional components distributed under the GNU Public License or Lesser GNU Public License, and one to install the Enterprise components. We recommend performing the installation with a set of parametric sweep tasks. To do this, you must make sure that the three installers are unpacked and accessible on the head node. Be sure that you are putting the Revolution R Enterprise node installer (Revolution-R-Enterprise-Node-7.4.1-Windows.exe) on the head node.

To create the parametric sweep jobs:

1. Launch the HPC 2012 Job Manager. (You must be an administrator on the cluster.) 2. Under Job Submission, click New Job…. The New Job dialog appears open to the Job Details page. 3. Enter a name for the job in the text field labeled Job name. 4. Select Node from the dropdown menu labeled Select the type of resource to request for this job. 5. Under Minimum, select the radio button with the numeric slider, and enter the number of nodes in your cluster. You want to install Revolution R Enterprise on all nodes in the cluster. Do not use Auto-calculate. 6. Under Maximum, select the radio button with the numeric slider, and enter the number of nodes in your cluster. You want to install Revolution R Enterprise on all nodes in the cluster. Do not use Auto-calculate. 7. Click Resource Selection in the left navigation pane. 8. Select the checkbox labeled Run this job only on nodes in the following list. 9. Select all the nodes listed. 10. Click Edit Tasks in the left navigation pane. 11. Click the small arrow at the right of the Add button and select Parametric Sweep Task from the dropdown menu that appears. The Parametric Sweep Task dialog appears. 12. Make sure that the Start value field is set to 1 and set the End value field to the number of nodes you specified in Steps 5 and 6. 13. Set Increment value to 1. 14. In the Command line text field, enter the following (where HEADNODE is the name of your head node and \\AllShare is the network share created for Revolution R Enterprise):

start /wait \\HEADNODE\\AllShare\RRO-8.0.3-win.exe /SILENT

15. Set Working directory to C:\. 16. Set both Standard output and Standard error to \\HEADNODE\\AllShare\MergedOut.txt, where again HEADNODE is the name of your head node and \\AllShare is the network share created for Revolution R Enterprise. 17. Click OK.

Quick Deployment 5

18. (optional) In the New Job dialog, click Save Job XML File. (This will make it easy to install a later version of the Revolution R Enterprise software or to install to additional compute nodes.) 19. In the New Job dialog, click Submit. 20. Repeat steps 2 through 19, but at Step 14, do the following: In the Command line text field, enter the following (where HEADNODE is the name of your head node and \\AllShare is the network share created for Revolution R Enterprise):

start /wait \\HEADNODE\\AllShare\Revolution-R-Connector-7.4.1- Windows.exe /exenoui /q /Lvoicewarmup %TEMP%\Revolution-R-Connector-7.4.1-Windows_install.log

21. Repeat steps 2 through 19, but at Step 14, do the following: In the Command line text field, enter the following (where HEADNODE is the name of your head node, \\AllShare is the network share created for Revolution R Enterprise, and XXXX-XXXX-XXXX is your Revolution R Enterprise serial number, sent in your welcome e-mail):

start /wait \\HEADNODE\\AllShare\Revolution-R-Enterprise-Node- 7.4.1-Windows.exe /exenoui PIDKEY=XXXX-XXXX-XXXX /q /Lvoicewarmup %TEMP%\Revolution-R-Enterprise-Node-7.4.1-Windows_install.log

2.5 Removing the Package from All Nodes

You can remove the packages via parametric sweep tasks in exactly the same way as for installation; simply replace the Command line text from Step 14 with the following to remove the Revolution R Enterprise components:

start /wait msiexec /x {0BD0EE2B-24A8-41D6-8C6B-C9BDA36C821C} /qn /Lvoicewarmup %TEMP%\Revolution-R-Enterprise-Node-7.4.1- Windows_uninstall.log

To remove the Revolution R Connector components:

start /wait msiexec /x {876ECAD1-23E2-462F-9B6C-C3069307FFF7} /qn /Lvoicewarmup %TEMP%\Revolution-R-Connector-7.4.1-Windows_uninstall.log

To remove the Revolution R Open components:

start /wait C:\Program Files\RRO\R-3.1.3\unins000.exe /silent

2.6 Sharing Setup Information with your R Users

Once you have completed the installation, you need to share the following pieces of information with your R users:

 headNode: The network name or IP address of the cluster’s head node.

6 Managing Data

 revoPath: The path to the \bin\x64 subdirectory of your node installation; this is by default C:\Program Files\RRO\R-3.1.3\bin\x64.  shareDir: The user’s subdirectory of the shared directory, as described in the section Creating Firewall Exceptions  Revolution R Enterprise requires two distinct processes on each node of your cluster, and the results of these processes need to be communicated between nodes. You must therefore create firewall exceptions for these processes. To do this, go to the Allowed Apps page in your Windows Firewall and add the following programs to your Allowed Apps list:

C:\Program Files\RRO\R-3.1.3\library\RevoScaleR\rxLibs\x64\BxlServer.exe C:\Program Files\RRO\R-3.1.3\bin\x64\Rterm.exe

 Creating Shared and Working Directories.  workingDir: The user’s working directory, as described in the section Creating Firewall Exceptions  Revolution R Enterprise requires two distinct processes on each node of your cluster, and the results of these processes need to be communicated between nodes. You must therefore create firewall exceptions for these processes. To do this, go to the Allowed Apps page in your Windows Firewall and add the following programs to your Allowed Apps list:

C:\Program Files\RRO\R-3.1.3\library\RevoScaleR\rxLibs\x64\BxlServer.exe C:\Program Files\RRO\R-3.1.3\bin\x64\Rterm.exe

 Creating Shared and Working Directories.  dataPath: The path to any data directories created for storing .xdf files, as described in the section Providing Access to Jobs and Large Data Directories.

3 Managing Data

As mentioned in the previous chapter, Revolution R Enterprise and in particular its RevoScaleR package are intended to help users analyze large data sets. Within a given enterprise, it is likely that more than one user will need to access enterprise-wide data files on the cluster. Also as mentioned in the previous chapter, there are basically two ways to manage data, one is to copy complete files to all the nodes, which you can do with the ClusterCopy tool, and the other is to create distributed copies on the nodes, which are the subsets of the entire data needed for the computation on a particular node. If you will be managing data on the cluster for your Revolution R users, you will need the instructions in this chapter; similar instructions for your Revolution R users can be found in the RevoScaleR User’s Guide.

3.1 Copying Data with ClusterCopy

ClusterCopy is a tool available as a free download from Microsoft that makes copying data from node to node quick and simple. Once you’ve downloaded and installed ClusterCopy, using it is easy:

Additional Install Options 7

1. Navigate to the location where you installed ClusterCopy, and then double-click ClusterCopy.exe. 2. In the Head node text field, enter the name of your cluster’s head node, for example, cluster-head. 3. In the Source folder text field, enter the path of the original data file, for example, c:\data. 4. In the Destination folder text field, enter the path for the data on the compute nodes, for example, c:\data. 5. In the Target file(s) text field, enter the name of the file (or files) you want to copy. If you leave this blank, all the files in the source folder are copied. 6. Specify the Number of simultaneous copies from source; this defaults to 3. 7. Click Copy to Nodes (Distribute).

4 Additional Install Options

The Quick Deployment outlined in Chapter 2 is adequate for most users, but there may be situations where other configurations are needed. This chapter addresses some of the possibilities.

4.1 Performing a Full Install of Revolution R Enterprise

The Node install described in Chapter 2 installs the complete R language and most of the Revolution R Enterprise extension packages, but it does not include the Revolution R Enterprise R Productivity Environment, an integrated development environment for R based on . Normally, this is not needed on the cluster machines because users will be accessing the cluster from their client workstations, which will have the full Revolution R Enterprise installed. However, you may want a full install on the cluster’s head node for testing purposes. You can do this easily either by running the full installer, Revolution-R-Enterprise-7.4.1-Windows.exe, on the head node in interactive mode, or using the following command in a Command Prompt on the head node:

start /wait \\HEADNODE\\AllShare\Revolution-R-Enterprise-7.4.1- Windows.exe /exenoui PIDKEY=XXXX-XXXX-XXXX /q /Lvoicewarmup %TEMP%\Revolution-R-Enterprise-7.4.1-Windows_install.log

8 Additional Install Options

To ensure automated installation of any missing prerequisites, set the following environment variables in the Environment Variable pane:

Variable Value VCRUIMODE /q DOTNET_MODE /passive /norestart VSIS_MODE /passive /norestart

The full and node installs can both exist on the same computer, so your head node can act as both a client workstation and a compute node.

4.2 Installing to a Non-Default Location

By default, R is installed to C:\Program Files\RRO\R-3.1.3, and additional Revolution R Enterprise files are installed to either C:\Revolution\R-Enterprise-Node-7.4 (for Node installs) or C:\Revolution\R-Enterprise-7.4 (full installs).

You can override these defaults for the Revolution R Enterprise files either by running the installer in interactive mode, in which case you are offered a screen for choosing the installation directory, or by using the APPDIR property in the silent install. For example, the following command installs the full Revolution R Enterprise in the C:\Revo directory:

start /wait \\HEADNODE\\AllShare\Revolution-R-Enterprise-7.4.1- Windows.exe /exenoui APPDIR=C:\Revo PIDKEY=XXXX-XXXX-XXXX /q /Lvoicewarmup %TEMP%\Revolution-R-Enterprise-7.4.1-Windows_install.log