Revolution R Enterprise™ 7 Microsoft HPC Administrator's Guide

Revolution R Enterprise™ 7 Microsoft HPC Administrator’s Guide The correct bibliographic citation for this manual is as follows: Revolution Analytics, Inc. 2015. Revolution R Enterprise 7 Microsoft HPC Administrator’s Guide. Revolution Analytics, Inc., Redmond, WA. Revolution R Enterprise 7 Microsoft HPC Administrator’s Guide Copyright © 2015 Revolution Analytics, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of Revolution Analytics. U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of The Rights in Technical Data and Computer Software clause at 52.227-7013. Revolution R, Revolution R Enterprise, RPE, RevoScaleR, RevoDeployR, RevoTreeView, and Revolution Analytics are trademarks of Revolution Analytics. Other product names mentioned herein are used for identification purposes only and may be trademarks of their respective owners. Revolution Analytics One Microsoft Way Redmond, WA 98052 Revised on August 11, 2015 We want our documentation to be useful, and we want it to address your needs. If you have comments on this or any Revolution document, write to [email protected]. Table of Contents 1 Introduction ................................................................................................................... 1 1.1 Terminology ..................................................................................................................... 1 1.2 System Requirements ...................................................................................................... 1 2 Quick Deployment ......................................................................................................... 2 2.1 Creating Firewall Exceptions ............................................................................................ 3 2.2 Creating Shared and Working Directories........................................................................ 3 2.3 Providing Access to Jobs and Large Data Directories ...................................................... 3 2.4 Installing the Package on All Nodes ................................................................................. 4 2.5 Removing the Package from All Nodes ............................................................................ 5 2.6 Sharing Setup Information with your R Users .................................................................. 5 3 Managing Data .............................................................................................................. 6 3.1 Copying Data with ClusterCopy ........................................................................................ 6 4 Additional Install Options ............................................................................................... 7 4.1 Performing a Full Install of Revolution R Enterprise ........................................................ 7 4.2 Installing to a Non-Default Location ................................................................................ 8 1 Introduction Revolution R Enterprise for Windows is an enhanced, supported version of the open- source R language. It includes RevoScaleR, Revolution’s package for statistical analysis of large data sets. RevoScaleR turns R into a clustered high performance computing (HPC) application when run via HPC Server. In the usual configuration, users will access the HPC Server cluster while running Revolution R Enterprise from their client laptop or desktop workstation. To support this configuration, Revolution Analytics has developed a lightweight node installer that allows you to quickly deploy Revolution R Enterprise to all the nodes of your HPC cluster. This manual provides detailed instructions for installing and uninstalling Revolution R Enterprise for Windows on your HPC cluster. 1.1 Terminology In this manual, we use the following terminology for various computers (the first three terms are Microsoft’s; more information can be found here. Head node: The HPC Server node that serves as the head node for the HPC cluster. Compute node: An HPC Server node configured as a compute node on the HPC cluster. Workstation node: A Windows 7 or 8 computer configured as a workstation node on the HPC cluster. Client workstation: A Windows computer that is not part of the HPC cluster but is joined to the same Active Directory domain and that is equipped with Revolution R Enterprise and is capable of establishing a cluster connection to the HPC cluster. 1.2 System Requirements Revolution R Enterprise runs on clusters running HPC Pack 2012 and those running HPC Server 2008R2, but HPC Server 2008R2 support is now deprecated. To run Revolution R Enterprise for Windows on clusters running HPC Pack 2012, the cluster must meet the following minimum requirements: Head Node o x64-architecture computer o x64 version of Windows Server 2012. o HPC Pack 2012 o 4GB RAM o 200MB disk space for installation o Must be joined to an Active Directory domain. Compute Nodes o x64-architecture computer o x64 version of Windows Server 2012 or Windows Server 2008 R2. o HPC Pack 2012 o 4GB RAM 2 Quick Deployment o 200MB disk space for installation o Must be joined to same Active Directory domain as head node. Workstation Nodes (optional) o x64-architecture computer o Windows 7, Windows 8 o HPC Pack 2012 o 4GB RAM o 200MB disk space for installation o Must be joined to same Active Directory domain as head node. To run Revolution R Enterprise for Windows on HPC Server 2008R2, the cluster must meet the following minimum requirements: Head Node o x64-architecture computer o x64 version of Windows Server 2008 R2 Standard, Enterprise, Datacenter, or HPC edition. o HPC Pack 2008 R2 o 4GB RAM o 200MB disk space for installation o Must be joined to an Active Directory domain. Compute Nodes o x64-architecture computer o x64 version of Windows Server 2008 or Windows Server 2008 R2 Standard, Enterprise, Datacenter, or HPC edition. o HPC Pack 2008 R2 o 4GB RAM o 200MB disk space for installation o Must be joined to same Active Directory domain as head node. Workstation Nodes (optional) o x86- or x64-architecture computer o Windows 7 Professional or Enterprise o HPC Pack 2008 R2 o 4GB RAM o 200MB disk space for installation o Must be joined to same Active Directory domain as head node. 2 Quick Deployment Installing Revolution R Enterprise on your cluster involves three main steps: 1. Ensuring that your cluster firewall allows the two main Revolution R Enterprise processes to communicate. 2. Creating a network share directory on the head node that all Revolution R Enterprise users can access, together with user-specific working directories on all compute nodes. Quick Deployment 3 3. Ensuring that all Revolution R Enterprise users have the necessary permissions to run HPC Server jobs together with read-access to “large data” directories on all compute nodes. 4. Running a parametric sweep job to install the software on all the nodes. 2.1 Creating Firewall Exceptions Revolution R Enterprise requires two distinct processes on each node of your cluster, and the results of these processes need to be communicated between nodes. You must therefore create firewall exceptions for these processes. To do this, go to the Allowed Apps page in your Windows Firewall and add the following programs to your Allowed Apps list: C:\Program Files\RRO\R-3.1.3\library\RevoScaleR\rxLibs\x64\BxlServer.exe C:\Program Files\RRO\R-3.1.3\bin\x64\Rterm.exe 2.2 Creating Shared and Working Directories Revolution R Enterprise uses a network share directory to store its job and task information. You can give this share any name, but it must be read and write accessible by all Revolution R Enterprise users. The main share directory will contain folders for each user’s individual use. Thus, a typical structure might be to create a network share directory named \\AllShare, and then to populate that share with individual directories for each unique user: \\AllShare\fred, \\AllShare\gloria, etc. Your R users will refer to their particular subdirectory as their shareDir. Each worker also needs to have a private working directory on each node. Typically these will be standard user directories, e.g., C:\Users\fred, C:\Users\gloria, etc. Your R users will refer to this working directory as their workingDir. 2.3 Providing Access to Jobs and Large Data Directories RevoScaleR is a package for large data analysis, and it is expected that within a given organization, numerous R users will want to analyze shared data sets. To do this, all users must have an account on the cluster that allows them to run distributed jobs, and each user must have access to the shared data sets. These files will normally be in the RevoScaleR .xdf format, and will either be complete copies or distributed copies (that is, each node might have just the portion of the data needed for its own computations). In either case, all Revolution R Enterprise users need to have read access to the data. Write access must be available to some individual (who may be an R user or a cluster administrator) to allow distribution of the data. These data directories must not be network shares; they must be local to the individual nodes. For simplicity, we suggest “C:\data” as the data

Load more