Revolution Enterprise DeployR™ 7.1 Overview Guide

The correct bibliographic citation for this manual is as follows: , Inc. 2014. Revolution R Enterprise DeployR Overview Guide. Revolution Analytics, Inc., Mountain View, CA.

Revolution R Enterprise DeployR Overview Guide Copyright © 2014 Revolution Analytics, Inc. All rights reserved.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of Revolution Analytics.

U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of The Rights in Technical Data and Computer Software clause at 52.227-7013.

Revolution R, Revolution R Enterprise, RPE, RevoScaleR, DeployR, RevoTreeView, and Revolution Analytics are trademarks of Revolution Analytics.

Other product names mentioned herein are used for identification purposes only and may be trademarks of their respective owners.

Revolution Analytics 2570 West El Camino Real Suite 222 Mountain View, CA, 94040 USA

We want our documentation to be useful, and we want it to address your needs. If you have comments on this or any Revolution document, write to [email protected].

Contents

Chapter 1: Introduction ...... 1 What’s New in Revolution R Enterprise DeployR ...... 2 Client-Server Model ...... 4 Installing DeployR ...... 4 DeployR Web Services API (RESTful) ...... 4 DeployR Landing Page ...... 5

Chapter 2: Key Features in DeployR ...... 7 Key Technologies ...... 7 Key API Services ...... 7

Chapter 3: The Administration Console ...... 11

Chapter 4: Java, JavaScript, and .NET Client Libraries ...... 13 Java Client: jDeployR ...... 13 JavaScript Client: JSDeployR ...... 13 .NET Client: DeployR ...... 13

Chapter 5: Examples ...... 15

[i]

1 Revolution R Enterprise DeployR Overview Guide

Introduction 1

Revolution R Enterprise DeployR is a server framework that exposes the R platform as a service allowing the integration of R statistics, analytics and visualizations inside Web, desktop and mobile applications. DeployR does not replace R. Instead, DeployR provides a layer on top of R to handle resource management, security, R session management, load balancing, XML/JSON encoding, and more.

Using DeployR, data analysts that are working in R can publish R scripts to a server-based installation of Revolution R Enterprise. Then, Web, desktop, and mobile application developers can use the DeployR Web Services API to securely integrate the results of these scripts into any application without needing to learn the R language.

With DeployR, authorized users can combine interactive Web-based applications and widgets, desktop applications such as ExcelTM, business intelligence dashboards, and mobile applications with on-demand analytics, predictions, and advanced visualizations from R. Behind the scenes, these applications can dynamically call a DeployR server hosted on-premise, or consolidate and schedule computations in real-time to available processors on a cluster or in the cloud. Furthermore, DeployR provides convenient functionality such as its repository, which stores and manages R projects, R scripts, R objects, data files, and plots.

DeployR is designed to be completely standardized, which facilitates the integration of R functionality into any type of application without having to worry about the complications from R or high performance statistical computing.

Note to Existing Users!

If you plan to upgrade from RevoDeployR 2.x, 6.2, 7.0, carefully read the Revolution R Enterprise DeployR Migration Guide before uninstalling RevoDeployR.

2 Chapter 1: Introduction

What’s New in Revolution R Enterprise DeployR

Revolution R Enterprise DeployR 7.1

• With this release, RevoDeployR is now called Revolution R Enterprise DeployR, or DeployR for short. • We’ve also introduced the DeployR Repository Manager, which is a tool that simplifies the task of managing repository files. Authenticated DeployR users can use the Repository Manager to manage repository files (R scripts, data files, and so on) as well as interact with their R scripts in a live debugging environment. Note that the script management functionality formerly found in the RevoDeployR Management Console is now contained in the Repository Manager. This tool can be accessed through the DeployR landing page. • The Management Console has been renamed to the Administration Console. Other changes to that console include:

o The Administration Console can only be accessed by the admin user. o All R script management functionality, other than import and export of R scripts, has been moved to the Repository Manager.

o The user testmanager and the role SCRIPT_MANAGER were obsoleted and removed when script management functionality was moved to the DeployR Repository Manager. • There is new and updated API support, including new directory support on the Repository APIs. Refer to the 7.1 Change History section in the API Reference Guide for details. • The default installation port numbers have changed for this version.

RevoDeployR 7.0

This release contains new features and improvements, including: • New and updated API support including: o Support for user blackbox projects, which are a new type of secure temporary project for authenticated users

o Support for HTTP blackbox projects, which are a new type of secure, stateful project for anonymous users

o Support for creating pools of temporary projects on the /r/project/pool API o New standardized set of parameters across all execution APIs

3 Revolution R Enterprise DeployR Overview Guide

o New R script execution chaining support on all script execution APIs o New role-based restricted access control for files on the Repository APIs o For JavaScript developers, server-side events pushed on new /r/event/stream API o For an overview of additions and updates to the API for this release please refer to the section API Change History on the documentation landing page (http://SERVER:PORT/revolution/docs/documentation/). • New Event Stream Console, which is a browser-based console window for viewing /r/event/stream events. This console is integrated into the management console, the API Explorer tool, and the JavaScript sample applications delivered with RevoDeployR. • The management console was updated to include: o The creation and use of custom roles to restrict access to R scripts and event streams o Private, Restricted, Shared and Public access controls for R Scripts o Validation of grid node configurations upon creation, update, and import o New event stream access policies under Server Policies

RevoDeployR 6.2

This release contains new features and improvements, including: • New API support for:

o Priority scheduling for asynchronous jobs o Executing scripts found on external URLs or file paths including scripts in GIT and SVN repositories

o The lifecycle management of repository-managed files, including scripts o Enhanced file versioning for repository-managed files, including scripts • New pre-authentication support for the users that have been reliably authenticated by an external system such as CA Siteminder® • A new high-performance, scalable persistence infrastructure, which is built on top of the MongoDB NoSQL database, to manage the reliable persistence of all user, project and repository data • Updated documentation and accompanying sample applications • Updated Javascript, Java, .NET client libraries • A new RevoDeployR Deployment Planning Guide is now available to help administrators plan the provisioning of server and grid capacity • Removed dependency on Apache HTTP Server

4 Chapter 1: Introduction

Client-Server Model

DeployR uses a client-server computing model. Web, desktop, and mobile applications can connect and consume services exposed by the DeployR server. Client applications do not have direct memory access to the R objects or processes on the server. Instead, all functionality is accessed via the standardized DeployR Web services API over HTTP(S). This model guarantees that the integration of R into your applications is clean, reusable, standardized, scalable, and secure.

Installing DeployR

The server framework is supported on Linux and Windows. The installation process involves installing Revolution R, a database, and the DeployR software.

For step-by-step instructions, refer to the Revolution R Enterprise DeployR Installation Guide for your operating system, which can be downloaded using the link you received from Revolution Analytics.

DeployR Web Services API (RESTful)

The DeployR server is accessed through the DeployR Web services REST API, which supports both JSON and XML. Any software capable of establishing a network connection to the server and parsing XML or JSON can be a client. All server functionality is invoked by calling a set of HTTP(S) endpoints on the server. The server itself is passive, which means the server can only respond to HTTP(S) requests and does not initiate connections itself.

Each API call requires the use of the GET or POST HTTP(S) method. Each call also takes additional required and optional parameters, many of which are call-specific. However, there are two parameters common to every call. They are:

• format, which is a required parameter that specifies the encoding of the server response as either xml or json.

• jsessionid, which specifies the HTTP(S) session token whose value is returned after a successful login by the httpcookie property in the response markup. (Exception: does not apply to /r/user/login)

The other parameters depend on the call at hand. For example, the /r/user/login call takes parameters username, password, and optionally disableautosave.

5 Revolution R Enterprise DeployR Overview Guide

For an overview of all available API calls and associated parameters, refer to the DeployR API Reference Guide, which can be accessed from the link provided to you by Revolution Analytics.

When integrating with DeployR using one of the client libraries described later, the libraries handle all server communication, and you interact with the API of the client library rather than interacting directly with the DeployR Web services API.

DeployR Landing Page

After installing DeployR, you have immediate access to the DeployR landing page at: http://:/revolution/ where is the IP address of the DeployR machine and where is the port number used during installation. The landing page presents all documentation, sample applications, and links to the Repository Manager, the Administration Console, and the API Explorer tool.

6 Chapter 1: Introduction

Figure 1: DeployR Home Page

7 Revolution R Enterprise DeployR Overview Guide

Key Features in DeployR 2

Key Technologies

DeployR includes the following key technologies: • DeployR Web services API offers support for JSON and XML data exchange on a RESTful interface. For more information, refer to the following section as well as the complete API Reference Guide, delivered electronically with this product. • The Java, Javascript, and .NET client libraries simplify the usage of the Web services API. For more information, refer to Client Libraries in this document. • The API Explorer tool offers developers a web-based interface to familiarize themselves with and explore the Web services API in an interactive manner. It is accessed through the DeployR home page at http: http://:/revolution • The Administration Console for DeployR offers a web-based interface for the administrator to manage deployment and runtime configuration of the server and Web services API. For more information, refer to the chapter on the Administration Console in this document. You can also download the complete Revolution R Enterprise DeployR Administration Console Guide delivered for this product.

The DeployR API supports the integration of R-based analytics into Web, desktop and mobile applications. The API is exposed by the DeployR server, a standards-based server technology capable of scaling to meet the needs of the enterprise.

Key API Services

There are several key API services including User, Project, Job, and Repository API services. The following sections briefly describe each of these services. You can learn more about these services in greater detail in the online API Reference Guide delivered with this product.

8 Chapter 2: Key Features in DeployR

User API Services

User API services facilitate user sign-in, sign-out and project auto-save preferences.

One of the first steps for most typical applications using this API is to provide a mechanism for users to authenticate with the DeployR server by signing-in and signing-out of the application.

To sign-in, a user must provide username and password credentials. These credentials then need to be verified by the DeployR server using the /r/user/login call. Credentials are matched against user account data created and managed in the DeployR Administration Console or against user account data stored in an LDAP or directory service.

A user whose credentials have been verified by the DeployR server is called an authenticated user. Authenticated users are granted access to the full API, which allows them to work on projects, submit or schedule jobs, and work with repository-managed files and scripts.

There are situations where a user may have been reliably authenticated by some external system such as CA Siteminder® prior to accessing a DeployR-enabled Web application. These types of pre-authentication scenarios are also supported by DeployR.

While many applications require the security and controls associated with authenticated users, there are cases when an application may want to offer specific services to users without ever establishing a formal verification of the user's identity. In such situations, we say that the user is an anonymous user. Typically, an anonymous user is an unauthenticated visitor to a DeployR- enabled Web application. Anonymous users are granted access only to the /r/repository/script/execute API call.

Project API Services

Project API services facilitate working with project code execution, workspaces, directories and packages.

Most R users are accustomed to working with R interactively in an R console window. In this environment users can input commands to manipulate, analyze, visualize, and interpret object and file data in the R session. The set of objects in the R session are collectively known as the workspace. The set of files in the R session are collectively known as the working directory. The R session environment also supports libraries, allowing the functionality found in R packages to be loaded by the user on-demand.

The DeployR environment supports these same set of functionalities by introducing the concept of projects on the API. As with working in an R console window, all operations on project APIs are synchronous, where requests are processed serially and blocked until completion.

9 Revolution R Enterprise DeployR Overview Guide

Authenticated users create and own projects. Each project is allocated its own workspace and working directory on the server and maintains its own set of dependencies along with a full R command history. A user can execute R code on a project using the /r/project/execute/code call and retrieve the R command history for the project using the /r/project/execute/history call. The lifecycle for any given project depends on whether the user indicates a desire for temporary or persistent storage. The following sections discuss these project storage options.

A temporary project will exist only for the duration of the current user's HTTP(S) session. Once the user makes an explicit call on /r/project/close or simply signs-out of the application the temporary project and all associated state are permanently deleted by the server. As such temporary projects are most useful for quick experimentation and testing simple proof-of- concepts.

A persistent project is stored indefinitely in the server unless it is explicitly deleted by the user. Each time users return to the server their own persistent projects are available, allowing the user to pick up where they last left off. In this way, persistent projects can be developed over days, weeks, months even years.

Job API Services

Job API services facilitate working with code execution scheduled for background execution.

Working with R interactively in an R console window or working with projects on the API are both examples of synchronous working environments in which a user can make a request and that request will be blocked until processing completes and an appropriate response is generated and returned. When working with R interactively, the response is displayed as output in the R console window. When working with projects on the API, the response is well-formed markup on the response stream.

However, it can be advantageous in some cases to permit users to make requests without requiring that they wait for responses. For example, consider long-running operations that could take hours or even days to complete.

The DeployR environment supports these types of long-running operations with the concept of jobs on the API. DeployR managed jobs support the execution of commands in the background on behalf of users.

Repository API Services

Repository API services facilitate working with a persistent store of managed files and scripts.

10 Chapter 2: Key Features in DeployR

The DeployR environment offers versioned file storage to authenticated users with the concept of a repository on the API. The repository provides a persistent store for user files of any type such as binary object files, plot image files, data files, simple text files and files containing blocks of R code, called repository-managed scripts.

Repository-managed scripts are a special type of repository-managed file. Any file with an .r or an .R extension is identified by the server as a repository-managed script. These scripts are blocks of R code with well-defined inputs and outputs. While scripts are technically also repository-managed files, they are designed to be exposed as an executable on the API.

Because the repository supports a versioned file system, a full version history for each file is maintained and any version can be retrieved on request. Each user has access to a private repository store. Each file placed in that store is maintained indefinitely by the server unless it is explicitly deleted by the user.

11 Revolution R Enterprise DeployR Overview Guide

The Administration Console 3

The DeployR Administration Console, which is delivered with DeployR, is an easy-to-use web interface that facilitates the proper management and administration of your DeployR deployment by the DeployR admin. Accordingly, the following functions are supported in the console: • The creation and management of user accounts • The creation and management of roles to grant users permissions and to restrict access to R scripts • The import and export of R scripts for migration purposes or for backing up. • The creation and management of R boundaries for constraining runtime resource usage • The creation and management of IP filters • The management of node resources on the DeployR grid • The management of DeployR server policies • The deployment of replica sets for MongoDB to increase data availability • The monitoring of events on the grid and database backups

Figure 2: DeployR Administration Console

12 Chapter 3: The Administration Console

To access and log into the Administration Console after installation:

1. In a browser window, enter the Administration Console’s URL:

http://:/deployr/administration

where is the IP address of the DeployR machine and where is the port number used during installation.

2. Click Administration Console Log In in the upper right to log in if you are not already logged in. For the list of preconfigured usernames and passwords, see the section User Account Properties in the Revolution R Enterprise DeployR Administration Console Guide.

3. Click Log In.

Important! You cannot log into DeployR from multiple accounts using a single brand of browser program. To use two or more accounts concurrently, you'll need to log into each one in a separate brand of browser. For example, to log into the DeployR Administration Console with admin account and into the API Explorer tool with another user account, you could open one in Google Chrome™ and the other in Mozilla® Firefox®.

13 Revolution R Enterprise DeployR Overview Guide

Java, JavaScript, and .NET Client Libraries 4

The DeployR Web services API is completely standardized and can be accessed from an application by opening a connection and sending HTTP(S) requests. As an added convenience, you can choose from several client libraries that are already equipped to handle the HTTP(S) processing, XML/JSON encoding/decoding, and more.

By using one of these client libraries, you can integrate Web-service API calls into popular programming languages with only a few lines of code.

Java Client: jDeployR

The Java client library, jDeployR, provides a simple API for integrating DeployR services into Java-based Web and desktop applications.

JavaScript Client: JSDeployR

The JavaScript client library, JSDeployR, serves as a wrapper for the DeployR Web services API calls and is used to interact with DeployR directly in a the browser via AJAX. Consequently, R functionality can easily be embedded into a website using JSDeployR.

Since JSDeployR uses XMLHttpRequest, which requires a strict adhesion to a same-origin policy, any retrieval of DeployR data hosted on a different domain via JSDeployR requires that cross-origin resource sharing (CORS) first be enabled in DeployR. For more information, refer to the section on Cross-Domain Transactions in the JavaScript Reference Guide, which is accessible from the DeployR landing page.

.NET Client: DeployR

The .NET client library, DeployR, provides a simple API for integrating DeployR services into .NET-based Web and desktop applications.

14 Chapter 4: Java, JavaScript, and .NET Client Libraries

15 Revolution R Enterprise DeployR Overview Guide

Examples 5

DeployR is delivered with several samples to help you better understand how to use and/or integrate the product.

Sample JavaScript Applications

You can access these samples from the DeployR home page: http://:/revolution

where is the IP address of the DeployR machine and where is the port number used during installation.

Integration Examples

DeployR is delivered with several examples to help you integrate with other products such as Qlikview, Excel, or Jaspersoft. You can access the integration guides and example files at under /3rdParty_Examples/.