What's New in Microsoft R Services

What’s New in Microsoft R Services The correct bibliographic citation for this manual is as follows: Microsoft Corporation. 2016. What’s New in Microsoft R Services. Microsoft Corporation, Redmond, WA. What’s New in Microsoft R Services Copyright © 2016 Microsoft Corporation. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of Microsoft Corporation. U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of The Rights in Technical Data and Computer Software clause at 52.227-7013. Revolution R, Revolution R Enterprise, RPE, RevoScaleR, DeployR, RevoPemaR, RevoTreeView, and Revolution Analytics are trademarks of Microsoft Corporation. Revolution R Enterprise/Microsoft R Server includes the Intel® Math Kernel Library (https://software.intel.com/en-us/intel-mkl). RevoScaleR includes Stat/Transfer software under license from Circle Systems, Inc. Stat/Transfer is a trademark of Circle Systems, Inc. Other product names mentioned herein are used for identification purposes only and may be trademarks of their respective owners. Microsoft One Microsoft Way Redmond WA 98052 U.S.A. Revised on November 18, 2015 We want our documentation to be useful, and we want it to address your needs. If you have comments on this or any Revolution document, write to [email protected]. Contents 1 Overview ....................................................................................................................... 1 1.1 Features New in Revolution R Enterprise 7.5 .................................................................. 1 1.2 Features New in Revolution R Enterprise 7.4 .................................................................. 1 1.3 Features New in Revolution R Enterprise 7.3 .................................................................. 2 1.4 Features New in Revolution R Enterprise 7.2 .................................................................. 3 1.5 Features New in Revolution R Enterprise 7.1 .................................................................. 4 1.6 Features New in Revolution R Enterprise 7.0 .................................................................. 5 2 Running the Examples in This Guide ............................................................................... 6 3 Fuzzy Matching with rxGetFuzzyKeys and rxGetFuzzyDist ............................................... 6 4 Specifying Low and High Values.................................................................................... 11 5 Naïve Bayes Classifiers ................................................................................................. 15 5.1 A Simple Naïve Bayes Classifier ...................................................................................... 15 5.2 Missing Data with Naïve Bayes ...................................................................................... 16 6 Stepwise Variable Selection ......................................................................................... 17 6.1 Stepwise Variable Selection with rxLinMod ................................................................... 17 6.2 Stepwise Variable Selection with rxLogit ....................................................................... 20 6.3 Stepwise Variable Selection with rxGlm ........................................................................ 22 6.4 Plotting Model Coefficients ............................................................................................ 23 6.5 Variable Selection with Wide Data ................................................................................ 26 7 Performance Improvements ......................................................................................... 27 7.1 Wide Dataset Import Improvements ............................................................................. 27 7.2 Modeling Performance for Tree Based Algorithms ....................................................... 27 7.3 Prediction Performance for rxDForest and rxBTrees ..................................................... 27 8 Stochastic Gradient Boosting ........................................................................................ 27 8.1 A Simple Binary Classification Forest ............................................................................. 27 8.2 A Simple Regression Forest ............................................................................................ 28 8.3 A Simple Multinomial Forest Model .............................................................................. 28 9 Decision Forests ........................................................................................................... 29 9.1 A Simple Classification Forest ........................................................................................ 29 9.2 A Simple Regression Forest ............................................................................................ 30 10 Removing Duplicates While Sorting .............................................................................. 30 11 Using rxDataStep and rxPredict to Create Delimited Text Files ...................................... 35 12 Converting RevoScaleR Objects for Use in PMML .......................................................... 38 13 The RevoTreeView Package .......................................................................................... 41 1 Overview This guide is an introduction to using the new features of Microsoft R Services. The current release is Microsoft R Services 2016, which includes the following new features: New Features o It is now Microsoft R Services! With the acquisition of Revolution Analytics by Microsoft, we are quickly moving to integrate the product line. With this release you will see Microsoft R Server 8.0 on Linux and Revolution R Enterprise 8.0 on Windows, both bringing you the full feature set of the Revolution R Enterprise 7, and more! o Revolution R Open has become Microsoft R Open. o RevoScaleR now includes two new functions (as experimental) for cleaning and analyzing text data: rxGetFuzzyKeys and rxGetFuzzyDist. Bug Fixes o When using rxDataStep, new variables created in a transformation no longer inherit the rxLowHigh attribute of the variable used to create them. 1.1 Features New in Revolution R Enterprise 7.5 Revolution R Enterprise 7.5 included the following new features and bug fixes: New Features o Revolution R Open for Revolution R Enterprise, a single installer, replaces the previous RRO and Revolution R Connector installers. o RevoScaleR now includes an in-database compute context, RxInSqlServer, for use with SQL Server 2016 CTP 3. See the RevoScaleR SQL Server Getting Started Guide for a tutorial introduction. o RevoScaleR now includes a SQL Server data source, RxSqlServerData. o The RxOdbcData data source now has write capability. Bug Fixes o rxGetInfo was failing when an extended class of RxXdfData was used. 1.2 Features New in Revolution R Enterprise 7.4 Revolution R Enterprise 7.4 included the following new features and bug fixes: New Features 2 Overview o R is no longer built into Revolution R Enterprise. Instead, Revolution R Enterprise is installed into an existing RRO 8.0.3 installation, after you have also installed the Revolution R Connector. o The Revobase and RevobaseEnt packages have been removed. o The new RevoUtils package contains basic utility functions such as ‘readNews’. o The new RevoUtilsMath package contains functions to get and set MKL threads. o The RxHadoopMR compute context now supports additional distributions of Hadoop, including Cloudera CDH 5.2 and CDH 5.3 (with or without Cloudera Manager 5), Hortonworks HDP 2.2, and MapR 4.0.2. o The new RevoIOQ package contains the RevoIOQ function for running an installation and operational qualification test suite on a single machine. o RevoScaleR now features a two-process architecture—R itself runs in one process, while RevoScaleR’s underlying computational engine runs in a separate process. For RRE 7.4, this is primarily an infrastructure investment, but we expect to take advantage of the new architecture in future releases by providing exciting new interfaces to the compute engine. o A new function, rxNaiveBayes, performs classification using Bayes Theorem to determine the probability that an observation belongs to a certain class. o A new argument, keepStepCoefs, has been added to the rxStepControl function. If TRUE, a data frame, stepCoefs, is returned with the fitted model with rows corresponding to the coefficients and columns corresponding to the iterations. These stepwise components can be visualized by plotting the fitted model with the new rxStepPlot function. 1.3 Features New in Revolution R Enterprise 7.3 Revolution R Enterprise 7.3 included the following new features and bug fixes: New Features o The included version of R has been updated to R 3.1.1. o The RxHadoopMR compute context now supports additional distributions of Hadoop, including Cloudera CDH5.1 (with or without Cloudera Manager 5), and MapR 4.0.1. o A new function, rxBTrees, provides decision forests fit using a stochastic gradient boosting algorithm. Regression models as well as binary and multi-class classification models are supported. o New argument scheduleOnce

Load more