Revoscaler User's Guide

Revoscaler User's Guide

RevoScaleR User’s Guide The correct bibliographic citation for this manual is as follows: Microsoft Corporation. 2016. RevoScaleR User’s Guide. Microsoft Corporation, Redmond, WA. RevoScaleR User’s Guide Copyright © 2016 Microsoft Corporation. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of Microsoft Corporation. U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of The Rights in Technical Data and Computer Software clause at 52.227-7013. Revolution R, Revolution R Enterprise, RPE, RevoScaleR, DeployR, RevoPemaR, RevoTreeView, and Revolution Analytics are trademarks of Microsoft Corporation. Revolution R Enterprise/Microsoft R Server includes the Intel® Math Kernel Library (https://software.intel.com/en-us/intel-mkl). RevoScaleR includes Stat/Transfer software under license from Circle Systems, Inc. Stat/Transfer is a trademark of Circle Systems, Inc. Other product names mentioned herein are used for identification purposes only and may be trademarks of their respective owners. Microsoft Corporation One Microsoft Way Redmond, WA 98052 U.S.A. Revised on November 9, 2015 We want our documentation to be useful, and we want it to address your needs. If you have comments on this or any Microsoft R Services document, send e-mail to [email protected]. We’d love to hear from you. Contents Chapter 1. Introduction .................................................................................................... 1 1.1 Why RevoScaleR? ............................................................................................................. 1 1.1.1 Accessing External Data Sets .................................................................................... 2 1.1.2 Efficiently Storing and Retrieving Data ..................................................................... 2 1.1.3 Data Cleaning, Exploration, and Manipulation ......................................................... 2 1.1.4 Statistical Analysis ..................................................................................................... 2 1.1.5 Writing Your Own Analyses for Large Data Sets ....................................................... 3 1.2 Getting Started ................................................................................................................. 3 1.2.1 Accessing External Data Sets .................................................................................... 3 1.2.2 Data Cleaning, Exploration, and Transformations .................................................... 4 1.2.3 Statistical Analysis ..................................................................................................... 6 1.2.4 Writing Your Own Analyses for Large Data Sets ....................................................... 7 1.3 Sample Data for Use with RevoScaleR ............................................................................. 9 1.4 Managing Threads .......................................................................................................... 10 1.5 Generating Random Numbers ....................................................................................... 11 1.6 Using RevoScaleR with Rscript ....................................................................................... 12 1.7 Getting Help ................................................................................................................... 12 Chapter 2. Importing Data .............................................................................................. 13 2.1 Data Compression in .xdf Files ....................................................................................... 14 2.2 Importing Delimited Text Data ....................................................................................... 14 2.2.1 Specifying a Missing Value String ........................................................................... 15 2.3 Importing Fixed-Format Data ......................................................................................... 16 2.4 Importing SAS Data ........................................................................................................ 17 2.5 Importing SPSS Data ....................................................................................................... 18 2.6 Specifying Variable Data Types ...................................................................................... 19 2.7 Specifying Additional Variable Information ................................................................... 21 2.8 Appending to an Existing File ......................................................................................... 22 2.9 Transforming Data on Import ........................................................................................ 22 2.10 Converting Dates Stored As Character Strings ........................................................... 23 2.11 Importing Wide Data .................................................................................................. 23 2.12 Reading Data from an .xdf File into a Data Frame ..................................................... 24 2.13 Splitting Data Files ...................................................................................................... 26 2.14 Importing Data as Composite Xdf Files ...................................................................... 27 2.15 Using Data from the Hadoop Distributed File System ............................................... 29 2.15.1 Note on Using RevoScaleR with rhdfs..................................................................... 29 Chapter 3. Data Sources ................................................................................................. 31 3.1 Data Source Constructors .............................................................................................. 31 3.2 Specifying Delimiters ...................................................................................................... 32 3.3 Compute Contexts and Data Sources............................................................................. 33 3.4 Methods for Looking at Data Sources ............................................................................ 34 3.5 Using Data Sources ......................................................................................................... 35 3.6 Working with an Xdf Data Source .................................................................................. 36 3.7 Using an Xdf Data Source with biglm ............................................................................. 36 Chapter 4. Transforming and Subsetting Data ................................................................. 38 4.1 Creating a Subset of Rows and Columns ........................................................................ 39 4.2 Transforming Data with rxDataStep .............................................................................. 40 4.2.1 Creating and Transforming Variables ..................................................................... 41 4.2.2 Subsetting and Transforming Variables .................................................................. 43 4.3 Using the Data Step to Create an .xdf File from a Data Frame ...................................... 45 4.4 Converting .xdf Files to Text ........................................................................................... 45 4.5 Re-Blocking an .xdf File .................................................................................................. 46 4.6 Modifying Variable Information ..................................................................................... 47 4.7 Sorting Data .................................................................................................................... 47 4.7.1 Removing Duplicates While Sorting ........................................................................ 48 4.7.2 The rxQuantile Function and the Five-Number Summary ...................................... 51 4.8 Merging Data .................................................................................................................. 52 4.8.1 Inner Merge ............................................................................................................ 52 4.8.2 Outer Merge ........................................................................................................... 53 4.8.3 One-to-one Merge .................................................................................................. 54 4.8.4 Union Merge ........................................................................................................... 55 4.8.5 Using rxMerge with .xdf files .................................................................................. 56 4.9 Creating and Recoding Factors....................................................................................... 57 4.9.1 Recoding Factors to Ensure Variable Compatibility ............................................... 60 Chapter 5. Models in RevoScaleR.................................................................................... 61 5.1 External Memory Algorithms ........................................................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    226 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us