Introduction to R in IBM SPSS Modeler

Introduction to R in IBM SPSS Modeler: A guide for SPSS Users Wannes Rosius/Belgium/IBM Goal of this guide Although there are several very good articles and blogs related to IBM SPSS Modeler, in my role as technical professional for IBM Analytical solutions, we still see lots of people struggling with both R and the integration between IBM SPSS Modeler and R. The idea of this document is certainly not to replace these very useful links listed below, but to enhance these in a way that people knowing IBM SPSS Modeler, with only a very limited knowledge of R, can use this integration. Going through sections 2, 3 and 4, the reader should be able to understand at a high level the R integration within SPSS and to (re)create some very basic R models within SPSS, even if you have only a basic knowledge of R. In section 5 you will learn more detailed tips, tricks and other things. This part is for the experienced user and can be interpreted as a list of loose things which might help you get up to speed with some more detailed functionalities of the integration, and understand some pitfalls. At every point in the document, we try to include R examples to the reader that could be easily copied into the appropriate R node in IBM SPSS Modeler. Unless specied otherwise, these code snippets are always based on the telco.sav dataset which can be found in the demo folder of your SPSS Modeler installation. After the source node, attach a type node, and thereafter the appropriate R node. However, sometimes there are just abstracts of code to show you the idea. It will be clearly mentioned when the code is incomplete. You will nd these codes back into several code frames throughout this document. Furthermore, all the SPSS streams and assets are embedded in the pdf symbolized by . You can access them by right clicking within this pdf document. Some useful links • Essentials for R - Installation Instructions • User Guide: IBM SPSS Modeler 18 R Nodes • Modeler essentials for R Downloads • SPSS Modeler and R integration - Getting started IBM SPSS Modeler and R Contents 1 System Setup 3 1.1 Installing R ........................................3 1.2 Enabling the R nodes . .3 2 R basics 3 3 The basics of R nodes in IBM SPSS Modeler 5 3.1 The R nodes........................................5 3.2 Simple R code example . .5 3.2.1 modelerData . .6 3.2.2 modelerDataModel . .7 3.2.3 modelerModel . .8 3.3 Some general remarks . 10 3.4 Read data options . 11 4 Custom Dialog builder 11 4.1 Tools............................................ 12 4.2 Custom dialog . 12 4.3 Simple example . 12 5 Tips & tricks: Some more detailed 14 5.1 R code........................................... 14 5.1.1 ibmspsscf70 library . 14 5.1.2 Some useful parts of R code........................... 15 5.2 Custom Dialog builder . 17 5.2.1 How to save and share a custom dialog? . 17 5.2.2 Link to dialog and script . 17 5.3 What about SQL Pushback? Hadoop pushback? . 18 5.4 What about real-time scoring? and Solution Publisher? . 19 5.5 Something more about the metadata in modeler and the consequences on R integration 19 Page 2 of 20 IBM SPSS Modeler and R 1 System Setup Let us start with the setup of your system. For now, we assume that you have a valid installation of IBM SPSS Modeler on your machine. For more installation topics we refer to the Installation Instructions. 1.1 Installing R Depending on the version of your IBM SPSS Modeler, you will now have to install dierent versions of R: SPSS Version R version R download link 16.0 2.15.2 https://cran.r-project.org/bin/windows/base/old/2.15.2/ 17.0 3.1 https://cran.r-project.org/bin/windows/base/old/3.1.0/ 17.1 3.1 https://cran.r-project.org/bin/windows/base/old/3.1.0/ 18.0 3.2 https://cran.r-project.org/bin/windows/base/old/3.2.0/ Once you downloaded and installed, you will have a working R instance on your machine/server. Like SPSS Modeler, you can have several versions of R installed on your machine without any problem. 1.2 Enabling the R nodes You will need to install the IBM SPSS Modeler essentials for R. You can nd these here, on the SPSS Community Downloads page. Click 2. "Get Essentials for SPSS" and then click the button "Get R Essentials for SPSS Modeler". This will take you to github and you will be able to select and download the Modeler 18 Essentials for R for a variety of platforms. If you require Essentials for R for earlier Modeler versions, there is also a link to legacy versions. Run this execution le. The installation will ask you the path of your R installation, and the path to the bin les of your SPSS Modeler installation. (Note that in the prelled path, it is the default path to a ModelerServer, and you will need to change this if you want to congure your client). This installation will place the R nodes in your SPSS Modeler node palette, and it will also include necessary R libraries in your R installation folder. 2 R basics There is already an overow of R courses (publicly) available through several channels, so we would certainly not want to replace these. In it also not very important that you are an R expert to follow this document. However, there are still some basics of R code and R terminology users need to understand in order to exploit the integration of R and IBM SPSS Modeler. For this section, let us open R in its original GUI. Therefore go to the R installation folder and open \bin\x64\RGUI.exe. A window will be opened looking like this: Page 3 of 20 IBM SPSS Modeler and R This is the R console ready for commands to run. You might often hear the term RStudio, which is nothing more than a development environment on top of this R gui. Installation of RStudio is not required for this introduction, but might be handy for further use. We will start the R introduction by stating R is a powerful programming language and environment for statistical computing and graphics. An important part within that last phrase is that R is a programming language, unlike IBM SPSS Modeler. That means, it is built on objects that are dened by the user. As an example, assume the following R code (feel free to type it within the R console to see the R outputs): 1 x <- 1+1 2 y <-2*x 3 xyVector <-c(x,y) 4 z <- mean(xyVector) 5 print(z) Here x is an object. This statement will ll the object x with the value of the evaluated formula 1 + 1, being 2. So whenever the program refers to x, it will be interpreted as 2. In the second line we will dene y as twice the value of x. In the third line, we create a vector containing the content of x and y, to calculate the mean of these 2 objects and place it in an object z. The operator "<-" could also be replaced by "=", but for various reasons, lots of R users pre- fer this way of writing (actually it is not exactly the same, but that could be ignored for the purpose of this document). If you feel more comfortable in using "=", please do so. Like we lled x, y and z with some numbers, any R object can be lled with a variety of types. Here is a list of the most important for our purposes: Vector is a sequence of data elements of the same type (eg. numeric or character). This includes vectors of length 1, which can be interpreted as just being numbers. You can create a vector with the R function c(). So in the example code above, all the values of x, y and z are vectors of length one. xyVector is a vector of length 2, containing the values of (the vector) x followed by (the vector) y. Trying to link it back to SPSS, you can interpret a vector as the values of a single data column. Data frame is a list of vectors of equal length. If you look at a vector as the values of a variable, a data frame could be interpreted as a 2-dimensional dataset with columns (the number of vectors) and lines (the size of each vector). 1 n <-c(2, 3, 5, 3, 9)#A first vector of5 numeric values 2 n2 <-c(1, 3, 2, 5, 4)#A second vector of5 numeric values 3 s <-c("aa","bb","cc","aa","zz")#A third vector of5 string values 4 b <-c(TRUE, FALSE, TRUE, TRUE, TRUE)#A fourth vector of5 flag values 5 Data <- data.frame(n, s, b, New = n+n2)#A data frame containing4 vectors Page 4 of 20 IBM SPSS Modeler and R 6 #Noten+n2 will bea new vector called"New" with the sum of then+ n2:c(3, 6, 7, 8, 13) 7 8 dim(Data)#Will show you it isa5x4 dataset. 9 Data[2,4]#Will give back the value on the2nd line, the3rd column 10 colnames(Data)#Will give the column names asa vector("n","s","b","New") 11 Data$n[1]#Will give back the first value of the vectorn within the data frame.

Load more