Introduction to

Dave Armstrong University of Wisconsin-Milwaukee Department of Political Science e: [email protected] w: http://www.quantoid.net/teachuw/uwmpsych

Contents

1 Introduction 2

2 What You Need 2

3 Reading in Data from Other Programs 2 3.1 Sas ...... 3 3.2 Stata ...... 4 3.3 SPSS ...... 5 3.4 Excel (csv) ...... 6

4 Graphical User Interfaces 6 4.1 Rcmdr ...... 6 4.2 Deducer ...... 8

5 Saving & Writing 9 5.1 Where does R store things? ...... 9 5.2 Writing ...... 10 5.3 Saving ...... 10

6 Help! 10 6.1 Books ...... 10 6.2 Web ...... 11

1 1 Introduction

Rather than slides, I have decided to distribute handouts that have more prose in them than slides would permit. The idea is to provide something that will serve as a slightly more comprehensive reference, than would slides, when you return home. If you’re reading this, you want to learn R, either of your own accord or under duress. Here are some of the reasons that I use R:

• It’s open source (that means FREE!) • Rapid development in statistical routines/capabilities. • Great graphs (including interactive and 3D displays) without (as much) hassle. • Multiple datasets open at once (I know, SAS users will wonder why this is such a big deal). • Save entire workspace, including multiple datasets, all models, etc... • Easily programmable/customizable; easily see the contents (guts) of any function.

• Easy integration with LATEX (jump on the reproducible research bandwagon).

2 What You Need

Things you’ll need to do what we’re doing in the workshop:

• R(http://cran.r-project.org). On Windows, choose to install both the 32-bit and 64-bit architectures if you have the option.

• Java Runtime Environment: http://www.oracle.com/technetwork/java/javase/ downloads/jre8-downloads-2133155.html – If you have a choice between the 64-bit version and the 32 bit version you need to get the one that goes with the version of R you are using. You can open R and type “version” at the command prompt and hit enter and you should see one of the two outputs below: • You may also want to have RStudio, an integrated development environment (IDE) for R (https://www.rstudio.com/products/rstudio/download/). The free RStu- dio desktop should be sufficient.

3 Reading in Data from Other Programs

R makes it easy to read in data from other software - particularly excel, SPSS, Sas and Stata. To do this, you’ll need to use a different “package”. Packages are collection of functions that are packaged together that you can download. In R, you can install packages from the Comprehensive R Archive Network by doing:

2 Figure 1: R Version Architecture

(a) 32-bit (b) 64-bit install.packages('package.name')

3.1 Sas For example, if you wanted to download the package that would allow you to install .sas7bdat files, you would do: install.packages('sas7bdat')

You only have to install packages once. Once you’re in R and you want to use the functions in a package, you have to use the library() function. library(sas7bdat)

If you want to see what functions are in the package, you can type: help(package='sas7bdat')

To read in a dataset, you could do: dat <- read.sas7bdat('mvreg.sas7bdat') summary(dat)

## LOCUS_OF_CONTROL SELF_CONCEPT READ WRITE ## Min. :-1.99596 Min. :-2.532750 Min. :24.62 Min. :20.07 ## 1st Qu.:-0.34959 1st Qu.:-0.482128 1st Qu.:44.13 1st Qu.:45.60 ## Median : 0.08099 Median : 0.031470 Median :51.86 Median :52.57 ## Mean : 0.09653 Mean : 0.004917 Mean :51.90 Mean :52.38 ## 3rd Qu.: 0.55300 3rd Qu.: 0.480672 3rd Qu.:58.99 3rd Qu.:59.20

3 ## Max. : 2.20551 Max. : 2.093563 Max. :80.59 Max. :83.93 ## SCIENCE MOTIVATION PROG ## Min. :21.99 Min. :-2.746669 Min. :1.000 ## 1st Qu.:45.32 1st Qu.:-0.551408 1st Qu.:2.000 ## Median :51.37 Median :-0.007099 Median :2.000 ## Mean :51.76 Mean : 0.003898 Mean :2.088 ## 3rd Qu.:58.06 3rd Qu.: 0.493787 3rd Qu.:3.000 ## Max. :80.37 Max. : 2.583752 Max. :3.000

now the object dat holds the data you just read in.

3.2 Stata To read Stata files, there are a couple of different options. The read.dta function in the foreign package reads in Stata datasets saved in formats earlier than Stata 13. library(foreign) dat <- read.dta('mvreg_12.dta') summary(dat)

## locus_of_control self_concept motivation ## Min. :-1.99596 Min. :-2.532750 Min. :-2.746669 ## 1st Qu.:-0.34959 1st Qu.:-0.482128 1st Qu.:-0.551408 ## Median : 0.08099 Median : 0.031470 Median :-0.007099 ## Mean : 0.09653 Mean : 0.004917 Mean : 0.003898 ## 3rd Qu.: 0.55300 3rd Qu.: 0.480672 3rd Qu.: 0.493787 ## Max. : 2.20551 Max. : 2.093563 Max. : 2.583752 ## read write science prog ## Min. :24.62 Min. :20.07 Min. :21.99 general :138 ## 1st Qu.:44.13 1st Qu.:45.60 1st Qu.:45.32 academic :271 ## Median :51.86 Median :52.57 Median :51.37 vocational:191 ## Mean :51.90 Mean :52.38 Mean :51.76 ## 3rd Qu.:58.99 3rd Qu.:59.20 3rd Qu.:58.06 ## Max. :80.59 Max. :83.93 Max. :80.37

To read Stata files from version 13 or later, you can use the read.dta13 function in the readStata13 package. First, you have to install the package: install.packages('readstata13')

Then, you can load the package and use it to read in the data. library(readstata13) dat <-read.dta13('mvreg_14.dta', nonint.factors=T) summary(dat)

4 ## locus_of_control self_concept motivation ## Min. :-1.99596 Min. :-2.532750 Min. :-2.746669 ## 1st Qu.:-0.34959 1st Qu.:-0.482128 1st Qu.:-0.551408 ## Median : 0.08099 Median : 0.031470 Median :-0.007099 ## Mean : 0.09653 Mean : 0.004917 Mean : 0.003898 ## 3rd Qu.: 0.55300 3rd Qu.: 0.480672 3rd Qu.: 0.493787 ## Max. : 2.20551 Max. : 2.093563 Max. : 2.583752 ## read write science prog ## Min. :24.62 Min. :20.07 Min. :21.99 general :138 ## 1st Qu.:44.13 1st Qu.:45.60 1st Qu.:45.32 academic :271 ## Median :51.86 Median :52.57 Median :51.37 vocational:191 ## Mean :51.90 Mean :52.38 Mean :51.76 ## 3rd Qu.:58.99 3rd Qu.:59.20 3rd Qu.:58.06 ## Max. :80.59 Max. :83.93 Max. :80.37

3.3 SPSS The read. function in the foreign package reads all versions of SPSS files, both .sav and .por types. library(foreign) dat <- read.spss('mvreg.sav', to.data.frame=T, use.value.labels=T)

## Warning in read.spss("mvreg.sav", to.data.frame = T, use.value.labels = T): mvreg.sav: Unrecognized record type 7, subtype 18 encountered in system file summary(dat)

## locus_of_control self_concept motivation ## Min. :-1.99596 Min. :-2.532750 Min. :-2.746669 ## 1st Qu.:-0.34959 1st Qu.:-0.482128 1st Qu.:-0.551408 ## Median : 0.08099 Median : 0.031470 Median :-0.007099 ## Mean : 0.09653 Mean : 0.004917 Mean : 0.003898 ## 3rd Qu.: 0.55300 3rd Qu.: 0.480672 3rd Qu.: 0.493787 ## Max. : 2.20551 Max. : 2.093563 Max. : 2.583752 ## read write science prog ## Min. :24.62 Min. :20.07 Min. :21.99 general :138 ## 1st Qu.:44.13 1st Qu.:45.60 1st Qu.:45.32 academic :271 ## Median :51.86 Median :52.57 Median :51.37 vocational:191 ## Mean :51.90 Mean :52.38 Mean :51.76 ## 3rd Qu.:58.99 3rd Qu.:59.20 3rd Qu.:58.06 ## Max. :80.59 Max. :83.93 Max. :80.37

5 3.4 Excel (csv) There is a read.csv function that is always available (not in a package you have to load) that will read .csv files.

dat <- read.csv('mvreg.csv', header=T) summary(dat)

## locus_of_control self_concept motivation ## Min. :-1.99596 Min. :-2.532750 Min. :-2.746669 ## 1st Qu.:-0.34959 1st Qu.:-0.482128 1st Qu.:-0.551408 ## Median : 0.08099 Median : 0.031470 Median :-0.007099 ## Mean : 0.09653 Mean : 0.004917 Mean : 0.003898 ## 3rd Qu.: 0.55300 3rd Qu.: 0.480672 3rd Qu.: 0.493787 ## Max. : 2.20551 Max. : 2.093563 Max. : 2.583752 ## read write science prog ## Min. :24.62 Min. :20.07 Min. :21.99 academic :271 ## 1st Qu.:44.13 1st Qu.:45.60 1st Qu.:45.32 general :138 ## Median :51.86 Median :52.57 Median :51.37 vocational:191 ## Mean :51.90 Mean :52.38 Mean :51.76 ## 3rd Qu.:58.99 3rd Qu.:59.20 3rd Qu.:58.06 ## Max. :80.59 Max. :83.93 Max. :80.37

4 Graphical User Interfaces

Instead of showing you a bunch of code, I want to show you two graphical interfaces that might be useful as you start modeling - Rcmdr and JGR. Both of these are free and have different functionality. Let’s start with Rcmdr (pronounced ). To learn more about R Commander and the plugins available, see http://www.rcommander.com/.

4.1 Rcmdr To use the R commander, you need to install the package and all of its dependencies.

install.packages('Rcmdr', dependencies=T)

For this to work on the Mac, you’ll also need a version of Quartz, I use XQuartz (https://www.xquartz.org/). Once that is installed, you can load the package. That will activate a window that gives you menus to estimate statistical models and make graphs. It should look something like this:

6 Any data set that was active in R when you invoked Rcmdr will be available for you. You can choose the data set by click in the “Data Set” field in the upper left of the window near the R logo. That will generate a dialog box that will allow you to pick which data set you want to be active. Here are a couple of things to note:

• Models are specified with a formula y ~ x + z would regress y on x and z additively. The formula y ~ x*z would regression y on x, z and the product of x and z.

• Generally, you have to specify the data file being used to estimate a model or make a graph.

• Once you’ve clicked through the menus and submitted the command, the code used to generate the result will be printed in th “R Script” window and the results will be presented in the “Output” window. You can type code directly into the “R Script” window and evaluate it using the “Submit” button.

To show you how the plugins work, suppose we wanted to install the KMggplot2 plugin. We would do the following in R:

install.packages('RcmdrPlugin.KMggplot2') library(RcmdrPlugin.KMggplot2)

This will load the Rcmdr package and the KMggplot2 plugin. You can then work through some of the menus to see what the new options are.

7 4.2 Deducer Deducer is a different graphical user interface for R. It has lots of options, but its biggest asset is how it allows you to make graphs. You can install JGR (the Java GUI for R) and Deducer with the following functions in R: install.packages(c('JGR', 'Deducer', 'DeducerExtras')) library(JGR) JGR()

• If you’re a mac person and you’re having troubles getting the javaVM to load, you can do the following in terminal:

sudo ln -s $(/usr/libexec/java_home)/jre/lib/server/libjvm.dylib /usr/local/lib sudo javareconf

• Then, do the following in R:

install.packages('rJava', type='source')

This should bring up a window that looks like this:

Once it is open, click on the “Packages & Data” menu, then the “Package Manager”. That will bring up a dialog, make sure that the “loaded” box is clicked for Deducer and DeducerExtras.

8 5 Saving & Writing

5.1 Where does R store things? • Files you ask R to save are stored in R’s working directory. By default, this is your home directory (on the mac mine is /Users/armstrod and on Windows it is C:\Users\armstrod\documents).

• If you invoke R from a different directory, that will be the default working directory.

• You can find out what R’s working directory is with:

getwd()

## [1] "/Users/armstrod/Dropbox/IntroR/UWMPsych"

• You can change the working directory with:

– Mac:

setwd("/Users/armstrod/Dropbox/IntroR")

– Windows:

setwd("C:/Users/armstrod/Dropbox/IntroR")

Note the forward slashes even in the Windows path. You could also do C:\\users\\armstrod\\Dropbox\\IntroR. For those of you who would prefer to browse to a directory, you could do that with

– Mac:

library(tcltk) setwd(tk_choose.dir())

– Windows:

setwd(choose.dir())

There are a number of different ways to save data from R. You can either write it out to its own file readable by other software (e.g., .dta, .csv, .dbf), you can save a single dataset as an R dataset or you can save the entire workspace (i.e., all the objects) so everything is available to you when you load the workspace again (.RData or .rda).

9 5.2 Writing You can use the write functions to write data out of R.

• write.csv() will write out a comma-separated text file that can easily be read back into excel, stata, SPSS, ect...

• write.table() writes a text file that has whatever spearator you like, but otherwise has similar options and functionality to write.csv()

• write.dta() writes a Stata .dta file of the dataset. The benefit here is that factors remain defined as variables with labels in Stata. Those attributes go away in the text files.

• There is no canned function to write out a completed SPSS dataset, but there are two auxiliary functions in the foreign package that allow users to write out a text data file and then an input syntax file that will read the data in and make the “right” variable and value labels.

– writeForeignStata() takes three arguments, first is the R data frame you want to write out, the second is the name of a data file to which the data will be written and the third is the name of a code file to which the code to input the data will be written. – writeForeignSPSS() has the same arguments as the Stata version.

5.3 Saving • You can save the entire R workspace with save.image() where the only argument needed is a filename (e.g., save.image('myWorkspace.RData')). This will allow you to load all objects in your workspace whenever you want. You can do this with load('myWorkspace.RData').

• You can save a single object or a small set of objects with save() e.g., save(spss.dat, stata.dat, file='myStuff.rda') would save just those two data frames in a file called myStuff.rda which you could also get back into R with load().

6 Help!

There are lots of ways to get help for R. First, let me suggest a couple of books.

6.1 Books • Kabacoff, Robert. 2014. R In Action, 2nd ed. Manning.

• Fox, John and Sanford Weisberg. 2011. An R Companion to Applied Regression, 2nd ed. Sage.

10 Both are wonderful books. Kabacoff’s is more of a “from scratch” book, providing some detail about the basics that John’s book doesn’t. Kabacoff also has a website called Quick R http://www.statmethods.net/ that has some nice examples and code that could prove useful. John’s has some introductory material, but is a bit more focused on regression than Kabacoff’s.

6.2 Web There are also lots of internet resources.

Stack Overflow The r tag at stack overflow refers to questions relating to R (http: //stackoverflow.com/questions/tagged/r). In the top-right corner of the page, there is a search bar that will allow you to search through “r”-tagged questions. So, here you can see if your question has already been asked and answered or post a new question if it hasn’t.

Rseek We talked about http://www.rseek.org for finding packages, but it is also useful for getting help if you need it.

UCLA IDRE UCLA’s Institute for Digital Research and Education (http://www.ats. ucla.edu/stat/r/) has some nice tools for learning R, too.

R Mailing List R has a number of targeted mailing lists (along with a general help list) that are intended to allow users to ask questions (http://www.r-project. org/mail.html). There are links to instructions and a posting guide which you should follow to the best of your ability. Failure to follow these guidelines will likely result in you being excoriated by the people on the list. People who answer the questions are doing so on their free time, so make it as easy as possible for them to do that. In general, it is good practice if you’re asking someone to help you that you provide them with the means to reproduce the problem.

11