Introduction to R
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to R Dave Armstrong University of Wisconsin-Milwaukee Department of Political Science e: [email protected] w: http://www.quantoid.net/teachuw/uwmpsych Contents 1 Introduction 2 2 What You Need 2 3 Reading in Data from Other Programs 2 3.1 Sas . .3 3.2 Stata . .4 3.3 SPSS . .5 3.4 Excel (csv) . .6 4 Graphical User Interfaces 6 4.1 Rcmdr . .6 4.2 Deducer . .8 5 Saving & Writing 9 5.1 Where does R store things? . .9 5.2 Writing . 10 5.3 Saving . 10 6 Help! 10 6.1 Books . 10 6.2 Web ...................................... 11 1 1 Introduction Rather than slides, I have decided to distribute handouts that have more prose in them than slides would permit. The idea is to provide something that will serve as a slightly more comprehensive reference, than would slides, when you return home. If you're reading this, you want to learn R, either of your own accord or under duress. Here are some of the reasons that I use R: • It's open source (that means FREE!) • Rapid development in statistical routines/capabilities. • Great graphs (including interactive and 3D displays) without (as much) hassle. • Multiple datasets open at once (I know, SAS users will wonder why this is such a big deal). • Save entire workspace, including multiple datasets, all models, etc... • Easily programmable/customizable; easily see the contents (guts) of any function. • Easy integration with LATEX (jump on the reproducible research bandwagon). 2 What You Need Things you'll need to do what we're doing in the workshop: • R(http://cran.r-project.org). On Windows, choose to install both the 32-bit and 64-bit architectures if you have the option. • Java Runtime Environment: http://www.oracle.com/technetwork/java/javase/ downloads/jre8-downloads-2133155.html { If you have a choice between the 64-bit version and the 32 bit version you need to get the one that goes with the version of R you are using. You can open R and type \version" at the command prompt and hit enter and you should see one of the two outputs below: • You may also want to have RStudio, an integrated development environment (IDE) for R (https://www.rstudio.com/products/rstudio/download/). The free RStu- dio desktop should be sufficient. 3 Reading in Data from Other Programs R makes it easy to read in data from other software - particularly excel, SPSS, Sas and Stata. To do this, you'll need to use a different \package". Packages are collection of functions that are packaged together that you can download. In R, you can install packages from the Comprehensive R Archive Network by doing: 2 Figure 1: R Version Architecture (a) 32-bit (b) 64-bit install.packages('package.name') 3.1 Sas For example, if you wanted to download the package that would allow you to install .sas7bdat files, you would do: install.packages('sas7bdat') You only have to install packages once. Once you're in R and you want to use the functions in a package, you have to use the library() function. library(sas7bdat) If you want to see what functions are in the package, you can type: help(package='sas7bdat') To read in a dataset, you could do: dat <- read.sas7bdat('mvreg.sas7bdat') summary(dat) ## LOCUS_OF_CONTROL SELF_CONCEPT READ WRITE ## Min. :-1.99596 Min. :-2.532750 Min. :24.62 Min. :20.07 ## 1st Qu.:-0.34959 1st Qu.:-0.482128 1st Qu.:44.13 1st Qu.:45.60 ## Median : 0.08099 Median : 0.031470 Median :51.86 Median :52.57 ## Mean : 0.09653 Mean : 0.004917 Mean :51.90 Mean :52.38 ## 3rd Qu.: 0.55300 3rd Qu.: 0.480672 3rd Qu.:58.99 3rd Qu.:59.20 3 ## Max. : 2.20551 Max. : 2.093563 Max. :80.59 Max. :83.93 ## SCIENCE MOTIVATION PROG ## Min. :21.99 Min. :-2.746669 Min. :1.000 ## 1st Qu.:45.32 1st Qu.:-0.551408 1st Qu.:2.000 ## Median :51.37 Median :-0.007099 Median :2.000 ## Mean :51.76 Mean : 0.003898 Mean :2.088 ## 3rd Qu.:58.06 3rd Qu.: 0.493787 3rd Qu.:3.000 ## Max. :80.37 Max. : 2.583752 Max. :3.000 now the object dat holds the data you just read in. 3.2 Stata To read Stata files, there are a couple of different options. The read.dta function in the foreign package reads in Stata datasets saved in formats earlier than Stata 13. library(foreign) dat <- read.dta('mvreg_12.dta') summary(dat) ## locus_of_control self_concept motivation ## Min. :-1.99596 Min. :-2.532750 Min. :-2.746669 ## 1st Qu.:-0.34959 1st Qu.:-0.482128 1st Qu.:-0.551408 ## Median : 0.08099 Median : 0.031470 Median :-0.007099 ## Mean : 0.09653 Mean : 0.004917 Mean : 0.003898 ## 3rd Qu.: 0.55300 3rd Qu.: 0.480672 3rd Qu.: 0.493787 ## Max. : 2.20551 Max. : 2.093563 Max. : 2.583752 ## read write science prog ## Min. :24.62 Min. :20.07 Min. :21.99 general :138 ## 1st Qu.:44.13 1st Qu.:45.60 1st Qu.:45.32 academic :271 ## Median :51.86 Median :52.57 Median :51.37 vocational:191 ## Mean :51.90 Mean :52.38 Mean :51.76 ## 3rd Qu.:58.99 3rd Qu.:59.20 3rd Qu.:58.06 ## Max. :80.59 Max. :83.93 Max. :80.37 To read Stata files from version 13 or later, you can use the read.dta13 function in the readStata13 package. First, you have to install the package: install.packages('readstata13') Then, you can load the package and use it to read in the data. library(readstata13) dat <-read.dta13('mvreg_14.dta', nonint.factors=T) summary(dat) 4 ## locus_of_control self_concept motivation ## Min. :-1.99596 Min. :-2.532750 Min. :-2.746669 ## 1st Qu.:-0.34959 1st Qu.:-0.482128 1st Qu.:-0.551408 ## Median : 0.08099 Median : 0.031470 Median :-0.007099 ## Mean : 0.09653 Mean : 0.004917 Mean : 0.003898 ## 3rd Qu.: 0.55300 3rd Qu.: 0.480672 3rd Qu.: 0.493787 ## Max. : 2.20551 Max. : 2.093563 Max. : 2.583752 ## read write science prog ## Min. :24.62 Min. :20.07 Min. :21.99 general :138 ## 1st Qu.:44.13 1st Qu.:45.60 1st Qu.:45.32 academic :271 ## Median :51.86 Median :52.57 Median :51.37 vocational:191 ## Mean :51.90 Mean :52.38 Mean :51.76 ## 3rd Qu.:58.99 3rd Qu.:59.20 3rd Qu.:58.06 ## Max. :80.59 Max. :83.93 Max. :80.37 3.3 SPSS The read.spss function in the foreign package reads all versions of SPSS files, both .sav and .por types. library(foreign) dat <- read.spss('mvreg.sav', to.data.frame=T, use.value.labels=T) ## Warning in read.spss("mvreg.sav", to.data.frame = T, use.value.labels = T): mvreg.sav: Unrecognized record type 7, subtype 18 encountered in system file summary(dat) ## locus_of_control self_concept motivation ## Min. :-1.99596 Min. :-2.532750 Min. :-2.746669 ## 1st Qu.:-0.34959 1st Qu.:-0.482128 1st Qu.:-0.551408 ## Median : 0.08099 Median : 0.031470 Median :-0.007099 ## Mean : 0.09653 Mean : 0.004917 Mean : 0.003898 ## 3rd Qu.: 0.55300 3rd Qu.: 0.480672 3rd Qu.: 0.493787 ## Max. : 2.20551 Max. : 2.093563 Max. : 2.583752 ## read write science prog ## Min. :24.62 Min. :20.07 Min. :21.99 general :138 ## 1st Qu.:44.13 1st Qu.:45.60 1st Qu.:45.32 academic :271 ## Median :51.86 Median :52.57 Median :51.37 vocational:191 ## Mean :51.90 Mean :52.38 Mean :51.76 ## 3rd Qu.:58.99 3rd Qu.:59.20 3rd Qu.:58.06 ## Max. :80.59 Max. :83.93 Max. :80.37 5 3.4 Excel (csv) There is a read.csv function that is always available (not in a package you have to load) that will read .csv files. dat <- read.csv('mvreg.csv', header=T) summary(dat) ## locus_of_control self_concept motivation ## Min. :-1.99596 Min. :-2.532750 Min. :-2.746669 ## 1st Qu.:-0.34959 1st Qu.:-0.482128 1st Qu.:-0.551408 ## Median : 0.08099 Median : 0.031470 Median :-0.007099 ## Mean : 0.09653 Mean : 0.004917 Mean : 0.003898 ## 3rd Qu.: 0.55300 3rd Qu.: 0.480672 3rd Qu.: 0.493787 ## Max. : 2.20551 Max. : 2.093563 Max. : 2.583752 ## read write science prog ## Min. :24.62 Min. :20.07 Min. :21.99 academic :271 ## 1st Qu.:44.13 1st Qu.:45.60 1st Qu.:45.32 general :138 ## Median :51.86 Median :52.57 Median :51.37 vocational:191 ## Mean :51.90 Mean :52.38 Mean :51.76 ## 3rd Qu.:58.99 3rd Qu.:59.20 3rd Qu.:58.06 ## Max. :80.59 Max. :83.93 Max. :80.37 4 Graphical User Interfaces Instead of showing you a bunch of code, I want to show you two graphical interfaces that might be useful as you start modeling - Rcmdr and JGR. Both of these are free and have different functionality. Let's start with Rcmdr (pronounced R Commander). To learn more about R Commander and the plugins available, see http://www.rcommander.com/. 4.1 Rcmdr To use the R commander, you need to install the package and all of its dependencies. install.packages('Rcmdr', dependencies=T) For this to work on the Mac, you'll also need a version of Quartz, I use XQuartz (https://www.xquartz.org/). Once that is installed, you can load the package. That will activate a window that gives you menus to estimate statistical models and make graphs. It should look something like this: 6 Any data set that was active in R when you invoked Rcmdr will be available for you.