Introduction to R

Introduction to R

Tuan V. Nguyen Gene$cs Epidemiology of Osteoporosis Lab Garvan Ins$tute of Medical Research Garvan Ins$tute Biostas$cal Workshop 17 April 2014 © Tuan V. Nguyen Introduction to R • A brief history • Installaon • Packages • Essen$al grammar • A session with R Previously … • Many stas$cal packages were/are available • Popular packages include Systat, Minitab, Stas$ca, BMDP, S+, Gauss, Spida JMP, SPSS, Stata, SAS and now R R is gaining popularity Number of scholarly ar$cles that reference each soUware by year (Source: Muenchen R. The popularity of data analysis soUware, r4stat.com/ar$cles/popularity) R is gaining popularity Number of scholarly ar$cles that reference each soUware by year, aer removing the top two, SPSS and SAS (Source: Muenchen R. The popularity of data analysis soUware, r4stat.com/ar$cles/popularity) A brief history • R is a “stas$cal and graphical programming language” • Originated from S – 1988 - S2: RA Becker, JM Chambers, A Wilks – 1992 - S3: JM Chambers, TJ Has$e – 1998 - S4: JM Chambers • R was ini$ally wriben by Ross Ihaka and Robert Gentleman (Univ of Auckland, New Zealand) in 1990s • From 1997: internaonal “R-core”, 15 people What can R do? • It is a sta$s$cal language • All models of stas$cal analysis • Great for simulaon work • Programming (do you want to take a challenge?) Why R ? • Open source – totally free! • Developed by professional and academic stas$cians • Run on Windows, Unix, MacOS • Keep up-to-date with methodological developments • Speak the language of experts (bioinformacs and stas$cs) • Large user community Installaon cran.r-project.org Installation of R on Windows • Select Windows • Select “base” • Run à OK à Next • Then Finish – R icon on your desktop A screenshot of R RStudio An “add-on” of R RStudio hbp://rstudio.org Introduction to RStudio • An IDE (Interface Development Environment) of R. • Provide some convenient func$ons for running R • R also has a number of other IDEs: • TinnR • R commander R and RStudio Can run R within Rstudio (you don’t need to start R) RStudio Workspace: Variables R console Files Packages R is a real demonstration of the power of collaboration Ihaka Packages • R = Base + Packages • Base R includes basic R func$ons for simple func$ons and analyses • Packages are modules for specific analyses • More than 6000 packages in R ! Common packages Hmisc: Miscellaneous for data rms: Regression modeling strategies manipulaon car: Companion to regression tables: For tabulaon of data analysis foreign: For reading data from survival: Survival analyses other soUwares EpiR: Epidemiological analyses tables: For tabulaon of data epicalc: Epidemiological analyses gmodels: Programming tools boot: Bootstrap analyses ggplot2: Advanced graphics cluster: Cluster analysis sciplot: Scien$fic graphs psych: Psychometrics and Zelig: “Every one’s stas$cal descrip$ve stas$cs soUware” Basic management of packages • Installing new packages (try now!) install.packages(c("Hmisc", "rms", "tables", "foreign", "gmodels", "ggplot2", "sciplot", "Zelig", "car", "survival", "EpiR", "epicalc", "boot", "cluster", "psych", "binom", "BMA", "ExactCIdiff", "lattice", "mgcv", "gam", "nlme", "quantreg") • To find out which packages you have installed library() R Grammar: a quick introduc9on Interacting with R • Start up R • Can use up/down arrow keys to retrieve command history • Can use leU/right keys to edit a command line • Can use TAB to append a full command – very useful! • Mul$ple commands can be wriben in 1 line by using “;” separator Variable names • Use lebers, numbers, and signs (., -, _) • Assignment symbol: <- or = • Dis$nc$on between upper and lower case lebers Genotype = 5; genotype <- 7; Geno.type = Genotype + genotype Object-oriented language R is an object-oriented language • Funcon • Vector • Matrix • Dataframe Function • R “commands” = func$on • Func$on has arguments • Arguments include variables (name), parameters, opons, etc • Example: fing a linear regression model y = a + bx m1 = lm(y ~ x, data=test) Function • R “commands” = func$on • Func$on has arguments • Example: fing a linear regression model y = a + bx m1 = lm(y ~ x, data=test) Object name Func9on Arguments: m1 lm = linear model variables: y, x dataset name Vector • Vectors are basic building block in R • Vector = a series of values • Values can be numeric or character score = c(4,2,1,5) gender = c('F','M','F','M') c (concatenaon) for direct data entry Matrix • Rectagular data à rows, columns • Matrix can be a collec$on of vectors 1 3 6 7 3 4 7 9 5 7 8 0 Matrix 1 3 6 7 3 4 7 9 5 7 8 0 v1 = c(1,3,5) v2 = c(3,4,7) v3 = c(6,7,8) v4 = c(7,9,0) m = cbind(v1,v2,v3,v4) m Reference to matrix > m • Row first, column later v1 v2 v3 v4 [1,] 1 3 6 7 • Flexible in R [2,] 3 4 7 9 [3,] 5 7 8 0 > m[2,3] v3 7 > m[,2:3] v2 v3 > m[1,] [1,] 3 6 v1 v2 v3 v4 [2,] 4 7 1 3 6 7 [3,] 7 8 > m[1:2,] > m[,3:4]*m[1,2] v1 v2 v3 v4 v3 v4 [1,] 1 3 6 7 [1,] 18 21 [2,] 3 4 7 9 [2,] 21 27 [3,] 24 0 Dataframe Dataset in R = “Dataframe” = matrix fields, columns, variables ID Gender Math Reading 1 F 5 8 2 M 5 2 rows records 3 F 7 3 observaons 4 F 8 6 numeric character numeric numeric Reference to field/column in a dataframe • Dataframe should be attached prior to analysis • Reference to field: (dataframe name)$(field name) • Example: v1 = c(1,3,5) v2 = c(3,4,7) v3 = c(6,7,8) v4 = c(7,9,0) dat = data.frame(v1, v2, v3, v4) attach(dat) dat$sum = dat$v1 + dat$v3 sum1 = v1 + v3 dat The effect of $ v1 = c(1,3,5) > dat v2 = c(3,4,7) v1 v2 v3 v4 sum v3 = c(6,7,8) 1 1 3 6 7 7 v4 = c(7,9,0) 2 3 4 7 9 10 dat=data.frame(v1,v2,v3,v4) 3 5 7 8 0 13 attach(dat) dat$sum = dat$v1 + dat$v3 There is NO sum1 ! sum1 = v1 + v3 dat Data coding in R id = c(1, 2, 3, 4, 5) gender = c("male", "female", "male", "female", "female") dat = data.frame(id, gender) We want to create a new variable called sex with numeric values (1, 2) dat$sex[gender=="male"] <- 1 dat$sex[gender=="female"] <- 2 Character and numeric coding Character to numeric X = c("1", "2", "3", "4", "5") We want to create a new variable called Y with numeric values (for calculaon) Y = as.numeric(X) mean(Y) Numeric to character Y = 1:10 We want to create a new variable called X with character values X = as.character(Y) Sorting dat: sort() X = rnorm(10); X [1] 1.5651300 -0.5382971 -0.1995302 1.0111098 0.3590144 -1.5245237 [7] -0.3192534 0.1323256 -0.7916954 -0.0664167 sort(X) [1] -1.5245237 -0.7916954 -0.5382971 -0.3192534 -0.1995302 -0.0664167 [7] 0.1323256 0.3590144 1.0111098 1.5651300 Merging datasets id = c(1,2,3,4) id = c(1,2,3,4,5) sex=c("M","F","M","F") age=c(21,34,45,32,18) dat1=data.frame(id,sex) dat2=data.frame(id,age) dat = merge(dat1, dat2, by="id") dat = merge(dat1, dat2, by="id", all.x=T, all.y=T) An R Session (demo) To work with R … • R, like most stas$cal programs, works on observaons (rows) and variables • You should keep in mind – Name of dataframe – Name of variables Allison and Cichhetti’s study Trueb Allison; Domenic V. Cicche. Sleep in Mammals: Ecological and Cons$tu$onal Correlates. Science 1976; 194:732-734. R Session • Reading a file into R for analysis Filename: allison.csv • Some graphical analyses • Some descrip$ve (and not so descrip$ve) analyses Allison T, Cicchetti DV (1976). Sleep in mammals: ecological and constitutional correlates. Science 194, 732–734. NonDrea Species BodyWt BrainWt ming Dreaming TotalSleep LifeSpan Gestaon Predaon Exposure Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3 Africangiantpouchedrat 1 6.6 6.3 2 8.3 4.5 42 3 1 3 ArccFox 3.385 44.5 NA NA 12.5 14 60 1 1 1 Arccgroundsquirrel 0.92 5.7 NA NA 16.5 NA 25 5 2 3 Asianelephant 2547 4603 2.1 1.8 3.9 69 624 3 5 4 Baboon 10.55 179.5 9.1 0.7 9.8 27 180 4 4 4 Bigbrownbat 0.023 0.3 15.8 3.9 19.7 19 35 1 1 1 Braziliantapir 160 169 5.2 1 6.2 30.4 392 4 5 4 Cat 3.3 25.6 10.9 3.6 14.5 28 63 1 2 1 Chimpanzee 52.16 440 8.3 1.4 9.7 50 230 1 1 1 Chinchilla 0.425 6.4 11 1.5 12.5 7 112 5 4 4 Cow 465 423 3.2 0.7 3.9 30 281 5 5 5 Deserthedgehog 0.55 2.4 7.6 2.7 10.3 NA NA 2 1 2 Donkey 187.1 419 NA NA 3.1 40 365 5 5 5 EasternAmericanmole 0.075 1.2 6.3 2.1 8.4 3.5 42 1 1 1 Reading file csv • Locate your folder and filename • Use the func$on read.csv • In Mac, you simply drag the filename to the R command line dat = read.csv("~/Dropbox/Garvan Lectures 2014/Datasets and Teaching Materials/ allison.csv", header=T, na.strings="NA") Reading file through file.choose() f = file.choose() # find the file dat = read.csv(f, header=T, na.strings="NA") attach(dat) # aach the data before analysis names(dat) # want to know variable names dim(dat) # how many rows and columns? summary(dat) # summarize data Summary: an overall “picture” > summary(dat) Species BodyWt BrainWt Africanelephant : 1 Min.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    69 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us