R Course for the NSOs in the Arab countries Part I: Introduction
Valentin Todorov1
1United Nations Industrial Development Organization, Vienna
18-20 May 2015
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 1 / 1 Outline
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 2 / 1 About R Outline
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 3 / 1 About R What is R
• R is a language and environment for statistical computing and graphics • R is based on the S language originally developed by John Chambers and colleagues at AT&T Bell Labs in the late 1970s and early 1980s • R (sometimes called ”GNU S“ ) is free open source software licensed under the GNU general public license (GPL 2) • R was created by Robert Gentleman and Ross Ihaka at the University of Auckland as a test bed for trying out some ideas in statistical computing • R is formally known as The R Project for Statistical Computing: http://www.r-project.org
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 4 / 1 About R The R project
• The R Project is an international collaboration of researchers in statistical computing. • There are roughly 20 members of the ”R Core Team“ who maintain and enhance R. • Releases of the R environment are made through the CRAN (comprehensive R archive network) twice per year. • The software is released under a ”free software“ license, which makes it possible for anyone to download and use it. • There are over 6000 extension packages that have been contributed to CRAN.
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 5 / 1 About R Popularity of R
• The number of R users continues to increase exponentially since 1996. • Google, Facebook, Pfizter, Merc, Bank of America, . . . (a long list...) use R in production. • New developed methods and algorithms are almost always only in R available. • Most Universities teach statistics with R. • In indexes and rankings (e.g. Tiobe, Online-help, Downloads, number of projects, etc) R leads.
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 6 / 1 About R Popularity of R
From http://r4stats.com/articles/popularity/ But what about Official statistics?
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 7 / 1 About R Popularity of R
From http://r4stats.com/articles/popularity/ But what about Official statistics?
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 7 / 1 About R Popularity of R
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 8 / 1 The advantages and disadvantages of R Outline
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 9 / 1 The advantages and disadvantages of R The advantages of R
R is a powerful and free statistical environment and programming language for data management, statistical computing, graphics with the following features: - R is the most comprehensive statistical analysis package incorporating all of the standard statistical tests, models, and analyses. - Outstanding graphical capabilities. R provides a fully programmable graphics language that surpasses most other statistical and graphical packages. - A comprehensive and efficient programming language (although with some flaws and wired features). - Efficient matrix manipulation (implemented in C or Fortran) built in the language. - Object oriented programming language
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 10 / 1 The advantages and disadvantages of R The advantages of R
R is a powerful and free statistical environment and programming language for data management, statistical computing, graphics with the following features: - R is the most comprehensive statistical analysis package incorporating all of the standard statistical tests, models, and analyses. - Outstanding graphical capabilities. R provides a fully programmable graphics language that surpasses most other statistical and graphical packages. - A comprehensive and efficient programming language (although with some flaws and wired features). - Efficient matrix manipulation (implemented in C or Fortran) built in the language. - Object oriented programming language
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 10 / 1 The advantages and disadvantages of R The advantages of R
R is a powerful and free statistical environment and programming language for data management, statistical computing, graphics with the following features: - R is the most comprehensive statistical analysis package incorporating all of the standard statistical tests, models, and analyses. - Outstanding graphical capabilities. R provides a fully programmable graphics language that surpasses most other statistical and graphical packages. - A comprehensive and efficient programming language (although with some flaws and wired features). - Efficient matrix manipulation (implemented in C or Fortran) built in the language. - Object oriented programming language
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 10 / 1 The advantages and disadvantages of R The advantages of R
R is a powerful and free statistical environment and programming language for data management, statistical computing, graphics with the following features: - R is the most comprehensive statistical analysis package incorporating all of the standard statistical tests, models, and analyses. - Outstanding graphical capabilities. R provides a fully programmable graphics language that surpasses most other statistical and graphical packages. - A comprehensive and efficient programming language (although with some flaws and wired features). - Efficient matrix manipulation (implemented in C or Fortran) built in the language. - Object oriented programming language
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 10 / 1 The advantages and disadvantages of R The advantages of R
R is a powerful and free statistical environment and programming language for data management, statistical computing, graphics with the following features: - R is the most comprehensive statistical analysis package incorporating all of the standard statistical tests, models, and analyses. - Outstanding graphical capabilities. R provides a fully programmable graphics language that surpasses most other statistical and graphical packages. - A comprehensive and efficient programming language (although with some flaws and wired features). - Efficient matrix manipulation (implemented in C or Fortran) built in the language. - Object oriented programming language
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 10 / 1 The advantages and disadvantages of R The advantages of R
R is a powerful and free statistical environment and programming language for data management, statistical computing, graphics with the following features: - R is the most comprehensive statistical analysis package incorporating all of the standard statistical tests, models, and analyses. - Outstanding graphical capabilities. R provides a fully programmable graphics language that surpasses most other statistical and graphical packages. - A comprehensive and efficient programming language (although with some flaws and wired features). - Efficient matrix manipulation (implemented in C or Fortran) built in the language. - Object oriented programming language
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 10 / 1 The advantages and disadvantages of R The advantages of R
- High-performance computing with interfaces for native code and support for parallel and grid computing. - R plays well with many other tools: import/export data from/to CSV files, SAS, SPSS, Microsoft Excel, Microsoft Access, Oracle, MySQL, and SQLite. - R can produce graphics output in PDF, JPG, PNG, and SVG formats, and table output for LATEX and HTML - The extensive feature set of R can be extended by installing additional packages: provides a great variety of packages for statistical analysis in finance, environment, and different life science areas
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 11 / 1 The advantages and disadvantages of R The advantages of R
- High-performance computing with interfaces for native code and support for parallel and grid computing. - R plays well with many other tools: import/export data from/to CSV files, SAS, SPSS, Microsoft Excel, Microsoft Access, Oracle, MySQL, and SQLite. - R can produce graphics output in PDF, JPG, PNG, and SVG formats, and table output for LATEX and HTML - The extensive feature set of R can be extended by installing additional packages: provides a great variety of packages for statistical analysis in finance, environment, and different life science areas
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 11 / 1 The advantages and disadvantages of R The advantages of R
- High-performance computing with interfaces for native code and support for parallel and grid computing. - R plays well with many other tools: import/export data from/to CSV files, SAS, SPSS, Microsoft Excel, Microsoft Access, Oracle, MySQL, and SQLite. - R can produce graphics output in PDF, JPG, PNG, and SVG formats, and table output for LATEX and HTML - The extensive feature set of R can be extended by installing additional packages: provides a great variety of packages for statistical analysis in finance, environment, and different life science areas
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 11 / 1 The advantages and disadvantages of R The advantages of R
- High-performance computing with interfaces for native code and support for parallel and grid computing. - R plays well with many other tools: import/export data from/to CSV files, SAS, SPSS, Microsoft Excel, Microsoft Access, Oracle, MySQL, and SQLite. - R can produce graphics output in PDF, JPG, PNG, and SVG formats, and table output for LATEX and HTML - The extensive feature set of R can be extended by installing additional packages: provides a great variety of packages for statistical analysis in finance, environment, and different life science areas
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 11 / 1 The advantages and disadvantages of R Why is R so popular?
Chambers (2009) 1. R provides an interface to computational procedures of many kinds; 2. R is interactive, hands-on in real time; 3. R is functional in its model of programming; 4. R is object-oriented, ”everything is an object“; 5. R is modular, built from standardized pieces; and, 6. R is collaborative, a world-wide, open-source effort. Three additional items by Venables (2013) 7. R is extensible, may be augmented by compiled code in other languages; 8. R is cross-platform; and 9. R is international. Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 12 / 1 The advantages and disadvantages of R The disadvantages of R
• Steep learning curve (people say) • Documentation is patchy and terse • No commercial support • Well, let us leave this for another tutorial . . .
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 13 / 1 What can I do with R? Outline
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 14 / 1 What can I do with R? What can I do with R?
1. Data manipulation
I Data input from keyboard,
I ... from text files, XML, SDMX,
I ... from spreadsheets,
I ... from other statistical software,
I ... from relational databases,
I ... from the web.
I Merging, combining, and sub-setting data sets
I Character, date and time processing 2. Statistical analysis
I Descriptive statistics: mean, variance, density, etc.
I Linear models
I Multivariate analysis
I Time series analysis
I Robust statistics
I Many more ... Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 15 / 1 What can I do with R? What can I do with R?
3. Statistical graphics
I Graphics for exploratory data analysis
I ... line plot, bar plot, scatter plot, box plot, density plot
I Predefined plots for particular models
I Flexible and powerful options
I Publication quality graphics
I Save as image files
I Include in dynamically generated reports 4. Write own functions
I Change existing functions (all sources are available)
I Create a new function for your needs
I Include all your functions in package
I Contribute a package to the community
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 16 / 1 What can I do with R? What can I do with R?
5. Official statistics
I All of the previous 1) to 4)
I Specify complex survey designs
I Various sampling algorithms
I Editing and visual inspection of micro data
I Imputation of missing values
I Statistical disclosure control
I Seasonal adjustment
I Statistical Matching and Record Linkage
I Small area simulation
I Compilation of Indices and Indicators
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 17 / 1 Where do I get R? Outline
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 18 / 1 Where do I get R? Where do I get R?
• There are versions for Unix, Windows, and Mac OS. All of them are free • Download from: http://cran.r-project.org/, and follow the instructions (Enter, Enter, Enter, . . . ). • Start R and install packages by
I install.packages(c("XLConnect", "RSQLite"))
I update.packages()
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 19 / 1 Where do I get R? Additional tools
There exist good editors (IDE) for R which allow: • Syntax highlighting and code-completing • SVN integration • Interfaces and interaction with other software • Automatic connection to servers See also: Eclipse + statet or RStudio.
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 20 / 1 Where do I get R? RStudio
• A complete open-source IDE for R • Object explorer • Graphics window in main IDE • Java based • Platform independent http://support.rstudio.org Highly recommended
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 21 / 1 Where do I get R? RStudio
• A complete open-source IDE for R • Object explorer • Graphics window in main IDE • Java based • Platform independent http://support.rstudio.org Highly recommended
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 21 / 1 Where do I get R? R-WinEdt
• Based on WinEdt: excellent shareware editor • Support for LATEX, Sweave and knitr • Syntax highlighting for R • Only Windows
http://www.winedt.com This is what I use
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 22 / 1 Where do I get R? R-WinEdt
• Based on WinEdt: excellent shareware editor • Support for LATEX, Sweave and knitr • Syntax highlighting for R • Only Windows
http://www.winedt.com This is what I use
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 22 / 1 Installing R Outline
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 23 / 1 Navigating the system Outline
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 24 / 1 Navigating the system Navigating the R system
• The standard R interface
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 25 / 1 Navigating the system Navigating the R system
• Simply type commands in the console > pi [1] 3.141593 > 2 + 5^2 [1] 27 > cos(3/4*pi) [1] -0.7071068 • Is it easier to do regression in R or in Excel? • To repeat a command, press the Up Arrow key • To interrupt a command press ESC (on Windows) • Use a file where to save the commands; use the built-in editor
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 26 / 1 Navigating the system Navigating the R system
• The built-in editor
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 27 / 1 Navigating the system Navigating the R system
• Start the built-in editor with the menu File-New script • Type any commands you would like or copy and paste from somewhere else • To run a line, select the line an press Ctrl-R • To run a block of code, select the block and press Ctrl-R • To run the whole script press Ctrl-A to select everything and them press Ctrl-R
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 28 / 1 Navigating the system Navigating the R system
• Exiting from R:
I To interrupt a running command, press ESC
I To exit from R type q()
I To exit from R use the menu File-Exit or
I simply close the window.
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 29 / 1 R packages Outline
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 30 / 1 R packages What are R packages
• Packages are standardized units for extending R (Hornik, 2004). • Easy, transparent and cross-platform extension of the R base system. • The R distribution itself contains 30 packages. • Packages must provide as a minimum the following information to the core R system:
I name and version;
I license, description, title,
I author and maintainer. • A package must be installed, using for example the R command install.packages(). • Before using a package it has to be loaded into the system by the library() command.
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 31 / 1 R packages Why Rpackages
• Accessible functions and data
I Convenient means for code storage and version control
I Functions, data and other objects can be easily made available for use (loaded) by a single library(myPackage) command
I Facilitates access to native code (C/C++/FORTRAN)
I Sharing code with others
I Using a package makes sense even for personal use • Reliable and maintainable code
I Facilitates code development (more disciplined software development), particularly in collaborative projects
I Better design of the functions
I Less bugs and easier to fix them
I More reliable code
I Maintainable code
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 32 / 1 R packages Basic terms related to R packages
• Package: A set of code, example data and documentation in a standard form used for extension of the base R • Library: A directory containing installed packages • Repository: A formalized web site providing packages for installation • Source: The source version of the package containing the R source code, data, documentation and other components in its original form • Binary: A compiled version of the package suitable for use only on a particular platform (e.g. Windows, Mac OS) • Base packages: Packages maintained by the R core development team, distributed and installed as a part of the R software • Recommended packages: Packages distributed with the main R software but not necessarily maintained by the R core Todorovdevelopment (UNIDO) teamR Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 33 / 1 • Contributed packages: All other packages—most of them can be downloaded and installed from the CRAN repository. R packages CRAN
as of 11.03.2015
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 34 / 1 R packages CRAN: Top 10 packages by popularity
http://www.r-statistics.com/2013/06/ top-100-r-packages-for-2013-jan-may/
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 35 / 1 R packages CRAN: How often are my packages downloaded?
http://www.nicebread.de/ finally-tracking-cran-packages-downloads/
3000
2000 package robustbase rrcov spls Downloads
1000
0 2014−35 2014−36 2014−37 2014−38 2014−39 2014−40 2014−41 2014−42 2014−43 2014−44 2014−45 2014−46 2014−47 2014−48 2014−49 2014−50 2014−51 2014−52 2015−00 2015−01 2015−02 2015−03 2015−04 2015−05 2015−06 2015−07 2015−08 week
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 36 / 1 R packages Using R packages
• Which packages are currently loaded: search path—use the function search() • What packages are currently installed:
I library() without arguments
I installed.packages() returns a data frame, a row per package. The Priority column—types of packages: base or recommended or NA. • Information about a package:
I packageDescription("MASS")
I library(help="MASS")
I help(package="MASS")
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 37 / 1 R packages Using R packages II
• Use the functions in a package:
I library(packagename) or
I require(packagename) • List the available packages in a repository:
I available.packages() • Installing and updating packages:
I install.packages("packagename"
I old.packages()
I update.packages() • Package vignettes: Use the function vignette() to list all available vignettes or to view a particular vignette.
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 38 / 1 R packages Using R packages III
When are we searching for packages? • We need certain functionality to use for our work or • We have developed methodology or algorithm, implemented it in R and we want to know if it worths publishing it at CRAN.
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 39 / 1 R packages Using R packages IV
How to find packages? • Ask Google, but do not expect a precise answer. • Ask a question at R-Help (do not expect a polite answer); ask at Stack Overflow. • Go to CRAN Task Views. > library(ctv) > views <- available.views() > unlist(lapply(views, function(x) x[[1]])) • Use the new R package sos. For example try > library(sos) > findFn("robust+multivariate") The results will be shown in the web browser. • what else? Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 40 / 1 R learning resources Outline
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 41 / 1 R learning resources R home page
http://www.r-project.org R Home page • List of CRAN mirror sites • Manuals • FAQs • Mailing Lists • Links
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 42 / 1 R learning resources CRAN - Comprehensive R Archive Network
http://cran.r-project.org/ CRAN - Comprehensive R Archive Network • CRAN Mirrors
I About 75 sites worldwide • R Binaries • R Packages • R Sources • Task Views
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 43 / 1 R learning resources Quick-R
http://www.statmethods.net Introductory R Lessons • R Interface • Data Input • Data Management • Basic Statistics • Advanced Statistics • Basic Graphs • Advanced Graphs
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 44 / 1 R learning resources Useful R sites
• R Seek R specific search site: http://www.rseek.org/ • R Bloggers Aggregation of about 100 R blogs: http://www.r-bloggers.com • Stack Overflow Excellent developer Q&A forum: http://stackoverflow.com • R Graph Gallery Examples of many possible R graphs: http://addictedtor.free.fr/graphiques • Revolution Blog Blog from David Smith of Revolution: http://blog.revolutionanalytics.com • Inside-R R community site by Revolution Analytics: http://www.inside-r.org
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 45 / 1 Summary Outline
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 46 / 1 Summary Summary of Part 2
In this session we discussed the following R concepts: • What is R and why it is good to learn R? • What can I do with R • How to obtain and install R • How to navigate the system? • What are packages and how to use them • Where to find more information on R
Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 47 / 1