Course for the NSOs in the Arab countries Part I: Introduction

Valentin Todorov1

1United Nations Industrial Development Organization, Vienna

18-20 May 2015

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 1 / 1 Outline

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 2 / 1 About R Outline

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 3 / 1 About R What is R

• R is a language and environment for statistical computing and graphics • R is based on the S language originally developed by John Chambers and colleagues at AT&T Bell Labs in the late 1970s and early 1980s • R (sometimes called ”GNU S“ ) is free open source licensed under the GNU general public license (GPL 2) • R was created by Robert Gentleman and Ross Ihaka at the University of Auckland as a test bed for trying out some ideas in statistical computing • R is formally known as The R Project for Statistical Computing: http://www.r-project.org

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 4 / 1 About R The R project

• The R Project is an international collaboration of researchers in statistical computing. • There are roughly 20 members of the ”R Core Team“ who maintain and enhance R. • Releases of the R environment are made through the CRAN (comprehensive R archive network) twice per year. • The software is released under a ”free software“ license, which makes it possible for anyone to download and use it. • There are over 6000 extension packages that have been contributed to CRAN.

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 5 / 1 About R Popularity of R

• The number of R users continues to increase exponentially since 1996. • Google, Facebook, Pfizter, Merc, Bank of America, . . . (a long list...) use R in production. • New developed methods and algorithms are almost always only in R available. • Most Universities teach statistics with R. • In indexes and rankings (e.g. Tiobe, Online-help, Downloads, number of projects, etc) R leads.

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 6 / 1 About R Popularity of R

From http://r4stats.com/articles/popularity/ But what about Official statistics?

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 7 / 1 About R Popularity of R

From http://r4stats.com/articles/popularity/ But what about Official statistics?

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 7 / 1 About R Popularity of R

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 8 / 1 The advantages and disadvantages of R Outline

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 9 / 1 The advantages and disadvantages of R The advantages of R

R is a powerful and free statistical environment and programming language for data management, statistical computing, graphics with the following features: - R is the most comprehensive statistical analysis package incorporating all of the standard statistical tests, models, and analyses. - Outstanding graphical capabilities. R provides a fully programmable graphics language that surpasses most other statistical and graphical packages. - A comprehensive and efficient programming language (although with some flaws and wired features). - Efficient matrix manipulation (implemented in C or Fortran) built in the language. - Object oriented programming language

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 10 / 1 The advantages and disadvantages of R The advantages of R

R is a powerful and free statistical environment and programming language for data management, statistical computing, graphics with the following features: - R is the most comprehensive statistical analysis package incorporating all of the standard statistical tests, models, and analyses. - Outstanding graphical capabilities. R provides a fully programmable graphics language that surpasses most other statistical and graphical packages. - A comprehensive and efficient programming language (although with some flaws and wired features). - Efficient matrix manipulation (implemented in C or Fortran) built in the language. - Object oriented programming language

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 10 / 1 The advantages and disadvantages of R The advantages of R

R is a powerful and free statistical environment and programming language for data management, statistical computing, graphics with the following features: - R is the most comprehensive statistical analysis package incorporating all of the standard statistical tests, models, and analyses. - Outstanding graphical capabilities. R provides a fully programmable graphics language that surpasses most other statistical and graphical packages. - A comprehensive and efficient programming language (although with some flaws and wired features). - Efficient matrix manipulation (implemented in C or Fortran) built in the language. - Object oriented programming language

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 10 / 1 The advantages and disadvantages of R The advantages of R

R is a powerful and free statistical environment and programming language for data management, statistical computing, graphics with the following features: - R is the most comprehensive statistical analysis package incorporating all of the standard statistical tests, models, and analyses. - Outstanding graphical capabilities. R provides a fully programmable graphics language that surpasses most other statistical and graphical packages. - A comprehensive and efficient programming language (although with some flaws and wired features). - Efficient matrix manipulation (implemented in C or Fortran) built in the language. - Object oriented programming language

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 10 / 1 The advantages and disadvantages of R The advantages of R

R is a powerful and free statistical environment and programming language for data management, statistical computing, graphics with the following features: - R is the most comprehensive statistical analysis package incorporating all of the standard statistical tests, models, and analyses. - Outstanding graphical capabilities. R provides a fully programmable graphics language that surpasses most other statistical and graphical packages. - A comprehensive and efficient programming language (although with some flaws and wired features). - Efficient matrix manipulation (implemented in C or Fortran) built in the language. - Object oriented programming language

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 10 / 1 The advantages and disadvantages of R The advantages of R

R is a powerful and free statistical environment and programming language for data management, statistical computing, graphics with the following features: - R is the most comprehensive statistical analysis package incorporating all of the standard statistical tests, models, and analyses. - Outstanding graphical capabilities. R provides a fully programmable graphics language that surpasses most other statistical and graphical packages. - A comprehensive and efficient programming language (although with some flaws and wired features). - Efficient matrix manipulation (implemented in C or Fortran) built in the language. - Object oriented programming language

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 10 / 1 The advantages and disadvantages of R The advantages of R

- High-performance computing with interfaces for native code and support for parallel and grid computing. - R plays well with many other tools: import/export data from/to CSV files, SAS, SPSS, Microsoft Excel, Microsoft Access, Oracle, MySQL, and SQLite. - R can produce graphics output in PDF, JPG, PNG, and SVG formats, and table output for LATEX and HTML - The extensive feature set of R can be extended by installing additional packages: provides a great variety of packages for statistical analysis in finance, environment, and different life science areas

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 11 / 1 The advantages and disadvantages of R The advantages of R

- High-performance computing with interfaces for native code and support for parallel and grid computing. - R plays well with many other tools: import/export data from/to CSV files, SAS, SPSS, Microsoft Excel, Microsoft Access, Oracle, MySQL, and SQLite. - R can produce graphics output in PDF, JPG, PNG, and SVG formats, and table output for LATEX and HTML - The extensive feature set of R can be extended by installing additional packages: provides a great variety of packages for statistical analysis in finance, environment, and different life science areas

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 11 / 1 The advantages and disadvantages of R The advantages of R

- High-performance computing with interfaces for native code and support for parallel and grid computing. - R plays well with many other tools: import/export data from/to CSV files, SAS, SPSS, Microsoft Excel, Microsoft Access, Oracle, MySQL, and SQLite. - R can produce graphics output in PDF, JPG, PNG, and SVG formats, and table output for LATEX and HTML - The extensive feature set of R can be extended by installing additional packages: provides a great variety of packages for statistical analysis in finance, environment, and different life science areas

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 11 / 1 The advantages and disadvantages of R The advantages of R

- High-performance computing with interfaces for native code and support for parallel and grid computing. - R plays well with many other tools: import/export data from/to CSV files, SAS, SPSS, Microsoft Excel, Microsoft Access, Oracle, MySQL, and SQLite. - R can produce graphics output in PDF, JPG, PNG, and SVG formats, and table output for LATEX and HTML - The extensive feature set of R can be extended by installing additional packages: provides a great variety of packages for statistical analysis in finance, environment, and different life science areas

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 11 / 1 The advantages and disadvantages of R Why is R so popular?

Chambers (2009) 1. R provides an interface to computational procedures of many kinds; 2. R is interactive, hands-on in real time; 3. R is functional in its model of programming; 4. R is object-oriented, ”everything is an object“; 5. R is modular, built from standardized pieces; and, 6. R is collaborative, a world-wide, open-source effort. Three additional items by Venables (2013) 7. R is extensible, may be augmented by compiled code in other languages; 8. R is cross-platform; and 9. R is international. Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 12 / 1 The advantages and disadvantages of R The disadvantages of R

• Steep learning curve (people say) • Documentation is patchy and terse • No commercial support • Well, let us leave this for another tutorial . . .

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 13 / 1 What can I do with R? Outline

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 14 / 1 What can I do with R? What can I do with R?

1. Data manipulation

I Data input from keyboard,

I ... from text files, XML, SDMX,

I ... from spreadsheets,

I ... from other statistical software,

I ... from relational databases,

I ... from the web.

I Merging, combining, and sub-setting data sets

I Character, date and time processing 2. Statistical analysis

I Descriptive statistics: mean, variance, density, etc.

I Linear models

I Multivariate analysis

I Time series analysis

I Robust statistics

I Many more ... Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 15 / 1 What can I do with R? What can I do with R?

3. Statistical graphics

I Graphics for exploratory data analysis

I ... line plot, bar plot, scatter plot, box plot, density plot

I Predefined plots for particular models

I Flexible and powerful options

I Publication quality graphics

I Save as image files

I Include in dynamically generated reports 4. Write own functions

I Change existing functions (all sources are available)

I Create a new function for your needs

I Include all your functions in package

I Contribute a package to the community

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 16 / 1 What can I do with R? What can I do with R?

5. Official statistics

I All of the previous 1) to 4)

I Specify complex survey designs

I Various sampling algorithms

I Editing and visual inspection of micro data

I Imputation of missing values

I Statistical disclosure control

I Seasonal adjustment

I Statistical Matching and Record Linkage

I Small area simulation

I Compilation of Indices and Indicators

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 17 / 1 Where do I get R? Outline

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 18 / 1 Where do I get R? Where do I get R?

• There are versions for Unix, Windows, and Mac OS. All of them are free • Download from: http://cran.r-project.org/, and follow the instructions (Enter, Enter, Enter, . . . ). • Start R and install packages by

I install.packages(c("XLConnect", "RSQLite"))

I update.packages()

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 19 / 1 Where do I get R? Additional tools

There exist good editors (IDE) for R which allow: • Syntax highlighting and code-completing • SVN integration • Interfaces and interaction with other software • Automatic connection to servers See also: Eclipse + statet or RStudio.

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 20 / 1 Where do I get R? RStudio

• A complete open-source IDE for R • Object explorer • Graphics window in main IDE • Java based • Platform independent http://support.rstudio.org Highly recommended

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 21 / 1 Where do I get R? RStudio

• A complete open-source IDE for R • Object explorer • Graphics window in main IDE • Java based • Platform independent http://support.rstudio.org Highly recommended

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 21 / 1 Where do I get R? R-WinEdt

• Based on WinEdt: excellent editor • Support for LATEX, Sweave and knitr • Syntax highlighting for R • Only Windows

http://www.winedt.com This is what I use

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 22 / 1 Where do I get R? R-WinEdt

• Based on WinEdt: excellent shareware editor • Support for LATEX, Sweave and knitr • Syntax highlighting for R • Only Windows

http://www.winedt.com This is what I use

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 22 / 1 Installing R Outline

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 23 / 1 Navigating the system Outline

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 24 / 1 Navigating the system Navigating the R system

• The standard R interface

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 25 / 1 Navigating the system Navigating the R system

• Simply type commands in the console > pi [1] 3.141593 > 2 + 5^2 [1] 27 > cos(3/4*pi) [1] -0.7071068 • Is it easier to do regression in R or in Excel? • To repeat a command, press the Up Arrow key • To interrupt a command press ESC (on Windows) • Use a file where to save the commands; use the built-in editor

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 26 / 1 Navigating the system Navigating the R system

• The built-in editor

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 27 / 1 Navigating the system Navigating the R system

• Start the built-in editor with the menu File-New script • Type any commands you would like or copy and paste from somewhere else • To run a line, select the line an press Ctrl-R • To run a block of code, select the block and press Ctrl-R • To run the whole script press Ctrl-A to select everything and them press Ctrl-R

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 28 / 1 Navigating the system Navigating the R system

• Exiting from R:

I To interrupt a running command, press ESC

I To exit from R type q()

I To exit from R use the menu File-Exit or

I simply close the window.

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 29 / 1 R packages Outline

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 30 / 1 R packages What are R packages

• Packages are standardized units for extending R (Hornik, 2004). • Easy, transparent and cross-platform extension of the R base system. • The R distribution itself contains 30 packages. • Packages must provide as a minimum the following information to the core R system:

I name and version;

I license, description, title,

I author and maintainer. • A package must be installed, using for example the R command install.packages(). • Before using a package it has to be loaded into the system by the library() command.

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 31 / 1 R packages Why Rpackages

• Accessible functions and data

I Convenient means for code storage and version control

I Functions, data and other objects can be easily made available for use (loaded) by a single library(myPackage) command

I Facilitates access to native code (C/C++/FORTRAN)

I Sharing code with others

I Using a package makes sense even for personal use • Reliable and maintainable code

I Facilitates code development (more disciplined software development), particularly in collaborative projects

I Better design of the functions

I Less bugs and easier to fix them

I More reliable code

I Maintainable code

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 32 / 1 R packages Basic terms related to R packages

• Package: A set of code, example data and documentation in a standard form used for extension of the base R • Library: A directory containing installed packages • Repository: A formalized web site providing packages for installation • Source: The source version of the package containing the R source code, data, documentation and other components in its original form • Binary: A compiled version of the package suitable for use only on a particular platform (e.g. Windows, Mac OS) • Base packages: Packages maintained by the R core development team, distributed and installed as a part of the R software • Recommended packages: Packages distributed with the main R software but not necessarily maintained by the R core Todorovdevelopment (UNIDO) teamR Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 33 / 1 • Contributed packages: All other packages—most of them can be downloaded and installed from the CRAN repository. R packages CRAN

as of 11.03.2015

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 34 / 1 R packages CRAN: Top 10 packages by popularity

http://www.r-statistics.com/2013/06/ top-100-r-packages-for-2013-jan-may/

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 35 / 1 R packages CRAN: How often are my packages downloaded?

http://www.nicebread.de/ finally-tracking-cran-packages-downloads/

3000

2000 package robustbase rrcov spls Downloads

1000

0 2014−35 2014−36 2014−37 2014−38 2014−39 2014−40 2014−41 2014−42 2014−43 2014−44 2014−45 2014−46 2014−47 2014−48 2014−49 2014−50 2014−51 2014−52 2015−00 2015−01 2015−02 2015−03 2015−04 2015−05 2015−06 2015−07 2015−08 week

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 36 / 1 R packages Using R packages

• Which packages are currently loaded: search path—use the function search() • What packages are currently installed:

I library() without arguments

I installed.packages() returns a data frame, a row per package. The Priority column—types of packages: base or recommended or NA. • Information about a package:

I packageDescription("MASS")

I library(help="MASS")

I help(package="MASS")

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 37 / 1 R packages Using R packages II

• Use the functions in a package:

I library(packagename) or

I require(packagename) • List the available packages in a repository:

I available.packages() • Installing and updating packages:

I install.packages("packagename"

I old.packages()

I update.packages() • Package vignettes: Use the function vignette() to list all available vignettes or to view a particular vignette.

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 38 / 1 R packages Using R packages III

When are we searching for packages? • We need certain functionality to use for our work or • We have developed methodology or algorithm, implemented it in R and we want to know if it worths publishing it at CRAN.

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 39 / 1 R packages Using R packages IV

How to find packages? • Ask Google, but do not expect a precise answer. • Ask a question at R-Help (do not expect a polite answer); ask at Stack Overflow. • Go to CRAN Task Views. > library(ctv) > views <- available.views() > unlist(lapply(views, function(x) x[[1]])) • Use the new R package sos. For example try > library(sos) > findFn("robust+multivariate") The results will be shown in the web browser. • what else? Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 40 / 1 R learning resources Outline

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 41 / 1 R learning resources R home page

http://www.r-project.org R Home page • List of CRAN mirror sites • Manuals • FAQs • Mailing Lists • Links

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 42 / 1 R learning resources CRAN - Comprehensive R Archive Network

http://cran.r-project.org/ CRAN - Comprehensive R Archive Network • CRAN Mirrors

I About 75 sites worldwide • R Binaries • R Packages • R Sources • Task Views

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 43 / 1 R learning resources Quick-R

http://www.statmethods.net Introductory R Lessons • R Interface • Data Input • Data Management • Basic Statistics • Advanced Statistics • Basic Graphs • Advanced Graphs

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 44 / 1 R learning resources Useful R sites

• R Seek R specific search site: http://www.rseek.org/ • R Bloggers Aggregation of about 100 R blogs: http://www.r-bloggers.com • Stack Overflow Excellent developer Q&A forum: http://stackoverflow.com • R Graph Gallery Examples of many possible R graphs: http://addictedtor.free.fr/graphiques • Revolution Blog Blog from David Smith of Revolution: http://blog.revolutionanalytics.com • Inside-R R community site by Revolution Analytics: http://www.inside-r.org

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 45 / 1 Summary Outline

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 46 / 1 Summary Summary of Part 2

In this session we discussed the following R concepts: • What is R and why it is good to learn R? • What can I do with R • How to obtain and install R • How to navigate the system? • What are packages and how to use them • Where to find more information on R

Todorov (UNIDO) R Course for the NSOs in the Arab countriesPart I: Introduction18-20 May 2015 47 / 1