R / Python Why and How to Get Started What do you use?
• Use SPSS, Stata, or SAS • with the GUI/menus • with syntax
• Use Excel • for data management • for data analysis
• Use Matlab, R, or Python Python
• General programming language • Create applications, run websites, interface with systems • Has all the elements of other languages
• Created by groups of computer scientists • Runs fast and stable for production workflows • Simplest of languages, one best way to do any action R
• Statistical Language • Built to do math and work with datasets • Can utilize some tools from other languages
• Created by statisticians • Fast and intuitive to do analysis, slower to process • Many statisticians have increased it's capabilities Both
• Use Scripting • The code/syntax is intended to be saved in a script file • The code can be re-played to reproduce the output
• Open Source & Extensible • Anybody can create new add-ins ("packages") • People can NOT change the original without permission
• Free to use As originally built, you:
Type instructions at a prompt…
Python Shell R Console …and get output like this
R - Regression
Python - Frequency Table Spyder (Python) IDEs
Script • Script Window
• Console Console • Help • History RStudio (R) • Files
• Plots Script • Environment / Variable explorer • Run current selection Console So, why all the buzz?
• Free software that can do everything SPSS, Stata, SAS, and Excel can do.
• Massive improvements in ease of use through packages with convenience functions. What to Install
• R • Install R from CRAN • Install RStudio
• Python • Install Anaconda
All of these are cross-platform with regular installers No Installation Needed
• RStudio Cloud (beta) • https://rstudio.cloud/
• Python Anywhere • https://www.pythonanywhere.com/
• Both require free accounts. Old and New
• R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. Hadley Wickham's dplyr package was introduced in 2014. The RStudio IDE was released in 2011 with v1.0 in 2016.
• Python was created in 1990. The interactive shell was released in 2001. The data management package Pandas was first released by Wes McKinney in 2008; v1.0 was released in 2020. The Spyder IDE was released in 2009. Misconceptions
• Python is better than R • Python can do a wider variety of computer tasks than R. Python has Breadth, R has Depth in Data/Statistics • The languages themselves are not what people are judging, they are judging the entire ecosystem.
• Python is easier than R • Python is the simplest of the programming languages. R is not a programming language. • Since R was made by Statisticians, it does some things different than other general programming languages. Demonstration
• Python
• R Functions & Packages The building blocks of computer languages Have you used Functions?
word(stuff) Have you used Functions?
word(stuff)
=AVERAGE(V1:V5)
COMPUTE average = MEAN(v1, v2, v3, v4, v5). egen average = rowmean(v1 v2 v3 v4 v5) average = mean(of v1-v5); Creating a Scale Index ncc_score = (ncc1 + ncc2 + ncc3 + ncc4 + ncc5) / 5 ncc_score = SUM(ncc1, ncc2, ncc3, ncc4, ncc5) / 5 ncc_score = MEAN(ncc1, ncc2, ncc3, ncc4, ncc5) ncc_score = SCALE(ncc) "Convenience Function" ncc_score = SCALE(ncc, "sum")
Added "Argument" Function Names and Arguments Functions & Objects Packages
• Package: A group of functions installed together • Packages may have functions with the same name!
• Install: Copy instructions to your computer
• Load/Attach/Import: Put instructions in memory Media Literacy -> Package Literacy
• Who wrote it? • How long has it been around? • How many other people use it? • Where is the code? • How good is the documentation? • What kind of testing has been done? • Does it give the same results? • What do other people say about it? Is R / Python for You Check in with yourself:
• Do functions and arguments make sense?
• Can you be detail oriented?
• Can you keep track of things that change?
• Are you good at thinking systematically?
It's okay if you answered "no". You can still use R. If Not Yet
• Practice with functions in software you know.
• Use Jamovi (R) and have it show the syntax.
• Practice reading syntax and identifying functions and objects. Which to Pick
• Start with whichever one… • … the people around you use • … has the functions you need • … looks easier to read for you • Use R if you mostly work with data tables and do statistics • Tends to get new statistical procedures first • Easier to read and understand • Use Python if you often do non-statistical programming • More and better non-tabular text-processing tools • Better integrates with applications R + Python
• Use R in Python (r2py) • Use Python in R (reticulate) • Use R or Python in SPSS, Stata, and SAS
• Some features in R get "ported" to Python • Some features in Python get "ported" to R
• Use SQL in R or Python Where to Start?
• Data Management • Many, many, functions • Python: pandas • R: tidyverse, data.table, or sqldf • Statistical Analysis • Formula Notation • Python: statsmodels • R: base R, afex/car (ANOVA), lme4 (Mixed Models), etc. • Graphing • Python: seaborn (uses matplotlib) • R: ggplot2, ggformula, or lattice Interpreting Tutorials Recognizing Packages
In R: • library(package) • require(package) • package::function()
In Python: • import package as nickname • nickname.stuff What to Look For
Functions & Methods:
Objects: Learning Packages & Functions Built-In Datasets
R • data() #see all the datasets included • data(name) # make it available Python • Some options, but none great • Just use R's • https://vincentarelbundock.github.io/Rdatasets/ • Both allow URLs in read_csv Creating Data
Use Vectors It is very common to use vectors as variables without making them into a dataframe. A <- c(1,2,3,4,5) B <- c(7:20, 200) t.test(A,B)
group <- 1:2 value <- rnorm(20) data <- data.frame(group, value) t.test(value ~ group, data=data) extension is .ipynb Jupyter Notebook Our InfoGuides
• https://infoguides.gmu.edu/learn_r • https://infoguides.gmu.edu/learn_python
• DataCamp • Carpentries • CodeSchool • Coursera Jamovi jamovi.org Syntax Mode
Comments # Descriptives Functions
Packages jmv::descriptives( data = data, Data Specification vars = "fate") Arguments
Values