/ Python Why and How to Get Started What do you use?

• Use SPSS, , or SAS • with the GUI/menus • with syntax

• Use Excel • for data management • for data analysis

• Use Matlab, R, or Python Python

• General • Create applications, run websites, interface with systems • Has all the elements of other languages

• Created by groups of computer scientists • Runs fast and stable for production workflows • Simplest of languages, one best way to do any action R

• Statistical Language • Built to do math and work with datasets • Can utilize some tools from other languages

• Created by statisticians • Fast and intuitive to do analysis, slower to process • Many statisticians have increased it's capabilities Both

• Use Scripting • The code/syntax is intended to be saved in a script file • The code can be re-played to reproduce the output

• Open Source & Extensible • Anybody can create new add-ins ("packages") • People can NOT change the original without permission

• Free to use As originally built, you:

Type instructions at a prompt…

Python R Console …and get output like this

R - Regression

Python - Frequency Table (Python) IDEs

Script • Script Window

• Console Console • Help • History RStudio (R) • Files

• Plots Script • Environment / Variable explorer • Run current selection Console So, why all the buzz?

• Free software that can do everything SPSS, Stata, SAS, and Excel can do.

• Massive improvements in ease of use through packages with convenience functions. What to Install

• R • Install R from CRAN • Install RStudio

• Python • Install Anaconda

All of these are cross-platform with regular installers No Installation Needed

• RStudio (beta) • https://rstudio.cloud/

• Python Anywhere • https://www.pythonanywhere.com/

• Both require free accounts. Old and New

• R is an implementation of S (created in 1976) and was first released in 1995 with v1.0 in 2000. 's package was introduced in 2014. The RStudio IDE was released in 2011 with v1.0 in 2016.

• Python was created in 1990. The interactive shell was released in 2001. The data management package was first released by Wes McKinney in 2008; v1.0 was released in 2020. The Spyder IDE was released in 2009. Misconceptions

• Python is better than R • Python can do a wider variety of computer tasks than R. Python has Breadth, R has Depth in Data/Statistics • The languages themselves are not what people are judging, they are judging the entire ecosystem.

• Python is easier than R • Python is the simplest of the programming languages. R is not a programming language. • Since R was made by Statisticians, it does some things different than other general programming languages. Demonstration

• Python

• R Functions & Packages The building blocks of computer languages Have you used Functions?

word(stuff) Have you used Functions?

word(stuff)

=AVERAGE(V1:V5)

COMPUTE average = MEAN(v1, v2, v3, v4, v5). egen average = rowmean(v1 v2 v3 v4 v5) average = mean(of v1-v5); Creating a Scale Index ncc_score = (ncc1 + ncc2 + ncc3 + ncc4 + ncc5) / 5 ncc_score = SUM(ncc1, ncc2, ncc3, ncc4, ncc5) / 5 ncc_score = MEAN(ncc1, ncc2, ncc3, ncc4, ncc5) ncc_score = SCALE(ncc) "Convenience Function" ncc_score = SCALE(ncc, "sum")

Added "Argument" Function Names and Arguments Functions & Objects Packages

• Package: A group of functions installed together • Packages may have functions with the same name!

• Install: Copy instructions to your computer

• Load/Attach/Import: Put instructions in memory Media Literacy -> Package Literacy

• Who wrote it? • How long has it been around? • How many other people use it? • Where is the code? • How good is the documentation? • What kind of testing has been done? • Does it give the same results? • What do other people say about it? Is R / Python for You Check in with yourself:

• Do functions and arguments make sense?

• Can you be detail oriented?

• Can you keep track of things that change?

• Are you good at thinking systematically?

It's okay if you answered "no". You can still use R. If Not Yet

• Practice with functions in software you know.

• Use Jamovi (R) and have it show the syntax.

• Practice reading syntax and identifying functions and objects. Which to Pick

• Start with whichever one… • … the people around you use • … has the functions you need • … looks easier to read for you • Use R if you mostly work with data tables and do statistics • Tends to get new statistical procedures first • Easier to read and • Use Python if you often do non-statistical programming • More and better non-tabular text-processing tools • Better integrates with applications R + Python

• Use R in Python (r2py) • Use Python in R (reticulate) • Use R or Python in SPSS, Stata, and SAS

• Some features in R get "ported" to Python • Some features in Python get "ported" to R

• Use SQL in R or Python Where to Start?

• Data Management • Many, many, functions • Python: pandas • R: , data.table, or sqldf • Statistical Analysis • Formula Notation • Python: statsmodels • R: base R, afex/car (ANOVA), lme4 (Mixed Models), etc. • Graphing • Python: seaborn (uses ) • R: , ggformula, or lattice Interpreting Tutorials Recognizing Packages

In R: • library(package) • require(package) • package::function()

In Python: • import package as nickname • nickname.stuff What to Look For

Functions & Methods:

Objects: Learning Packages & Functions Built-In Datasets

R • data() #see all the datasets included • data(name) # make it available Python • Some options, but none great • Just use R's • https://vincentarelbundock.github.io/Rdatasets/ • Both allow URLs in read_csv Creating Data

Use Vectors It is very common to use vectors as variables without making them into a dataframe. A <- c(1,2,3,4,5) B <- c(7:20, 200) t.test(A,B)

group <- 1:2 value <- rnorm(20) data <- data.frame(group, value) t.test(value ~ group, data=data) extension is .ipynb Jupyter Notebook Our InfoGuides

• https://infoguides.gmu.edu/learn_r • https://infoguides.gmu.edu/learn_python

• DataCamp • Carpentries • CodeSchool • Coursera Jamovi jamovi.org Syntax Mode

Comments # Descriptives Functions

Packages jmv::descriptives( data = data, Data Specification vars = "fate") Arguments

Values