Scientific Computing and Visualization

Total Page:16

File Type:pdf, Size:1020Kb

Scientific Computing and Visualization

Introduction to R Data Analysis and Calculations

Katia Oleinik [email protected]

Boston University Scientific Computing and Visualization Introduction to R

R arithmetic operations Operation Description x + y addition x - y subtraction x * y multiplication x / y division x ^ y exponentiation x %% y x mod y x %/% y integer division

Variable Name rules

 Case sensitive : Party ≠ party

 Letters, digits, underscores and dots can be used: DNA.data.2012

 Cannot start with a digit, underscore or a dot followed by a digit: 2012.DNA

 Should not use reserved words (if, else, repeat, etc.) which

~ 2 ~ Boston University Scientific Computing and Visualization Introduction to R R atomic constants types:

1. Integer: n <- 1 or n <- as.integer(1) or n <- 1L

2. Numeric: a <- 2.5

3. Complex: d <- 3 + 12i

4. Logical: ans <- TRUE

5. Character: name <- “Katia” or name <- ‘Katia’

6. Special: NULL, NA, Inf, Nan

R operators: Operations Description + - * / %% ^ Arithmetic > >= < <= == != Relational ! & | Logical ~ Model Formulas -> <- Assignment $ List indexing : Sequence

~ 3 ~ Boston University Scientific Computing and Visualization Introduction to R

R built-in constants: Constants Description LETTERS 26 upper-case letters of the Roman alphabet letters 26 lower-case letters of the Roman alphabet month.abb 3-letter abbreviations of month names month.name month names pi π: ratio of circle circumference to diameter T , F TRUE, FALSE

~ 4 ~ Boston University Scientific Computing and Visualization Introduction to R

R math functions for scalars and vectors: Function Description sin, cos, tan, asin, acos, atan, Various standard trig, log and exp. functions atan2, log, log10, log(x,base), exp, sinh, cosh, … min(x), max(x), range(x), abs(x) Minimum/maximum, range and absolute value sum(x), diff(x), prod(x) Sum, difference and product of vector elements mean(x), median(x),sd(x), var(x) Mean, median, standard deviation, variance weighted.mean(x,w) Mean of x with weights w quantile(x,probs=) Sample quantiles corresponding to the given probabilities (defaults to 0,.25,.5,.75,1) round(x, n) Rounds the elements of x to n decimals Re(x), Im(x), Conj(x) Real, imaginary part of a complex number, Conjugate of a number Arg(x) Angle in radians of the complex number fft(x) Fast Fourier Transform of an array pmin(x,y,…), pmax(x,y,…) A vector which ith element is min/max of (x[i],y[i],…) cumsum(x), cumprod(x) A vector, which ith element is a sum/product from x[1] to x[i] cummin(x), cummax(x) A vector, which ith element is a min/max from x[1] to x[i] var(x,y) or cov(x,y) Covariance between 2 vectors cor(x,y) Linear correlation between x and y length(x) Get the length of the vector factorial(n) Calculate n! choose(n,m) Combination function: n! / ( k! * (n - k)! )

*Note: Many math functions have a logical parameter na.rm=FALSE to specify missing data (NA) removal.

~ 5 ~ Boston University Scientific Computing and Visualization Introduction to R

Directories and Workspace: Function Description getwd() Get working directory setwd(“/projects/myR/”) Set current directory ls() List objects in the current workspace rm(x,…) Remove objects from the current workspace list.files() List files in the current directory list.dirs() List directories file.info(“myfile.xls”) Get file properties file.exists(“myfile.xls”) Check if file exists file.remove(“myfile.xls”) Delete file file.append(file1, file2) Append file2 to file1 file.copy(from, to, …) Copy file system(“ls -la”) Execute command in the operating system save.image() Save contents of the current workspace in the default file .Rdata save.image(file=”myR.Rdata”) Save contents of the current workspace in the file save(a,b, file = “ab.Rdata”) Save a and b in the file load(“myR.Rdata”) Restore workspace from the file

Loading and Saving Data:

Function Description read.table(file=”myData.txt”, header=TRUE) Read text file

~ 6 ~ Boston University Scientific Computing and Visualization Introduction to R read.csv(file=”myData.csv”) Read csv file (“,” – default separator) list.files(); dir() List all files in current directory file.show(file=”myData.csv”) Show file content write.table(file=”myData.txt”,…) Save data into a file write.csv(file=”myData.csv”,…) Save data into csv formatted file

Exploring the data: Function Description class(x) Get class attribute of an object names(x) Function to get or set names of an object head(x), tail(x) Returns the first/last parts of vector, matrix, dataframe, function str(x) Structure of an object dimnames(x) Retrieve or set dimnames of an object length(x) Get or set the length of a vector or factor summary(x) Generic function – produces summary of the data attributes(x) List object’s attributes dim(x) Retrieve or set the dimension of an object nrow(x), ncol(x) Return the number of rows or columns of vector, matrix or dataframe row.names() Retrieve or set the names of the rows

R script file

 R script is usually saved in a file with extension .R (or .r).

 # - serves as a comment indicator (every character on the line after #-sign is ignored

 source(“myScript.R”) will load the script into R workspace and execute it

 source(“myScript.R”, echo=TRUE) will load and execute the script and also show the content of the file

~ 7 ~ Boston University Scientific Computing and Visualization Introduction to R R script example (weather.R)

# This script loads data from a table and explore the data # Script is written for Introduction to R tutorial

# Load datafile weather <- read.csv(“BostonWeather_sept2012.csv”)

# Get header names names(weather)

# Get class of the loaded object class (weather)

# Get attributes attributes(weather)

# Get dimensions of the loaded data dim(weather)

# Get structure of the loaded object str(weather)

# Summary of the data summary(weather)

Installing and loading R packages

 To install R package from cran website: install.packages(“package”)

 library( package )- loads package into workspace. Library has to be loaded every time you open a workspace.

 Another way to load package into workspace is require(package). Usually used inside functions. It returns FALSE and gives a warning (rather than error) if package does not exist.

 installed.packages() – retrieve details about all packages installed

 library() lists all available packages

~ 8 ~ Boston University Scientific Computing and Visualization Introduction to R  search() lists all loaded packages

 library(help = package) provides information about all the functions in a package

Getting help Function Description Example ?topic Get R documentation on topic ?mean help(topic) Get R documentation on topic help(mean) help.search(“topic”) Search the help for topic help.search(“mean”) example(topic) Get example of function usage example(mean) Get the names of all objects in the search apropos(“topic”) apropos(“mean”) list that match string “topic” methods(function) List all methods of the function methods(mean) Printing a function name without function_name mean parenthesis in most cases will show its code

R object types:

o Vector – a set of elements of the same type.

o Matrix - a set of elements of the same type organized in rows and columns.

o Data Frame - a set of elements organized in rows and columns, where columns can be of different types.

o List - a collection of data objects (possibly of different types) – a generalization of a vector.

~ 9 ~ Boston University Scientific Computing and Visualization Introduction to R Vector creation (examples):

#Create a vector using concatenation of elements: c() v1 <- c( 5,8,3,9) v2 <- c( “One”, “Two”, “Three” )

#Generate sequence (from:to) s1 <- 2:5

#Sequence function: seq(from, to, by, length.out) seq(0,1,length.out=5) [1] 0.00 0.25 0.50 0.75 1.00 seq(1, 6, by = 3) [1] 1 4 seq(4) [1] 1 2 3 4

#Generate vector using repeat function: rep(x,times) rep(7, 3) [1] 7 7 7

Accessing vector elements:

Indexing vectors Description x[n] nth element x[-n] all but nth element x[1:n] first n elements x[-(1:n)] elements starting from n+1 x[c(1,3,6)] specific elements x[x>3 & x<7] all element greater than 3 and less than 7 x[x<3 | x>7] all element less than 3 or greater than 7

Useful vector operations:

Operation Description sort(x) Returns sorted vector(in increasing order) ~ 10 ~ Boston University Scientific Computing and Visualization Introduction to R rev(x) Reverses elements of x which.max(x) Returns index of the largest element which.min(x) Returns index of the smallest element which (x == a) Returns vector of indices i, for which x[i]==a na.omit(x) Surpresses the observations with missing data x[is.na(x)] <- 0 Replace all missing elements with zeros

Matrix creation (examples):

#Create a matrix using function: matrix(data,nrow,ncol,byrow=F) matrix( seq(1:6), nrow=2) [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6

#Create a diagonal matrix: diag( ) diag( 3 ) diag( 4, 2, 2 )

[,1] [,2] [,3] [,1] [,2]

[1,] 1 0 0 [1,] 4 0

[2,] 0 1 0 [2,] 0 4

[3,] 0 0 1

#Combine arguments by column: cbind() cbind(c(1,2,3), c(4,5,6)) [,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6

~ 11 ~ Boston University Scientific Computing and Visualization Introduction to R #Combine arguments by row: rbind() rbind(c(1,2,3), c(4,5,6)) [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6

#Create matrix using array(x, dim) function array(1:6, c(2,3))) [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6

Accessing matrix elements:

Indexing matrices Description x[i,j] Element at row i, column j x[i,] Row i (output is a vector) x[,j] Column j (output is a vector) x[c(1,5),] Rows 1 and 5 (output is a matrix) x[,c(2,3,6)] Columns 2 ,3 and 6 (output is a matrix) x[“name”,] Row named “name” x[,“name”] Column named “name”

Useful matrix operations:

Operation Description t(x) Transpose x * y Multiply elements of 2 matrices x %*% y Perform “normal” matrix multiplication diag(x) Returns a vector of diagonal elements det(x) Returns determinant of matrix solve(x) Returns inverse matrix (if exists), error-otherwise solve(a,b) Returns solution vector for system Ax=b rowSums(), colSums() Returns vector with a sum of each row/column rowMeans(),colMeans() Returns vector with mean values of each row/column

~ 12 ~ Boston University Scientific Computing and Visualization Introduction to R Data frames: - elements organized in rows and columns, where columns can be of different types

- All elements in the same column must have the same data type

- Usually obtained by reading a data file.

- Can be created using data.frame() function #Create a data frame using function: data.frame() name <- c(“Paul”, “Simon”, “Robert”) age <- c(8, 12, 3) height <- c(53.5, 64.8, 35.2) family <- data.frame(Name = name, Age = age, Height = height); family Name Age Height 1 Paul 8 53.5 2 Simon 12 64.8 3 Robert 3 35.2

#To sort data frame using one column family[order(family$Age),] Name Age Height 3 Robert 3 35.2 1 Paul 8 53.5 2 Simon 12 64.8

Accessing data frame elements:

Indexing matrices Description x[[i]] Accessing column i (returns vector) x[[“name”]] Accessing column named “name” (returns vector) x$name Accessing column named “name” (returns vector) x[,i] Accessing column i (returns vector) x[j,] Accessing row j (returns dataframe!) x[i:j,] Accessing rows from i to j x[i,j] Accessing element in row i and column j x[i, “name”] Accessing element in row i and column “name”

Lists: - Generalization of vector: ordered collection of components ~ 13 ~ Boston University Scientific Computing and Visualization Introduction to R - Elements can be of any mode or type

- Many R functions return list as their output object

- Can be created using list() function

#Create a list using function: list() lst <- list(name=“Fred”, no.children=3, child.ages=c(12,8,3))

#Create a list using concatenation: c() list.ABC <- c(list.A, list.B, list.C)

#List can be created from different R objects list.misc<-list(e1 = c(1,2,3), e2 = list.B, e3 = matrix(1:4,2) )

Accessing list elements:

Indexing matrices Description x[[i]] Accessing component i x[[“name”]] Accessing component named “name” x$name Accessing component named “name” x[i:j,] Accessing components from i to j

~ 14 ~ Boston University Scientific Computing and Visualization Introduction to R Factors: - a numeric vector that stores the number of levels of a vector. It provides an easy way to store character strings common for categorical variables

Factor operations:

Operation Description factor(x) Convert vector to a factor relevel(x, ref=…) Rearrange the order of levels in a factor levels(x) List levels in a factor attributes(x) Inspect attributes of a factor table() Get count of elements in each level is.factor(x) Checks if x is a factor. Returns TRUE or FALSE cut(x, breaks) Divide x into intervals (factors) gl(n,k,length=n*k,labels=1:n Generate factors by specifying pattern )

Regression analysis

Function Description lm() Linear regression glm() Generalized linear regression nls() Non-linear regression residuals() The difference between observed values and fitted values deviance() Returns the deviance gls() Fit linear model using generalized least squares

~ 15 ~ Boston University Scientific Computing and Visualization Introduction to R gnls() Fit nonlinear model using generalized least squares x[,“name”] Column named “name”

Miscellanies functions for data analysis

Function Description optim() General purpose optimization nlm() Minimize function spline() Spline interpolation kmeans() k-means clustering on a data matrix ts() Create a time series t.test() Students’ t-test binom.test() Binomial test merge() Merge 2 data frames sample() Sampling density() Kernel density estimates of x logLik(fit) Computes the logarithm of the likelihood predict(fit,…) Predictions from fit based on input data anova() Analysis of variance (or deviance) aov(formula) Analysis of variance model

Distributions

Function Description rnorm(n, mean=0, sd = 1) Gaussian runif(n, min=0, max = 1) Uniform rexp(n , rate=1) Exponential rgamma(n , shape, scale=1) Gamma rpois(n, lambda) Poisson rcauchy(n, location=0, scale=1) Cauchy rbeta(n , shape, scale=1) Beta rchisq(n, df) Pearson rbinom(n, size, prob) Binomial rgeom(n, prob) Geometric rlogistic(n, location=0, scale=1) Logistic

~ 16 ~ Boston University Scientific Computing and Visualization Introduction to R rlnorm(n, meanlog=0, sdlog=1) Lognormal rt(n, df) Student

~ 17 ~

Recommended publications