Introduction to and R

Microarray analysis in a multi-user environment Course web page

http://www.cbs.dtu.dk/dtucourse/data.php

Course program Lecture Slides Exercises Project Data Sets to the GenePublisher tool What do you need to know?

• This is not a course on computers

• But you will need some UNIX for the exercises, and for your final project

• You will also need to know some R to handle the exercises

• But GenePublisher will handle R for you during the projects Microarray Processing Pipeline

Question/Experimental Design

Array Design/Probe Design Buy Chip or Array

Sample Preparation/Hybridization

Image Analysis

Normalization (Scaling)

Expression Index Calculation

Comparable Gene Expression Data

Statistical Analysis

Advanced Data Analysis: Clustering PCA Classification Promoter Analysis Regulatory Network What is UNIX?

• UNIX is not just one , but a collection of many different systems sharing a common interface

• It has evolved alongside the internet and is geared toward multi-user environments, and big multi- processors like our servers

it’s heart, UNIX is a text-based operating system – that means a little typing is required

• UNIX Lives! (Particularly Linux!) The R Project for Statistical Computing R is an interpreted computer language

random <- sample(c(9:16)) sample.name <- c(rep("+",4),rep("-",4)) fold.m <- Norm.Int.m[,9:12]/Norm.Int.m[,13:16] fold.mean.m <- apply(fold.m,1,function(x){mean(x,na.rm=TRU E)}) log.fold.mean.m <- log2(fold.mean.m) permuted.m <- Norm.Int.m[,c(1:8,random)] pVal.permuted.TF <- get.pval.ttest(permuted.m,9:12,13:16) fold.permuted.m <- permuted.m[,9:12]/permuted.m[,13:16] log.fold.permuted.mean.m <- log2(apply(fold.permuted.m,1,function(x){me an(x,na.=TRUE)})) plot(log.fold.mean.m,pVal.TF, main="Volcano Plot", log="y",xlab="M (log2 fold change)", ylab="P-value", pch="*", col="blue") Student Accounts

Group Account Rune, Louise, Birgitte, David msc36 Iben, Shzeena, Lizette msc37 Line, Luise, Anna, Kristine msc38 Nina, Allan, Marianne, Ole msc39 Christian msc40

Password: Logging on – Windows users

• Start the program ssh, it should be on your desktop •Click on “Quick Connect” • Enter the following: – name: genome.cbs.dtu.dk – User Name: mscXX (XX=Your account number) – Port Number: 22 – Authentication Method: • You will then be prompted for your password Navigating Directories

Key commands: – #lists the files in the current directory – dir #changes working directory to ‘dir’ – dir #makes a new directory called ‘dir’

Nice tricks: – The shorthand ‘ll’ is short for ls –l – The asterisk ‘*’ is a wildcard – ‘ls *.txt’ will list all files ending with ‘.txt’ –‘cd ..’ takes you back one directory –plain ‘cd’ (no arguments) takes you to your home Starting with R

Just ‘ ’ on the command line R

How to get :

> help.start() #Opens browser > help() #For on using help > help(..) #For help on .. > help.search(“..”) #To search for ..

How to leave again:

> q() #Image can be saved to .RData Basic R commands

Most arithmetic operators work like you would expect in R:

> 4 + 2 #Prints ‘6’ > 3 * 4 #Prints ‘12’

as known from Operators have precedence basic algebra: > 1 + 2 * 4 #Prints ‘9’, while > (1 + 2) * 4 #Prints ‘12’ Functions

A function call in R looks like this: – function_name(arguments) – Examples:

> cos(pi/3) #Prints ‘0.5’ > exp(1) #Prints ‘2.718282’

A function is identified in R by the parentheses – That’s why it’s: , and not: help() help Variables (Objects) in R

To assign a value to a variable (object): > x <- 4 #Assigns 4 to x > x = 4 #Assigns 4 to x (new) > x #Prints ‘4’ > y <- x + 2 #Assigns 6 to y

Functions for managing variables: or lists all existing objects –ls() objects() tells the structure (type) of object ‘x’ – str(x) removes (deletes) the object ‘x’ – rm(x) Vectors in R

A vector functions in R like a sequence of elements of the same mode. > x <- 1:10 #Creates a vector > y <- c(“a”,“b”,“c”) #So does this

Handy functions for vectors: – Concatenates arguments into a vector –c() – Returns the smallest value in vector – min() – Returns the largest value in vector – max() – Returns the mean of the vector – mean() More on Vectors

Elements in a vector can be accessed individually: > x[1] #Prints first element > x[1:10] #Prints first 10 elements > x[c(1,3)] #Prints element 1 and 3

Most functions expect one vector as argument, rather than individual numbers > mean(1,2,3) #Replies ‘1’ > mean(c(1,2,3)) #Replies ‘2’ Graphics and Visualization

Visualization is one of R’s strong points.

R has many functions for drawing graphs, including: – Draws a basic xy plot of x against y – plot(x,y) – Draws a histogram of values in x – hist(x)

Adding stuff to plots – Add point (x,y) to existing . – points(x,y) – Connect points with line. – lines(x,y) – Writes string at (x,y). – text(x,y,str) Graphical Devices in R

A graphical device is what ‘displays’ the graph. It can be a window, it can be the printer.

Functions for plotting “Devices”: – This function allows you to change the – X11() size and composition of the plotting window. – Splits a plotting device – (mfrow=c(x,y)) into x rows and y columns. – dev.print(postscript, =“???.”) – Use this device to save the plot to a file.