Introduction to UNIX and R
Microarray analysis in a multi-user environment Course web page
http://www.cbs.dtu.dk/dtucourse/data.php
Course program Lecture Slides Exercises Project Data Sets Link to the GenePublisher tool What do you need to know?
• This is not a course on computers
• But you will need some UNIX for the exercises, and for your final project
• You will also need to know some R to handle the exercises
• But GenePublisher will handle R for you during the projects Microarray Processing Pipeline
Question/Experimental Design
Array Design/Probe Design Buy Chip or Array
Sample Preparation/Hybridization
Image Analysis
Normalization (Scaling)
Expression Index Calculation
Comparable Gene Expression Data
Statistical Analysis
Advanced Data Analysis: Clustering PCA Classification Promoter Analysis Regulatory Network What is UNIX?
• UNIX is not just one operating system, but a collection of many different systems sharing a common interface
• It has evolved alongside the internet and is geared toward multi-user environments, and big multi- processors like our servers
• At it’s heart, UNIX is a text-based operating system – that means a little typing is required
• UNIX Lives! (Particularly Linux!) The R Project for Statistical Computing R is an interpreted computer language
random <- sample(c(9:16)) sample.name <- c(rep("+",4),rep("-",4)) fold.m <- Norm.Int.m[,9:12]/Norm.Int.m[,13:16] fold.mean.m <- apply(fold.m,1,function(x){mean(x,na.rm=TRU E)}) log.fold.mean.m <- log2(fold.mean.m) permuted.m <- Norm.Int.m[,c(1:8,random)] pVal.permuted.TF <- get.pval.ttest(permuted.m,9:12,13:16) fold.permuted.m <- permuted.m[,9:12]/permuted.m[,13:16] log.fold.permuted.mean.m <- log2(apply(fold.permuted.m,1,function(x){me an(x,na.rm=TRUE)})) plot(log.fold.mean.m,pVal.TF, main="Volcano Plot", log="y",xlab="M (log2 fold change)", ylab="P-value", pch="*", col="blue") Student Accounts
Group Account Rune, Louise, Birgitte, David msc36 Iben, Shzeena, Lizette msc37 Line, Luise, Anna, Kristine msc38 Nina, Allan, Marianne, Ole msc39 Christian msc40
Password: Logging on – Windows users
• Start the program ssh, it should be on your desktop •Click on “Quick Connect” • Enter the following: – Host name: genome.cbs.dtu.dk – User Name: mscXX (XX=Your account number) – Port Number: 22 – Authentication Method:
Key commands: – ls #lists the files in the current directory – cd dir #changes working directory to ‘dir’ – mkdir dir #makes a new directory called ‘dir’
Nice tricks: – The shorthand ‘ll’ is short for ls –l – The asterisk ‘*’ is a wildcard – ‘ls *.txt’ will list all files ending with ‘.txt’ –‘cd ..’ takes you back one directory –plain ‘cd’ (no arguments) takes you to your home Starting with R
Just type ‘ ’ on the command line R
How to get help:
> help.start() #Opens browser > help() #For more on using help > help(..) #For help on .. > help.search(“..”) #To search for ..
How to leave again:
> q() #Image can be saved to .RData Basic R commands
Most arithmetic operators work like you would expect in R:
> 4 + 2 #Prints ‘6’ > 3 * 4 #Prints ‘12’
as known from Operators have precedence basic algebra: > 1 + 2 * 4 #Prints ‘9’, while > (1 + 2) * 4 #Prints ‘12’ Functions
A function call in R looks like this: – function_name(arguments) – Examples:
> cos(pi/3) #Prints ‘0.5’ > exp(1) #Prints ‘2.718282’
A function is identified in R by the parentheses – That’s why it’s: , and not: help() help Variables (Objects) in R
To assign a value to a variable (object): > x <- 4 #Assigns 4 to x > x = 4 #Assigns 4 to x (new) > x #Prints ‘4’ > y <- x + 2 #Assigns 6 to y
Functions for managing variables: or lists all existing objects –ls() objects() tells the structure (type) of object ‘x’ – str(x) removes (deletes) the object ‘x’ – rm(x) Vectors in R
A vector functions in R like a sequence of elements of the same mode. > x <- 1:10 #Creates a vector > y <- c(“a”,“b”,“c”) #So does this
Handy functions for vectors: – Concatenates arguments into a vector –c() – Returns the smallest value in vector – min() – Returns the largest value in vector – max() – Returns the mean of the vector – mean() More on Vectors
Elements in a vector can be accessed individually: > x[1] #Prints first element > x[1:10] #Prints first 10 elements > x[c(1,3)] #Prints element 1 and 3
Most functions expect one vector as argument, rather than individual numbers > mean(1,2,3) #Replies ‘1’ > mean(c(1,2,3)) #Replies ‘2’ Graphics and Visualization
Visualization is one of R’s strong points.
R has many functions for drawing graphs, including: – Draws a basic xy plot of x against y – plot(x,y) – Draws a histogram of values in x – hist(x)
Adding stuff to plots – Add point (x,y) to existing graph. – points(x,y) – Connect points with line. – lines(x,y) – Writes string at (x,y). – text(x,y,str) Graphical Devices in R
A graphical device is what ‘displays’ the graph. It can be a window, it can be the printer.
Functions for plotting “Devices”: – This function allows you to change the – X11() size and composition of the plotting window. – Splits a plotting device – par(mfrow=c(x,y)) into x rows and y columns. – dev.print(postscript, file=“???.ps”) – Use this device to save the plot to a file.