Introduction to UNIX and R Microarray Analysis in a Multi-User Environment
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to UNIX and R Microarray analysis in a multi-user environment Course web page http://www.cbs.dtu.dk/dtucourse/data.php Course program Lecture Slides Exercises Project Data Sets Link to the GenePublisher tool What do you need to know? • This is not a course on computers • But you will need some UNIX for the exercises, and for your final project • You will also need to know some R to handle the exercises • But GenePublisher will handle R for you during the projects Microarray Processing Pipeline Question/Experimental Design Array Design/Probe Design Buy Chip or Array Sample Preparation/Hybridization Image Analysis Normalization (Scaling) Expression Index Calculation Comparable Gene Expression Data Statistical Analysis Advanced Data Analysis: Clustering PCA Classification Promoter Analysis Regulatory Network What is UNIX? • UNIX is not just one operating system, but a collection of many different systems sharing a common interface • It has evolved alongside the internet and is geared toward multi-user environments, and big multi- processors like our servers • At it’s heart, UNIX is a text-based operating system – that means a little typing is required • UNIX Lives! (Particularly Linux!) The R Project for Statistical Computing R is an interpreted computer language random <- sample(c(9:16)) sample.name <- c(rep("+",4),rep("-",4)) fold.m <- Norm.Int.m[,9:12]/Norm.Int.m[,13:16] fold.mean.m <- apply(fold.m,1,function(x){mean(x,na.rm=TRU E)}) log.fold.mean.m <- log2(fold.mean.m) permuted.m <- Norm.Int.m[,c(1:8,random)] pVal.permuted.TF <- get.pval.ttest(permuted.m,9:12,13:16) fold.permuted.m <- permuted.m[,9:12]/permuted.m[,13:16] log.fold.permuted.mean.m <- log2(apply(fold.permuted.m,1,function(x){me an(x,na.rm=TRUE)})) plot(log.fold.mean.m,pVal.TF, main="Volcano Plot", log="y",xlab="M (log2 fold change)", ylab="P-value", pch="*", col="blue") Student Accounts Group Account Rune, Louise, Birgitte, David msc36 Iben, Shzeena, Lizette msc37 Line, Luise, Anna, Kristine msc38 Nina, Allan, Marianne, Ole msc39 Christian msc40 Password: Logging on – Windows users • Start the program ssh, it should be on your desktop •Click on “Quick Connect” • Enter the following: – Host name: genome.cbs.dtu.dk – User Name: mscXX (XX=Your account number) – Port Number: 22 – Authentication Method: <Profile Settings> • You will then be prompted for your password Navigating Directories Key commands: – ls #lists the files in the current directory – cd dir #changes working directory to ‘dir’ – mkdir dir #makes a new directory called ‘dir’ Nice tricks: – The shorthand ‘ll’ is short for ls –l – The asterisk ‘*’ is a wildcard – ‘ls *.txt’ will list all files ending with ‘.txt’ –‘cd ..’ takes you back one directory –plain ‘cd’ (no arguments) takes you to your home Starting with R Just type ‘ ’ on the command line R How to get help: > help.start() #Opens browser > help() #For more on using help > help(..) #For help on .. > help.search(“..”) #To search for .. How to leave again: > q() #Image can be saved to .RData Basic R commands Most arithmetic operators work like you would expect in R: > 4 + 2 #Prints ‘6’ > 3 * 4 #Prints ‘12’ as known from Operators have precedence basic algebra: > 1 + 2 * 4 #Prints ‘9’, while > (1 + 2) * 4 #Prints ‘12’ Functions A function call in R looks like this: – function_name(arguments) – Examples: > cos(pi/3) #Prints ‘0.5’ > exp(1) #Prints ‘2.718282’ A function is identified in R by the parentheses – That’s why it’s: , and not: help() help Variables (Objects) in R To assign a value to a variable (object): > x <- 4 #Assigns 4 to x > x = 4 #Assigns 4 to x (new) > x #Prints ‘4’ > y <- x + 2 #Assigns 6 to y Functions for managing variables: or lists all existing objects –ls() objects() tells the structure (type) of object ‘x’ – str(x) removes (deletes) the object ‘x’ – rm(x) Vectors in R A vector functions in R like a sequence of elements of the same mode. > x <- 1:10 #Creates a vector > y <- c(“a”,“b”,“c”) #So does this Handy functions for vectors: – Concatenates arguments into a vector –c() – Returns the smallest value in vector – min() – Returns the largest value in vector – max() – Returns the mean of the vector – mean() More on Vectors Elements in a vector can be accessed individually: > x[1] #Prints first element > x[1:10] #Prints first 10 elements > x[c(1,3)] #Prints element 1 and 3 Most functions expect one vector as argument, rather than individual numbers > mean(1,2,3) #Replies ‘1’ > mean(c(1,2,3)) #Replies ‘2’ Graphics and Visualization Visualization is one of R’s strong points. R has many functions for drawing graphs, including: – Draws a basic xy plot of x against y – plot(x,y) – Draws a histogram of values in x – hist(x) Adding stuff to plots – Add point (x,y) to existing graph. – points(x,y) – Connect points with line. – lines(x,y) – Writes string at (x,y). – text(x,y,str) Graphical Devices in R A graphical device is what ‘displays’ the graph. It can be a window, it can be the printer. Functions for plotting “Devices”: – This function allows you to change the – X11() size and composition of the plotting window. – Splits a plotting device – par(mfrow=c(x,y)) into x rows and y columns. – dev.print(postscript, file=“???.ps”) – Use this device to save the plot to a file. .