<<

Essentials: Hands-on Parsing HBI array data

Goal: Process a gene expression to get information such as genes of interest, sort by expression values, and subset the data for further investigation.

0) In your browser open the page: http://jura.wi.mit.edu/bio/education/hot_topics/unix_essentials_2016/UnixEssentials_HandsOn.txt

You can copy the commands from that page as we need them In this file all commands are in red.

1) Log into tak. See handouts.

2) Go to the BaRC’s training folder: /nfs/BaRC_training

Create a folder with your login name with command, such as mkdir your_login_name Then, go to the directory that you just created with cd your_login_name [Note: Replace your_login_name with your tak login name]

Check where you are with

3) Copy the HBI data we will be working with: ../HBI.partial.txt .

# If you are following this instructions after the Hot Topics is over then use this command: #cp /nfs/BaRC_Public/Hot_Topics/Unix_Essentials_Oct2016/HBI.partial.txt .

4) View the file in your favorite editor. What is the first field?

Command: gedit HBI.partial.txt & Without x window, you can run command: HBI.partial.txt or -1 HBI.partial.txt | -f1  Answer: Gene

5) How many genes are in the HBI data? [Note: header line] Command: -l HBI.partial.txt

 Answer: 999

6) Get the first column, and columns 20-22 and output it to a file called HBI.partial.new.txt, use this new file for the rest of the questions.  Command: cut -f 1,20-22 HBI.partial.txt > HBI.partial.new.txt 7) What tissues are included in the new file?  Command: head -1 HBI.partial.new.txt  Answer: Brain (Ganglia)

8) Are there any duplicate genes? [Hint: needs a sorted list]  Command: cut -f 1 HBI.partial.new.txt | sort | uniq –d  Answer: No

9) Sort the expression values based on the second column. gene has the highest expression level?  Command: sort -k 2,2gr HBI.partial.new.txt| head  Answer: LOC100507311 1.7393

[Note: the difference in using the sort options –g (general numeric sort) and –n (numerical sort)] sort -k 2,2gr HBI.partial.new.txt | cut -f1,2 | head sort -k 2,2nr HBI.partial.new.txt | cut -f1,2 | head

10) Get all the genes that begin only with "ZNF" from the original file, and output to a new file. sure to include the header line by appending just the header to the new file first. [Hint: use ]  Command: head -1 HBI.partial.new.txt > ZNF_genes.txt

grep "^ZNF" HBI.partial.new.txt >> ZNF_genes.txt 11) the end of the class make a folder with your name in your lab folder and copy all the material to lab it. Refer to the hand out for the location of your lab folder on Tak.

These are example commands: mkdir /lab/PIname_lab/username

mkdir /lab/PI_name_lab/username/unix_essentials_class

cp –r * /lab/PI_name_lab/username/unix_essentials_class