Unix Essentials: Hands-on Parsing HBI array data
Goal: Process a gene expression file to get information such as genes of interest, sort by expression values, and subset the data for further investigation.
0) In your browser open the page: http://jura.wi.mit.edu/bio/education/hot_topics/unix_essentials_2016/UnixEssentials_HandsOn.txt
You can copy paste the commands from that page as we need them In this file all commands are in red.
1) Log into tak. See handouts.
2) Go to the BaRC’s training folder: cd /nfs/BaRC_training
Create a folder with your login name with mkdir command, such as mkdir your_login_name Then, go to the directory that you just created with cd your_login_name [Note: Replace your_login_name with your tak login name]
Check where you are with pwd
3) Copy the HBI data we will be working with: cp ../HBI.partial.txt .
# If you are following this instructions after the Hot Topics is over then use this command: #cp /nfs/BaRC_Public/Hot_Topics/Unix_Essentials_Oct2016/HBI.partial.txt .
4) View the file in your favorite editor. What is the first field?
Command: gedit HBI.partial.txt & Without x window, you can run command: more HBI.partial.txt or head -1 HBI.partial.txt | cut -f1 Answer: Gene
5) How many genes are in the HBI data? [Note: header line] Command: wc -l HBI.partial.txt
Answer: 999
6) Get the first column, and columns 20-22 and output it to a file called HBI.partial.new.txt, use this new file for the rest of the questions. Command: cut -f 1,20-22 HBI.partial.txt > HBI.partial.new.txt 7) What tissues are included in the new file? Command: head -1 HBI.partial.new.txt Answer: Brain (Ganglia)
8) Are there any duplicate genes? [Hint: uniq needs a sorted list] Command: cut -f 1 HBI.partial.new.txt | sort | uniq –d Answer: No
9) Sort the expression values based on the second column. Which gene has the highest expression level? Command: sort -k 2,2gr HBI.partial.new.txt| head Answer: LOC100507311 1.7393
[Note: the difference in using the sort options –g (general numeric sort) and –n (numerical sort)] sort -k 2,2gr HBI.partial.new.txt | cut -f1,2 | head sort -k 2,2nr HBI.partial.new.txt | cut -f1,2 | head
10) Get all the genes that begin only with "ZNF" from the original file, and output to a new file. Make sure to include the header line by appending just the header to the new file first. [Hint: use grep] Command: head -1 HBI.partial.new.txt > ZNF_genes.txt
grep "^ZNF" HBI.partial.new.txt >> ZNF_genes.txt 11) At the end of the class make a folder with your name in your lab folder and copy all the material to lab it. Refer to the hand out for the location of your lab folder on Tak.
These are example commands: mkdir /lab/PIname_lab/username
mkdir /lab/PI_name_lab/username/unix_essentials_class
cp –r * /lab/PI_name_lab/username/unix_essentials_class