Introduc on to Linux
Ge ng so ware Installing new so ware
• Downloading – wget http://
• bz2 - BZzip2 compressed archive file • gz - Gnu Zipped Archive File • gzip - Gnu Zipped File • tar - Consolidated Unix File Archive • tgz - Gzipped Tar File • zip - ZIP compression Ge ng the file out
• tar xzf file.tar.gz -> file • gunzip file.gz -> file • tar xf file.tar -> file • tar xzf file.tgz -> file • tar xjf file.tar.bz2 -> file • bunzip2 file.bz2 -> file • rar x file.rar -> file • tar xjf file.tbz2 -> file • unzip file.zip -> file Compiling
Normal procedure • ./configure --prefix=$HOME • make • make install
To remove all but the source files • make clean Permissions
• chmod - change file modes – Permissions are Read Write eXecute – People affected are User Group Others
Example chmod u+xo-rwx file removes all permissions for the file file from other users and makes the file executable for the owner. Linking
• In the wrong place • Too large to move or copy ln –s
• Pu ng output to next program • Pu ng output into a file • Pu ng output at the end of a file (append)
| > >> Part II
Common file types in bioinforma cs FASTA raw data • Line 1: Single line descrip on, marked by “>” • Line 2…n: Lines of sequence data Example >gi|31563518|ref|NP_852610.1 MKMRFFSSPCGKAAVDPADRCKEVQQIRDQHPS KIPVIIERYKGEKQLPVLDKTKFLPDHVNMSEL VKIIRRRLQLNPTQAFFLLVNQHSMVSVSTPIA DIYEQEKDEDGFLYMVYASQETFGF FASTQ raw data with quality • Line 1: @seqName (descrip on – op onal) • Line 2: Sequence • Line 3: + • Line 4: Quality score (in hexadecimal format, ASCII characters) Example: @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCAT + !’’*((((***+))%++)(%%).1*+*’’))**55CCF>>>C SAM/BAM alignment data (text or binary) • Header Sec on • Aligned Reads • Tab delimited
Examples Header Sec on @HD VN:1.0 SO:unsorted @SQ SN:O_volvulusOVOC_OM1a LN:2816604 @SQ SN:O_volvulusOVOC_OM1b LN:28345163 @SQ SN:O_volvulusOVOC_OM2 LN:25485961
Aligned Reads M01137:130:00-A:17009:1352/1 4 * 0 0 * * 0 0 AGCAAAATACAACGATCTGGATGGTAGCATTAGCGATGCGACACTGCTTGAACCGTCAAAG FGGFGCFGFFGC8,,@D?E6EFCF,=AEFFGGDGGGADFGG@>FFEGGG:+<7D>AFCFGG YT:Z:UU VCF varia on data • Header with meta-data • Body with variant informa on • Tab delimited
Example Header ##fileformat=VCFv4.0 ##source=myImputationProgramV3.1 ##FILTER=
Body #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003 20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,. 20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0| 0:49:3:58,50 0|1:3:5:65,3 0/0:41:3 GFF annota on data • One line = one feature • Tab delimited Example scaffold1 EVM gene 7116 13890 . - . ID=gene1;Name=Name1 scaffold1 EVM mRNA 7116 13890 . - . ID=model1;Parent=gene1 scaffold1 EVM exon 13833 13890 . - . ID=exon1;Parent=model1 scaffold1 EVM CDS 13833 13890 . - 0 ID=cds1;Parent=model1 scaffold1 EVM exon 13434 13664 . - . ID=exon2;Parent=model1