Introducon to Linux

Geng soware Installing new soware

• Downloading – wget http:///file copies file to current place • Unpacking – Geng the file from the downloaded • Compiling – Making the code into an executable program • Running – Run your new program Compressed File Extensions

• bz2 - BZzip2 compressed archive file • gz - Gnu Zipped - Gnu Zipped File • - Consolidated Unix File Archive • tgz - Gzipped Tar File • - ZIP compression Geng the file out

• tar xzf file.tar.gz -> file • gunzip file.gz -> file • tar xf file.tar -> file • tar xzf file.tgz -> file • tar xjf file.tar.bz2 -> file • bunzip2 file.bz2 -> file • x file.rar -> file • tar xjf file.tbz2 -> file • unzip file.zip -> file Compiling

Normal procedure • ./configure --prefix=$HOME • make • make install

To remove all but the source files • make clean Permissions

• chmod - change file modes – Permissions are Read Write eXecute – People affected are User Group Others

Example chmod u+xo-rwx file removes all permissions for the file file from other users and makes the file executable for the owner. Linking

• In the wrong place • Too large to move or copy ln –s Redirecon

• Pung output to next program • Pung output into a file • Pung output at the end of a file (append)

| > >> Part II

Common file types in bioinformacs FASTA raw data • Line 1: Single line descripon, marked by “>” • Line 2…n: Lines of sequence data Example >gi|31563518|ref|NP_852610.1 MKMRFFSSPCGKAAVDPADRCKEVQQIRDQHPS KIPVIIERYKGEKQLPVLDKTKFLPDHVNMSEL VKIIRRRLQLNPTQAFFLLVNQHSMVSVSTPIA DIYEQEKDEDGFLYMVYASQETFGF FASTQ raw data with quality • Line 1: @seqName (descripon – oponal) • Line 2: Sequence • Line 3: + • Line 4: Quality score (in hexadecimal format, ASCII characters) Example: @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCAT + !’’*((((***+))%++)(%%).1*+*’’))**55CCF>>>C SAM/BAM alignment data (text or binary) • Header Secon • Aligned Reads • Tab delimited

Examples Header Secon @HD VN:1.0 SO:unsorted @SQ SN:O_volvulusOVOC_OM1a LN:2816604 @SQ SN:O_volvulusOVOC_OM1b LN:28345163 @SQ SN:O_volvulusOVOC_OM2 LN:25485961

Aligned Reads M01137:130:00-A:17009:1352/1 4 * 0 0 * * 0 0 AGCAAAATACAACGATCTGGATGGTAGCATTAGCGATGCGACACTGCTTGAACCGTCAAAG FGGFGCFGFFGC8,,@D?E6EFCF,=AEFFGGDGGGADFGG@>FFEGGG:+<7D>AFCFGG YT:Z:UU VCF variaon data • Header with meta-data • Body with variant informaon • Tab delimited

Example Header ##fileformat=VCFv4.0 ##source=myImputationProgramV3.1 ##FILTER= ##FORMAT=

Body #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003 20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,. 20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0| 0:49:3:58,50 0|1:3:5:65,3 0/0:41:3 GFF annotaon data • One line = one feature • Tab delimited Example scaffold1 EVM gene 7116 13890 . - . ID=gene1;Name=Name1 scaffold1 EVM mRNA 7116 13890 . - . ID=model1;Parent=gene1 scaffold1 EVM exon 13833 13890 . - . ID=exon1;Parent=model1 scaffold1 EVM CDS 13833 13890 . - 0 ID=cds1;Parent=model1 scaffold1 EVM exon 13434 13664 . - . ID=exon2;Parent=model1