Introduc2on to Linux
Total Page:16
File Type:pdf, Size:1020Kb
Introduc)on to Linux Geng so3ware Installing new so3ware • Downloading – wget http://<url>/file copies file to current place • Unpacking – Geng the file from the downloaded package • Compiling – Making the code into an executable program • Running – Run your new program Compressed File Extensions • bz2 - BZzip2 compressed archive file • gz - Gnu Zipped Archive File • gzip - Gnu Zipped File • tar - Consolidated Unix File Archive • tgz - Gzipped Tar File • zip - ZIP compression Geng the file out • tar xzf file.tar.gz -> file • gunzip file.gz -> file • tar xf file.tar -> file • tar xzf file.tgz -> file • tar xjf file.tar.bz2 -> file • bunzip2 file.bz2 -> file • rar x file.rar -> file • tar xjf file.tbz2 -> file • unzip file.zip -> file Compiling Normal procedure • ./configure --prefix=$HOME • make • make install To remove all but the source files • make clean Permissions • chmod - change file modes – Permissions are Read Write eXecute – People affected are User Group Others Example chmod u+xo-rwx file removes all permissions for the file file from other users and makes the file executable for the owner. Linking • In the wrong place • Too large to move or copy ln –s <source file> <target file> Redirec)on • Pung output to next program • Pung output into a file • Pung output at the end of a file (append) | > >> Part II Common file types in bioinformacs FASTA raw data • Line 1: Single line descrip)on, marked by “>” • Line 2…n: Lines of sequence data Example >gi|31563518|ref|NP_852610.1 MKMRFFSSPCGKAAVDPADRCKEVQQIRDQHPS KIPVIIERYKGEKQLPVLDKTKFLPDHVNMSEL VKIIRRRLQLNPTQAFFLLVNQHSMVSVSTPIA DIYEQEKDEDGFLYMVYASQETFGF FASTQ raw data with quality • Line 1: @seqName (descrip)on – op)onal) • Line 2: Sequence • Line 3: + • Line 4: Quality score (in hexadecimal format, ASCII characters) Example: @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCAT + !’’*((((***+))%++)(%%).1*+*’’))**55CCF>>>C SAM/BAM alignment data (text or binary) • Header Sec)on • Aligned Reads • Tab delimited Examples Header Secon @HD VN:1.0 SO:unsorted @SQ SN:O_volvulusOVOC_OM1a LN:2816604 @SQ SN:O_volvulusOVOC_OM1b LN:28345163 @SQ SN:O_volvulusOVOC_OM2 LN:25485961 Aligned Reads M01137:130:00-A:17009:1352/1 4 * 0 0 * * 0 0 AGCAAAATACAACGATCTGGATGGTAGCATTAGCGATGCGACACTGCTTGAACCGTCAAAG FGGFGCFGFFGC8,,@D?E6EFCF,=AEFFGGDGGGADFGG@>FFEGGG:+<7D>AFCFGG YT:Z:UU VCF variaon data • Header with meta-data • Body with variant informaon • Tab delimited Example Header ##fileformat=VCFv4.0 ##source=myImputationProgramV3.1 ##FILTER=<ID=q10,Description="Quality below 10"> ##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality"> Body #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003 20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,. 20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0| 0:49:3:58,50 0|1:3:5:65,3 0/0:41:3 GFF annotaon data • One line = one feature • Tab delimited Example scaffold1 EVM gene 7116 13890 . - . ID=gene1;Name=Name1 scaffold1 EVM mRNA 7116 13890 . - . ID=model1;Parent=gene1 scaffold1 EVM exon 13833 13890 . - . ID=exon1;Parent=model1 scaffold1 EVM CDS 13833 13890 . - 0 ID=cds1;Parent=model1 scaffold1 EVM exon 13434 13664 . - . ID=exon2;Parent=model1.