The Structure of the SAM File

The Structure of the SAM File

10/9/14 The majority of data analyses split off aer 2014 - BMMB 852: Applied Bioinformacs generang a SAM file Week 7, Lecture 14 Sequencing Reads István Albert Alignment So8ware Bioinformacs Consul:ng Center Penn State SAM file Chip-Seq RNA-Seq SNPs Create test queries: The structure of the SAM file exact match, one mismatch, 1 dele:on etc. SAM Headers Alignments 1 10/9/14 Resource: SAM specificaon SAM format: tabular text format 11 required column + op:onal fields Published as The Sequence Alignment/Map format and SAMtools by Heng Li et al Bioinformacs 25, Volume 25, Issue 16, 2009 “A TAB-delimited text format consis:ng of a header sec:on, which is op:onal, and an alignment sec:on.” An aligment consist of 11 tab Column 1 and 2: QNAME and FLAG delimited columns (Query name and bitwise flags) QNAME: the name of the query sequence 2 10/9/14 Column 2: FLAG Columns 3, 4: RNAME and POS the bitwise representaon Reference and Posi:on 1 = 00000001 à paired end read 2 = 00000010 à mapped as proper pair 4 = 00000100 à unmappable read 8 = 00001000 à read mate unmapped 16 = 00010000 à read mapped on reverse strand The flag 11 à 1 + 2 + 8 = 0001011 (condions 1, 2 and 8 sasfied) It is used to save space – but it does make things a bit more difficult. Usually very few flags are needed in prac:ce – 0, 4, 16 are the most generic ones Column 4 POS: 1-based lemost mapping POSion of the first matching base. If you need to construct a more complex flag search for explain SAM flags: Very important to remember later when we need to find the 5’ end (the actual start) hp://picard.sourceforge.net/explain-flags.html Details of the mapping quality Column 5: MAPQ - Mapping Quality computaon – hard to find good answers • • Tool specific – there is no standard of what it should be Phred score, iden:cal to the quality measure in the fastq file. quality Q, probability P: • The repeat structure of the reference. Reads falling in repe::ve regions usually get very low mapping quality. P = 10 ^ (-Q / 10.0) • The base quality of the read. Low quality means the observed read sequence is possibly wrong, and wrong sequence may lead to a wrong alignment. If Q=30, P=1/1000 à on average, one of out 1000 • The sensi:vity of the alignment algorithm. The true hit is more likely to be alignments will be wrong missed by an algorithm with low sensi:vity, which also causes mapping errors. As good as this sounds it is not easy to compute • Paired end or not. Reads mapped in pairs are more likely to be correct. such a quality. (from the MAQ manual) 3 10/9/14 BWA specific high scores BWA specific low scores A read alignment with a mapping quality 30 or Surprisingly difficult to track down the exact behavior above usually implies • Q=0 à if a read can be aligned equally well to mul:ple • The overall base quality of the read is good. posi:ons, BWA will randomly pick one posi:on and give it a mapping quality zero. • The best alignment has few mismatches. • Q=25 à the edit distance equals mismatches and is greater • The read has few or Just one `good' hit on the than zero reference, which means the current alignment is s:ll the best even if one or two bases are actually mutaons or sequencing errors. Columns 7, 8, 9: RNEXT, PNEXT, TLEN Column 6: CIGAR (used in paired end read sequencing) • CIGAR = Compact Idiosyncrac Gapped Alignment Report • RNEXT: the name of the pair • PNEXT: the posi:on of the pair • TLEN: the distance between the leMmost posi:ons of the pairs We will discuss these in more detail later. These can show the posi:on of reads that are distant – allow us to infer genomic variaons 4 10/9/14 Column 10, 11 Column 12 and beyond • SEQ: the query sequence • QUAL: the phred encoded quality sequence SEQ may contain the original sequence or the segment it was aligned to. Not all tools do the same thing. Specific informaon about the alignment process that the tools was able to establish. more details in later lectures Homework 14 Download data-14.tar.gz Align the data in paired end more with bwa. How many dis:nct alignment flags do you get? What does each flag mean? 5 .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    5 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us