Applied Bioinformatics (Of Nucleic Acid Sequences)

Applied Bioinformatics (Of Nucleic Acid Sequences)

Applied Bioinformatics of Nucleic Acid Sequences David A. Hendrix January 9, 2020 For the students and learners of the world. Contents 1 Introduction to Biological Sequences, Biopython, and GNU/Linux 6 1.1 Nucleic Acid Bioinformatics . .6 1.1.1 GNU/Linux and the command line . .6 1.2 Sequences, Strings, and the Genetic Code . .8 1.2.1 Introduction to Sequences and Biopython . .8 1.2.2 The Central Dogma . .9 1.2.3 Subsequences and Reverse Complement . 10 1.3 Sequences File Formats . 11 1.3.1 FASTA . 11 1.3.2 FASTQ . 12 1.3.3 GNU/Linux and Sequence Files . 16 1.4 Lab 1: Introduction to GNU/Linux and FASTA files . 16 1.5 Biological Sequence Databases . 18 1.5.1 NCBI . 18 1.5.2 Ensembl . 20 1.5.3 UCSC Genome Bioinformatics . 20 1.5.4 Uniprot . 20 1.6 Lab 2: FASTQ and Quality Scores . 21 2 Sequence Motifs 23 2.1 Introduction to Motifs . 23 2.2 String Matching . 23 2.3 Consensus Sequences . 24 2.3.1 Searching Consensus Sequences with Biopython . 25 2.4 Motif Finding . 25 2.4.1 Sequence Complexity . 25 2.4.2 Weight Matrices . 25 2.4.3 Relative Entropy . 27 2.4.4 Building a Weight Matrix . 27 2.4.5 Biopython Motifs . 28 2.5 Promoters . 30 2.5.1 Core Promoters . 30 2.5.2 Databases of Promoters/TSSs . 30 2.6 De novo Motif Finding . 30 2.6.1 Gibbs Sampling . 30 2.6.2 MEME and the EM Algorithm . 31 2.7 Lab 3: Introduction to Motifs . 32 2.7.1 Part 1: Building a motif and LOGO image . 32 2.7.2 Part 2: JASPAR Database and \sites" format. 33 2.7.3 Part 3: Running MEME on the command line . 33 2 CONTENTS 3 3 Sequence Alignments 34 3.1 Alignment Algorithms and Dynamic Programming . 34 3.1.1 Needleman-Wunsch Algorithm . 35 3.1.2 Smith-Waterman . 37 3.1.3 Comparison . 38 3.1.4 Aligning DNA vs Proteins . 38 3.2 Alignment Software . 39 3.2.1 BLAST: Basic Local Alignment Search Tool . 39 3.3 Alignment Statistics . 39 3.3.1 Running BLAST from the command line . 40 3.4 Short Read Mapping . 41 3.5 Lab 4: Using BLAST on the command line . 41 3.5.1 Part 1: BLASTing to a protein database . 41 3.5.2 Biopython and BLAST (optional) . 42 3.5.3 Part 2: BLASTing to a genome . 42 4 Multiple Sequence Alignments, Molecular Evolution, and Phylogenetics 44 4.1 Multiple Sequence Alignment . 44 4.1.1 MSA Methods . 44 4.1.2 MSA File Formats . 45 4.2 Phylogenetic Trees . 47 4.2.1 Representing a Phylogenetic Tree . 47 4.2.2 Pairwise Distances . 49 4.3 Models of mutations . 49 4.3.1 Genetic Drift . 50 4.3.2 Substitution Models . 50 4.3.3 Jukes-Cantor 1969 (JC69) . 51 4.3.4 Kimura 1980 model (K80) . 52 4.3.5 Felsenstein 1981 model (F81) . 52 4.3.6 The Hasegawa, Kishino and Yano model (HKY85) . 52 4.3.7 Generalized Time-Reversible Model . 52 4.3.8 Building Phylogenetic Trees . 52 4.3.9 Evaluating the Quality of a Phylogenetic Tree . 53 4.3.10 Tree Searching . 54 4.4 Lab 5: Phylogenetics . 55 4.4.1 Download Sequences from NCBI . 55 4.4.2 Create a Multiple Sequence Alignment and Phylogenetic Tree with Clustalw . 55 4.4.3 Create a Multiple Sequence Alignment and Phylogenetic Tree with phyML . 56 5 Genomics 57 5.1 The Three Fundamental \Gotchas" of Genomics . 57 5.1.1 Different Genome Assemblies/Annotations . 57 5.1.2 Different Chromosome Deflines . 57 5.1.3 0 vs 1-based coordinates . 57 5.2 Genomic Data and File Formats . 58 5.2.1 Formats for Genomic Locations . 58 5.2.2 Quantitative Tracks . 60 5.3 Genome Browsers . 61 5.3.1 IGV . 61 5.3.2 UCSC Genome Browser . 61 5.3.3 Gbrowse . 61 5.3.4 JBrowse . 61 5.4 Lab 6: Genome Annotation Data . 61 5.4.1 Part I: Storing a Genome to a Dictionary . 61 CONTENTS 4 5.4.2 Part II: Storing a GFF to a list . 62 5.4.3 Part III: Find a gene of interest in Drosophila melanogaster ............... 62 6 Transcriptomics 65 6.1 High-throughput Sequencing (HTS) . 65 6.2 RNA Deep Sequencing . ..

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    118 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us