DAMBE Manual

The DAMBE Manual Revise alignment (half done) Xuhua Xia, 2014 2 CONTENTS The DAMBE Manual.................................................................................................................................... 1 Instruction for students ................................................................................................................................ 5 Why bioinformatics? ................................................................................................................................. 5 Why DAMBE? .......................................................................................................................................... 5 Answers all questions at end of each lab ................................................................................................... 6 Acknowledgement ..................................................................................................................................... 6 Lab 1 sequence databases and string matchings ........................................................................................ 7 Summary ................................................................................................................................................... 7 Data resource and software tools............................................................................................................... 7 Representative database and web interface: NCBI, GenBank, and Entrez ........................................... 7 Annotated sequences ............................................................................................................................ 8 Sequence formats .................................................................................................................................. 8 Ambiguous codes, and gap symbols in sequences ................................................................................ 9 Why do we need to extract sequence elements? ................................................................................... 9 Retrieve the KAL153 genome with Entrez ..............................................................................................10 Extracting coding sequences from a GenBank file ..................................................................................10 Computing nucleotide frequencies ...........................................................................................................11 Computing Karlin-Altschul parameters ...................................................................................................12 Which subtype does KAL153 belong to? Is it recombinant? ...................................................................13 Subtypes in HIV-1 M group ................................................................................................................13 Retrieve reference subtype sequences .................................................................................................13 Use BLAST to identify subtypes .........................................................................................................13 Lecture questions: ....................................................................................................................................16 Lab questions: ..........................................................................................................................................16 Lab 2 Making sense of genomes: Position Weight Matrix (PWM) .........................................................17 Introduction ..............................................................................................................................................17 Two approaches to understand the meaning of a nucleotide sequence ................................................17 Extraction of annotated gene features from GenBank files .................................................................17 Position weight matrix .........................................................................................................................18 Objectives .................................................................................................................................................18 Procedures ................................................................................................................................................18 A brief peek into a GenBank file .........................................................................................................18 Extracting annotated sequence elements with DAMBE ......................................................................19 Characterize 5’ and 3’ splice sites with position weight matrix (PWM) .............................................19 Scan sequences for splice site signals ..................................................................................................22 Limitations of PWM ............................................................................................................................22 More questions .........................................................................................................................................23 Lab 3 Gibbs Sampler and Yeast Intron Properties ..................................................................................24 Introduction ..............................................................................................................................................24 Genetic switches ..................................................................................................................................24 Gibbs sampler and its application in molecular biology ......................................................................24 Identifying genetic motifs with Gibbs sampler ....................................................................................24 Objectives .................................................................................................................................................25 Procedures ................................................................................................................................................26 Copy column-based output from DAMBE to EXCEL ........................................................................26 Running Gibbs sampler in DAMBE ....................................................................................................27 Studying S1 and S2 distances ..............................................................................................................29 More questions .........................................................................................................................................30 3 Lab 5 Codon Usage Bias ............................................................................................................................. 32 Introduction ............................................................................................................................................. 32 Codon usage bias ................................................................................................................................ 32 Codon usage bias and tRNA abundance ............................................................................................. 36 Objectives ................................................................................................................................................ 37 Use RSCU and CAI to characterize codon usage ............................................................................... 37 Understand the relationship between tRNA abundance and codon usage .......................................... 37 Procedures ............................................................................................................................................... 37 Computing CAI and RSCU ................................................................................................................ 37 Identifying tRNA anticodon ............................................................................................................... 41 More questions ........................................................................................................................................ 43 Lab 6 RNA secondary structure, minimum folding energy, and IRES.................................................. 44 Introduction ............................................................................................................................................. 44 RNA secondary structure .................................................................................................................... 44 Minimum folding energy (MFE) ........................................................................................................ 44 5’ UTR secondary structure and translation economy ........................................................................ 44 Internal ribosomal entry site ............................................................................................................... 45 Objectives ................................................................................................................................................ 45 Procedures ..............................................................................................................................................

DAMBE Manual

Treebase: an R Package for Discovery, Access and Manipulation of Online Phylogenies

Regulatory Motifs of DNA

Motif Discovery in Sequential Data Kyle L. Jensen

A General Pairwise Interaction Model Provides an Accurate Description of in Vivo Transcription Factor Binding Sites

A New Sequence Logo Plot to Highlight Enrichment and Depletion

Biocreative 2012 Proceedings

Predicting Expression Levels of De NovoProtein Designs in Yeast

Low-N Protein Engineering with Data-Efficient Deep Learning

TESE Kelma Sirleide De Souza.Pdf

Lecture 12. Gene Finding & Rnaseq

Interpretable Convolution Methods for Learning Genomic Sequence Motifs

Tutorial: Environment for Tree Exploration Release 3.0.0B7