A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
DNA Sequencing and Sorting: Identifying Genetic Variations
BioMath DNA Sequencing and Sorting: Identifying Genetic Variations Student Edition Funded by the National Science Foundation, Proposal No. ESI-06-28091 This material was prepared with the support of the National Science Foundation. However, any opinions, findings, conclusions, and/or recommendations herein are those of the authors and do not necessarily reflect the views of the NSF. At the time of publishing, all included URLs were checked and active. We make every effort to make sure all links stay active, but we cannot make any guaranties that they will remain so. If you find a URL that is inactive, please inform us at [email protected]. DIMACS Published by COMAP, Inc. in conjunction with DIMACS, Rutgers University. ©2015 COMAP, Inc. Printed in the U.S.A. COMAP, Inc. 175 Middlesex Turnpike, Suite 3B Bedford, MA 01730 www.comap.com ISBN: 1 933223 71 5 Front Cover Photograph: EPA GULF BREEZE LABORATORY, PATHO-BIOLOGY LAB. LINDA SHARP ASSISTANT This work is in the public domain in the United States because it is a work prepared by an officer or employee of the United States Government as part of that person’s official duties. DNA Sequencing and Sorting: Identifying Genetic Variations Overview Each of the cells in your body contains a copy of your genetic inheritance, your DNA which has been passed down to you, one half from your biological mother and one half from your biological father. This DNA determines physical features, like eye color and hair color, and can determine susceptibility to medical conditions like hypertension, heart disease, diabetes, and cancer. -
Evidence of Selection at the Ramosa1 Locus During Maize Domestication
Molecular Ecology (2010) 19, 1296–1311 doi: 10.1111/j.1365-294X.2010.04562.x Evidence of selection at the ramosa1 locus during maize domestication BRANDI SIGMON and ERIK VOLLBRECHT Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011, USA Abstract Modern maize was domesticated from Zea mays parviglumis, a teosinte, about 9000 years ago in Mexico. Genes thought to have been selected upon during the domestication of crops are commonly known as domestication loci. The ramosa1 (ra1) gene encodes a putative transcription factor that controls branching architecture in the maize tassel and ear. Previous work demonstrated reduced nucleotide diversity in a segment of the ra1 gene in a survey of modern maize inbreds, indicating that positive selection occurred at some point in time since maize diverged from its common ancestor with the sister species Tripsacum dactyloides and prompting the hypothesis that ra1 may be a domestication gene. To investigate this hypothesis, we examined ear phenotypes resulting from minor changes in ra1 activity and sampled nucleotide diversity of ra1 across the phylogenetic spectrum between tripsacum and maize, including a broad panel of teosintes and unimproved maize landraces. Weak mutant alleles of ra1 showed subtle effects in the ear, including crooked rows of kernels due to the occasional formation of extra spikelets, correlating a plausible, selected trait with subtle variations in gene activity. Nucleotide diversity was significantly reduced for maize landraces but not for teosintes, and statistical tests implied directional selection on ra1 consistent with the hypothesis that ra1 is a domestication locus. In maize landraces, a noncoding 3¢-segment contained almost no genetic diversity and 5¢-flanking diversity was greatly reduced, suggesting that a regulatory element may have been a target of selection. -
DNA Sequencing
Contig Assembly ATCGATGCGTAGCAGACTACCGTTACGATGCCTT… TAGCTACGCATCGTCTGATGGCAATGCTACGGAA.. C T AG AGCAGA TAGCTACGCATCGT GT CTACCG GC TT AT CG GTTACGATGCCTT AT David Wishart, Ath 3-41 [email protected] DNA Sequencing 1 Principles of DNA Sequencing Primer DNA fragment Amp PBR322 Tet Ori Denature with Klenow + ddNTP heat to produce + dNTP + primers ssDNA The Secret to Sanger Sequencing 2 Principles of DNA Sequencing 5’ G C A T G C 3’ Template 5’ Primer dATP dATP dATP dATP dCTP dCTP dCTP dCTP dGTP dGTP dGTP dGTP dTTP dTTP dTTP dTTP ddCTP ddATP ddTTP ddGTP GddC GCddA GCAddT ddG GCATGddC GCATddG Principles of DNA Sequencing G T _ _ short C A G C A T G C + + long 3 Capillary Electrophoresis Separation by Electro-osmotic Flow Multiplexed CE with Fluorescent detection ABI 3700 96x700 bases 4 High Throughput DNA Sequencing Large Scale Sequencing • Goal is to determine the nucleic acid sequence of molecules ranging in size from a few hundred bp to >109 bp • The methodology requires an extensive computational analysis of raw data to yield the final sequence result 5 Shotgun Sequencing • High throughput sequencing method that employs automated sequencing of random DNA fragments • Automated DNA sequencing yields sequences of 500 to 1000 bp in length • To determine longer sequences you obtain fragmentary sequences and then join them together by overlapping • Overlapping is an alignment problem, but different from those we have discussed up to now Shotgun Sequencing Isolate ShearDNA Clone into Chromosome into Fragments Seq. Vectors Sequence 6 Shotgun Sequencing Sequence Send to Computer Assembled Chromatogram Sequence Analogy • You have 10 copies of a movie • The film has been cut into short pieces with about 240 frames per piece (10 seconds of film), at random • Reconstruct the film 7 Multi-alignment & Contig Assembly ATCGATGCGTAGCAGACTACCGTTACGATGCCTT… TAGCTACGCATCGTCTGATGGCAATGCTACGGAA. -
New Softwares for Automated Microsatellite Marker Development
Published online February 21, 2006 Nucleic Acids Research, 2006, Vol. 34, No. 4 e31 doi:10.1093/nar/gnj030 New softwares for automated microsatellite marker development Wellington Martins, Daniel de Sousa1, Karina Proite2, Patrı´cia Guimara˜es2, Marcio Moretzsohn2 and David Bertioli3 Department of Computer Science, Catholic University of Goia´s, Brazil, 1Department of Computer Science, Catholic University of Rio de Janeiro, Brazil, 2Embrapa Genetic Resources and Biotechnology, Brası´lia, Brazil and 3Genomic Sciences and Biotechnology, Catholic University of Brası´lia, Brazil Received November 23, 2005; Revised and Accepted January 31, 2006 ABSTRACT whose unit of repetition is between 1 and 6 bp. They are highly abundant in the genomes of eukaryotes, polymorphic and Microsatellites are repeated small sequence motifs usually co-dominant and transferable between different map- that are highly polymorphic and abundant in the ping populations. Microsatellite markers can also be used in genomes of eukaryotes. Often they are the molecular automated genotyping techniques. Thus, they have become markers of choice. To aid the development of micro- one of the most useful molecular markers for a large number satellite markers we have developed a module that of organisms. integrates a program for the detection of microsatel- Researchers working on the development of microsatellite lites (TROLL), with the sequence assembly and markers need an efficient way to go from usually hundreds or analysis software, the Staden Package. The module thousands of trace and/or text sequence files to the identifi- has easily adjustable parameters for microsatellite cation of new potential markers. However, the softwares lengths and base pair quality control. -
A Brief History of Sequence Logos
Biostatistics and Biometrics Open Access Journal ISSN: 2573-2633 Mini-Review Biostat Biometrics Open Acc J Volume 6 Issue 3 - April 2018 Copyright © All rights are reserved by Kushal K Dey DOI: 10.19080/BBOAJ.2018.06.555690 A Brief History of Sequence Logos Kushal K Dey* Department of Statistics, University of Chicago, USA Submission: February 12, 2018; Published: April 25, 2018 *Corresponding author: Kushal K Dey, Department of Statistics, University of Chicago, 5747 S Ellis Ave, Chicago, IL 60637, USA. Tel: 312-709- 0680; Email: Abstract For nearly three decades, sequence logo plots have served as the standard tool for graphical representation of aligned DNA, RNA and protein sequences. Over the years, a large number of packages and web applications have been developed for generating these logo plots and using them handling and the overall scope of these plots in biological applications and beyond. Here I attempt to review some popular tools for generating sequenceto identify logos, conserved with a patterns focus on in how sequences these plots called have motifs. evolved Also, over over time time, since we their have origin seen anda considerable how I view theupgrade future in for the these look, plots. flexibility of data Keywords : Graphical representation; Sequence logo plots; Standard tool; Motifs; Biological applications; Flexibility of data; DNA sequence data; Python library; Interdependencies; PLogo; Depletion of symbols; Alphanumeric strings; Visualizes pairwise; Oligonucleotide RNA sequence data; Visualize succinctly; Predictive power; Initial attempts; Widespread; Stylistic configurations; Multiple sequence alignment; Introduction based comparisons and predictions. In the next section, we The seeds of the origin of sequence logos were planted in review the modeling frameworks and functionalities of some of early 1980s when researchers, equipped with large amounts of these tools [5]. -
Bioinformatics Manual Sequence Data Analysis with CLC Main Workbench
Introduction to Molecular Biology and Bioinformatics Workshop Biosciences eastern and central Africa - International Livestock Research Institute Hub (BecA-ILRI Hub), Nairobi, Kenya May 5-16, 2014 Bioinformatics Manual Sequence data analysis with CLC Main Workbench Written and compiled by Joyce Njuguna, Mark Wamalwa, Rob Skilton 1 Getting started with CLC Main Workbench A. Quality control of sequenced data B. Assembling sequences C. Nucleotide and protein sequence manipulation D. BLAST sequence search E. Aligning sequences F. Building phylogenetic trees Welcome to CLC Main Workbench -- a user-friendly sequence analysis software package for analysing Sanger sequencing data and for supporting your daily bioinformatics work. Definition Sequence: the order of nucleotide bases [or amino acids] in a DNA [or protein] molecule DNA Sequencing: Biochemical methods used to determine the order of nucleotide bases, adenine(A), guanine(G),cytosine(C)and thymine(T) in a DNA strand TIP CLC Main Workbench includes an extensive Help function, which can be found in the Help menu of the program’s Menu bar. The Help can also be shown by pressing F1. The help topics are sorted in a table of contents and the topics can be searched. Also, it is recommended that you view the Online presentations where a product specialist from CLC bio demonstrates the software. This is a very easy way to get started using the program. Read more about online presentations here: http://clcbio.com/presentation. A. Getting Started with CLC Main Workbench 1. Download the trial version of CLC Main Workbench 6.8.1 from the following URL: http://www.clcbio.com/products/clc-main-workbench-direct-download/ 2. -
A STAT Protein Domain That Determines DNA Sequence Recognition Suggests a Novel DNA-Binding Domain
Downloaded from genesdev.cshlp.org on September 25, 2021 - Published by Cold Spring Harbor Laboratory Press A STAT protein domain that determines DNA sequence recognition suggests a novel DNA-binding domain Curt M. Horvath, Zilong Wen, and James E. Darnell Jr. Laboratory of Molecular Cell Biology, The Rockefeller University, New York, New York 10021 Statl and Stat3 are two members of the ligand-activated transcription factor family that serve the dual functions of signal transducers and activators of transcription. Whereas the two proteins select very similar (not identical) optimum binding sites from random oligonucleotides, differences in their binding affinity were readily apparent with natural STAT-binding sites. To take advantage of these different affinities, chimeric Statl:Stat3 molecules were used to locate the amino acids that could discriminate a general binding site from a specific binding site. The amino acids between residues -400 and -500 of these -750-amino-acid-long proteins determine the DNA-binding site specificity. Mutations within this region result in Stat proteins that are activated normally by tyrosine phosphorylation and that dimerize but have greatly reduced DNA-binding affinities. [Key Words: STAT proteins; DNA binding; site selection] Received January 6, 1995; revised version accepted March 2, 1995. The STAT (signal transducers and activators if transcrip- Whereas oligonucleotides representing these selected se- tion) proteins have the dual purpose of, first, signal trans- quences exhibited slight binding preferences, the con- duction from ligand-activated receptor kinase com- sensus sites overlapped sufficiently to be recognized by plexes, followed by nuclear translocation and DNA bind- both factors. However, by screening different natural ing to activate transcription (Darnell et al. -
A Guide to HIV-1 Reverse Transcriptase and Protease Sequencing for Drug Resistance Studies
HIV-1 RT and Protease Sequencing for Drug Resistance Studies 1 A Guide to HIV-1 Reverse Transcriptase and Reviews Protease Sequencing for Drug Resistance Studies Robert W. Shafer1, Kathryn Dupnik1, Mark A. Winters1, Susan H. Eshleman2 1 Division of Infectious Diseases, Stanford University, Stanford, CA 94305 2 Dept. of Pathology, The Johns Hopkins Medical Institutions, Baltimore, MD 21205 I. HIV-1 Drug Resistance A. Introduction HIV-1 RT and protease sequencing and drug susceptibility testing have been done in research settings for more than ten years to elucidate the genetic mechanisms of resistance to antiretroviral drugs. Retrospective studies have shown that the presence of drug resistance before starting a new drug regimen is an independent predictor of virologic response to that regimen (DeGruttola et al., 2000; Hanna and D’Aquila, 2001; Haubrich and Demeter, 2001). Prospective studies have shown that patients whose physicians have access to drug resistance data, particularly genotypic resistance data, respond better to therapy than control patients whose physicians do not have access to the same data (Baxter et al., 2000; Cohen et al., 2000; De Luca et al., 2001; Durant et al., 1999; Melnick et al., 2000; Meynard et al., 2000; Tural et al., 2000). The accumulation of retrospective and prospective data has led three expert panels to recommend the use of resistance testing in the treatment of HIV-infected patients (EuroGuidelines Group for HIV Resistance, 2001; Hirsch et al., 2000; US Department of Health and Human Services Panel on Clinical Practices for Treatment of HIV Infection, 2000) (Table 1). There have been several recent reviews on methods for assessing HIV-1 drug resistance (Demeter and Haubrich, 2001; Hanna and D’Aquila, 2001; Richman, 2000) and on the mutations associated with drug resistance (Deeks, 2001; Hammond et al., 1999; Loveday, 2001; Miller, 2001; Shafer et al., 2000b). -
Bioinformatics I Sanger Sequence Analysis
Updated: October 2020 Lab 5: Bioinformatics I Sanger Sequence Analysis Project Guide The Wolbachia Project Page Contents 3 Activity at a Glance 4 Technical Overview 5-7 Activity: How to Analyze Sanger Sequences 8 DataBase Entry 9 Appendix I: Illustrated BLAST Alignment 10 Appendix II: Sanger Sequencing Quick Reference Content is made available under the Creative Commons Attribution-NonCommercial-No Derivatives International License. Contact ([email protected]) if you would like to make adaptations for distribution beyond the classroom. The Wolbachia Project: Discover the Microbes Within! was developed by a collaboration of scientists, educators, and outreach specialists. It is directed by the Bordenstein Lab at Vanderbilt University. https://www.vanderbilt.edu/wolbachiaproject 2 Activity at a Glance Goals To analyze and interpret the quality of Sanger sequences To generate a consensus DNA sequence for bioinformatics analyses Learning Objectives Upon completion of this activity, students will (i) understand the Sanger method of sequencing, also known as the chain-termination method; (ii) be able to interpret chromatograms; (iii) evaluate sequencing Quality Scores; and (iv) generate a consensus DNA sequence based on forward and reverse Sanger reactions. Prerequisite Skills While no computer programming skills are necessary to complete this work, prior exposure to personal computers and the Internet is assumed. Teaching Time: One class period Recommended Background Tutorials • DNA Learning Center Animation: Sanger Method of -
Identification of Factors That Interact with the E IA-Inducible Adenovirus E3 Promoter
Downloaded from genesdev.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press Identification of factors that interact with the E IA-inducible adenovirus E3 promoter Helen C. Hurst and Nicholas C. Jones Gene Regulation Group, Imperial Cancer Research Fund, London, England, WC2A 3PX We have investigated the E1A-inducible E3 promoter of adenovirus type 5 with respect to its ability to bind specific nuclear proteins. Four distinct nucleoprotein-binding sites were detected, located between positions - 7 to -33, -44 to -68, -81 to - 103, and - 154 to - 183, relative to the E3 cap site. These sites contain sequences previously shown to be functionally important for efficient E3 transcription. No major qualitative or quantitative differences were found in the binding pattern between nucleoprotein extracts prepared from uninfected or adenovirus-infected HeLa cells. Competition experiments suggest that the factors binding to the - 154 to - 183 and -81 to - 103 sites are the previously identified nucleoproteins, NFI and AP1, respectively. The factor binding to the -44 to -68 site, which we term ATF, also interacts with other E1A-inducible promoters and is very similar and probably identical to the factor that binds to the cAMP-responsive element of somatostatin. We have purified this factor, which is a protein of 43 kD in size. [Key Words: Trans-activation; gel retardation; footprinting; nuclear factors] Received June 12, 1987; revised version accepted October 5, 1987. The control of the rate of transcription initiation is an (Borrelli et al. 1984; Velcich and Ziff 1985), whereas the important element in the regulation of eukaryotic gene protein of 243 amino acids, although it represses very expression. -
Lecture 7: Sequence Motif Discovery
Sequence motif: definitions COSC 348: Computing for Bioinformatics • In Bioinformatics, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and has Lecture 7: been proven or assumed to have a biological significance. Sequence Motif Discovery • Once we know the sequence pattern of the motif, then we can use the search methods to find it in the sequences (i.e. Lubica Benuskova Boyer-Moore algorithm, Rabin-Karp, suffix trees, etc.) • The problem is to discover the motifs, i.e. what is the order of letters the particular motif is comprised of. http://www.cs.otago.ac.nz/cosc348/ 1 2 Examples of motifs in DNA Sequence motif: notations • The TATA promoter sequence is an example of a highly • An example of a motif in a protein: N, followed by anything but P, conserved DNA sequence motif found in eukaryotes. followed by either S or T, followed by anything but P − One convention is to write N{P}[ST]{P} where {X} means • Another example of motifs: binding sites for transcription any amino acid except X; and [XYZ] means either X or Y or Z. factors (TF) near promoter regions of genes, etc. • Another notation: each ‘.’ signifies any single AA, and each ‘*’ Gene 1 indicates one member of a closely-related AA family: Gene 2 − WDIND*.*P..*...D.F.*W***.**.IYS**...A.*H*S*WAMRN Gene 3 Gene 4 • In the 1st assignment we have motifs like A??CG, where the Gene 5 wildcard ? Stands for any of A,U,C,G. Binding sites for TF 3 4 Sequence motif discovery from conservation Motif discovery based on alignment • profile analysis is another word for this. -
What Are DNA Sequence Motifs?
PRIMER What are DNA sequence motifs? Patrik D’haeseleer Sequence motifs are becoming increasingly important in the analysis of gene regulation. How do we define sequence motifs, and why should we use sequence logos instead of consensus sequences to represent them? Do they have any relation with binding affinity? How do we search for new instances of a motif in this sea of DNA? Sequence motifs are short, recurring patterns Restriction enzymes and consensus in DNA that are presumed to have a biologi- sequences cal function. Often they indicate sequence- a HEM13 CCCATTGTTCTC Type II restriction enzymes, discovered in the specific binding sites for proteins such as HEM13 TTTCTGGTTCTC late 1960s, need to bind to their DNA targets http://www.nature.com/naturebiotechnology nucleases and transcription factors (TF). in a highly sequence-specific manner, because Others are involved in important processes at HEM13 TCAATTGTTTAG they are part of a primitive bacterial immune the RNA level, including ribosome binding, ANB1 CTCATTGTTGTC system designed to chop up viral DNA from mRNA processing (splicing, editing, polyad- infecting phages. Straying from their con- enylation) and transcription termination. ANB1 TCCATTGTTCTC sensus binding site specificity would be the In the past, binding sites were typically ANB1 CCTATTGTTCTC equivalent of an autoimmune reaction that determined through DNase footprinting, and could lead to irreversible damage to the bacte- gel-shift or reporter construct assays, whereas ANB1 TCCATTGTTCGT rial genome. For example, EcoRI binds to the binding affinities to artificial sequences were ROX1 CCAATTGTTTTG 6-mer GAATTC, and only to that sequence. explored using SELEX. Nowadays, com- Note that this motif is a palindrome, reflect- putational methods are generating a flood ing the fact that the EcoRI protein binds to Nature Publishing Group Group Nature Publishing b YCHATTGTTCTC 6 of putative regulatory sequence motifs by the DNA as a homodimer.