(12) United States Patent (10) Patent No.: US 6,617,156 B1 Doucette-Stamm Et Al

Total Page:16

File Type:pdf, Size:1020Kb

(12) United States Patent (10) Patent No.: US 6,617,156 B1 Doucette-Stamm Et Al USOO6617156B1 (12) United States Patent (10) Patent No.: US 6,617,156 B1 Doucette-Stamm et al. (45) Date of Patent: Sep. 9, 2003 (54) NUCLEIC ACID AND AMINO ACID 5,624,816 A * 4/1997 Carraway et al. .......... 435/69.1 SEQUENCES RELATING TO ENTEROCOCCUS EAECALIS FOR FOREIGN PATENT DOCUMENTS DLAGNOSTICS AND THERAPEUTICS WO 9850555 11/1998 WO 98/80554 11/1998 ........... C12N/15/31 (76) Inventors: Lynn A. Doucette-Stamm, 14 Flanagan OTHER PUBLICATIONS Dr., Framingham, MA (US) 01701; David Bush, 205 Holland St., Frere et al. J. Basic Microbiol. 36(5):305-310, 1996.* Somerville, MA (US) 02144 Tanaka et al. Molecular and Cellular Biology 9(2):757-768, - 1989.* (*) Notice: Subject to any disclaimer, the term of this Bugert et al. Molecular Microbiology 15(5):917–933, patent is extended or adjusted under 35 1995.* U.S.C. 154(b) by 0 days. Mason et al. Journal of Bacteriology 175(9):2632-2639, 1993.* (21) Appl. No.: 09/134,000 Wagner et al. Journal of Bacteriology 177(21):6144-6152, 1-1. 1995.* (22) Filed: Aug. 13, 1998 Chen et al. Yeast 7:287–299, 1991.* Related U.S. Application Data * cited by examiner (60) Provisional application No. 60/055,778, filed on Aug. 15, 1997. Primary Examiner Mary E. Mosher (51) Int. CI.7 C12N 15/31; C12N 15/63; (74) Attorney, Agent, or Firm-Genome Therapeutics CN is. Cio 1's Corporation (52) U.S. Cl. ................ 435/320.1; 536/23.7:536/24.32; (57) ABSTRACT 435/252.3; 435/69.1; 435/6 - (58) Field of Search 536/23.7, 24.32: The invention provides isolated polypeptide and nucleic acid 435/320.1, 252.3, 6, 69.1, 69.3 Sequences derived from EnterOCOccuS faecalis that are use • us a- a-- a- (-1s ws w - - -us s/ 1 - ful in diagnosis and therapy of pathological conditions, (56) References Cited antibodies against the polypeptides, and methods for the production of the polypeptides. The invention also provides U.S. PATENT DOCUMENTS methods for the detection, prevention and treatment of 4,835,256 A * 5/1989 Taniguchi et al 530/351 pathological conditions resulting from bacterial infection. 5,417,971. A * 5/1995 Potter et al. ............. 424/256.1 5,459,034 A * 10/1995 Tabaqchali et al. ............ 435/6 19 Claims, No Drawings US 6,617,156 B1 1 2 NUCLEIC ACID AND AMINO ACID tion (1993) MMWR 42:597-599). These numbers may not SEQUENCES RELATING TO be an accurate reflection of the actual total, as clinical ENTEROCOCCUS EAECALIS FOR identification of Vancomycin resistance is not consistently DLAGNOSTICS AND THERAPEUTICS detected, especially in the VanB phenotype which confers moderate resistance (Tenover, F C (1993) J Clin Microbiol This application claims priority of U.S. Provisional 31: 1695–1699; and Sahm, D F (1990) Antimicrob Agents application No. 60/055,778, filed Aug. 15, 1997, all of which Chemother 34: 1846–1848; and Zabransky, R J (1994) is hereby incorporated herein by reference in its entirety. Microbiol Infect Dis 20:113-116). Patients can be colonized and carry VRE without symptoms, with chief areas of FIELD OF THE INVENTION colonization being anus, axilla, Stool, perineal, umbilicus, The invention relates to isolated nucleic acids and wounds, foley catheters, and colostomy Sites. polypeptides derived from EnterOCOccuS faecalis that are Epidemiology of E. faecalis is not completely understood, useful as molecular targets for diagnosis, prophylaxis and but it is thought that most infections and colonizations are a treatment of pathological conditions, as well as materials result of the patient's endogenous flora (Murray, BE (1990) and methods for the diagnosis, prevention, and amelioration 15 Clin Microbiol Rev 3:46-65). Recent evidence suggests that of pathological conditions resulting from bacterial infection. E. faecalis can be spread by direct contact with other infected patients, indirect transmission from hospital per BRIEF DESCRIPTION OF THE SEQUENCE sonnel (Boyce, J M et al (1994) J Clin Microbiol LISTING 32:1148–53; and Rhineheart, E et al (1990) N Engl J Med Incorporated herein by reference in its entirety is a 323:1814-1818), or from contaminated hospital surfaces Sequence Listing, comprising SEQ ID NO: 1 to SEQ ID and equipment (Karanfil, L V et al (1992) Infect Control NO: 6812. The Sequence Listing is contained on a Hosp Epidemiol 13:195-200; and Boyce, J Metal (1994) J CD-ROM, three copies of which are filed, the Sequence Clin Microbiol 32:1148–53; and Livornese, LL Jr. (1992) Listing being in a computer-readable ASCII file named Ann Intern Med 117:112-116). Increased risk for the criti 25 cally ill, those with underlying disease of immunosuppres “GTC005.pto', created on Sep. 6, 2001 and of 10.3 mega Sion (i.e. ICU, oncology, and transplant patients), cardio bytes in size, in Windows NT 4.0, ASCII text format. thoracic/intraabdominal Surgical patients and those with BACKGROUND OF THE INVENTION urinary or central venous catheters has been demonstrated. In addition, risk for E. faecalis infection increases for Enterococcus faecalis (E. faecalis) is a gram-positive, patients with long hospital stays or previous multiantimi facultative, anaerobic cocci, that is widely distributed in crobial or vancomycin treatments (Boyce, J M et al (1994) nature, animals, and humans. Enterococci are part of the J Clin Microbiol 32:1148–1153; Boyle, JF et al (1993) J normal gastrointestinal and genital tract flora, and among the Clin Microbiol 31:1280–1285; Karanfil, L V et al (1992) 17 known species, E. faecalis is dominant in humans, Infect Control Hosp Epidemiol 13:195-200; Handwerger, S accounting for 80-90% of clinically isolated specimens, and 35 et al (1993) Clin Infect Dis 16:750–755; Montecalvo, MA it exhibits increasing levels of multidrug resistance et al (1994) Antimicrob Agents Chemother 38:1363–1367). (Kaufhold, A and Klein, R (1995) Zentralblatt fuer Bakite Additional concern stems from the ability of the E. riologie 282 (4): 507–518; and Svec, P, Sedlacek, I, and faecalis plasmid borne VanA gene, which conferS high level Pakrova, E (1996) Epidemiologie Mikrobiologie Imunologie Vancomycin resistance, to transfer in vitro to Several gram 45: 153–157). E. faecalis infections include urinary tract 40 positive microorganisms. Such as StaphylococcuS aureuS infections (UTI), bacteremia, endocarditis, and wound and (Leclercq, R et al (1989) Antimicrob Agents Chemother abdominal-pelvic infections, accounting for 16% of all 33:10-15; and Noble, WC, et al (1992) FEMS Microbiology UTIs, and 8% of all becteremias (Ardino, RC, and Murray, Letters 72:195-198). To date, no clinical isolates of S. B E (1990) Principles and Practice of Infectious Diseases, aureuS or S. epidermidis have shown Vancomycin resistance 3rd ed., Mandell et al, eds., Update Vol. 2, No. 4). 45 Vancomycin resistant enterococci (VRE) have emerged in conferred by plasmid transfer, but clinically isolated Strains the midst of high level resistance to penicillin and aminogly of S. haemolyticus have (Degner, J E, et al (1994) J Clin cosides (Centers for Disease Control and Prevention (1993) Microbiol32:2260–2265; and Veach, LA, et al (1990).J Clin MMWR 42:597-599; and Handwerger, S, et al (1993) Clin Microbiol 28:2064–2068). Infect Dis 16:750-755). Resistance can be intrinsic 50 These concerns point to the need for diagnostic tools and (chromosomally mediated), or acquired (plasmid or trans therapeutics aimed at proper identification of Strain and poson mediated), with higher levels of resistance in eradication of virulence. The design of vaccines that will acquired. VRE are characterized by resistance to virtually all limit the Spread of infection and halt transfer of resistance available antibiotics, including Vancomycin, considered the factorS is very desirable. “last resort' antibiotic effective against gram-positive bac 55 SUMMARY OF THE INVENTION teria. Treatment options for physicians are limited, with the latest Strategy being combinations of antimicrobials or the The present invention fulfills the need for diagnostic tools use of new unproven compounds (Moellering, RCJr. (1991) and therapeutics by providing bacterial-specific composi J Antimicrob Chemother 28: 1-12; and Hayden, M Ketal tions and methods for detecting, treating, and preventing (1994) Antimicrob Agents Chemother 38 1225–1229; and 60 bacterial infection, in particular E. faecalis infection. Mobarakai, N et al (1994) J Antimicrob Chemother 33: The present invention encompasses isolated nucleic acids 319-321). From 1989 through 1993, the percentage of and polypeptides derived from E. faecalis that are useful as nosocomial (hospital incurred) infections by VRE increased reagents for diagnosis of bacterial disease, components of from 0.3% to 7.9% (Centers for Disease Control and Pre effective antibacterial vaccines, and/or as targets for anti vention (1993) MMWR 42:597-599). There was a 34-fold 65 bacterial drugs, including anti-E. faecalis drugs. They can increase in ICU patients, and a increasing trend among also be used to detect the presence of E. faecalis and other non-ICU patients (Centers for Disease Control and Preven EnterococcuS Species in a Sample, and in Screening com US 6,617,156 B1 3 4 pounds for the ability to interfere with the E. faecalis life media having recorded thereon a nucleotide Sequence of the cycle or to inhibit E. faecalis infection. They also has use as present invention. The choice of the data Storage Structure biocontrol agents for plants. will generally be based on the means chosen to access the The present invention also provides a genome-wide com Stored information. In addition, a variety of data processor 5 programs and formats can be used to Store the nucleotide parison by FASTA of the predicted amino acid Sequences of Sequence information of the present invention on computer several E.
Recommended publications
  • Evidence of Selection at the Ramosa1 Locus During Maize Domestication
    Molecular Ecology (2010) 19, 1296–1311 doi: 10.1111/j.1365-294X.2010.04562.x Evidence of selection at the ramosa1 locus during maize domestication BRANDI SIGMON and ERIK VOLLBRECHT Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011, USA Abstract Modern maize was domesticated from Zea mays parviglumis, a teosinte, about 9000 years ago in Mexico. Genes thought to have been selected upon during the domestication of crops are commonly known as domestication loci. The ramosa1 (ra1) gene encodes a putative transcription factor that controls branching architecture in the maize tassel and ear. Previous work demonstrated reduced nucleotide diversity in a segment of the ra1 gene in a survey of modern maize inbreds, indicating that positive selection occurred at some point in time since maize diverged from its common ancestor with the sister species Tripsacum dactyloides and prompting the hypothesis that ra1 may be a domestication gene. To investigate this hypothesis, we examined ear phenotypes resulting from minor changes in ra1 activity and sampled nucleotide diversity of ra1 across the phylogenetic spectrum between tripsacum and maize, including a broad panel of teosintes and unimproved maize landraces. Weak mutant alleles of ra1 showed subtle effects in the ear, including crooked rows of kernels due to the occasional formation of extra spikelets, correlating a plausible, selected trait with subtle variations in gene activity. Nucleotide diversity was significantly reduced for maize landraces but not for teosintes, and statistical tests implied directional selection on ra1 consistent with the hypothesis that ra1 is a domestication locus. In maize landraces, a noncoding 3¢-segment contained almost no genetic diversity and 5¢-flanking diversity was greatly reduced, suggesting that a regulatory element may have been a target of selection.
    [Show full text]
  • DNA Sequencing
    Contig Assembly ATCGATGCGTAGCAGACTACCGTTACGATGCCTT… TAGCTACGCATCGTCTGATGGCAATGCTACGGAA.. C T AG AGCAGA TAGCTACGCATCGT GT CTACCG GC TT AT CG GTTACGATGCCTT AT David Wishart, Ath 3-41 [email protected] DNA Sequencing 1 Principles of DNA Sequencing Primer DNA fragment Amp PBR322 Tet Ori Denature with Klenow + ddNTP heat to produce + dNTP + primers ssDNA The Secret to Sanger Sequencing 2 Principles of DNA Sequencing 5’ G C A T G C 3’ Template 5’ Primer dATP dATP dATP dATP dCTP dCTP dCTP dCTP dGTP dGTP dGTP dGTP dTTP dTTP dTTP dTTP ddCTP ddATP ddTTP ddGTP GddC GCddA GCAddT ddG GCATGddC GCATddG Principles of DNA Sequencing G T _ _ short C A G C A T G C + + long 3 Capillary Electrophoresis Separation by Electro-osmotic Flow Multiplexed CE with Fluorescent detection ABI 3700 96x700 bases 4 High Throughput DNA Sequencing Large Scale Sequencing • Goal is to determine the nucleic acid sequence of molecules ranging in size from a few hundred bp to >109 bp • The methodology requires an extensive computational analysis of raw data to yield the final sequence result 5 Shotgun Sequencing • High throughput sequencing method that employs automated sequencing of random DNA fragments • Automated DNA sequencing yields sequences of 500 to 1000 bp in length • To determine longer sequences you obtain fragmentary sequences and then join them together by overlapping • Overlapping is an alignment problem, but different from those we have discussed up to now Shotgun Sequencing Isolate ShearDNA Clone into Chromosome into Fragments Seq. Vectors Sequence 6 Shotgun Sequencing Sequence Send to Computer Assembled Chromatogram Sequence Analogy • You have 10 copies of a movie • The film has been cut into short pieces with about 240 frames per piece (10 seconds of film), at random • Reconstruct the film 7 Multi-alignment & Contig Assembly ATCGATGCGTAGCAGACTACCGTTACGATGCCTT… TAGCTACGCATCGTCTGATGGCAATGCTACGGAA.
    [Show full text]
  • New Softwares for Automated Microsatellite Marker Development
    Published online February 21, 2006 Nucleic Acids Research, 2006, Vol. 34, No. 4 e31 doi:10.1093/nar/gnj030 New softwares for automated microsatellite marker development Wellington Martins, Daniel de Sousa1, Karina Proite2, Patrı´cia Guimara˜es2, Marcio Moretzsohn2 and David Bertioli3 Department of Computer Science, Catholic University of Goia´s, Brazil, 1Department of Computer Science, Catholic University of Rio de Janeiro, Brazil, 2Embrapa Genetic Resources and Biotechnology, Brası´lia, Brazil and 3Genomic Sciences and Biotechnology, Catholic University of Brası´lia, Brazil Received November 23, 2005; Revised and Accepted January 31, 2006 ABSTRACT whose unit of repetition is between 1 and 6 bp. They are highly abundant in the genomes of eukaryotes, polymorphic and Microsatellites are repeated small sequence motifs usually co-dominant and transferable between different map- that are highly polymorphic and abundant in the ping populations. Microsatellite markers can also be used in genomes of eukaryotes. Often they are the molecular automated genotyping techniques. Thus, they have become markers of choice. To aid the development of micro- one of the most useful molecular markers for a large number satellite markers we have developed a module that of organisms. integrates a program for the detection of microsatel- Researchers working on the development of microsatellite lites (TROLL), with the sequence assembly and markers need an efficient way to go from usually hundreds or analysis software, the Staden Package. The module thousands of trace and/or text sequence files to the identifi- has easily adjustable parameters for microsatellite cation of new potential markers. However, the softwares lengths and base pair quality control.
    [Show full text]
  • A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection
    ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection Nikolaos Alachiotis Emmanouella Vogiatzi∗ Scientific Computing Group Institute of Marine Biology and Genetics HITS gGmbH HCMR Heidelberg, Germany Heraklion Crete, Greece [email protected] [email protected] Pavlos Pavlidis Alexandros Stamatakis Scientific Computing Group Scientific Computing Group HITS gGmbH HITS gGmbH Heidelberg, Germany Heidelberg, Germany [email protected] [email protected] ∗ Affiliated also with the Department of Genetics and Molecular Biology of the Democritian University of Thrace at Alexandroupolis, Greece. Corresponding author: Nikolaos Alachiotis Keywords: chromatograms, software, mis-calls Abstract Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined) characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a per-peak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG), an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA) for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability.
    [Show full text]
  • A Guide to HIV-1 Reverse Transcriptase and Protease Sequencing for Drug Resistance Studies
    HIV-1 RT and Protease Sequencing for Drug Resistance Studies 1 A Guide to HIV-1 Reverse Transcriptase and Reviews Protease Sequencing for Drug Resistance Studies Robert W. Shafer1, Kathryn Dupnik1, Mark A. Winters1, Susan H. Eshleman2 1 Division of Infectious Diseases, Stanford University, Stanford, CA 94305 2 Dept. of Pathology, The Johns Hopkins Medical Institutions, Baltimore, MD 21205 I. HIV-1 Drug Resistance A. Introduction HIV-1 RT and protease sequencing and drug susceptibility testing have been done in research settings for more than ten years to elucidate the genetic mechanisms of resistance to antiretroviral drugs. Retrospective studies have shown that the presence of drug resistance before starting a new drug regimen is an independent predictor of virologic response to that regimen (DeGruttola et al., 2000; Hanna and D’Aquila, 2001; Haubrich and Demeter, 2001). Prospective studies have shown that patients whose physicians have access to drug resistance data, particularly genotypic resistance data, respond better to therapy than control patients whose physicians do not have access to the same data (Baxter et al., 2000; Cohen et al., 2000; De Luca et al., 2001; Durant et al., 1999; Melnick et al., 2000; Meynard et al., 2000; Tural et al., 2000). The accumulation of retrospective and prospective data has led three expert panels to recommend the use of resistance testing in the treatment of HIV-infected patients (EuroGuidelines Group for HIV Resistance, 2001; Hirsch et al., 2000; US Department of Health and Human Services Panel on Clinical Practices for Treatment of HIV Infection, 2000) (Table 1). There have been several recent reviews on methods for assessing HIV-1 drug resistance (Demeter and Haubrich, 2001; Hanna and D’Aquila, 2001; Richman, 2000) and on the mutations associated with drug resistance (Deeks, 2001; Hammond et al., 1999; Loveday, 2001; Miller, 2001; Shafer et al., 2000b).
    [Show full text]
  • Next-Generation DNA Sequencing Informatics, 2Nd Edition
    This is a free sample of content from Next-Generation DNA Sequencing Informatics, 2nd edition. Click here for more information on how to buy the book. Index Page references followed by f denote figures. Page references followed by t denote tables. A Needleman–Wunsch (NW) algorithm, 49, 54, 110–113 overview, 109–110 Abeel, Thomas, 103 – – – ABI. See Applied Biosystems Inc. Smith Waterman (SW) algorithm, 38, 49, 62 63, 111 113 Ab initio genome annotation, 172, 178, 180t–181t Splign, 182 – TopHat, 43, 182 ab1PeakReporter software, 52 53 – A-Bruijn graph, 133–134 Alignment score, FASTA, 64 65 ABySS (Assembly by Short Sequencing), 134, 142, 147–153 Allele, 52, 354 Allele frequency, 76, 94, 193 effect of k-mer size and minimum pair number on assembly, fi 148–149, 149f Allele-speci c expression, 155, 298 overview of, 147–148 ALLPATHS, 134 quality of assembly, 149–153, 150t, 151f–152f ALN format, 92 α transcriptome assembly (Trans-ABySS), 158t, 160–161, 166 -diversity indices, 319 – – AceView database, 294, 295f Alternative splicing, 182, 293 296, 294f 295f Acrylamide gels Altschul, Stephen, 65 capillary tube, 4 Amazon Elastic Compute Cloud (EC2), 43, 254, 300, 315, – Sanger sequencing and, 2, 3–4 362 364, 366, 369 – ACT, 179t Amino acids, pairwise comparisons, 48 49 Adapter removal, 37–39, 39f, 43 Amplicons, 8, 30, 89, 204, 309, 312 Adapter Removal program, 38 Amplicon Variant Analyzer, 101 Affine gaps, 42, 110, 111–112 AmpliSeq Cancer Panel (Ion Torrent), 206 Algorithms Annotation, 75. See also Genome annotation – – – alignment, 49, 109–124, 129, 223, 338, 344 ChIP-seq peak, 240 242, 255, 259, 262 263, 262f 263f – assembly, 59, 127–129, 133–134, 338 proteogenomics and, 327 328, 328f – database searching, 113–115 of variants, 208 212 development, 364 ANNOVAR, 211 DNA fragment/genome assembly, 127–129, 133–134, 142 Anthrax, 141 dynamic programming, 110–124 Anti-sense RNA, 281 file compression, 79 Application programming interface (API), 368 Golay error-correcting, 31 Applied Biosystems Inc.
    [Show full text]
  • Comparison of DNA Sequence Assembly Algorithms Using Mixed Data Sources
    Comparison of DNA Sequence Assembly Algorithms Using Mixed Data Sources A Thesis Submitted to the College of Graduate Studies and Research in Partial Fulfillment of the Requirements for the degree of Master of Science in the Department of Computer Science University of Saskatchewan Saskatoon By Tejumoluwa Abegunde c Tejumoluwa Abegunde, April/2010. All rights reserved. Permission to Use In presenting this thesis in partial fulfilment of the requirements for a Postgraduate degree from the University of Saskatchewan, I agree that the Libraries of this University may make it freely available for inspection. I further agree that permission for copying of this thesis in any manner, in whole or in part, for scholarly purposes may be granted by the professor or professors who supervised my thesis work or, in their absence, by the Head of the Department or the Dean of the College in which my thesis work was done. It is understood that any copying or publication or use of this thesis or parts thereof for financial gain shall not be allowed without my written permission. It is also understood that due recognition shall be given to me and to the University of Saskatchewan in any scholarly use which may be made of any material in my thesis. Requests for permission to copy or to make other use of material in this thesis in whole or part should be addressed to: Head of the Department of Computer Science 176 Thorvaldson Building 110 Science Place University of Saskatchewan Saskatoon, Saskatchewan Canada S7N 5C9 i Abstract DNA sequence assembly is one of the fundamental areas of bioinformatics.
    [Show full text]
  • Downloading and Will Run As Stand-Alone Software
    BMC Bioinformatics BioMed Central Software Open Access preAssemble: a tool for automatic sequencer trace data processing Alexei A Adzhubei*1,4, Jon K Laerdahl2 and Anna V Vlasova3 Address: 1Norwegian School of Veterinary Science, BasAM – Genetics, P.O. Box 8146 Dep, NO-0033 Oslo, Norway, 2Centre for Molecular Biology and Neuroscience (CMBN) Institute of Medical Microbiology, Rikshospitalet, NO-0027 Oslo, Norway, 3Engelhardt Institute of Molecular Biology, Vavilov St. 32, 117984 Moscow, Russia and 4The Biotechnology Centre of Oslo, University of Oslo, P.O. Box 1125 Blindern, NO-0317 Oslo, Norway Email: Alexei A Adzhubei* - [email protected]; Jon K Laerdahl - [email protected]; Anna V Vlasova - [email protected] * Corresponding author Published: 17 January 2006 Received: 02 August 2005 Accepted: 17 January 2006 BMC Bioinformatics 2006, 7:22 doi:10.1186/1471-2105-7-22 This article is available from: http://www.biomedcentral.com/1471-2105/7/22 © 2006 Adzhubei et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: Trace or chromatogram files (raw data) are produced by automatic nucleic acid sequencing equipment or sequencers. Each file contains information which can be interpreted by specialised software to reveal the sequence (base calling). This is done by the sequencer proprietary software or publicly available programs. Depending on the size of a sequencing project the number of trace files can vary from just a few to thousands of files.
    [Show full text]
  • Basecalling, Alignment, Assembly and Deconvolution of Sanger
    Rausch et al. BMC Genomics (2020) 21:230 https://doi.org/10.1186/s12864-020-6635-8 SOFTWARE Open Access Tracy: basecalling, alignment, assembly and deconvolution of sanger chromatogram trace files Tobias Rausch1,2,3*† , Markus Hsi-Yang Fritz2†, Andreas Untergasser1,4† and Vladimir Benes1 Abstract Background: DNA sequencing is at the core of many molecular biology laboratories. Despite its long history, there is a lack of user-friendly Sanger sequencing data analysis tools that can be run interactively as a web application or at large-scale in batch from the command-line. Results: We present Tracy, an efficient and versatile command-line application that enables basecalling, alignment, assembly and deconvolution of sequencing chromatogram files. Its companion web applications make all functionality of Tracy easily accessible using standard web browser technologies and interactive graphical user interfaces. Tracy can be easily integrated in large-scale pipelines and high-throughput settings, and it uses state-of-the-art file formats such as JSON and BCF for reporting chromatogram sequencing results and variant calls. The software is open-source and freely available at https://github.com/gear-genomics/tracy, the companion web applications are hosted at https://www.gear-genomics.com. Conclusions: Tracy can be routinely applied in large-scale validation efforts conducted in clinical genomics studies as well as for high-throughput genome editing techniques that require a fast and rapid method to confirm discovered variants or engineered mutations. Molecular biologists benefit from the companion web applications that enable installation-free Sanger chromatogram analyses using intuitive, graphical user interfaces. Keywords: Chromatogram, PCR, Sanger sequencing, Alignment, Variant calling Background patch a reference sequence based on trace information.
    [Show full text]
  • Gap5—Editing the Billion Fragment Sequence Assembly James K
    Vol. 26 no. 14 2010, pages 1699–1703 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btq268 Genome analysis Advance Access publication May 30, 2010 Gap5—editing the billion fragment sequence assembly James K. Bonfield∗ and Andrew Whitwham Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK Associate Editor: Dmitrij Frishman ABSTRACT and memory footprint. However, the solutions typically employed Motivation: Existing sequence assembly editors struggle with the by these programs are only amenable for read-only access, with the volumes of data now readily available from the latest generation of exception of NGSView that can perform some minor editing tasks. DNA sequencing instruments. In addition to algorithmic efficiency, the large increase in Results: We describe the Gap5 software along with the data the number of DNA fragments has put a strain on our storage structures and algorithms used that allow it to be scalable. We requirements. By using data compression methods, the storage demonstrate this with an assembly of 1.1 billion sequence fragments burden can be greatly reduced, with the BAM file format being and compare the performance with several other programs. We one such recent example. When coupled with an index, compressed analyse the memory, CPU, I/O usage and file sizes used by Gap5. BAM files can be randomly accessed. Availability and Implementation: Gap5 is part of the Staden We present the Gap5 program: a sequence assembly viewer and Package and is available under an Open Source licence from http:// editor. This encompasses both base by base editing operations as staden.sourceforge.net. It is implemented in C and Tcl/Tk.
    [Show full text]
  • Download from Ftp:/ Matter of Fashion? Nat Rev Genet 2004, 5:63-69
    BMC Bioinformatics BioMed Central Software Open Access STAMP: Extensions to the STADEN sequence analysis package for high throughput interactive microsatellite marker design Lars Kraemer1,2, Bánk Beszteri2, Steffi Gäbler-Schwarz2, Christoph Held2, Florian Leese2,3, Christoph Mayer3, Kevin Pöhlmann2 and Stephan Frickenhaus*2 Address: 1Institut für Klinische Molekularbiologie, Universität Kiel, Arnold-Heller-Str 3, 24105 Kiel, Germany, 2Alfred Wegener Institute for Polar and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, Germany and 3Animal Ecology, Evolution and Biodiversity, Ruhr University, 44780 Bochum, Germany Email: Lars Kraemer - [email protected]; Bánk Beszteri - [email protected]; Steffi Gäbler-Schwarz - [email protected]; Christoph Held - [email protected]; Florian Leese - [email protected]; Christoph Mayer - [email protected]; Kevin Pöhlmann - [email protected]; Stephan Frickenhaus* - [email protected] * Corresponding author Published: 30 January 2009 Received: 24 September 2008 Accepted: 30 January 2009 BMC Bioinformatics 2009, 10:41 doi:10.1186/1471-2105-10-41 This article is available from: http://www.biomedcentral.com/1471-2105/10/41 © 2009 Kraemer et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: Microsatellites (MSs) are DNA markers with high analytical power, which are widely used in population genetics, genetic mapping, and forensic studies. Currently available software solutions for high-throughput MS design (i) have shortcomings in detecting and distinguishing imperfect and perfect MSs, (ii) lack often necessary interactive design steps, and (iii) do not allow for the development of primers for multiplex amplifications.
    [Show full text]
  • The Staden Package Manual Last Update on 22 October 2002
    The Staden Package Manual Last update on 22 October 2002 James Bonfield, Kathryn Beal, Mark Jordan, Yaping Cheng and Rodger Staden Copyright c 1999-2002, Medical Research Council, Laboratory of Molecular Biology. Permission is given to duplicate this manual in both paper and electronic forms. If you wish to charge more than the duplication costs, or wish to make edits to the manual, please contact the authors by email to [email protected]. i Short Contents 1 Sequence assembly and finishing using Gap4.................... 3 2 Searching for point mutations using pregap4 and gap4 .......... 207 3 Preparing readings for assembly using pregap4 ................ 223 4 Marking poor quality and vector segments of readings .......... 291 5 Screening Against Vector Sequences ........................ 293 6 Screening Readings for Contaminant Sequences ............... 305 7 Viewing and editing trace data using trev .................... 309 8 Analysing and comparing sequences using spin ................ 319 9 User Interface ......................................... 401 10 File Formats .......................................... 411 11 Man Pages ........................................... 445 References ............................................... 477 General Index ............................................. 479 File Index ................................................ 491 Variable Index ............................................ 493 Function Index ............................................ 495 ii The Staden Package Manual iii Table
    [Show full text]