BIOLOGICAL FEATURE EXTRACTION Elloumi CH10 Date: June 14, 2013 Time: 8:58 Am Elloumi CH10 Date: June 14, 2013 Time: 8:58 Am

Total Page:16

File Type:pdf, Size:1020Kb

BIOLOGICAL FEATURE EXTRACTION Elloumi CH10 Date: June 14, 2013 Time: 8:58 Am Elloumi CH10 Date: June 14, 2013 Time: 8:58 Am Elloumi CH10 Date: June 14, 2013 Time: 8:58 am PART C BIOLOGICAL FEATURE EXTRACTION Elloumi CH10 Date: June 14, 2013 Time: 8:58 am Elloumi CH10 Date: June 14, 2013 Time: 8:58 am CHAPTER 10 ALGORITHMS AND DATA STRUCTURES FOR NEXT-GENERATION SEQUENCES FRANCESCO VEZZI,1,2 GIUSEPPE LANCIA,1 and ALBERTO POLICRITI1,2 1Department of Mathematics and Informatics and 2Institute of Applied Genomics, University of Udine, Udine, Italy The first genome was sequenced in 1975 [87] and from this first success sequencing tech- Q1 nologies have significantly improved, with a strong acceleration in the last few years. Today these technologies allow us to read (huge amounts of) contiguous DNA stretches and are the key to reconstructing the genome sequence of a new species, of an individual within a population, or to studying the levels of expressions of single cell lines. Even though a number of different applications use sequencing data today, the “highest” sequencing goal is always the reconstruction of the complete genome sequence. The success in determining the first human genome sequence has encouraged many groups to tackle the problem of reconstructing the codebook of others species, including microbial, mammalian, and plant genomes. Despite such efforts in sequencing new organisms, most species in the biosphere have not been sequenced yet. There are many reason for this, but the two main causes are the costs of a sequencing project and the difficulties in building a reliable assembly. Until few years ago, Sanger sequencing was the only unquestioned available technology. This method has been used in order to produce many complete genomes of microbes, vertebrates (e.g., human [96]), and plants (e.g., grapevine [37]). Roughly speaking, in order to sequence an organisms, it is necessary to extract the DNA, break it into small fragments, and read their tips. As a final result one obtains a set of sequences, usually named reads, that may be assembled in order to reconstruct the original genome sequence or searched within a database of an already reconstructed genome. Reads are randomly sampled along the DNA sequence, so in order to be sure that each base in the genome is present in at least one read we have to oversample the genome. Given a set of reads, the sum of all the read lengths divided by the genome length is the coverage. If the ratio between the overall length of the reads and the genome length is C, then we say that the genome has been sequenced with depth of coverage C or C times (C×). One of the first and most important (practical) algorithmic insights in genome assembly (see [96]) was the observation that using reads coming from the two ends of a single Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data, First Edition. Edited by Mourad Elloumi and Albert Y. Zomaya. © 2013 John Wiley & Sons, Inc. Published 2013 by John Wiley & Sons, Inc. 225 Elloumi CH10 Date: June 14, 2013 Time: 8:58 am 226 ALGORITHMS AND DATA STRUCTURES FOR NEXT-GENERATION SEQUENCES sequence, named the paired reads of an insert* of known estimated length, the overall down-line process of assembly was greatly simplified. Recently, new sequencing methods have emerged [59]. In particular, the commercially available technologies include pyrosequencing (454 [1]), sequencing by synthesis (Illumina [3]), and sequencing by ligation (SOLiD [2]). Compared to the traditional Sanger method, these technologies function with significantly lower production costs and much higher throughput. These advances have significantly reduced the cost of several applications having sequencing or resequencing as an intermediate step. The computationally significant aspect of these new technologies is that the reads pro- duced are much shorter than traditional Sanger reads. At the actual state, Illumina HiSeq 2000, the latest Illumina instrument available on the market, is able to produce reads of length 150 bp and generates more than 200 billion of output data per run, Solid 4 System produces paired reads of length 50 bp, while Roche 454 GS FLX Titanium has the lowest throughput but it is able to produce single reads of length 400 bp and paired reads of length 200 bp. Other technologies are now approaching (Polonator, Helicos BioSciences, Pacific BioSciences, and Oxford Nanopore Technologies[66]) promising higher throughput and lower costs. At the beginning of the new-generation sequencing (NGS) era, as a consequence of the extremely short lengths of both reads and inserts, NGS data have been used mainly in (several) resequencing projects [9, 23, 42, 102]. A resequencing project is based on the availability of a reference sequence (usually a fairly complete genome sequence) against which short sequences can be aligned, using a short-read aligner [48, 52, 71, 81]. Rese- quencing projects allow the reconstruction of the genetic information in similar organisms and the identification of differences among individuals of the same species. The most impor- tant such differences are single-nucleotide polymorphisms (SNPs) [53, 90], copy number variation (CNV) [17, 18, 32], and insertion/deletion events (indels) [62]. Despite the short length of reads, but encouraged by technology improvements, many groups have started to use NGS data in order to reconstruct new genomes from scratch. De novo assembly is in general a difficult problem and is made even more difficult not only by short read lengths [69] but also from the problem of having reliable sequencing and distribution error models. Many tools have been proposed (see, e.g., Velvet [104], ALLPATHS [56], and ABySS [92], to mention just a few of the available ones) but the results achievable to date are far from those of the Sanger-era assemblers (PCAP [34]). The unbridled spread of second-generation sequencing machines has been accompanied by a (natural) effort toward producing computational instruments capable of analyzing the large amounts of newly available data. The aim of this chapter is to present and (compar- atively) discuss the data structures that have been proposed in the context of NGS data processing. In particular, we will concentrate our attention on two widely studied areas: data structures for alignment and de novo assembly. The chapter is divided into two main sections. In the first we will classify algorithms and data structures specifically designed for the alignment of short nucleotide sequences produced by NGS instruments against a database. We will propose a division into categories and describe some of the most successful tools proposed so far. In the second part we will *The name “insert” is used since the sequence providing the reads is inserted in a bacterial’s genome to be reproduced in a sufficient number of copies. Elloumi CH10 Date: June 14, 2013 Time: 8:58 am ALIGNERS 227 deal with de novo assembly. De novo assembly is a computationally challenging problem with NP-complete versions easily obtainable from its real-world definition. In this part we will classify the different de novo strategies and describe available tools. Moreover we will focus our attention on the limits of the currently most used tools, discussing when such limits can be traced back to data structures employed and when, instead, they are a direct consequence of the kind of data processed. 10.1 ALIGNERS One of the main applications of string matching is computational biology. A DNA sequence can be seen as a string over the alphabet ={A, C, G, T}. Given a reference genome sequence, we are interested in searching (aligning) different sequences (reads) of various lengths. When aligning such reads against another DNA sequence, we must consider both errors due to the sequencer and intrinsic errors due to the variability between individuals of the same species. For these reasons, all the programs aligning reads against a reference sequence must deal (at least) with mismatches [5, 41]. As a general rule, tools used to align Sanger reads (see [5]) are not suitable—that is, are not efficient enough—to align next-generation sequencer output due, essentially, to the sheer amount of data to handle. (The advent of next-generation sequencers moved the bottleneck from data production to data analysis.) Therefore, in order to keep the pace with data production, new algorithms and data structures have been proposed in the last years. String matching can be divided into two main areas: exact string matching and approx- imate string matching. When doing approximate string matching, we need to employ a distance metric between strings. The most commonly used metrics are the edit distance (or Levenshtein distance) [47] and the Hamming distance [29]. Approximate string matching at distance k under the edit metric is called the k-difference problem, while under the Hamming metric, it is called the k-mismatch problem. In many practical applications like short-sequence alignment, we are interested in finding the best occurrence of the pattern with at most k mismatches. We will refer to this as the best-k- difference/mismatch problem. Recently, a flurry of papers presenting new indexing algo- rithms to solve this problem have appeared [46, 50, 51]. While all the aligners allow to specify constraints on the Hamming distance, only some of them allow to use also the edit distance. All aligners designed for NGS use some form of index to speed up the search phase. Aligners usually build an index over the text, but some solutions that index only the reads or both are available. According to [49], we can cluster existing alignment algorithms into two main classes: algorithms based on hash tables and algorithms based on suffix-based data structures. A third category is formed by algorithms based on merge sorting but, to the best of our knowledge, the only available solution that belongs to this category is [57].
Recommended publications
  • Understanding the Origins, Dispersal, and Evolution of Bonamia Species (Phylum Haplosporidia) Based on Genetic Analyses of Ribosomal RNA Gene Regions
    W&M ScholarWorks Dissertations, Theses, and Masters Projects Theses, Dissertations, & Master Projects 2011 Understanding the Origins, Dispersal, and Evolution of Bonamia Species (Phylum Haplosporidia) Based on Genetic Analyses of Ribosomal RNA Gene Regions Kristina M. Hill College of William and Mary - Virginia Institute of Marine Science Follow this and additional works at: https://scholarworks.wm.edu/etd Part of the Developmental Biology Commons, Evolution Commons, and the Molecular Biology Commons Recommended Citation Hill, Kristina M., "Understanding the Origins, Dispersal, and Evolution of Bonamia Species (Phylum Haplosporidia) Based on Genetic Analyses of Ribosomal RNA Gene Regions" (2011). Dissertations, Theses, and Masters Projects. Paper 1539617909. https://dx.doi.org/doi:10.25773/v5-a0te-9079 This Thesis is brought to you for free and open access by the Theses, Dissertations, & Master Projects at W&M ScholarWorks. It has been accepted for inclusion in Dissertations, Theses, and Masters Projects by an authorized administrator of W&M ScholarWorks. For more information, please contact [email protected]. Understanding the Origins, Dispersal, and Evolution of Bonamia Species (Phylum Haplosporidia) Based on Genetic Analyses of Ribosomal RNA Gene Regions A Thesis Presented to The Faculty of the School of Marine Science The College of William and Mary in Virginia In Partial Fulfillment of the Requirements for the Degree of Master of Science by Kristina M. Hill 2011 APPROVAL SHEET This thesis is submitted in partial fulfillment of the requirements for the degree of Master of Science CH-s 7n - "UuUL ' Kristina Marie Hill Approved, May 2011 w. n Eugene M. Burreson, Ph.D Advisor Kimberly S. Reece, Ph.D.
    [Show full text]
  • Need and Role of Scala Implementations in Bioinformatics
    (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 2, 2017 Need and Role of Scala Implementations in Bioinformatics Abbas Rehman Muhammad Atif Sarwar Department of Computer Science Department of Computer Science COMSATS Institute of Information Technology COMSATS Institute of Information Technology Sahiwal, Pakistan Sahiwal, Pakistan Ali Abbas Javed Ferzund Department of Computer Science Department of Computer Science COMSATS Institute of Information Technology COMSATS Institute of Information Technology Sahiwal, Pakistan Sahiwal, Pakistan Abstract—Next Generation Sequencing has resulted in the evolutionary change in data generation of different sequences. generation of large number of omics data at a faster speed that NGS machines are generating a huge amount of sequence data was not possible before. This data is only useful if it can be stored per day that needs to be stored, analyzed and managed well to and analyzed at the same speed. Big Data platforms and tools like seek the maximum advantages from this. Existing Apache Hadoop and Spark has solved this problem. However, bioinformatics techniques, tools or software are not keeping most of the algorithms used in bioinformatics for Pairwise pace with the speed of data generation. Old Bioinformatics alignment, Multiple Alignment and Motif finding are not tools have very less performance, accuracy and scalability implemented for Hadoop or Spark. Scala is a powerful language while analyzing large amount of data. When storing, managing supported by Spark. It provides, constructs like traits, closures, and analyzing large amount of data which is being generated functions, pattern matching and extractors that make it suitable now a days, these tools require more time and cost with less for Bioinformatics applications.
    [Show full text]
  • Tracheophyte Genomes Keep Track of the Deep Evolution of the 2 Caulimoviridae 3 4 Authors 5 Seydina Diop1, Andrew D.W
    bioRxiv preprint doi: https://doi.org/10.1101/158972; this version posted July 21, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 Tracheophyte genomes keep track of the deep evolution of the 2 Caulimoviridae 3 4 Authors 5 Seydina Diop1, Andrew D.W. Geering2, Françoise Alfama-Depauw1, Mikaël Loaec1, Pierre-Yves 6 Teycheney3 and Florian Maumus1* 7 8 Affiliations 9 1 URGI, INRA, Université Paris-Saclay, 78026 Versailles, France; 10 2 Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, GPO Box 11 267, Brisbane, Queensland 4001, Australia 12 3 UMR AGAP, CIRAD, INRA, SupAgro, 97130 Capesterre Belle-Eau, France 13 14 Corresponding author 15 Florian Maumus 16 URGI-INRA 17 RD10 route de Saint Cyr 18 78026, Versailles 19 France 20 +33 1 30 83 31 74 21 [email protected] 22 23 24 1 bioRxiv preprint doi: https://doi.org/10.1101/158972; this version posted July 21, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 25 Abstract 26 Endogenous viral elements (EVEs) are viral sequences that are integrated in the nuclear genomes of 27 their hosts and are signatures of viral infections that may have occurred millions of years ago. The 28 study of EVEs, coined paleovirology, provides important insights into virus evolution. The 29 Caulimoviridae is the most common group of EVEs in plants, although their presence has often been 30 overlooked in plant genome studies.
    [Show full text]
  • Next-Generation DNA Sequencing Informatics, 2Nd Edition
    This is a free sample of content from Next-Generation DNA Sequencing Informatics, 2nd edition. Click here for more information on how to buy the book. Index Page references followed by f denote figures. Page references followed by t denote tables. A Needleman–Wunsch (NW) algorithm, 49, 54, 110–113 overview, 109–110 Abeel, Thomas, 103 – – – ABI. See Applied Biosystems Inc. Smith Waterman (SW) algorithm, 38, 49, 62 63, 111 113 Ab initio genome annotation, 172, 178, 180t–181t Splign, 182 – TopHat, 43, 182 ab1PeakReporter software, 52 53 – A-Bruijn graph, 133–134 Alignment score, FASTA, 64 65 ABySS (Assembly by Short Sequencing), 134, 142, 147–153 Allele, 52, 354 Allele frequency, 76, 94, 193 effect of k-mer size and minimum pair number on assembly, fi 148–149, 149f Allele-speci c expression, 155, 298 overview of, 147–148 ALLPATHS, 134 quality of assembly, 149–153, 150t, 151f–152f ALN format, 92 α transcriptome assembly (Trans-ABySS), 158t, 160–161, 166 -diversity indices, 319 – – AceView database, 294, 295f Alternative splicing, 182, 293 296, 294f 295f Acrylamide gels Altschul, Stephen, 65 capillary tube, 4 Amazon Elastic Compute Cloud (EC2), 43, 254, 300, 315, – Sanger sequencing and, 2, 3–4 362 364, 366, 369 – ACT, 179t Amino acids, pairwise comparisons, 48 49 Adapter removal, 37–39, 39f, 43 Amplicons, 8, 30, 89, 204, 309, 312 Adapter Removal program, 38 Amplicon Variant Analyzer, 101 Affine gaps, 42, 110, 111–112 AmpliSeq Cancer Panel (Ion Torrent), 206 Algorithms Annotation, 75. See also Genome annotation – – – alignment, 49, 109–124, 129, 223, 338, 344 ChIP-seq peak, 240 242, 255, 259, 262 263, 262f 263f – assembly, 59, 127–129, 133–134, 338 proteogenomics and, 327 328, 328f – database searching, 113–115 of variants, 208 212 development, 364 ANNOVAR, 211 DNA fragment/genome assembly, 127–129, 133–134, 142 Anthrax, 141 dynamic programming, 110–124 Anti-sense RNA, 281 file compression, 79 Application programming interface (API), 368 Golay error-correcting, 31 Applied Biosystems Inc.
    [Show full text]
  • Identification of the Vascular Plants of Churchill, Manitoba, Using a DNA Barcode Library Maria L Kuzmina1*, Karen L Johnson2, Hannah R Barron3 and Paul DN Hebert1
    Kuzmina et al. BMC Ecology 2012, 12:25 http://www.biomedcentral.com/1472-6785/12/25 METHODOLOGY ARTICLE Open Access Identification of the vascular plants of Churchill, Manitoba, using a DNA barcode library Maria L Kuzmina1*, Karen L Johnson2, Hannah R Barron3 and Paul DN Hebert1 Abstract Background : Because arctic plant communities are highly vulnerable to climate change, shifts in their composition require rapid, accurate identifications, often for specimens that lack diagnostic floral characters. The present study examines the role that DNA barcoding can play in aiding floristic evaluations in the arctic by testing the effectiveness of the core plant barcode regions (rbcL, matK) and a supplemental ribosomal DNA (ITS2) marker for a well-studied flora near Churchill, Manitoba. Results: This investigation examined 900 specimens representing 312 of the 354 species of vascular plants known from Churchill. Sequencing success was high for rbcL: 95% for fresh specimens and 85% for herbarium samples (mean age 20 years). ITS2 worked equally well for the fresh and herbarium material (89% and 88%). However, sequencing success was lower for matK, despite two rounds of PCR amplification, which reflected less effective primer binding and sensitivity to the DNA degradation (76% of fresh, 45% of herbaria samples). A species was considered as taxonomically resolved if its members showed at least one diagnostic difference from any other taxon in the study and formed a monophyletic clade. The highest species resolution (69%) was obtained by combining information from all three genes. The joint sequence information for rbcL and matK distinguished 54% of 286 species, while rbcL and ITS2 distinguished 63% of 285 species.
    [Show full text]
  • Comparison of DNA Sequence Assembly Algorithms Using Mixed Data Sources
    Comparison of DNA Sequence Assembly Algorithms Using Mixed Data Sources A Thesis Submitted to the College of Graduate Studies and Research in Partial Fulfillment of the Requirements for the degree of Master of Science in the Department of Computer Science University of Saskatchewan Saskatoon By Tejumoluwa Abegunde c Tejumoluwa Abegunde, April/2010. All rights reserved. Permission to Use In presenting this thesis in partial fulfilment of the requirements for a Postgraduate degree from the University of Saskatchewan, I agree that the Libraries of this University may make it freely available for inspection. I further agree that permission for copying of this thesis in any manner, in whole or in part, for scholarly purposes may be granted by the professor or professors who supervised my thesis work or, in their absence, by the Head of the Department or the Dean of the College in which my thesis work was done. It is understood that any copying or publication or use of this thesis or parts thereof for financial gain shall not be allowed without my written permission. It is also understood that due recognition shall be given to me and to the University of Saskatchewan in any scholarly use which may be made of any material in my thesis. Requests for permission to copy or to make other use of material in this thesis in whole or part should be addressed to: Head of the Department of Computer Science 176 Thorvaldson Building 110 Science Place University of Saskatchewan Saskatoon, Saskatchewan Canada S7N 5C9 i Abstract DNA sequence assembly is one of the fundamental areas of bioinformatics.
    [Show full text]
  • Ultra-High Resolution HLA Genotyping and Allele Discovery by Highly Multiplexed Cdna Amplicon Pyrosequencing
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Harvard University - DASH Ultra-high resolution HLA genotyping and allele discovery by highly multiplexed cDNA amplicon pyrosequencing The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Lank, Simon M, Brittney A Golbach, Hannah M Creager, Roger W Wiseman, Derin B Keskin, Ellis L Reinherz, Vladimir Brusic, and David H O’Connor. 2012. Ultra-high resolution hla genotyping and allele discovery by highly multiplexed cdna amplicon pyrosequencing. BMC Genomics 13: 378. Published Version doi:10.1186/1471-2164-13-378 Accessed February 19, 2015 11:59:01 AM EST Citable Link http://nrs.harvard.edu/urn-3:HUL.InstRepos:10589781 Terms of Use This article was downloaded from Harvard University's DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms- of-use#LAA (Article begins on next page) Lank et al. BMC Genomics 2012, 13:378 http://www.biomedcentral.com/1471-2164/13/378 METHODOLOGY ARTICLE Open Access Ultra-high resolution HLA genotyping and allele discovery by highly multiplexed cDNA amplicon pyrosequencing Simon M Lank1, Brittney A Golbach1, Hannah M Creager1, Roger W Wiseman1, Derin B Keskin2,3, Ellis L Reinherz2,3, Vladimir Brusic2,3 and David H O’Connor1,4* Abstract Background: High-resolution HLA genotyping is a critical diagnostic and research assay. Current methods rarely achieve unambiguous high-resolution typing without making population-specific frequency inferences due to a lack of locus coverage and difficulty in exon-phase matching.
    [Show full text]
  • Align Multiple Sequences to a Reference
    Align Multiple Sequences To A Reference affectionatelyAnacardiaceous or outweary.Lorne detonating Herbiest or Sargent purloin someheliographs stokes pedantically week, however and finercircumspectly, Mohan shmoozes she clout her quiteNureyev consumptively glamorize godlessly. but gulf her Snippier concertino Georg chimerically. still overcapitalise: jannock and last Averil drumming Uniprot web interface as the next suspicious region of taxa in bam file can create read to reference to better regions of aligned and pairs of the accuracy Ggmsa Plot one sequence alignment using ggplot2. How to distribute multiple sequences different genes to a. There have several ways to start Multalign Viewer a tool mark the Sequence category. An uncharacterized gene organization, reference sequences to align multiple alignment? Multiple match sequences aligned against her single reference. In fact Align Contigs tool has is possible to join split add edit contig sequences. If when aligning your sequences to a reference sequence table with Preprocessing in. Mapping The Galaxy Project. MAF Multiple Alignment Format up Bowtie 201 This package provides an interface to the 1 Oct 2014 Given discrete set of sequences a sophisticated sequence. Example of plotting multiple sequence alignment with conventional tree. How they align sequences in mega. Methods and any associated references are altogether in the online version. Sam and where and all reference sequences to align multiple a consistency paradigm to. Is an alignment method which stands for department Sequence is by. Calculate the additional aligned residues at least one may be obtained from the genome data. In the fund if gaps need for be inserted into the reference sequence where they which be.
    [Show full text]
  • Aligner Help by Menu
    CodonCode Aligner User Manual CodonCode Aligner User Manual Table of Contents About CodonCode Aligner.................................................................................................................................1 System Requirements..............................................................................................................................1 Licenses....................................................................................................................................................1 Licenses for CodonCode Aligner.......................................................................................................................3 Demo Mode.............................................................................................................................................3 Time-limited Trials..................................................................................................................................3 Single-user Licenses................................................................................................................................4 Using Phred and Phrap from CodonCode Aligner............................................................................4 Replacement License Keys...............................................................................................................5 License Server Licenses...........................................................................................................................5 Firewall ports
    [Show full text]
  • A Defect in Myoblast Fusion Underlies Carey-Fineman-Ziter Syndrome
    ARTICLE Received 27 Jun 2016 | Accepted 25 May 2017 | Published 6 Jul 2017 DOI: 10.1038/ncomms16077 OPEN A defect in myoblast fusion underlies Carey-Fineman-Ziter syndrome Silvio Alessandro Di Gioia1,2,3, Samantha Connors4,*, Norisada Matsunami5,*, Jessica Cannavino6,*, Matthew F. Rose1,2,7,8,9,10,*, Nicole M. Gilette1,2, Pietro Artoni1,2,3, Nara Lygia de Macena Sobreira11, Wai-Man Chan1,2,3,12, Bryn D. Webb13, Caroline D. Robson14,15, Long Cheng1,2,3, Carol Van Ryzin16, Andres Ramirez-Martinez6, Payam Mohassel17,18, Mark Leppert5, Mary Beth Scholand19, Christopher Grunseich18, Carlos R. Ferreira16, Tyler Hartman20, Ian M. Hayes21, Tim Morgan4, David M. Markie22, Michela Fagiolini1,2,3, Amy Swift16, Peter S. Chines16, Carlos E. Speck-Martins23, Francis S. Collins16,24, Ethylin Wang Jabs11,13, Carsten G. Bo¨nnemann17,18, Eric N. Olson6, Moebius Syndrome Research Consortiumw, John C. Carey25, Stephen P. Robertson4, Irini Manoli16, Elizabeth C. Engle1,2,3,8,10,12,26,27 Multinucleate cellular syncytial formation is a hallmark of skeletal muscle differentiation. Myomaker, encoded by Mymk (Tmem8c), is a well-conserved plasma membrane protein required for myoblast fusion to form multinucleated myotubes in mouse, chick, and zebrafish. Here, we report that autosomal recessive mutations in MYMK (OMIM 615345) cause Carey-Fineman-Ziter syndrome in humans (CFZS; OMIM 254940) by reducing but not eliminating MYMK function. We characterize MYMK-CFZS as a congenital myopathy with marked facial weakness and additional clinical and pathologic features that distinguish it from other congenital neuromuscular syndromes. We show that a heterologous cell fusion assay in vitro and allelic complementation experiments in mymk knockdown and mymkinsT/insT zebrafish in vivo can dif- ferentiate between MYMK wild type, hypomorphic and null alleles.
    [Show full text]
  • And Comparative Genomics of the Galliform MHC Biao Wang1*, Robert Ekblom2, Tanja M Strand1,3, Silvia Portela-Bens1 and Jacob Höglund1
    Wang et al. BMC Genomics 2012, 13:553 http://www.biomedcentral.com/1471-2164/13/553 RESEARCH ARTICLE Open Access Sequencing of the core MHC region of black grouse (Tetrao tetrix) and comparative genomics of the galliform MHC Biao Wang1*, Robert Ekblom2, Tanja M Strand1,3, Silvia Portela-Bens1 and Jacob Höglund1 Abstract Background: The MHC, which is regarded as the most polymorphic region in the genomes of jawed vertebrates, plays a central role in the immune system by encoding various proteins involved in the immune response. The chicken MHC-B genomic region has a highly streamlined gene content compared to mammalian MHCs. Its core region includes genes encoding Class I and Class IIB molecules but is only ~92Kb in length. Sequences of other galliform MHCs show varying degrees of similarity as that of chicken. The black grouse (Tetrao tetrix) is a wild galliform bird species which is an important model in conservation genetics and ecology. We sequenced the black grouse core MHC-B region and combined this with available data from related species (chicken, turkey, gold pheasant and quail) to perform a comparative genomics study of the galliform MHC. This kind of analysis has previously been severely hampered by the lack of genomic information on avian MHC regions, and the galliformes is still the only bird lineage where such a comparison is possible. Results: In this study, we present the complete genomic sequence of the MHC-B locus of black grouse, which is 88,390 bp long and contains 19 genes. It shows the same simplicity as, and almost perfect synteny with, the corresponding genomic region of chicken.
    [Show full text]
  • A Survey of Enabling Technologies in Synthetic Biology Kahl and Endy
    A survey of enabling technologies in synthetic biology Kahl and Endy Kahl and Endy Journal of Biological Engineering 2013, 7:13 http://www.jbioleng.org/content/7/1/13 Kahl and Endy Journal of Biological Engineering 2013, 7:13 http://www.jbioleng.org/content/7/1/13 RESEARCH Open Access A survey of enabling technologies in synthetic biology Linda J Kahl* and Drew Endy Abstract Background: Realizing constructive applications of synthetic biology requires continued development of enabling technologies as well as policies and practices to ensure these technologies remain accessible for research. Broadly defined, enabling technologies for synthetic biology include any reagent or method that, alone or in combination with associated technologies, provides the means to generate any new research tool or application. Because applications of synthetic biology likely will embody multiple patented inventions, it will be important to create structures for managing intellectual property rights that best promote continued innovation. Monitoring the enabling technologies of synthetic biology will facilitate the systematic investigation of property rights coupled to these technologies and help shape policies and practices that impact the use, regulation, patenting, and licensing of these technologies. Results: We conducted a survey among a self-identifying community of practitioners engaged in synthetic biology research to obtain their opinions and experiences with technologies that support the engineering of biological systems. Technologies widely used and considered enabling by survey participants included public and private registries of biological parts, standard methods for physical assembly of DNA constructs, genomic databases, software tools for search, alignment, analysis, and editing of DNA sequences, and commercial services for DNA synthesis and sequencing.
    [Show full text]