Long-Read Sequencing of Trypanosoma Cruzi

Total Page:16

File Type:pdf, Size:1020Kb

Long-Read Sequencing of Trypanosoma Cruzi RESEARCH ARTICLE Berna et al., Microbial Genomics 2018;4 DOI 10.1099/mgen.0.000177 Expanding an expanded genome: long-read sequencing of Trypanosoma cruzi Luisa Berna, 1 Matias Rodriguez,2 María Laura Chiribao,1,3 Adriana Parodi-Talice,1,4 Sebastian Pita,1,4 Gastón Rijo,1 Fernando Alvarez-Valin2,* and Carlos Robello1,3,* Abstract Although the genome of Trypanosoma cruzi, the causative agent of Chagas disease, was first made available in 2005, with additional strains reported later, the intrinsic genome complexity of this parasite (the abundance of repetitive sequences and genes organized in tandem) has traditionally hindered high-quality genome assembly and annotation. This also limits diverse types of analyses that require high degrees of precision. Long reads generated by third-generation sequencing technologies are particularly suitable to address the challenges associated with T. cruzi’s genome since they permit direct determination of the full sequence of large clusters of repetitive sequences without collapsing them. This, in turn, not only allows accurate estimation of gene copy numbers but also circumvents assembly fragmentation. Here, we present the analysis of the genome sequences of two T. cruzi clones: the hybrid TCC (TcVI) and the non-hybrid Dm28c (TcI), determined by PacBio Single Molecular Real-Time (SMRT) technology. The improved assemblies herein obtained permitted us to accurately estimate gene copy numbers, abundance and distribution of repetitive sequences (including satellites and retroelements). We found that the genome of T. cruzi is composed of a ‘core compartment’ and a ‘disruptive compartment’ which exhibit opposite GC content and gene composition. Novel tandem and dispersed repetitive sequences were identified, including some located inside coding sequences. Additionally, homologous chromosomes were separately assembled, allowing us to retrieve haplotypes as separate contigs instead of a unique mosaic sequence. Finally, manual annotation of surface multigene families, mucins and trans-sialidases allows now a better overview of these complex groups of genes. DATA SUMMARY cruzi genomes; three species considered to be representative of the parasitic diversity of the group [1–3]. The new infor- The genome assemblies and annotation of TCC and Dm28c have been deposited in GenBank with the accession num- mation obtained from these genomes opened a new era that bers: PRJNA432753 and PRJNA433042, respectively. Raw allowed researchers to conduct studies with unprecedented sequences of TCC and Dm28c have been deposited in comprehensiveness. Several new types of analyses, such as GenBank with the accession numbers SRP134374 and large-scale comparative genomics, aimed at understanding SRP134013, respectively. Supporting Data tables are avail- the common evolutionary basis of parasitism and pathogen- able on Figshare data: https://figshare.com/s/38da1c- esis, or searching for new vaccine candidates and drug de5667299b3a1e. We confirm that all supporting data, code targets [4, 5] were made possible. These three genomes, and protocols have been provided within the article or however, were published with very different degrees of fin- through supplementary data files. ishing. The genome of L. major was precisely assembled and annotated. The T. brucei genome was also of very good quality but was mostly focused on megabase chromosomes, INTRODUCTION whereas minichromosomes were not included. Conversely, The year 2005 represents a landmark in the study of trypa- the assembly of the T. cruzi genome (CL Brener strain) was nosomatid biology with the simultaneous publication of the extremely fragmented (4098 contigs) with very few contigs Leishmania major, Trypanosoma brucei and Trypanosoma exceeding 100 kb in size. In fact, only 12 contigs met this Received 19 February 2018; Accepted 3 April 2018 Author affiliations: 1Laboratory of Host Pathogen Interactions-UBM, Institut Pasteur de Montevideo, Montevideo, Uruguay; 2Sección Biomatematica - Unidad de Genómica Evolutiva, Facultad de Ciencias-UDELAR, Montevideo, Uruguay; 3Departamento de Bioquímica, Facultad de Medicina-UDELAR, Montevideo, Uruguay; 4Sección Genetica, Facultad de Ciencias-UDELAR, Montevideo, Uruguay. *Correspondence: Fernando Alvarez-Valin, [email protected]; Carlos Robello, [email protected] Keywords: Trypanosoma cruzi; PacBio; whole genome sequencing; Chagas disease. Abbreviations: HSP, high-scoring pairs; SMRT, Single Molecular Real-Time. Data statement: All supporting data, code and protocols have been provided within the article or through supplementary data files. Five supplementary figures and five supplementary tables are available with the online version of this article. 000177 ã 2018 The Authors Downloaded from www.microbiologyresearch.org by This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited. IP: 164.73.80.128 1 On: Fri, 12 Jul 2019 20:33:37 Berna et al., Microbial Genomics 2018;4 threshold and none was larger than 150 kb [3]. Despite this degree of fragmentation, this draft genome was highly valu- IMPACT STATEMENT able because it led to the identification of several novel spe- We present the assembled and annotated genomes of cies-specific multigene families and gave, for the first time, a two Trypanosoma cruzi clones, the hybrid TCC and the draft overview of the genome architecture of T. cruzi. This non-hybrid Dm28c, obtained using PacBio technology. draft, also made it possible to identify the vast majority of The complications associated with this genome (assem- conserved trypanosomatid genes, as indicated by the fact bly fragmentation and collapse of repetitive sequences) that very few genes conserved between Leishmania and were basically solved. These improved assemblies per- T. brucei were not found in the draft T. cruzi genome [4]. mitted us not only to accurately estimate copy numbers Most likely this can be attributed to the fact that genes are of tandemly arrayed genes and multigene families but short in trypanosomatids (since they lack introns), hence also allowed the unambiguous identification of many sin- even moderately small contigs can contain complete coding gle-copy genes. Reliable information on the latter is sequences (CDS). The genome sequences of additional T. required to conduct functional studies based on gene cruzi strains Dm28c [6], Sylvio X10/1 [7], and T. c. marin- knock-out, a fundamental source of information to deter- kellei [8] have been reported since this initial publication mine which genes are essential and to find new drug tar- using new sequencing technologies, such as Roche 454 alone gets for Chagas disease.We found that the genome of T. or in combinations with other methodologies (Illumina and cruzi is compartmentalized in a isochore-like manner, Sanger). A common feature of all the assemblies reported ‘ ’ ‘ for these strains is again high fragmentation, which was containing a core compartment and a disruptive com- ’ even higher than that originally reported for CL Brener. partment , which exhibit significant differences in GC con- tent and gene composition, the former being GC-poorer To tackle the problem of assembly fragmentation, Weath- and composed of conserved genes and the latter erly et al. [9] used complementary sources of data to scaffold enriched in trans-sialidases (TS), mucins and mucin- T. cruzi (CL Brener strain) contigs, aiming to recover full- associated surface protein (MASP) genes.Additionally, length chromosome sequences. Their approach used a com- many homologous chromosomes were separately bined strategy based on sequencing bacterial artificial chro- assembled (haplotypes), and some homologous recombi- mosome (BAC) ends, as well as their co-location on nation events could be identified. The availability of high- chromosomes and synteny conservation with T. brucei and quality genomes opens new possibilities to conduct anal- Leishmania chromosomes. This strategy enabled these ysis on this parasite that require high degrees of preci- authors not only to obtain an assembly that represented a sion of genomic data (e.g. allelic exclusion, population substantial improvement in comparison to previous ver- genomics). sions of the genome but also to reconstruct chromosomes (with a few gaps), with only a relatively minor portion of the genome remaining unassigned to chromosomes. Nonethe- less, the issue of assembly fragmentation is a limitation that malariae [11]. Furthermore, this technology allows obtain- poses a number of complications for diverse types of analy- ing separated assemblies of homologous chromosomes, such ses that require high precision. that haplotypes can be retrieved as separated contigs/scaf- folds instead of a unique mosaic sequence. This feature is Third-generation sequencing technologies are particularly particularly desirable in genomes with a high level of hetero- suitable for addressing the challenges associated with the zygosity, as in the case of some T. cruzi hybrid strains that peculiarities of the T. cruzi’s genome since they allow have arisen from relatively divergent ancestors, in which the sequencing reads of more than 15 kb in average length, and divergence between haplotypes is higher than 5 % [3]. Need- many much longer can be obtained. This opens up the pos- less to say, a single mosaic assembly, representing a mixture sibility of directly determining the full sequence
Recommended publications
  • A Field Guide to Eukaryotic Transposable Elements
    GE54CH23_Feschotte ARjats.cls September 12, 2020 7:34 Annual Review of Genetics A Field Guide to Eukaryotic Transposable Elements Jonathan N. Wells and Cédric Feschotte Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14850; email: [email protected], [email protected] Annu. Rev. Genet. 2020. 54:23.1–23.23 Keywords The Annual Review of Genetics is online at transposons, retrotransposons, transposition mechanisms, transposable genet.annualreviews.org element origins, genome evolution https://doi.org/10.1146/annurev-genet-040620- 022145 Abstract Annu. Rev. Genet. 2020.54. Downloaded from www.annualreviews.org Access provided by Cornell University on 09/26/20. For personal use only. Copyright © 2020 by Annual Reviews. Transposable elements (TEs) are mobile DNA sequences that propagate All rights reserved within genomes. Through diverse invasion strategies, TEs have come to oc- cupy a substantial fraction of nearly all eukaryotic genomes, and they rep- resent a major source of genetic variation and novelty. Here we review the defining features of each major group of eukaryotic TEs and explore their evolutionary origins and relationships. We discuss how the unique biology of different TEs influences their propagation and distribution within and across genomes. Environmental and genetic factors acting at the level of the host species further modulate the activity, diversification, and fate of TEs, producing the dramatic variation in TE content observed across eukaryotes. We argue that cataloging TE diversity and dissecting the idiosyncratic be- havior of individual elements are crucial to expanding our comprehension of their impact on the biology of genomes and the evolution of species. 23.1 Review in Advance first posted on , September 21, 2020.
    [Show full text]
  • Tandemly Arrayed Genes in Vertebrate Genomes
    Tandemly Arrayed Genes in Vertebrate Genomes Deng Pan1 and Liqing Zhang2∗ 1Department of Computer Science, Virginia Tech, 2050 Torgerson Hall, Blacksburg, VA 24061-0106, USA 2Program in Genetics, Bioinformatics, and Computational Biology ∗To whom correspondence should be addressed; E-mail: [email protected] July 9, 2008 Abstract Tandemly arrayed genes (TAGs) are duplicated genes that are linked as neighbors on a chromosome, many of which have important physio- logical and biochemical functions. Here we performed a survey of these genes in 11 available vertebrate genomes. TAGs account for an average of about 14% of all genes in these vertebrate genomes, and about 25% of all duplications. The majority of TAGs (72-94%) have parallel tran- scription orientation (i.e. are encoded on the same strand), in contrast to the genome, which has about 50% of its genes in parallel transcrip- tion orientation. The majority of tandem arrays have only two members. In all species, the proportion of genes that belong to TAGs tends to be higher in large gene families than in small ones; together with our re- cent finding that tandem duplication played a more important role than retroposition in large families, this fact suggests that among all types of duplication mechanisms, tandem duplication is the predominant mecha- nism of duplication, especially in large families. Species-specific tandem arrays (SSTAs) are identified and results of statistical tests suggest that natural selection played a role in maintaining some of the SSTAs. Our 1 work will provide more insight into the evolutionary history of TAGs in vertebrates. 2 1 Introduction Although the importance of duplicated genes in providing raw materials for genetic innovation has been recognized since the 1930s and is highlighted in Ohno’s book Evolution by gene duplication (Ohno, 1970), it is only recently that the availability of numerous genomic sequences has made it possible to quantitatively estimate how many genes in a genome are generated by gene duplication.
    [Show full text]
  • Evolution of Orthologous Tandemly Arrayed Gene Clusters
    Tremblay Savard et al. BMC Bioinformatics 2011, 12(Suppl 9):S2 http://www.biomedcentral.com/1471-2105/12/S9/S2 PROCEEDINGS Open Access Evolution of orthologous tandemly arrayed gene clusters Olivier Tremblay Savard1*, Denis Bertrand2*, Nadia El-Mabrouk1* From Ninth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Com- parative Genomics Galway, Ireland. 8-10 October 2011 Abstract Background: Tandemly Arrayed Gene (TAG) clusters are groups of paralogous genes that are found adjacent on a chromosome. TAGs represent an important repertoire of genes in eukaryotes. In addition to tandem duplication events, TAG clusters are affected during their evolution by other mechanisms, such as inversion and deletion events, that affect the order and orientation of genes. The DILTAG algorithm developed in [1] makes it possible to infer a set of optimal evolutionary histories explaining the evolution of a single TAG cluster, from an ancestral single gene, through tandem duplications (simple or multiple, direct or inverted), deletions and inversion events. Results: We present a general methodology, which is an extension of DILTAG, for the study of the evolutionary history of a set of orthologous TAG clusters in multiple species. In addition to the speciation events reflected by the phylogenetic tree of the considered species, the evolutionary events that are taken into account are simple or multiple tandem duplications, direct or inverted, simple or multiple deletions, and inversions. We analysed the performance of our algorithm on simulated data sets and we applied it to the protocadherin gene clusters of human, chimpanzee, mouse and rat. Conclusions: Our results obtained on simulated data sets showed a good performance in inferring the total number and size distribution of duplication events.
    [Show full text]
  • Correlated Variation and Population Differentiation in Satellite DNA Abundance Among Lines of Drosophila Melanogaster
    Correlated variation and population differentiation in satellite DNA abundance among lines of Drosophila melanogaster Kevin H.-C. Wei, Jennifer K. Grenier, Daniel A. Barbash, and Andrew G. Clark1 Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853-2703 Contributed by Andrew G. Clark, November 18, 2014 (sent for review October 1, 2014; reviewed by Giovanni Bosco and Keith A. Maggert) Tandemly repeating satellite DNA elements in heterochromatin repeats of simple sequences (≤10 bp); the most abundant include occupy a substantial portion of many eukaryotic genomes. Al- AAGAG (aka GAGA-satellite), AACATAGAAT (aka 2L3L), though often characterized as genomic parasites deleterious to and AATAT (11, 14, 15). In comparison, the genome of its sister the host, they also can be crucial for essential processes such as species Drosophila simulans,fromwhichD. melanogaster di- chromosome segregation. Adding to their interest, satellite DNA verged ∼2.5 mya, is estimated to have only 5% satellite DNA, elements evolve at high rates; among Drosophila, closely related more than 10-fold less AAGAG, and little to no AACATA- species often differ drastically in both the types and abundances GAAT (16). For further contrast, nearly 50% of the Drosophila of satellite repeats. However, due to technical challenges, the evo- virilis and less than 0.5% of Drosophila erecta genomes are sat- lutionary mechanisms driving this rapid turnover remain unclear. ellite DNA (16, 17). Strikingly, such rapid changes in genomes have been implicated in postzygotic isolation of species in the Here we characterize natural variation in simple-sequence repeats – of 2–10 bp from inbred Drosophila melanogaster lines derived form of hybrid incompatibility in several species of flies (18 20), from multiple populations, using a method we developed called demonstrating the critical role satellite DNA has on the evolu- k-Seek that analyzes unassembled Illumina sequence reads.
    [Show full text]
  • Genome Analysis of Protozoan Parasites: Proceedings of a Workshop Held at ILRAD, Nairobi, Kenya, 11–13 November 1992, Ed
    G E N O M E A N A L Y S I S OF PROTOZOAN PARASITES PROCEEDINGS OF A WORKSHOP HELD AT ILRAD NAIROBI, KENYA 11–13 NOVEMBER 1992 Edited by S.P. Morzaria THE INTERNATIONAL LABORATORY FOR RESEARCH ON ANIMAL DISEASES BOX 30709 • NAIROBI • KENYA The International Laboratory for Research on Animal Diseases (ILRAD) was established in 1973 with a global mandate to develop effective control measures for livestock diseases that seriously limit world food production. ILRAD’s research program focuses on animal trypanosomiasis and tick-borne diseases, particularly theileriosis (East Coast fever). ILRAD is one of 18 centres in a worldwide agricultural research network sponsored by the Consultative Group on International Agricultural Research. In 1992 ILRAD received funding from the United Nations Development Programme, the World Bank and the governments of Australia, Belgium, Canada, Denmark, Finland, France, Germany, Italy, Japan, the Netherlands, Norway, Sweden, Switzerland, the United Kingdom and the United States of America. Production Editor: Peter Werehire This publication was typeset on a microcomputer and the final pages produced on a laser printer at ILRAD, P.O. Box 30709, Nairobi, Kenya, telephone 254 (2) 632311, fax 254 (2) 631499, e-mail BT GOLD TYMNET CGI017. Colour separations for cover were done by PrePress Productions, P.O. Box 41921, Nairobi, Kenya. Printed by English Press Ltd., P.O. Box 30127, Nairobi, Kenya. Copyright © 1993 by the International Laboratory for Research on Animal Diseases. ISBN 92-9055-296-4 The correct citation for this book is Genome Analysis of Protozoan Parasites: Proceedings of a Workshop Held at ILRAD, Nairobi, Kenya, 11–13 November 1992, ed.
    [Show full text]
  • First Efficient CRISPR-Cas9-Mediated Genome Editing in Leishmania Parasites
    First efficient CRISPR-Cas9-mediated genome editing in Leishmania parasites. Lauriane Sollelis, Mehdi Ghorbal, Cameron Ross Macpherson, Rafael Miyazawa Martins, Nada Kuk, Lucien Crobu, Patrick Bastien, Artur Scherf, Jose-Juan Lopez-Rubio, Yvon Sterkers To cite this version: Lauriane Sollelis, Mehdi Ghorbal, Cameron Ross Macpherson, Rafael Miyazawa Martins, Nada Kuk, et al.. First efficient CRISPR-Cas9-mediated genome editing in Leishmania parasites.. Cellular Mi- crobiology, Wiley, 2015, 17 (10), pp.1405-1412. 10.1111/cmi.12456. hal-01992713 HAL Id: hal-01992713 https://hal.umontpellier.fr/hal-01992713 Submitted on 2 Mar 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. First efficient CRISPR-Cas9-mediated genome editing in Leishmania parasites Lauriane Sollelis1,2, Mehdi Ghorbal1,2, Cameron Ross MacPherson3, Rafael Miyazawa Martins3, Nada Kuk1, Lucien Crobu2, Patrick Bastien1,2,4, Artur Scherf3, Jose-Juan Lopez- Rubio1,2 and Yvon Sterkers1,2,4# Running title: CRISPR-Cas9 for genome editing in Leishmania 1 University of Montpellier, Faculty of Medicine, Laboratory of Parasitology-Mycology, 2 CNRS 5290 - IRD 224 – University of Montpellier (UMR "MiVEGEC"), 3 Institut Pasteur - INSERM U1201 - CNRS ERL9195, "Biology of Host-Parasite Interactions" Unit, Paris, France, 4 CHRU (Centre Hospitalier Universitaire de Montpellier), Department of Parasitology-Mycology, Montpellier, France #To whom correspondence should be addressed.
    [Show full text]
  • Applying CRISPR/Cas for Genome Engineering in Plants
    Available online at www.sciencedirect.com ScienceDirect Applying CRISPR/Cas for genome engineering in plants: the best is yet to come Holger Puchta Less than 5 years ago the CRISPR/Cas nuclease was first zinc finger nucleases (ZFNs) and transcription activator- introduced into eukaryotes, shortly becoming the most efficient like effector nucleases (TALENs), which allow the tar- and widely used tool for genome engineering. For plants, geting of most genomic sites, have been developed and efforts were centred on obtaining heritable changes in most applied for a long period of time in plants (for reviews see transformable crop species by inducing mutations into open [6–8]). Although with the advent of ZFNs the basic reading frames of interest, via non-homologous end joining. problem of targeting DSBs to specific site in the genome Now it is important to take the next steps and further develop was solved, the system revealed numerous drawbacks. the technology to reach its full potential. For breeding, besides The construction of the enzymes is time consuming and using DNA-free editing and avoiding off target effects, it will be expensive and their specificity is limited, resulting in desirable to apply the system for the mutation of regulatory DSB at genomic site that very similar but nor identical elements and for more complex genome rearrangements. to the target site. Thus, unwanted secondary, ‘off-target’ Targeting enzymatic activities, like transcriptional regulators or mutations were introduced regularly. After their intro- DNA modifying enzymes, will be important for plant biology in duction TALENs outcompeted ZFNs in specificity as the future.
    [Show full text]
  • A Broad Analysis of Tandemly Arrayed Genes in the Genomes of Human, Mouse, and Rat
    A Broad Analysis of Tandemly Arrayed Genes in the Genomes of Human, Mouse, and Rat Valia Shoja Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Application Liqing Zhang, Ph.D. Chair Lenwood S. Heath, Ph.D. Zhijian Tu, Ph.D. November 10, 2006 Blacksburg, Virginia Keywords: Comparative Genomics, Gene Duplication, Tandemly Arrayed Genes, Gene Expression. Copyright 2006, Valia Shoja A Broad Analysis of Tandemly Arrayed Genes in the Genomes of Human, Mouse, and Rat Valia Shoja (ABSTRACT) Tandemly arrayed genes (TAG) play an important functional and physiological role in the genome. Most previous studies have focused on individual TAG families in a few species, yet a broad characterization of TAGs is not available. We identified all the TAGs in the genomes of human, chimp, mouse, and rat and performed a comprehensive analysis of TAG distribution, TAG sizes, TAG gene orientations and intergenic distances, and TAG gene functions. TAGs account for about 14-17% of all the genomic genes and nearly one third of all the duplicated genes in the four genomes, highlighting the predominant role that tandem duplication plays in gene duplication. For all species, TAG distribution is highly heterogeneous along chromosomes and some chromosomes are enriched with TAG forests while others are enriched with TAG deserts. The majority of TAGs are of size two for all genomes, similar to the previous findings in C. elegans, A. thaliana, and O. sativa, suggesting that it is a rather general phenomenon in eukaryotes.
    [Show full text]
  • Efficient Inversions and Duplications of Mammalian
    284 | Journal of Molecular Cell Biology (2015), 7(4), 284–298 doi:10.1093/jmcb/mjv016 Published online March 10, 2015 Article Efficient inversions and duplications of mammalian regulatory DNA elements and gene clusters by CRISPR/Cas9 Jinhuan Li1,2,3,4,†, Jia Shou1,2,3,4,†, YaGuo1,2,3,4, Yuanxiao Tang1,2,3,4, Yonghu Wu1,2,3,4, Zhilian Jia1,2,3,4, Yanan Zhai1,2,3,4, Zhifeng Chen1,2,3,4, Quan Xu1,2,3,4, and Qiang Wu1,2,3,4,* 1 Key Laboratory of Systems Biomedicine (Ministry of Education), Center for Comparative Biomedicine, Institute of Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China 2 State Key Laboratory of Oncogenes and Related Genes, Shanghai Cancer Institute, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200240, China 3 Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Bio-X Center, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China 4 Collaborative Innovation Center of Systems Biomedicine, Shanghai Jiao Tong University School of Medicine, Shanghai 200240, China † These authors contributed equally to this work. * Correspondence to: Qiang Wu, E-mail: [email protected] The human genome contains millions of DNA regulatory elements and a large number of gene clusters, most of which have not been tested experimentally. The clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated nuclease 9 (Cas9) programed with a synthetic single-guide RNA (sgRNA) emerges as a method for genome editing in virtually any organisms. Here we report that targeted DNA fragment inversions and duplications could easily be achieved in human and mouse genomes by CRISPR with two sgRNAs.
    [Show full text]
  • Genome Sequencing and Transcriptome Analysis Reveal Recent Species-Specific Gene Duplications in the Plastic Gilthead Sea Bream (Sparus Aurata)
    fmars-06-00760 December 6, 2019 Time: 15:34 # 1 ORIGINAL RESEARCH published: 20 December 2019 doi: 10.3389/fmars.2019.00760 Genome Sequencing and Transcriptome Analysis Reveal Recent Species-Specific Gene Duplications in the Plastic Gilthead Sea Bream (Sparus aurata) Jaume Pérez-Sánchez1*, Fernando Naya-Català1, Beatriz Soriano2, M. Carla Piazzon3, Ahmed Hafez2, Toni Gabaldón4, Carlos Llorens2, Ariadna Sitjà-Bobadilla3 and Josep A. Calduch-Giner1 1 Nutrigenomics and Fish Growth Endocrinology Group, Institute of Aquaculture Torre de la Sal (IATS), Consejo Superior de Investigaciones Científicas, Castellón de la Plana, Spain, 2 Biotechvana, Parc Cientific, Universitat de València, Valencia, Spain, 3 Fish Pathology Group, Institute of Aquaculture Torre de la Sal (IATS), Consejo Superior de Investigaciones Científicas, Ribera de Cabanes, Castellón de la Plana, Spain, 4 Bioinformatics and Genomics Unit, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain Gilthead sea bream is an economically important fish species that is remarkably well- Edited by: Jianhua Fan, adapted to farming and changing environments. Understanding the genomic basis East China University of Science of this plasticity will serve to orientate domestication and selective breeding toward and Technology, China more robust and efficient fish. To address this goal, a draft genome assembly was Reviewed by: reconstructed combining short- and long-read high-throughput sequencing with genetic Dongmei Wang, Ocean University of China, China linkage maps. The assembled unmasked genome spans 1.24 Gb of an expected Laura Ghigliotti, 1.59 Gb genome size with 932 scaffolds (∼732 Mb) anchored to 24 chromosomes Consiglio Nazionale delle Ricerche, Italy that are available as a karyotype browser at www.nutrigroup-iats.org/seabreamdb.
    [Show full text]
  • Genome-Wide Computational Prediction of Tandem Gene Arrays: Application in Yeasts
    Genome-wide computational prediction of tandem gene arrays: application in yeasts. Laurence Despons, Philippe Baret, Lionel Frangeul, Véronique Louis, Pascal Durrens, Jean-Luc Souciet To cite this version: Laurence Despons, Philippe Baret, Lionel Frangeul, Véronique Louis, Pascal Durrens, et al.. Genome- wide computational prediction of tandem gene arrays: application in yeasts.. BMC Genomics, BioMed Central, 2010, 11 (1), pp.56. 10.1186/1471-2164-11-56. pasteur-00670632 HAL Id: pasteur-00670632 https://hal-pasteur.archives-ouvertes.fr/pasteur-00670632 Submitted on 15 Feb 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Despons et al. BMC Genomics 2010, 11:56 http://www.biomedcentral.com/1471-2164/11/56 METHODOLOGY ARTICLE Open Access Genome-wide computational prediction of tandem gene arrays: application in yeasts Laurence Despons1*, Philippe V Baret2, Lionel Frangeul3, Véronique Leh Louis1, Pascal Durrens4, Jean-Luc Souciet1 Abstract Background: This paper describes an efficient in silico method for detecting tandem gene arrays (TGAs) in fully sequenced and compact genomes such as those of prokaryotes or unicellular eukaryotes. The originality of this method lies in the search of protein sequence similarities in the vicinity of each coding sequence, which allows the prediction of tandem duplicated gene copies independently of their functionality.
    [Show full text]
  • Applying Computational Solutions for Solving Problems in Mammalian
    University of Connecticut OpenCommons@UConn Doctoral Dissertations University of Connecticut Graduate School 12-29-2017 Applying Computational Solutions for solving problems in Mammalian Gene Family Evolution and Single Cell Gene Expression Analysis Ajay Obla University of Connecticut - Storrs, [email protected] Follow this and additional works at: https://opencommons.uconn.edu/dissertations Recommended Citation Obla, Ajay, "Applying Computational Solutions for solving problems in Mammalian Gene Family Evolution and Single Cell Gene Expression Analysis" (2017). Doctoral Dissertations. 1693. https://opencommons.uconn.edu/dissertations/1693 Applying Computational Solutions for solving problems in Mammalian Gene Family Evolution and Single Cell Gene Expression Analysis Ajay Babu Obla Kumaresh University of Connecticut, 2018 Using computational tools to solve various biological problems has become common practice over the last decade. This has been primarily fueled by exponentially growing high throughput biological data and relevant computational biology resources to support it [1]. The work presented in this thesis showcases application of computational techniques to answer critical biological questions pertaining to gene family evolution and single cell gene expression analysis. Dr. Susumu Ohno was a pioneer to propose the significance of gene duplication in driving gene family evolution. Gene duplication has been shown to expand the repertoire of several gene families involved in multitude of important biological functions. Thus, understanding the forces that lead to retention and loss of gene duplicates have been subject to close scrutiny over the past decade. One of the works presented in this thesis provides a strong argument for a novel gene duplication model that seeks to explain retention of previously unexplained gene duplicates.
    [Show full text]