A Massively Parallel Barcoded Sequencing Pipeline Enables Generation of the First Orfeome and Interactome Map for Rice

Total Page:16

File Type:pdf, Size:1020Kb

A Massively Parallel Barcoded Sequencing Pipeline Enables Generation of the First Orfeome and Interactome Map for Rice A massively parallel barcoded sequencing pipeline enables generation of the first ORFeome and interactome map for rice Shayne D. Wierbowskia,b,1, Tommy V. Vob,1,2, Pascal Falter-Braunc,d, Timothy O. Jobee, Lars H. Krusef, Xiaomu Weia, Jin Liangb, Michael J. Meyera,b, Nurten Akturkb, Christen A. Rivera-Erickb, Nicolas A. Corderob, Mauricio I. Paramob,g, Elnur E. Shayhidinb, Marta Bertolottib, Nathaniel D. Tippensa,b, Kazi Aktherh, Rita Sharmai, Yuichi Katayosej, Kourosh Salehi-Ashtianik,l,m,n, Tong Haol,m, Pamela C. Ronaldo,p,q, Joseph R. Eckerr,s, Peter A. Schweitzert, Shoshi Kikuchiu, Hiroshi Mizunov, David E. Hilll,m, Marc Vidall,m, Gaurav D. Moghef, Susan R. McCouchh,3, and Haiyuan Yua,b,3 aDepartment of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853; bWeill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853; cInstitute of Network Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Munich, Germany; dMicrobe-Host Interactions, Faculty of Biology, Ludwig-Maximilians-Universität München, 80539 Munich, Germany; eBotanical Institute, Cluster of Excellence on Plant Sciences (CEPLAS), University of Cologne, 50674 Cologne, Germany; fPlant Biology Section, School of Integrative Plant Sciences, Cornell University, Ithaca, NY 14853; gDepartment of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853; hSection of Plant Breeding and Genetics, School of Integrated Plant Sciences, Cornell University, Ithaca, NY 14853-1901; iSchool of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India; jAdvanced Genomics Breeding Section, Institute of Crop Science, National Agriculture and Food Research Organization, Tsukuba, Ibaraki 305-8634, Japan; kLaboratory of Algal, Systems, and Synthetic Biology, Division of Science and Math, New York University Abu Dhabi, 129188, Abu Dhabi, United Arab Emirates; lCenter for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA 02215; mDepartment of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115; nCenter for Genomics and Systems Biology, New York University Abu Dhabi, 129188, Abu Dhabi, United Arab Emirates; oDepartment of Plant Pathology, University of California, Davis, CA 95616; pThe Genome Center, University of California, Davis, CA 95616; qInnovative Genomics Institute, Berkeley, CA 94704; rHoward Hughes Medical Institute, The Salk Institute for Biological Studies, La Jolla, CA 92037; sPlant Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037; tBRC Genomics Facility, Institute of Biotechnology, Cornell University, Ithaca, NY 14853; uPlant Genome Research Unit, Division of Genome and Biodiversity Research, Agrogenomics Research Center, National Institute of Agrobiological Sciences, Tsukuba 305-0856, Japan; and vDepartment of Molecular Biology, National Institute of Agrobiological Resources, Tsukuba 305-0856, Japan PLANT BIOLOGY Contributed by Susan R. McCouch, March 11, 2020 (sent for review October 21, 2019; reviewed by Jing-Dong Jackie Han and Yinong Yang) Systematic mappings of protein interactome networks have pro- regarding plate layout, and cannot detect potential cross- vided invaluable functional information for numerous model contamination between wells (SI Appendix,Fig.S1C and D). organisms. Here we develop PCR-mediated Linkage of barcoded Here, we develop a massively parallel sequencing approach Adapters To nucleic acid Elements for sequencing (PLATE-seq) that called PCR-mediated Linkage of barcoded Adapters To nucleic serves as a general tool to rapidly sequence thousands of DNA elements. We validate its utility by generating the ORFeome for Oryza sativa covering 2,300 genes and constructing a high-quality Significance protein–protein interactome map consisting of 322 interactions between 289 proteins, expanding the known interactions in rice Although recent advances in next-generation sequencing have by roughly 50%. Our work paves the way for high-throughput facilitated the construction of whole genomes from hundreds profiling of protein–protein interactions in a wide range of of organisms, considerable barriers still restrict the functional organisms. understanding of the genes that they contain. An initial pre- requisite for such an understanding is the availability of high- protein–protein interaction | rice | ORFeome | next-generation quality gene libraries (ORFeomes) amenable to high-throughput sequencing | networks functional experiments. Here we develop a massively parallel next-generation sequencing method, PLATE-seq, and leverage the method to construct an ORFeome for rice, providing a toolkit he genomics revolution has democratized sequencing and for systematic functional study in an agricultural species. To structural annotation of genomes; however, assigning func- T demonstrate the utility of these resources, we present a map of tions to predicted genes remains an important unsolved challenge. rice protein–protein interactions. Identification of protein–protein interactions can help advance functional annotation in sequenced genomes. The first step in Author contributions: T.V.V., X.W., J.L., M.J.M., P.C.R., J.R.E., P.A.S., S.K., H.M., D.E.H., systematic, genome-wide mapping of protein–protein interactions M.V., S.R.M., and H.Y. designed research; S.D.W., T.V.V., P.F.-B., T.O.J., L.H.K., X.W., J.L., involves the construction of a comprehensive set of high-quality M.J.M., N.A., C.A.R.-E., N.A.C., M.I.P., E.E.S., M.B., K.A., R.S., Y.K., K.S.-A., S.R.M., and H.Y. performed research; T.V.V., P.F.-B., N.D.T., R.S., Y.K., K.S.-A., T.H., P.C.R., J.R.E., S.K., H.M., open reading frames (ORFs). To generate such ORFeomes, tens D.E.H., and M.V. contributed new reagents/analytic tools; S.D.W., T.V.V., L.H.K., M.J.M., K. of thousands of clones must be sequenced to ensure that only S-A., T.H., G.D.M., S.R.M., and H.Y. analyzed data; and S.D.W., T.V.V., L.H.K., G.D.M., correct, full-length clones are retained. Traditionally, this is S.R.M., and H.Y. wrote the paper. achieved through the labor-intensive and cost-prohibitive process Reviewers: J.-D.J.H., Peking University; and Y.Y., Pennsylvania State University. of Sanger sequencing of each individual clone. Simple barcoding The authors declare no competing interest. strategies that append a barcode at the beginning or end of each Published under the PNAS license. 1 ORF are problematic because they require thousands of unique S.D.W. and T.V.V. contributed equally to this work. barcodes, provide extremely limited ORF coverage, and generate 2Present address: Laboratory of Biochemistry and Molecular Biology, National Cancer SI Institute, National Institutes of Health, Bethesda, MD 20892. a high fraction of uninformative reads that contain no barcode ( 3 Appendix A To whom correspondence may be addressed. Email: [email protected] or haiyuan.yu@ ,Fig.S1 ). A deep-well-pooling approach has recently cornell.edu. been used to sequence ORFeome libraries (1, 2); however, these This article contains supporting information online at https://www.pnas.org/lookup/suppl/ smart pooling approaches cannot accommodate the inclusion of doi:10.1073/pnas.1918068117/-/DCSupplemental. homologous ORFs in one pool, rely on concrete prior knowledge www.pnas.org/cgi/doi/10.1073/pnas.1918068117 PNAS Latest Articles | 1of7 Downloaded by guest on September 25, 2021 acid Elements for sequencing (PLATE-seq)—a broadly utiliz- maximized to span both the plate-specific barcode and the ORF able approach for rapid sequencing of thousands of DNA ele- sequence. Notably, the final low-cycle PCR enrichment ensures that ments. We validate the utility of PLATE-seq by developing an all paired reads contain both barcodes and can be informatively ORFeome for rice and constructing a high-quality, experimen- mapped back to their exact source well. Moreover, because Tn5 tally validated protein–protein interactome map of this impor- tagmentation acts at a random position within the ORF, the frag- tant monocot species. Despite being a staple food for over half of ments generated can theoretically span the entire ORF. However, the world’s population, and an important model for monocot in practice, cluster formation and amplification using current genomics, empirical annotations currently cover fewer than 5% Illumina sequencers become inefficient for prohibitively large of genes in the rice genome (3). Indeed, understanding the fragments. Therefore, we optimized our methods to provide function of plant genes has become a major bottleneck for the coverage of roughly the last 800 to 1,000 bp of each ORF (SI field of plant biology as a whole. Appendix,Fig.S1B). The majority of plant functional studies have been carried out In order to benchmark the accuracy of PLATE-seq, we in the dicot model organism Arabidopsis thaliana (4, 5), and, to implemented the method on a test plate of 94 human ORF clones date, only limited characterization has been performed in mono- selected from the sequence-verified human ORFeome 8.1 library cot species. Combating this disparity in annotation, the develop- (2), demonstrating that PLATE-seq correctly identified the true ment of a comprehensive, high-quality ORFeome for Oryza sativa ORF in 100% of the test cases (Fig. 1C). Determination of clone would enable large-scale reverse-proteomics studies—including identity for each well was made by aligning the reads to the entire the systematic mapping of protein–protein interactions—and thus 8.1 reference library to calculate the fraction of reads in each well would significantly expand the functional genomics toolkit for contributed by each ORF. Importantly, the vast majority of plants. PLATE-seq reads aligned to the true ORF with only a minor A full-length complementary DNA (FLcDNA) clone library fraction aligning to an incorrect ORF introduced through exper- has previously been reported in O. sativa (6); however, such li- imental contamination or alignment ambiguities (Fig. 1D). More- braries are not suitable for high-throughput studies.
Recommended publications
  • Analysis of the Impact of Sequencing Errors on Blast Using Fault Injection
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Illinois Digital Environment for Access to Learning and Scholarship Repository ANALYSIS OF THE IMPACT OF SEQUENCING ERRORS ON BLAST USING FAULT INJECTION BY SO YOUN LEE THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical and Computer Engineering in the Graduate College of the University of Illinois at Urbana-Champaign, 2013 Urbana, Illinois Adviser: Professor Ravishankar K. Iyer ABSTRACT This thesis investigates the impact of sequencing errors in post-sequence computational analyses, including local alignment search and multiple sequence alignment. While the error rates of sequencing technology are commonly reported, the significance of these numbers cannot be fully grasped without putting them in the perspective of their impact on the downstream analyses that are used for biological research, forensics, diagnosis of diseases, etc. I approached the quantification of the impact using fault injection. Faults were injected in the input sequence data, and the analyses were run. Change in the output of the analyses was interpreted as the impact of faults, or errors. Three commonly used algorithms were used: BLAST, SSEARCH, and ProbCons. The main contributions of this work are the application of fault injection to the reliability analysis in bioinformatics and the quantitative demonstration that a small error rate in the sequence data can alter the output of the analysis in a significant way. BLAST and SSEARCH are both local alignment search tools, but BLAST is a heuristic implementation, while SSEARCH is based on the optimal Smith-Waterman algorithm.
    [Show full text]
  • BIRC7 Sirna (Human)
    For research purposes only, not for human use Product Data Sheet BIRC7 siRNA (Human) Catalog # Source Reactivity Applications CRJ2387 Synthetic H RNAi Description siRNA to inhibit BIRC7 expression using RNA interference Specificity BIRC7 siRNA (Human) is a target-specific 19-23 nt siRNA oligo duplexes designed to knock down gene expression. Form Lyophilized powder Gene Symbol BIRC7 Alternative Names KIAP; LIVIN; MLIAP; RNF50; Baculoviral IAP repeat-containing protein 7; Kidney inhibitor of apoptosis protein; KIAP; Livin; Melanoma inhibitor of apoptosis protein; ML-IAP; RING finger protein 50 Entrez Gene 79444 (Human) SwissProt Q96CA5 (Human) Purity > 97% Quality Control Oligonucleotide synthesis is monitored base by base through trityl analysis to ensure appropriate coupling efficiency. The oligo is subsequently purified by affinity-solid phase extraction. The annealed RNA duplex is further analyzed by mass spectrometry to verify the exact composition of the duplex. Each lot is compared to the previous lot by mass spectrometry to ensure maximum lot-to-lot consistency. Components We offers pre-designed sets of 3 different target-specific siRNA oligo duplexes of human BIRC7 gene. Each vial contains 5 nmol of lyophilized siRNA. The duplexes can be transfected individually or pooled together to achieve knockdown of the target gene, which is most commonly assessed by qPCR or western blot. Our siRNA oligos are also chemically modified (2’-OMe) at no extra charge for increased stability and Application key: E- ELISA, WB- Western blot, IH- Immunohistochemistry,
    [Show full text]
  • Advancing Solutions to the Carbohydrate Sequencing Challenge † † † ‡ § Christopher J
    Perspective Cite This: J. Am. Chem. Soc. 2019, 141, 14463−14479 pubs.acs.org/JACS Advancing Solutions to the Carbohydrate Sequencing Challenge † † † ‡ § Christopher J. Gray, Lukasz G. Migas, Perdita E. Barran, Kevin Pagel, Peter H. Seeberger, ∥ ⊥ # ⊗ ∇ Claire E. Eyers, Geert-Jan Boons, Nicola L. B. Pohl, Isabelle Compagnon, , × † Göran Widmalm, and Sabine L. Flitsch*, † School of Chemistry & Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester M1 7DN, U.K. ‡ Institute for Chemistry and Biochemistry, Freie Universitaẗ Berlin, Takustraße 3, 14195 Berlin, Germany § Biomolecular Systems Department, Max Planck Institute for Colloids and Interfaces, Am Muehlenberg 1, 14476 Potsdam, Germany ∥ Department of Biochemistry, Institute of Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, U.K. ⊥ Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia 30602, United States # Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States ⊗ Institut Lumierè Matiere,̀ UMR5306 UniversitéLyon 1-CNRS, Universitéde Lyon, 69622 Villeurbanne Cedex, France ∇ Institut Universitaire de France IUF, 103 Blvd St Michel, 75005 Paris, France × Department of Organic Chemistry, Arrhenius Laboratory, Stockholm University, S-106 91 Stockholm, Sweden *S Supporting Information established connection between their structure and their ABSTRACT: Carbohydrates possess a variety of distinct function, full characterization of unknown carbohydrates features with
    [Show full text]
  • Genome Wide Association Study of Response to Interval and Continuous Exercise Training: the Predict‑HIIT Study Camilla J
    Williams et al. J Biomed Sci (2021) 28:37 https://doi.org/10.1186/s12929-021-00733-7 RESEARCH Open Access Genome wide association study of response to interval and continuous exercise training: the Predict-HIIT study Camilla J. Williams1†, Zhixiu Li2†, Nicholas Harvey3,4†, Rodney A. Lea4, Brendon J. Gurd5, Jacob T. Bonafglia5, Ioannis Papadimitriou6, Macsue Jacques6, Ilaria Croci1,7,20, Dorthe Stensvold7, Ulrik Wislof1,7, Jenna L. Taylor1, Trishan Gajanand1, Emily R. Cox1, Joyce S. Ramos1,8, Robert G. Fassett1, Jonathan P. Little9, Monique E. Francois9, Christopher M. Hearon Jr10, Satyam Sarma10, Sylvan L. J. E. Janssen10,11, Emeline M. Van Craenenbroeck12, Paul Beckers12, Véronique A. Cornelissen13, Erin J. Howden14, Shelley E. Keating1, Xu Yan6,15, David J. Bishop6,16, Anja Bye7,17, Larisa M. Haupt4, Lyn R. Grifths4, Kevin J. Ashton3, Matthew A. Brown18, Luciana Torquati19, Nir Eynon6 and Jef S. Coombes1* Abstract Background: Low cardiorespiratory ftness (V̇O2peak) is highly associated with chronic disease and mortality from all causes. Whilst exercise training is recommended in health guidelines to improve V̇O2peak, there is considerable inter-individual variability in the V̇O2peak response to the same dose of exercise. Understanding how genetic factors contribute to V̇O2peak training response may improve personalisation of exercise programs. The aim of this study was to identify genetic variants that are associated with the magnitude of V̇O2peak response following exercise training. Methods: Participant change in objectively measured V̇O2peak from 18 diferent interventions was obtained from a multi-centre study (Predict-HIIT). A genome-wide association study was completed (n 507), and a polygenic predictor score (PPS) was developed using alleles from single nucleotide polymorphisms= (SNPs) signifcantly associ- –5 ated (P < 1 ­10 ) with the magnitude of V̇O2peak response.
    [Show full text]
  • Chapter 14: Functional Genomics Learning Objectives
    Chapter 14: Functional Genomics Learning objectives Upon reading this chapter, you should be able to: ■ define functional genomics; ■ describe the key features of eight model organisms; ■ explain techniques of forward and reverse genetics; ■ discuss the relation between the central dogma and functional genomics; and ■ describe proteomics-based approaches to functional genomics. Outline : Functional genomics Introduction Relation between genotype and phenotype Eight model organisms E. coli; yeast; Arabidopsis; C. elegans; Drosophila; zebrafish; mouse; human Functional genomics using reverse and forward genetics Reverse genetics: mouse knockouts; yeast; gene trapping; insertional mutatgenesis; gene silencing Forward genetics: chemical mutagenesis Functional genomics and the central dogma Approaches to function; Functional genomics and DNA; …and RNA; …and protein Proteomic approaches to functional genomics CASP; protein-protein interactions; protein networks Perspective Albert Blakeslee (1874–1954) studied the effect of altered chromosome numbers on the phenotype of the jimson-weed Datura stramonium, a flowering plant. Introduction: Functional genomics Functional genomics is the genome-wide study of the function of DNA (including both genes and non-genic regions), as well as RNA and proteins encoded by DNA. The term “functional genomics” may apply to • the genome, transcriptome, or proteome • the use of high-throughput screens • the perturbation of gene function • the complex relationship of genotype and phenotype Functional genomics approaches to high throughput analyses Relationship between genotype and phenotype The genotype of an individual consists of the DNA that comprises the organism. The phenotype is the outward manifestation in terms of properties such as size, shape, movement, and physiology. We can consider the phenotype of a cell (e.g., a precursor cell may develop into a brain cell or liver cell) or the phenotype of an organism (e.g., a person may have a disease phenotype such as sickle‐cell anemia).
    [Show full text]
  • XIAP's Profile in Human Cancer
    biomolecules Review XIAP’s Profile in Human Cancer Huailu Tu and Max Costa * Department of Environmental Medicine, Grossman School of Medicine, New York University, New York, NY 10010, USA; [email protected] * Correspondence: [email protected] Received: 16 September 2020; Accepted: 25 October 2020; Published: 29 October 2020 Abstract: XIAP, the X-linked inhibitor of apoptosis protein, regulates cell death signaling pathways through binding and inhibiting caspases. Mounting experimental research associated with XIAP has shown it to be a master regulator of cell death not only in apoptosis, but also in autophagy and necroptosis. As a vital decider on cell survival, XIAP is involved in the regulation of cancer initiation, promotion and progression. XIAP up-regulation occurs in many human diseases, resulting in a series of undesired effects such as raising the cellular tolerance to genetic lesions, inflammation and cytotoxicity. Hence, anti-tumor drugs targeting XIAP have become an important focus for cancer therapy research. RNA–XIAP interaction is a focus, which has enriched the general profile of XIAP regulation in human cancer. In this review, the basic functions of XIAP, its regulatory role in cancer, anti-XIAP drugs and recent findings about RNA–XIAP interactions are discussed. Keywords: XIAP; apoptosis; cancer; therapeutics; non-coding RNA 1. Introduction X-linked inhibitor of apoptosis protein (XIAP), also known as inhibitor of apoptosis protein 3 (IAP3), baculoviral IAP repeat-containing protein 4 (BIRC4), and human IAPs like protein (hILP), belongs to IAP family which was discovered in insect baculovirus [1]. Eight different IAPs have been isolated from human tissues: NAIP (BIRC1), BIRC2 (cIAP1), BIRC3 (cIAP2), XIAP (BIRC4), BIRC5 (survivin), BIRC6 (apollon), BIRC7 (livin) and BIRC8 [2].
    [Show full text]
  • Masterpath: Network Analysis of Functional Genomics Screening Data
    MasterPATH: network analysis of functional genomics screening data Natalia Rubanova, Guillaume Pinna, Jeremie Kropp, Anna Campalans, Juan Pablo Radicella, Anna Polesskaya, Annick Harel-Bellan, Nadya Morozova To cite this version: Natalia Rubanova, Guillaume Pinna, Jeremie Kropp, Anna Campalans, Juan Pablo Radicella, et al.. MasterPATH: network analysis of functional genomics screening data. BMC Genomics, BioMed Central, 2020, 21 (1), pp.632. 10.1186/s12864-020-07047-2. hal-02944249 HAL Id: hal-02944249 https://hal.archives-ouvertes.fr/hal-02944249 Submitted on 26 Nov 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Rubanova et al. BMC Genomics (2020) 21:632 https://doi.org/10.1186/s12864-020-07047-2 METHODOLOGY ARTICLE Open Access MasterPATH: network analysis of functional genomics screening data Natalia Rubanova1,2,3* , Guillaume Pinna4, Jeremie Kropp1, Anna Campalans5,6,7, Juan Pablo Radicella5,6,7, Anna Polesskaya8, Annick Harel-Bellan1 and Nadya Morozova1,4 Abstract Background: Functional genomics employs several experimental approaches to investigate gene functions. High- throughput techniques, such as loss-of-function screening and transcriptome profiling, allow to identify lists of genes potentially involved in biological processes of interest (so called hit list).
    [Show full text]
  • "Phylogenetic Analysis of Protein Sequence Data Using The
    Phylogenetic Analysis of Protein Sequence UNIT 19.11 Data Using the Randomized Axelerated Maximum Likelihood (RAXML) Program Antonis Rokas1 1Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee ABSTRACT Phylogenetic analysis is the study of evolutionary relationships among molecules, phenotypes, and organisms. In the context of protein sequence data, phylogenetic analysis is one of the cornerstones of comparative sequence analysis and has many applications in the study of protein evolution and function. This unit provides a brief review of the principles of phylogenetic analysis and describes several different standard phylogenetic analyses of protein sequence data using the RAXML (Randomized Axelerated Maximum Likelihood) Program. Curr. Protoc. Mol. Biol. 96:19.11.1-19.11.14. C 2011 by John Wiley & Sons, Inc. Keywords: molecular evolution r bootstrap r multiple sequence alignment r amino acid substitution matrix r evolutionary relationship r systematics INTRODUCTION the baboon-colobus monkey lineage almost Phylogenetic analysis is a standard and es- 25 million years ago, whereas baboons and sential tool in any molecular biologist’s bioin- colobus monkeys diverged less than 15 mil- formatics toolkit that, in the context of pro- lion years ago (Sterner et al., 2006). Clearly, tein sequence analysis, enables us to study degree of sequence similarity does not equate the evolutionary history and change of pro- with degree of evolutionary relationship. teins and their function. Such analysis is es- A typical phylogenetic analysis of protein sential to understanding major evolutionary sequence data involves five distinct steps: (a) questions, such as the origins and history of data collection, (b) inference of homology, (c) macromolecules, developmental mechanisms, sequence alignment, (d) alignment trimming, phenotypes, and life itself.
    [Show full text]
  • EMBL-EBI Powerpoint Presentation
    Processing data from high-throughput sequencing experiments Simon Anders Use-cases for HTS · de-novo sequencing and assembly of small genomes · transcriptome analysis (RNA-Seq, sRNA-Seq, ...) • identifying transcripted regions • expression profiling · Resequencing to find genetic polymorphisms: • SNPs, micro-indels • CNVs · ChIP-Seq, nucleosome positions, etc. · DNA methylation studies (after bisulfite treatment) · environmental sampling (metagenomics) · reading bar codes Use cases for HTS: Bioinformatics challenges Established procedures may not be suitable. New algorithms are required for · assembly · alignment · statistical tests (counting statistics) · visualization · segmentation · ... Where does Bioconductor come in? Several steps: · Processing of the images and determining of the read sequencest • typically done by core facility with software from the manufacturer of the sequencing machine · Aligning the reads to a reference genome (or assembling the reads into a new genome) • Done with community-developed stand-alone tools. · Downstream statistical analyis. • Write your own scripts with the help of Bioconductor infrastructure. Solexa standard workflow SolexaPipeline · "Firecrest": Identifying clusters ⇨ typically 15..20 mio good clusters per lane · "Bustard": Base calling ⇨ sequence for each cluster, with Phred-like scores · "Eland": Aligning to reference Firecrest output Large tab-separated text files with one row per identified cluster, specifying · lane index and tile index · x and y coordinates of cluster on tile · for each
    [Show full text]
  • A SARS-Cov-2 Sequence Submission Tool for the European Nucleotide
    Databases and ontologies Downloaded from https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btab421/6294398 by guest on 25 June 2021 A SARS-CoV-2 sequence submission tool for the European Nucleotide Archive Miguel Roncoroni 1,2,∗, Bert Droesbeke 1,2, Ignacio Eguinoa 1,2, Kim De Ruyck 1,2, Flora D’Anna 1,2, Dilmurat Yusuf 3, Björn Grüning 3, Rolf Backofen 3 and Frederik Coppens 1,2 1Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium, 1VIB Center for Plant Systems Biology, 9052 Ghent, Belgium and 2University of Freiburg, Department of Computer Science, Freiburg im Breisgau, Baden-Württemberg, Germany ∗To whom correspondence should be addressed. Associate Editor: XXXXXXX Received on XXXXX; revised on XXXXX; accepted on XXXXX Abstract Summary: Many aspects of the global response to the COVID-19 pandemic are enabled by the fast and open publication of SARS-CoV-2 genetic sequence data. The European Nucleotide Archive (ENA) is the European recommended open repository for genetic sequences. In this work, we present a tool for submitting raw sequencing reads of SARS-CoV-2 to ENA. The tool features a single-step submission process, a graphical user interface, tabular-formatted metadata and the possibility to remove human reads prior to submission. A Galaxy wrap of the tool allows users with little or no bioinformatic knowledge to do bulk sequencing read submissions. The tool is also packed in a Docker container to ease deployment. Availability: CLI ENA upload tool is available at github.com/usegalaxy- eu/ena-upload-cli (DOI 10.5281/zenodo.4537621); Galaxy ENA upload tool at toolshed.g2.bx.psu.edu/view/iuc/ena_upload/382518f24d6d and https://github.com/galaxyproject/tools- iuc/tree/master/tools/ena_upload (development) and; ENA upload Galaxy container at github.com/ELIXIR- Belgium/ena-upload-container (DOI 10.5281/zenodo.4730785) Contact: [email protected] 1 Introduction Nucleotide Archive (ENA).
    [Show full text]
  • The Economic Impact and Functional Applications of Human Genetics and Genomics
    The Economic Impact and Functional Applications of Human Genetics and Genomics Commissioned by the American Society of Human Genetics Produced by TEConomy Partners, LLC. Report Authors: Simon Tripp and Martin Grueber May 2021 TEConomy Partners, LLC (TEConomy) endeavors at all times to produce work of the highest quality, consistent with our contract commitments. However, because of the research and/or experimental nature of this work, the client undertakes the sole responsibility for the consequence of any use or misuse of, or inability to use, any information or result obtained from TEConomy, and TEConomy, its partners, or employees have no legal liability for the accuracy, adequacy, or efficacy thereof. Acknowledgements ASHG and the project authors wish to thank the following organizations for their generous support of this study. Invitae Corporation, San Francisco, CA Regeneron Pharmaceuticals, Inc., Tarrytown, NY The project authors express their sincere appreciation to the following indi- viduals who provided their advice and input to this project. ASHG Government and Public Advocacy Committee Lynn B. Jorde, PhD ASHG Government and Public Advocacy Committee (GPAC) Chair, President (2011) Professor and Chair of Human Genetics George and Dolores Eccles Institute of Human Genetics University of Utah School of Medicine Katrina Goddard, PhD ASHG GPAC Incoming Chair, Board of Directors (2018-2020) Distinguished Investigator, Associate Director, Science Programs Kaiser Permanente Northwest Melinda Aldrich, PhD, MPH Associate Professor, Department of Medicine, Division of Genetic Medicine Vanderbilt University Medical Center Wendy Chung, MD, PhD Professor of Pediatrics in Medicine and Director, Clinical Cancer Genetics Columbia University Mira Irons, MD Chief Health and Science Officer American Medical Association Peng Jin, PhD Professor and Chair, Department of Human Genetics Emory University Allison McCague, PhD Science Policy Analyst, Policy and Program Analysis Branch National Human Genome Research Institute Rebecca Meyer-Schuman, MS Human Genetics Ph.D.
    [Show full text]
  • Systems Biology: Tracking the Protein–Metabolite Interactome
    RESEARCH HIGHLIGHTS SYSTEMS BIOLOGY Tracking the protein–metabolite interactome A method combining limited proteolysis with mass spectrometry systematically detects protein–metabolite interactions. The genome, the transcriptome and the proteome all receive a lot of attention, but there are also multitudes of small molecules in a cell, collectively making up the chemically diverse metabolome. Metabolites wear many hats, serving as enzyme substrates and products, cofac- tors, allosteric regulators, and as mediators of protein-complex assembly. But despite their essential roles in biological processes, their interactions with proteins—which are often tran- sient and low-affinity—remain largely a mystery. “Understanding how these interactions occur on a global scale is essential to understand mechanisms of cellular adaptation and ecosystems’ dynamics,” says Paola Picotti of ETH Zurich in Switzerland. She and her colleagues recently designed a method to systematically discover which proteins bind to a metabolite of interest. The approach may also be useful for drug discovery, as a way to identify druggable sites in proteins and to test for off-target effects. To identify protein–metabolite interactions in Escherichia coli, Picotti’s team treated a whole- cell lysate with a metabolite of interest then added a low amount of the broad-specificity pro- tease proteinase K for a short period of time, all under native conditions. This ‘limited prote- olysis’ approach generates structure-specific protein fragments—metabolite binding can block proteinase K cleavage at locations which would otherwise be severed. Switching to denaturing conditions, the researchers then used the enzyme trypsin to completely digest the metabolite- treated sample and a reference untreated sample, generating peptides for label-free quantitative mass spectrometry analysis, and compared the resulting differential peptide spectral patterns.
    [Show full text]