Meyer Cornellgrad 0058F 10073

Total Page:16

File Type:pdf, Size:1020Kb

Meyer Cornellgrad 0058F 10073 METHODS FOR FUNCTIONAL INFERENCE IN THE PROTEOME AND INTERACTOME A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Michael Joseph Meyer January 2017 © 2017 Michael Joseph Meyer ALL RIGHTS RESERVED METHODS FOR FUNCTIONAL INFERENCE IN THE PROTEOME AND INTERACTOME Michael Meyer Cornell University 2017 Over the past several decades, biology has become an increasingly data-driven science. Due in large part to new techniques that allow massive collection of biological data, including next-generation sequencing and high-throughput experimental screening, many of the limitations currently facing the field are in the organization and interpretation of these data. In this dissertation, I present several computational methods and resources designed to organize and perform functional inference on these systems-level biological data sources. In Chapters 2 and 3, I describe the construction of a database and web tool to aid in foundational genomics research by providing predictions of interacting protein domains in interactomes and all-by-all conversions of popular variant identification formats. In Chapter 4, I describe the construction of the first whole-interactome protein interaction network in the fission yeast S. pombe, and, through comparisons with other complete networks in human and the budding yeast S. cerevisiae, demonstrate principles of functional evolution. Finally, in Chapters 5 and 6, I propose two new methods for functional genomic inference—an algorithm to predict cancer driver genes and mutations through 3D atomic clustering of somatic mutations and an ensemble machine learning method to predict the 3D interfaces of protein interactions by taking into account the evolutionary relationships and biophysical properties of proteins. Taken together, this suite of computational resources will help researchers interpret biological function on a genomic scale. BIOGRAPHICAL SKETCH Michael was born in 1989 to Joseph and Nicolette Meyer, and grew up in Worthington, Ohio as the eldest of three children. He graduated from Thomas Worthington High School in 2007 and attended the Massachusetts Institute of Technology from 2007 to 2011, where he earned his bachelor’s degree in Biological Engineering. In the summer of 2010 he interned at Ion Torrent, a startup DNA sequencing company founded by Dr. Jonathan Rothberg, where he helped develop the Ion Torrent Personal Genome Machine. He returned in the summer of 2011 after a successful buyout by Life Technologies. In 2011 he married his high school sweetheart, Sarai Meyer (Itagaki), and they moved to Ithaca, New York to begin their Ph.D.s at Cornell University. Michael entered the Tri- Institutional Training Program in Computational Biology and Medicine (CBM) and completed laboratory rotations at Cornell’s Ithaca campus and at the Weill Cornell Medical School in New York City. He ultimately joined the computational systems biology lab of Dr. Haiyuan Yu in the Weill Institute for Cell and Molecular Biology in Ithaca, where he completed the research presented in this dissertation. iii For my grandfathers v i ACKNOWLEDGEMENTS I am incredibly fortunate to have had the support of many people over the course of my life and studies. There are an awful lot of people to thank. But first, I’d like to acknowledge my incredible fortune to have been born in a time and place where scholarly aspirations are relevant and valued, without which none of this would have been possible. To my parents—Mom and Dad, thank you for giving me every chance to succeed, for investing in me, and for showing me how to be a good person. Mom, thank you for your unconditional love, home-cooked meals, and for giving me that problem-solving itch that I’m still trying to scratch. Dad, thank you pushing me, allowing me to learn from my mistakes, accept failure, and think for myself. And thank you both for being there from a distance to help me through these past five years. And to the rest of my family—thanks for putting up with me, for caring about my well-being, and for occasionally mustering the bravery to ask about my research. To all of my teachers throughout the years—Kathy McCulloch, Mark Hill, Peter Scully, and so many others, thank you for inspiring me. To my most recent mentor, Haiyuan, thank you for entrusting me with a great deal of responsibility in the early years of your lab, and for providing advice when I needed it, but still allowing me the freedom to determine my own course of study and explore my passions. To my committee—Chris, David, and Olivier— thank you for you for your genuine interest in my work and your feedback over the years. To several generations of labmates—thank you for being my daily buffer against the slog of graduate school—for the laughs, distractions, and taunts during March Madness. To the amazing undergraduate researchers I’ve worked with—Ryan, Philip, Mark, and Aaron— thank you for your enthusiasm, willingness to learn, and major contributions to much of the v work in this dissertation. To the old guard—Jishnu, Amanda, Tommy, and Robert—thanks for being there from the beginning and for helping to establish a productive and supportive research environment. Jishnu, thanks for showing me the ropes, for the most interesting arguments, and for sharing your encyclopedic knowledge of the literature. Tommy, thanks for your tireless efforts to make FissionNet a success, and for teaching me a thing or two about the wet lab in the process. To the new kids—Juan, Siwei, Antoine, and Charles—good luck. Juan, thank you for arriving at just the right time to remind me why I love what I do, for all the white-boarding, and for helping to create some really cool stuff. Thanks for the late- night code jams and after-work beers, for provocative discussions of philosophy, and for one awkward bro hug. And to Sarai—thanks for, well, everything. For all the little stuff—slipping butter between two halves of a grab-and-go muffin on a Monday morning, counting cats on summer evening walks, and leaving me little notes to find around the house. Thank you for being there, for smiling and laughing, for spontaneity and routine. And thank you for going with me on this first of many adventures—we made it. vi CONTRIBUTIONS I would also like to acknowledge the work of others that has gone into each of the chapters presented in this dissertation. Due to the collaborative nature of biological research, I have had the pleasure to work with many gifted scientists, several of whom are co-authors on the publications presented here. (* indicates co-first author) Chapter 2: Meyer, M.J.*, Das, J.*, Wang, X.*, and Yu, H. (2013). INstruct: a database of high- quality 3D structurally resolved protein interactome networks. Bioinformatics 29, 1577- 1579. Chapter 3: Meyer, M.J.*, Geske, P.*, and Yu, H. (2016). BISQUE: locus- and variant-specific conversion of genomic, transcriptomic and proteomic database identifiers. Bioinformatics 32, 1598-1600. Chapter 4: Vo, T.V.*, Das, J.*, Meyer, M.J.*, Cordero, N.A., Akturk, N., Wei, X., Fair, B.J., Degatano, A.G., Fragoza, R., Liu, L.G., et al. (2016). A Proteome-wide Fission Yeast Interactome Reveals Network Evolution Principles from Yeasts to Human. Cell 164, 310-323. Chapter 5: Meyer, M.J.*, Lapcevic, R.*, Romero, A.E.*, Yoon, M., Das, J., Beltran, J.F., Mort, M., Stenson, P.D., Cooper, D.N., Paccanaro, A., and Yu H. (2016). mutation3D: Cancer Gene Prediction Through Atomic Clustering of Coding Variants in the Structural Proteome. Human Mutation 37, 447-456. Chapter 6: Meyer, M.J.*, Beltran, J.F.*, Fragoza, R., Rumack, A., and Yu, H. (2016). A pan- interactome map of protein interaction interfaces. Under Review. vii TABLE OF CONTENTS Biographical Sketch . iii Dedication . iv Acknowledgements . v Contributions . vii Table of Contents . viii 1. Introduction: data and inference in the genomics era . 1 1.1 A biological data explosion . 1 1.2 New challenges . 2 1.3 Perspectives for the future . 3 1.4 References . 5 2. INstruct: a database of high quality 3D structurally resolved protein interactome networks . 8 2.1 Abstract . 8 2.2 Introduction . 8 2.3 Methods . 10 2.3.1 Identifying high-quality binary interactions . 10 2.3.2 Interaction Interface Inference and Validation . 12 2.4 Usage . 14 2.5 Looking ahead . 17 2.6 References . 20 3. BISQUE: locus- and variant-specific conversion of genomic, transcriptomic, and proteomic database identifiers . 23 3.1 Abstract . 23 3.2 Introduction . 23 3.3 Methods . 24 3.3.1 Database dependencies . 26 3.3.2 Database updates . 28 3.3.3 Identifiers . 29 3.3.4 Conversion architecture . 30 3.3.5 Determining optimal paths in the conversion graph . 30 3.3.6 Locus- and variant-specific conversion . 31 3.3.7 Conversion quality filtering options . 36 3.4 Results . 38 3.4.1 Usage . 38 3.4.2 Database & conversion scope . 39 3.4.3 Comparison with other conversion utilities . 40 3.5 Discussion . 42 3.6 References . 45 viii 4. A Proteome-wide fission yeast interactome reveals network evolution principles from yeasts to human . 47 4.1 Abstract . ..
Recommended publications
  • An Essential Role for Maternal Control of Nodal Signaling
    RESEARCH ARTICLE elife.elifesciences.org An essential role for maternal control of Nodal signaling Pooja Kumari1,2†, Patrick C Gilligan1†, Shimin Lim1,3, Long Duc Tran4, Sylke Winkler5, Robin Philp6‡, Karuna Sampath1,2,3* 1Temasek Life Sciences Laboratory, National University of Singapore, Singapore, Singapore; 2Department of Biological Sciences, National University of Singapore, Singapore, Singapore; 3School of Biological Sciences, Nanyang Technological University, Singapore, Singapore; 4Mechanobiology Institute, National University of Singapore, Singapore, Singapore; 5Department of Cell Biology and Genetics, Max Planck Institute for Molecular Cell Biology and Genetics, Dresden, Germany; 6Bioprocessing Technology Institute, A*STAR, Singapore, Singapore Abstract Growth factor signaling is essential for pattern formation, growth, differentiation, and maintenance of stem cell pluripotency. Nodal-related signaling factors are required for axis formation and germ layer specification from sea urchins to mammals. Maternal transcripts of the zebrafish Nodal factor, Squint (Sqt), are localized to future embryonic dorsal. The mechanisms by which maternal sqt/nodal RNA is localized and regulated have been unclear. Here, we show that maternal control of Nodal signaling via the conserved Y box-binding protein 1 (Ybx1) is essential. We identified Ybx1 via a proteomic screen. Ybx1 recognizes the 3’ untranslated region (UTR) of *For correspondence: karuna@ sqt RNA and prevents premature translation and Sqt/Nodal signaling. Maternal-effect mutations in tll.org.sg zebrafish ybx1 lead to deregulated Nodal signaling, gastrulation failure, and embryonic lethality. Implanted Nodal-coated beads phenocopy ybx1 mutant defects. Thus, Ybx1 prevents ectopic † These authors contributed Nodal activity, revealing a new paradigm in the regulation of Nodal signaling, which is likely to equally to this work be conserved.
    [Show full text]
  • A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus
    Page 1 of 781 Diabetes A Computational Approach for Defining a Signature of β-Cell Golgi Stress in Diabetes Mellitus Robert N. Bone1,6,7, Olufunmilola Oyebamiji2, Sayali Talware2, Sharmila Selvaraj2, Preethi Krishnan3,6, Farooq Syed1,6,7, Huanmei Wu2, Carmella Evans-Molina 1,3,4,5,6,7,8* Departments of 1Pediatrics, 3Medicine, 4Anatomy, Cell Biology & Physiology, 5Biochemistry & Molecular Biology, the 6Center for Diabetes & Metabolic Diseases, and the 7Herman B. Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN 46202; 2Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202; 8Roudebush VA Medical Center, Indianapolis, IN 46202. *Corresponding Author(s): Carmella Evans-Molina, MD, PhD ([email protected]) Indiana University School of Medicine, 635 Barnhill Drive, MS 2031A, Indianapolis, IN 46202, Telephone: (317) 274-4145, Fax (317) 274-4107 Running Title: Golgi Stress Response in Diabetes Word Count: 4358 Number of Figures: 6 Keywords: Golgi apparatus stress, Islets, β cell, Type 1 diabetes, Type 2 diabetes 1 Diabetes Publish Ahead of Print, published online August 20, 2020 Diabetes Page 2 of 781 ABSTRACT The Golgi apparatus (GA) is an important site of insulin processing and granule maturation, but whether GA organelle dysfunction and GA stress are present in the diabetic β-cell has not been tested. We utilized an informatics-based approach to develop a transcriptional signature of β-cell GA stress using existing RNA sequencing and microarray datasets generated using human islets from donors with diabetes and islets where type 1(T1D) and type 2 diabetes (T2D) had been modeled ex vivo. To narrow our results to GA-specific genes, we applied a filter set of 1,030 genes accepted as GA associated.
    [Show full text]
  • The Variant Call Format Specification Vcfv4.3 and Bcfv2.2
    The Variant Call Format Specification VCFv4.3 and BCFv2.2 27 Jul 2021 The master version of this document can be found at https://github.com/samtools/hts-specs. This printing is version 1715701 from that repository, last modified on the date shown above. 1 Contents 1 The VCF specification 4 1.1 An example . .4 1.2 Character encoding, non-printable characters and characters with special meaning . .4 1.3 Data types . .4 1.4 Meta-information lines . .5 1.4.1 File format . .5 1.4.2 Information field format . .5 1.4.3 Filter field format . .5 1.4.4 Individual format field format . .6 1.4.5 Alternative allele field format . .6 1.4.6 Assembly field format . .6 1.4.7 Contig field format . .6 1.4.8 Sample field format . .7 1.4.9 Pedigree field format . .7 1.5 Header line syntax . .7 1.6 Data lines . .7 1.6.1 Fixed fields . .7 1.6.2 Genotype fields . .9 2 Understanding the VCF format and the haplotype representation 11 2.1 VCF tag naming conventions . 12 3 INFO keys used for structural variants 12 4 FORMAT keys used for structural variants 13 5 Representing variation in VCF records 13 5.1 Creating VCF entries for SNPs and small indels . 13 5.1.1 Example 1 . 13 5.1.2 Example 2 . 14 5.1.3 Example 3 . 14 5.2 Decoding VCF entries for SNPs and small indels . 14 5.2.1 SNP VCF record . 14 5.2.2 Insertion VCF record .
    [Show full text]
  • A Semantic Standard for Describing the Location of Nucleotide and Protein Feature Annotation Jerven T
    Bolleman et al. Journal of Biomedical Semantics (2016) 7:39 DOI 10.1186/s13326-016-0067-z RESEARCH Open Access FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation Jerven T. Bolleman1*, Christopher J. Mungall2, Francesco Strozzi3, Joachim Baran4, Michel Dumontier5, Raoul J. P. Bonnal6, Robert Buels7, Robert Hoehndorf8, Takatomo Fujisawa9, Toshiaki Katayama10 and Peter J. A. Cock11 Abstract Background: Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. Description: We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Conclusions: Our ontology allows
    [Show full text]
  • A Flexible Microfluidic System for Single-Cell Transcriptome Profiling
    www.nature.com/scientificreports OPEN A fexible microfuidic system for single‑cell transcriptome profling elucidates phased transcriptional regulators of cell cycle Karen Davey1,7, Daniel Wong2,7, Filip Konopacki2, Eugene Kwa1, Tony Ly3, Heike Fiegler2 & Christopher R. Sibley 1,4,5,6* Single cell transcriptome profling has emerged as a breakthrough technology for the high‑resolution understanding of complex cellular systems. Here we report a fexible, cost‑efective and user‑ friendly droplet‑based microfuidics system, called the Nadia Instrument, that can allow 3′ mRNA capture of ~ 50,000 single cells or individual nuclei in a single run. The precise pressure‑based system demonstrates highly reproducible droplet size, low doublet rates and high mRNA capture efciencies that compare favorably in the feld. Moreover, when combined with the Nadia Innovate, the system can be transformed into an adaptable setup that enables use of diferent bufers and barcoded bead confgurations to facilitate diverse applications. Finally, by 3′ mRNA profling asynchronous human and mouse cells at diferent phases of the cell cycle, we demonstrate the system’s ability to readily distinguish distinct cell populations and infer underlying transcriptional regulatory networks. Notably this provided supportive evidence for multiple transcription factors that had little or no known link to the cell cycle (e.g. DRAP1, ZKSCAN1 and CEBPZ). In summary, the Nadia platform represents a promising and fexible technology for future transcriptomic studies, and other related applications, at cell resolution. Single cell transcriptome profling has recently emerged as a breakthrough technology for understanding how cellular heterogeneity contributes to complex biological systems. Indeed, cultured cells, microorganisms, biopsies, blood and other tissues can be rapidly profled for quantifcation of gene expression at cell resolution.
    [Show full text]
  • Supplementary Materials
    Supplementary Materials COMPARATIVE ANALYSIS OF THE TRANSCRIPTOME, PROTEOME AND miRNA PROFILE OF KUPFFER CELLS AND MONOCYTES Andrey Elchaninov1,3*, Anastasiya Lokhonina1,3, Maria Nikitina2, Polina Vishnyakova1,3, Andrey Makarov1, Irina Arutyunyan1, Anastasiya Poltavets1, Evgeniya Kananykhina2, Sergey Kovalchuk4, Evgeny Karpulevich5,6, Galina Bolshakova2, Gennady Sukhikh1, Timur Fatkhudinov2,3 1 Laboratory of Regenerative Medicine, National Medical Research Center for Obstetrics, Gynecology and Perinatology Named after Academician V.I. Kulakov of Ministry of Healthcare of Russian Federation, Moscow, Russia 2 Laboratory of Growth and Development, Scientific Research Institute of Human Morphology, Moscow, Russia 3 Histology Department, Medical Institute, Peoples' Friendship University of Russia, Moscow, Russia 4 Laboratory of Bioinformatic methods for Combinatorial Chemistry and Biology, Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow, Russia 5 Information Systems Department, Ivannikov Institute for System Programming of the Russian Academy of Sciences, Moscow, Russia 6 Genome Engineering Laboratory, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia Figure S1. Flow cytometry analysis of unsorted blood sample. Representative forward, side scattering and histogram are shown. The proportions of negative cells were determined in relation to the isotype controls. The percentages of positive cells are indicated. The blue curve corresponds to the isotype control. Figure S2. Flow cytometry analysis of unsorted liver stromal cells. Representative forward, side scattering and histogram are shown. The proportions of negative cells were determined in relation to the isotype controls. The percentages of positive cells are indicated. The blue curve corresponds to the isotype control. Figure S3. MiRNAs expression analysis in monocytes and Kupffer cells. Full-length of heatmaps are presented.
    [Show full text]
  • A Combined RAD-Seq and WGS Approach Reveals the Genomic
    www.nature.com/scientificreports OPEN A combined RAD‑Seq and WGS approach reveals the genomic basis of yellow color variation in bumble bee Bombus terrestris Sarthok Rasique Rahman1,2, Jonathan Cnaani3, Lisa N. Kinch4, Nick V. Grishin4 & Heather M. Hines1,5* Bumble bees exhibit exceptional diversity in their segmental body coloration largely as a result of mimicry. In this study we sought to discover genes involved in this variation through studying a lab‑generated mutant in bumble bee Bombus terrestris, in which the typical black coloration of the pleuron, scutellum, and frst metasomal tergite is replaced by yellow, a color variant also found in sister lineages to B. terrestris. Utilizing a combination of RAD‑Seq and whole‑genome re‑sequencing, we localized the color‑generating variant to a single SNP in the protein‑coding sequence of transcription factor cut. This mutation generates an amino acid change that modifes the conformation of a coiled‑coil structure outside DNA‑binding domains. We found that all sequenced Hymenoptera, including sister lineages, possess the non‑mutant allele, indicating diferent mechanisms are involved in the same color transition in nature. Cut is important for multiple facets of development, yet this mutation generated no noticeable external phenotypic efects outside of setal characteristics. Reproductive capacity was reduced, however, as queens were less likely to mate and produce female ofspring, exhibiting behavior similar to that of workers. Our research implicates a novel developmental player in pigmentation, and potentially caste, thus contributing to a better understanding of the evolution of diversity in both of these processes. Understanding the genetic architecture underlying phenotypic diversifcation has been a long-standing goal of evolutionary biology.
    [Show full text]
  • VCF/Plotein: a Web Application to Facilitate the Clinical Interpretation Of
    bioRxiv preprint doi: https://doi.org/10.1101/466490; this version posted November 14, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Manuscript (All Manuscript Text Pages, including Title Page, References and Figure Legends) VCF/Plotein: A web application to facilitate the clinical interpretation of genetic and genomic variants from exome sequencing projects Raul Ossio MD1, Diego Said Anaya-Mancilla BSc1, O. Isaac Garcia-Salinas BSc1, Jair S. Garcia-Sotelo MSc1, Luis A. Aguilar MSc1, David J. Adams PhD2 and Carla Daniela Robles-Espinoza PhD1,2* 1Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Querétaro, Qro., 76230, Mexico 2Experimental Cancer Genetics, Wellcome Sanger Institute, Hinxton Cambridge, CB10 1SA, UK * To whom correspondence should be addressed. Carla Daniela Robles-Espinoza Tel: +52 55 56 23 43 31 ext. 259 Email: [email protected] 1 bioRxiv preprint doi: https://doi.org/10.1101/466490; this version posted November 14, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. CONFLICT OF INTEREST No conflicts of interest. 2 bioRxiv preprint doi: https://doi.org/10.1101/466490; this version posted November 14, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. ABSTRACT Purpose To create a user-friendly web application that allows researchers, medical professionals and patients to easily and securely view, filter and interact with human exome sequencing data in the Variant Call Format (VCF).
    [Show full text]
  • CBIT Annual Report 2012
    Centre for Bio-Inspired Technology Annual Report 2012 Centre for Bio-Inspired Technology Institute of Biomedical Engineering Imperial College London South Kensington London SW7 2AZ T +44 (0)20 7594 0701 F +44 (0)20 7594 0704 [email protected] www.imperial.ac.uk/bioinspiredtechnology Centre for Bio-Inspired Technology Annual Report 2012 3 Director’s foreword 22 Medical science in the 21st century: a young engineer’s view 4 Founding Sponsor’s foreword 24 Clinical assessment of the 5 People bio-inspired artificial pancreas in 7 News subjects with type 1 diabetes 7 Honours and awards 26 Development of a genetic analysis 8 People and events platform to create innovative and 10 News from the interactive databases combining genotypic Centre’s ‘spin-out’ companies and phenotypic information for genetic detection devices 12 Research funding 29 Chip gallery 13 Research themes 33 Who we are 14 Metabolic technology 35 Academic staff profiles 15 Cardiovascular technology 40 Administrative staff profiles 16 Genetic technology 42 Staff research reports 17 Neural interfaces & neuroprosthetics 56 Research Students’ and 19 Research facilities Assistants’ reports 1 2 Centre for Bio-Inspired Technology | Annual Report 2012 Director’s foreword This Research Report for 2012 describes research institute, our collaborators include the research activities of The Centre for engineers, scientists, medical researchers and clinicians. We include here some brief reports of these Bio-Inspired Technology as it completes colleagues and how their current research has its third year of research. It is a privilege developed from their work in the Centre. to be Director of the Centre and to work One of the prime aims of the Institute of Biomedical with my colleagues at Imperial and Engineering, from which the Centre developed, was to around the world on a range of transfer technology from the laboratory into a innovative projects.
    [Show full text]
  • Infravec2 Open Research Data Management Plan
    INFRAVEC2 OPEN RESEARCH DATA MANAGEMENT PLAN Authors: Andrea Crisanti, Gareth Maslen, Andy Yates, Paul Kersey, Alain Kohl, Clelia Supparo, Ken Vernick Date: 10th July 2020 Version: 3.0 Overview Infravec2 will align to Open Research Data, as follows: Data Types and Standards Infravec2 will generate a variety of data types, including molecular data types: genome sequence and assembly, structural annotation (gene models, repeats, other functional regions) and functional annotation (protein function assignment), variation data, and transcriptome data; arbovirus and malaria experimental infection data, linked to archived samples; and microbiome data (Operational Taxonomic Units), including natural virome composition. All data will be released according to the appropriate standards and formats for each data type. For example, DNA sequence will be released in FASTA format; variant calls in Variant Call Format; sequence alignments in BAM (Binary Alignment Map) and CRAM (Compressed Read Alignment Map) formats, etc. We will strongly encourage the organisation of linked data sets as Track Hubs, a mechanism for publishing a set of linked genomic data that aids data discovery, sharing, and selection for subsequent analysis. We will develop internal standards within the consortium to define minimal metadata that will accompany all data sets, following the template of the Minimal Information Standards for Biological and Biomedical Investigations (http://www.dcc.ac.uk/resources/metadata-standards/mibbi-minimum-information-biological- and-biomedical-investigations). Data Exploitation, Accessibility, Curation and Preservation All molecular data for which existing public data repositories exist will be submitted to such repositories on or before the publication of written manuscripts, with early release of data (i.e.
    [Show full text]
  • Next-Generation Sequencing for Cancer Diagnostics: a Practical Perspective
    Review Article Next-Generation Sequencing for Cancer Diagnostics: a Practical Perspective Cliff Meldrum,1,4† Maria A Doyle,2† *Richard W Tothill3 1Molecular Pathology, Hunter Area Pathology Service, Newcastle, NSW 2310; 2Bioinformatics and 3Next-generation DNA Sequencing Facility Research Department and 4Pathology Department, Peter MacCallum Cancer Centre, Melbourne, Vic. 3002, Australia. †These authors contributed equally to this manuscript. *For correspondence: Dr Richard Tothill, [email protected] Abstract Next-generation sequencing (NGS) is arguably one of the most significant technological advances in the biological sciences of the last 30 years. The second generation sequencing platforms have advanced rapidly to the point that several genomes can now be sequenced simultaneously in a single instrument run in under two weeks. Targeted DNA enrichment methods allow even higher genome throughput at a reduced cost per sample. Medical research has embraced the technology and the cancer field is at the forefront of these efforts given the genetic aspects of the disease. World-wide efforts to catalogue mutations in multiple cancer types are underway and this is likely to lead to new discoveries that will be translated to new diagnostic, prognostic and therapeutic targets. NGS is now maturing to the point where it is being considered by many laboratories for routine diagnostic use. The sensitivity, speed and reduced cost per sample make it a highly attractive platform compared to other sequencing modalities. Moreover, as we identify more genetic determinants of cancer there is a greater need to adopt multi-gene assays that can quickly and reliably sequence complete genes from individual patient samples. Whilst widespread and routine use of whole genome sequencing is likely to be a few years away, there are immediate opportunities to implement NGS for clinical use.
    [Show full text]
  • DR1 Antibody A
    Revision 1 C 0 2 - t DR1 Antibody a e r o t S Orders: 877-616-CELL (2355) [email protected] Support: 877-678-TECH (8324) 7 4 Web: [email protected] 4 www.cellsignal.com 6 # 3 Trask Lane Danvers Massachusetts 01923 USA For Research Use Only. Not For Use In Diagnostic Procedures. Applications: Reactivity: Sensitivity: MW (kDa): Source: UniProt ID: Entrez-Gene Id: WB, IP H M R Mk Endogenous 19 Rabbit Q01658 1810 Product Usage Information 7. Yeung, K.C. et al. (1994) Genes Dev 8, 2097-109. 8. Kim, T.K. et al. (1995) J Biol Chem 270, 10976-81. Application Dilution 9. Kamada, K. et al. (2001) Cell 106, 71-81. Western Blotting 1:1000 Immunoprecipitation 1:50 Storage Supplied in 10 mM sodium HEPES (pH 7.5), 150 mM NaCl, 100 µg/ml BSA and 50% glycerol. Store at –20°C. Do not aliquot the antibody. Specificity / Sensitivity DR1 Antibody recognizes endogenous levels of total DR1 protein. Species Reactivity: Human, Mouse, Rat, Monkey Species predicted to react based on 100% sequence homology: D. melanogaster, Zebrafish, Dog, Pig Source / Purification Polyclonal antibodies are produced by immunizing animals with a synthetic peptide corresponding to residues surrounding Gly112 of human DR1 protein. Antibodies are purified by protein A and peptide affinity chromatography. Background Down-regulator of transcription 1 (DR1), also known as negative cofactor 2-β (NC2-β), forms a heterodimer with DR1 associated protein 1 (DRAP1)/NC2-α and acts as a negative regulator of RNA polymerase II and III (RNAPII and III) transcription (1-5).
    [Show full text]