Development and Application of Liquid Chromatography and Mass Spectrometry Platforms to Cancer Samples

Total Page:16

File Type:pdf, Size:1020Kb

Development and Application of Liquid Chromatography and Mass Spectrometry Platforms to Cancer Samples Advances in Glycoproteomics and Glyco-Biomarker Discovery Studies: Development and Application of Liquid Chromatography and Mass Spectrometry Platforms to Cancer Samples by Francisca Owusu Gbormittah B.S. in Chemistry, Kwame Nkrumah University of Science and Technology M.S. in Chemistry, Indiana University of Pennsylvania A dissertation submitted to The Faculty of the college of Science of Northeastern University in partial fulfillment of the requirements for the degree of Doctor of Philosophy August 1, 2014 Dissertation directed by William S. Hancock Professor of Chemistry and Chemical Biology UMI Number: 3633335 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. UMI 3633335 Published by ProQuest LLC (2014). Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106 - 1346 DEDICATION To my family ii ABSTRACT The development of analytical technologies to investigate the glycoproteome of clinical relevant samples has improved over the last 10 years. These new developments aim to improve the identification and quantification of disease-specific glyco-biomarkers, which are present at low amounts in biological matrices. Glyco-biomakers have the potential to significantly contribute to cancer discovery studies in specific areas such as; early diagnosis, prognosis, monitor cancer recurrence and improve the low survival rate of cancer. In this thesis, we focused on the development and application of novel liquid affinity chromatography fractionation platforms integrated with nano-LC-MS/MS to characterize and quantify the glycoproteome as well as selected glyco-biomarker candidates of cancer samples. In chapter 1, brief background information covering glycoproteomics and glyco- biomarker discovery studies is presented. Specifically, protein glycosylation process and how the field of ‘omics’, which includes glycoproteomics, have revolutionized clinical glyco-biomarkers discovery are discussed. Further, various disease models, current sample fractionation strategies and analytical methodologies involved in glyco-biomarker development pipeline and their significance as well as their short falls are described. Reviewing biomarker validation and current bio-infomatics tools utilized in glycoproteomics discovery studies concludes chapter 1. Chapter 2 details the development of a novel multi-dimensional affinity liquid chromatography fractionation approach that combines the depletion of the top 12 abundant proteins and multi-lectin fractionation of the human plasma. Evaluating and validating the reproducibility, specificity and overall recoveries of the platform demonstrated the suitability of the developed method in glyco-biomarker discovery studies of clinical samples. After establishing this robust platform, it was applied in chapter 3 to comprehensively study the global glycoproteome profile of clear cell renal cell carcinoma plasma (ccRCC) samples to identify and iii characterize potential biomarkers for early detection of the disease. During this study, protein abundance alterations as well as glycan shifts were investigated to understand the sub-proteome of ccRCC. Chapter 4 focuses on the structural characterization of a glycoprotein (clusterin) that was identified during the ccRCC biomarker discovery studies. Clusterin has been implicated in ccRCC cancer progression however; its structure and biological function(s) are not yet well defined. Therefore, to have more structural insights into clusterin, the protein was immuno- affinity purified from ccRCC plasma followed by tandem mass spectrometry to profile glycoforms, N-glycosylation sites and quantify glycan amounts. We discovered that the levels of bi-antennary digalactosylated disialylated (A2G2S2) and core fucosylated bi-antennary digalactosylated disialylated (FA2G2S2) glycans differed significantly in the plasma of patients before and after curative nephrectomy of localized ccRCC. In chapter 5, a multi-lectin affinity chromatography platform previously developed in our laboratory was optimized and applied to investigate glycoproteins and non-glycoproteins present in pancreatic cyst fluid samples. This study was aimed at identifying potential candidate markers for early detection of malignant cyst (pancreatic cancer precursor). Our data showed the identification of proteins with significant differential expression in mucinous cysts (malignant cyst) compared to non-mucinous cysts (benign) of which one protein (periostin) associated with cancer progression was confirmed by immunoblotting assay. In the final chapter (chapter 6), we summarize and conclude our findings in this work and provide our perspective on the potential of glycoproteins in glyco-biomarker discovery studies. iv TABLE OF CONTENTS Page # DEDICATION ii ABSTRACT iii TABLE OF CONTENTS vi APPENDICES xii LIST OF FIGURES xiii LIST OF TABLES xv CHAPTER 1 1 INTRODUCTION: OVERVIEW OF GLYCOPROTEOMICS AND GLYCO-BIOMARKER DISCOVERY STUDIES 1.1 Protein glycosylation and glycoproteomics 2 1.2 Current status of glyco-biomarkers: advantages and limitations 3 1.3 Biological matrices 4 1.3.1 Blood plasma/serum 4 1.3.2 Tumor tissue 5 1.3.3 Proximal fluids 5 1.3.4 Tumor cell lines 6 1.4 Glycoproteomics sample fractionation strategies 6 1.4.1 Lectin affinity platforms and applications 7 1.5 Glycoproteomics characterization using Mass Spectrometry (MS) approaches 9 1.5.1 Proteolytic enzymes selection 10 v 1.5.2 Glycoproteins and glycopeptides enrichment strategies 10 1.5.3 MS spectrometry platform 11 1.6 Quantitative technologies in glycoproteomics 15 1.6.1 Stable isotope quantitation 15 1.6.2 Targeted-based quantitation 16 1.6.3 Label free quantitation 16 1.7 Glyco-biomarker validation strategies 17 1.8 Data Processing and Statistical Analysis of Glycoproteomics 18 1.9 References 20 CHAPTER 2 30 DEVELOPMENT OF AN IMPROVED FRACTIONATION OF THE HUMAN PLASMA PROTEOME BY A COMBINATION OF ABUNDANT PROTEINS DEPLETION AND MULTI-LECTIN AFFINITY CHROMATOGRAPHY 2.1 Abstract 31 2.2 Introduction 32 2.3 Materials and Methods 35 2.3.1 Materials 35 2.3.2 Samples for study 36 2.3.3 Experimental design 36 2.3.4 Preparation of 12P, M-LAC and reverse phase HPLC columns 37 2.3.5 High abundance protein depletion and multi-lectin affinity Fractionation 38 2.3.6 Protein concentration measurements and 1D-SDS PAGE analysis 39 2.3.7 In-solution protein trypsin digestion 40 2.3.8 Nano-LC-MS/MS analysis and peptide sequencing 41 vi 2.4 Results and discussion 43 2.4.1 12P immuno-affinity depletion 43 2.4.2 Specificity of 12P depletion column 46 2.4.3 12P-M-LAC fractionation platform 46 2.4.4 Recovery studies of 12P-M-LAC platform 47 2.4.5 Reproducibility studies of protein identification from the 12P-M-LAC platform fractions 47 2.4.6 Enrichment of low level glycoproteins by 12P-M-LAC platform 51 2.5 Conclusion 53 2.6 References 54 CHAPTER 3 57 COMPARATIVE STUDIES OF THE PROTEOME, GLYCOPROTEOME AND N-GLYCOME OF CLEAR CELL RENAL CELL CARCINOMA PLASMA BEFORE AND AFTER CURATIVE NEPHRECTOMY 3.1 Abstract 58 3.2 Introduction 59 53.3 Materials and Methods 61 3.3.1 Materials 61 3.3.2 Sample population 62 3.3.3 High Abundance Proteins Depletion and Glycoprotein Affinity Fractionation 63 3.3.4 N- Glycan Release and LC-ESI-MS Analysis 64 3.3.5 Gel nano-LC-MS/MS Proteomic and Glycoproteomic Analysis 65 3.3.6 Data processing and statistical analysis 66 3.4 Results and discussion 68 vii 3.4.1 The analytical strategy 68 3.4.2 The 12P-M-LAC analytical platform 70 3.4.3 Overview of proteomics and glycoproteomics data 71 3.4.4 Quantification and selection of differentially expressed proteins present in 12P depleted ccRCC plasma proteome 72 3.4.5 Identification and selection of proteins of interest showing differential M-LAC column binding 76 3.4.6 Characterization of N-glycan moieties released from depleted M-LAC fractions by porous graphitized carbon (PGC) LC-ESI-IT MS/MS 79 3.4.7 N-glycan structures alteration analysis 80 3.4.8 Validation of differentially expressed N-glycans by extracted ion Chromatograms 84 3.5 Conclusion 86 3.6 References 87 CHAPTER 4 92 TANDEM MASS SPECTROMETRY CHARACTERIZATION OF CLUSTERIN GLYCOPEPTIDE VARIANTS IN THE PLASMA OF CLEAR CELL RENAL CELL CARCINOMA 4.1 Abstract 93 4.2 Introduction 94 4.3 Materials and Methods 96 4.3.1 Materials 96 4.3.2 Clear cell renal cell carcinoma (ccRCC) plasma sample collection and preparation 97 viii 4.3.3 Clusterin immuno-affinity HPLC purification 97 4.3.4 Lectin blot assay of purified Clusterin 98 4.3.5 One dimensional SDS PAGE and enzymatic digestion 98 4.3.6 C18 reversed phase nano-LC-MS/MS Analysis 99 4.3.7 Data and statistical Analysis 100 4.4 Results and discussion 101 4.4.1 Development of the Analytical Approach 101 4.4.2 Glycan occupancy analysis 106 4.4.3 Characterization of site-specific oligosaccharide heterogeneity 109 4.4.4 Glycan structures for the selected glycopeptide residue 372-385, N-374 112 4.4.5 Quantitation of targeted glycoforms in clinical samples 115 4.4.6 Lectin blot assay 119 4.5 Conclusion 120 4.6 References 122 CHAPTER 5 128 CHARACTERIZATION OF GLYCOPROTEINS IN PANCREATIC CYST FLUID USING A HIGH PERFORMANCE MULTIPLE
Recommended publications
  • Structural Forms of the Human Amylase Locus and Their Relationships to Snps, Haplotypes, and Obesity
    Structural Forms of the Human Amylase Locus and Their Relationships to SNPs, Haplotypes, and Obesity The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Usher, Christina Leigh. 2015. Structural Forms of the Human Amylase Locus and Their Relationships to SNPs, Haplotypes, and Obesity. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences. Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:17467224 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Structural forms of the human amylase locus and their relationships to SNPs, haplotypes, and obesity A dissertation presented by Christina Leigh Usher to The Division of Medical Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the subject of Genetics and Genomics Harvard University Cambridge, Massachusetts March 2015 © 2015 Christina Leigh Usher All rights reserved. Dissertation Advisor: Professor Steven McCarroll Christina Leigh Usher Structural forms of the human amylase locus and their relationships to SNPs, haplotypes, and obesity Abstract Hundreds of human genes reside in structurally complex loci that elude molecular analysis and assessment in genome-wide association studies (GWAS). One such locus contains the three different amylase genes (AMY2B, AMY2A, and AMY1) responsible for digesting starch into sugar. The copy number of AMY1 is reported to be the genome’s largest influence on obesity, yet has gone undetected in GWAS.
    [Show full text]
  • Chromosome 1 (Human Genome/Inkae) A
    Proc. Nati. Acad. Sci. USA Vol. 89, pp. 4598-4602, May 1992 Medical Sciences Integration of gene maps: Chromosome 1 (human genome/inkae) A. COLLINS*, B. J. KEATSt, N. DRACOPOLIt, D. C. SHIELDS*, AND N. E. MORTON* *CRC Research Group in Genetic Epidemiology, Department of Child Health, University of Southampton, Southampton, S09 4XY, United Kingdom; tDepartment of Biometry and Genetics, Louisiana State University Center, 1901 Perdido Street, New Orleans, LA 70112; and tCenter for Cancer Research, Massachusetts Institute of Technology, 40 Ames Street, Cambridge, MA 02139 Contributed by N. E. Morton, February 10, 1992 ABSTRACT A composite map of 177 locI has been con- standard lod tables extracted from the literature. Multiple structed in two steps. The first combined pairwise logarithm- pairwise analysis of these data was performed by the MAP90 of-odds scores on 127 loci Into a comprehensive genetic map. computer program (6), which can estimate an errorfrequency Then this map was projected onto the physical map through e (7) and a mapping parameter p such that map distance w is cytogenetic assignments, and the small amount ofphysical data a function of 0, e and p (8). It also includes a bootstrap to was interpolated for an additional 50 loci each of which had optimize order and a stepwise elimination of weakly sup- been assigned to an interval of less than 10 megabases. The ported loci to identify a conservative set of reliably ordered resulting composite map is on the physical scale with a reso- (framework) markers. The genetic map was combined with lution of 1.5 megabases.
    [Show full text]
  • Environmental Influences on Endothelial Gene Expression
    ENDOTHELIAL CELL GENE EXPRESSION John Matthew Jeff Herbert Supervisors: Prof. Roy Bicknell and Dr. Victoria Heath PhD thesis University of Birmingham August 2012 University of Birmingham Research Archive e-theses repository This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder. ABSTRACT Tumour angiogenesis is a vital process in the pathology of tumour development and metastasis. Targeting markers of tumour endothelium provide a means of targeted destruction of a tumours oxygen and nutrient supply via destruction of tumour vasculature, which in turn ultimately leads to beneficial consequences to patients. Although current anti -angiogenic and vascular targeting strategies help patients, more potently in combination with chemo therapy, there is still a need for more tumour endothelial marker discoveries as current treatments have cardiovascular and other side effects. For the first time, the analyses of in-vivo biotinylation of an embryonic system is performed to obtain putative vascular targets. Also for the first time, deep sequencing is applied to freshly isolated tumour and normal endothelial cells from lung, colon and bladder tissues for the identification of pan-vascular-targets. Integration of the proteomic, deep sequencing, public cDNA libraries and microarrays, delivers 5,892 putative vascular targets to the science community.
    [Show full text]
  • Characterization of Genomic Copy Number Variation in Mus Musculus Associated with the Germline of Inbred and Wild Mouse Populations, Normal Development, and Cancer
    Western University Scholarship@Western Electronic Thesis and Dissertation Repository 4-18-2019 2:00 PM Characterization of genomic copy number variation in Mus musculus associated with the germline of inbred and wild mouse populations, normal development, and cancer Maja Milojevic The University of Western Ontario Supervisor Hill, Kathleen A. The University of Western Ontario Graduate Program in Biology A thesis submitted in partial fulfillment of the equirr ements for the degree in Doctor of Philosophy © Maja Milojevic 2019 Follow this and additional works at: https://ir.lib.uwo.ca/etd Part of the Genetics and Genomics Commons Recommended Citation Milojevic, Maja, "Characterization of genomic copy number variation in Mus musculus associated with the germline of inbred and wild mouse populations, normal development, and cancer" (2019). Electronic Thesis and Dissertation Repository. 6146. https://ir.lib.uwo.ca/etd/6146 This Dissertation/Thesis is brought to you for free and open access by Scholarship@Western. It has been accepted for inclusion in Electronic Thesis and Dissertation Repository by an authorized administrator of Scholarship@Western. For more information, please contact [email protected]. Abstract Mus musculus is a human commensal species and an important model of human development and disease with a need for approaches to determine the contribution of copy number variants (CNVs) to genetic variation in laboratory and wild mice, and arising with normal mouse development and disease. Here, the Mouse Diversity Genotyping array (MDGA)-approach to CNV detection is developed to characterize CNV differences between laboratory and wild mice, between multiple normal tissues of the same mouse, and between primary mammary gland tumours and metastatic lung tissue.
    [Show full text]
  • A Reduction in Drp1-Mediated Fission Compromises
    HMG Advance Access published June 26, 2016 Human Molecular Genetics, 2016, Vol. 0, No. 0 1–13 doi: 10.1093/hmg/ddw173 Advance Access Publication Date: 10 June 2016 Original Article ORIGINAL ARTICLE A reduction in Drp1-mediated fission compromises mitochondrial health in autosomal recessive spastic Downloaded from ataxia of Charlevoix Saguenay Teisha Y. Bradshaw1, Lisa E.L. Romano1, Emma J. Duncan1, 1 3 2 3 Suran Nethisinghe , Rosella Abeti , Gregory J. Michael , Paola Giunti , http://hmg.oxfordjournals.org/ Sascha Vermeer4 and J. Paul Chapple1,* 1William Harvey Research Institute, Barts and the London School of Medicine, Queen Mary University of London, London EC1M 6BQ, United Kingdom, 2Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, United Kingdom, 3Department of Molecular Neuroscience, UCL Institute of Neurology, London WC1N 3BG, United Kingdom and 4Department of Clinical Genetics, The Netherlands Cancer Institute, 1066 CX Amsterdam, The Netherlands at University College London on December 9, 2016 *To whom correspondence should be addressed at: Tel: þ44 2078826242; Fax: þ44 207 882 6197; Email [email protected] Abstract The neurodegenerative disease autosomal recessive spastic ataxia of Charlevoix Saguenay (ARSACS) is caused by loss of function of sacsin, a modular protein that is required for normal mitochondrial network organization. To further understand cellular consequences of loss of sacsin, we performed microarray analyses in sacsin knockdown cells and ARSACS patient fibroblasts. This identified altered transcript levels for oxidative phosphorylation and oxidative stress genes. These changes in mitochondrial gene networks were validated by quantitative reverse transcription PCR.
    [Show full text]
  • Role of Amylase in Ovarian Cancer Mai Mohamed University of South Florida, [email protected]
    University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School July 2017 Role of Amylase in Ovarian Cancer Mai Mohamed University of South Florida, [email protected] Follow this and additional works at: http://scholarcommons.usf.edu/etd Part of the Pathology Commons Scholar Commons Citation Mohamed, Mai, "Role of Amylase in Ovarian Cancer" (2017). Graduate Theses and Dissertations. http://scholarcommons.usf.edu/etd/6907 This Dissertation is brought to you for free and open access by the Graduate School at Scholar Commons. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Scholar Commons. For more information, please contact [email protected]. Role of Amylase in Ovarian Cancer by Mai Mohamed A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Pathology and Cell Biology Morsani College of Medicine University of South Florida Major Professor: Patricia Kruk, Ph.D. Paula C. Bickford, Ph.D. Meera Nanjundan, Ph.D. Marzenna Wiranowska, Ph.D. Lauri Wright, Ph.D. Date of Approval: June 29, 2017 Keywords: ovarian cancer, amylase, computational analyses, glycocalyx, cellular invasion Copyright © 2017, Mai Mohamed Dedication This dissertation is dedicated to my parents, Ahmed and Fatma, who have always stressed the importance of education, and, throughout my education, have been my strongest source of encouragement and support. They always believed in me and I am eternally grateful to them. I would also like to thank my brothers, Mohamed and Hussien, and my sister, Mariam. I would also like to thank my husband, Ahmed.
    [Show full text]
  • Product Description P441-A2 SACS-V01
    MRC-Holland ® Product Description version A2-01; Issued 23 April 2020 MLPA Product Description SALSA® MLPA® Probemix P441-A2 SACS To be used with the MLPA General Protocol. Version A2. Compared to version A1, four reference probes have been replaced. For complete product history see page 5. Catalogue numbers: • P441-025R: SALSA MLPA Probemix P441 SACS, 25 reactions. • P441-050R: SALSA MLPA Probemix P441 SACS, 50 reactions. • P441-100R: SALSA MLPA Probemix P441 SACS, 100 reactions. To be used in combination with a SALSA MLPA reagent kit and Coffalyser.Net data analysis software. MLPA reagent kits are either provided with FAM or Cy5.0 dye-labelled PCR primer, suitable for Applied Biosystems and Beckman/SCIEX capillary sequencers, respectively (see www.mlpa.com). Certificate of Analysis: Information regarding storage conditions, quality tests, and a sample electropherogram from the current sales lot is available at www.mlpa.com. Precautions and warnings: For professional use only. Always consult the most recent product description AND the MLPA General Protocol before use: www.mlpa.com. It is the responsibility of the user to be aware of the latest scientific knowledge of the application before drawing any conclusions from findings generated with this product. General information: The SALSA MLPA Probemix P441 SACS is a research use only (RUO) assay for the detection of deletions or duplications in the SACS gene, which is associated with autosomal recessive spastic ataxia of Charlevoix-Saguenay (ARSACS). ARSACS is a neurodegenerative disorder, characterised by early-onset progressive cerebellar ataxia with spasticity and peripheral neuropathy. The classic form of ARSACS is often displayed in early childhood, leading to delayed walking in young toddlers, while individuals with disease onset in teenage or early-adult years are also being described more recently.
    [Show full text]
  • 1 Mutational Heterogeneity in Cancer Akash Kumar a Dissertation
    Mutational Heterogeneity in Cancer Akash Kumar A dissertation Submitted in partial fulfillment of requirements for the degree of Doctor of Philosophy University of Washington 2014 June 5 Reading Committee: Jay Shendure Pete Nelson Mary Claire King Program Authorized to Offer Degree: Genome Sciences 1 University of Washington ABSTRACT Mutational Heterogeneity in Cancer Akash Kumar Chair of the Supervisory Committee: Associate Professor Jay Shendure Department of Genome Sciences Somatic mutation plays a key role in the formation and progression of cancer. Differences in mutation patterns likely explain much of the heterogeneity seen in prognosis and treatment response among patients. Recent advances in massively parallel sequencing have greatly expanded our capability to investigate somatic mutation. Genomic profiling of tumor biopsies could guide the administration of targeted therapeutics on the basis of the tumor’s collection of mutations. Central to the success of this approach is the general applicability of targeted therapies to a patient’s entire tumor burden. This requires a better understanding of the genomic heterogeneity present both within individual tumors (intratumoral) and amongst tumors from the same patient (intrapatient). My dissertation is broadly organized around investigating mutational heterogeneity in cancer. Three projects are discussed in detail: analysis of (1) interpatient and (2) intrapatient heterogeneity in men with disseminated prostate cancer, and (3) investigation of regional intratumoral heterogeneity in
    [Show full text]
  • Population Genetics of the Coral Acropora Millepora: Towards a Genomic Predictor of Bleaching
    bioRxiv preprint doi: https://doi.org/10.1101/867754; this version posted December 6, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Population genetics of the coral Acropora millepora: Towards a genomic predictor of bleaching Zachary L. Fuller∗1, Veronique J.L. Mocellin2, Luke Morris2, Neal Cantin2, Jihanne Shepherd1, Luke Sarre1, Julie Peng3, Yi Liao4,5, Joseph Pickrell6, Peter Andolfatto1, Mikhail Matzy4, Line K. Bayy2, and Molly Przeworskiy1,2,3 1Department of Biological Sciences, Columbia University 2Australia Institute of Marine Science 3Lewis-Sigler Institute for Integrative Genomics, Princeton University 4Department of Integrative Biology, University of Texas at Austin 5Department of Ecology and Evolutionary Biology, University of California, Irvine 6Gencove, Inc. New York 7Department of Systems Biology, Columbia University 8Program for Mathematical Genomics, Columbia University 6 December 2019 Abstract Although reef-building corals are rapidly declining worldwide, responses to bleaching vary both within and among species. Because these inter-individual differences are partly her- itable, they should in principle be predictable from genomic data. Towards that goal, we generated a chromosome-scale genome assembly for the coral Acropora millepora. We then obtained whole genome sequences for 237 phenotyped samples collected at 12 reefs distributed along the Great Barrier Reef, among which we inferred very little population structure. Scanning the genome for evidence of local adaptation, we detected signatures of long-term balancing selection in the heat-shock co-chaperone sacsin. We further used 213 of the samples to conduct a genome-wide association study of visual bleaching score, in- corporating the polygenic score derived from it into a predictive model for bleaching in the wild.
    [Show full text]
  • Improved Detection of Gene Fusions by Applying Statistical Methods Reveals New Oncogenic RNA Cancer Drivers
    bioRxiv preprint doi: https://doi.org/10.1101/659078; this version posted June 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Improved detection of gene fusions by applying statistical methods reveals new oncogenic RNA cancer drivers Roozbeh Dehghannasiri1, Donald Eric Freeman1,2, Milos Jordanski3, Gillian L. Hsieh1, Ana Damljanovic4, Erik Lehnert4, Julia Salzman1,2,5* Author affiliation 1Department of Biochemistry, Stanford University, Stanford, CA 94305 2Department of Biomedical Data Science, Stanford University, Stanford, CA 94305 3Department of Computer Science, University of Belgrade, Belgrade, Serbia 4Seven Bridges Genomics, Cambridge, MA 02142 5Stanford Cancer Institute, Stanford, CA 94305 *Corresponding author [email protected] Short Abstract: The extent to which gene fusions function as drivers of cancer remains a critical open question. Current algorithms do not sufficiently identify false-positive fusions arising during library preparation, sequencing, and alignment. Here, we introduce a new algorithm, DEEPEST, that uses statistical modeling to minimize false-positives while increasing the sensitivity of fusion detection. In 9,946 tumor RNA-sequencing datasets from The Cancer Genome Atlas (TCGA) across 33 tumor types, DEEPEST identifies 31,007 fusions, 30% more than identified by other methods, while calling ten-fold fewer false-positive fusions in non-transformed human tissues. We leverage the increased precision of DEEPEST to discover new cancer biology. For example, 888 new candidate oncogenes are identified based on over-representation in DEEPEST-Fusion calls, and 1,078 previously unreported fusions involving long intergenic noncoding RNAs partners, demonstrating a previously unappreciated prevalence and potential for function.
    [Show full text]
  • Autosomal Recessive Spastic Ataxia of Charlevoix-Saguenay
    Copyright by John Francis Anderson 2011 THE ROLE OF SACSIN AS A MOLECULAR CHAPERONE by John Francis Anderson, B.S. Dissertation Presented to the Faculty of the Graduate School of The University of Texas Medical Branch in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy The University of Texas Medical Branch May 2011 Dedication To my wife, Erika R. Anderson. Acknowledgements I am indebted to my mentor, Dr. José M. Barral for his investment in my education. Dr. Barral taught me that curiosity drives scientific investigation and rigorous experiments derive new knowledge. This combination of values is rare to find in a mentor and I am extremely grateful for my experience in his lab. I would like to acknowledge Jason J. Chandler and Paige Spencer for technical assistance. I am especially grateful to Dr. Efrain Siller for daily assistance and discussions on all aspects of this project. I would like to thank Dr. Christian Kaiser providing me a practical education on the design and performance biochemical experiments. I acknowledge Dr. Bernard Brais for generously sharing his time and resources; he provided the rare opportunity to visit an ARSACS patient in her home as well as supplied us with SACS knockout mouse brains. My committee members, Dr. Henry F. Epstein, Dr. Andres F. Oberhauser, Dr. George R Jackson, and Dr. Robert O. Fox and Dr. Darren F. Boehning provided invaluable guidance throughout this work for which I am very grateful. I would like to especially thank Dr. Darren F. Boehning and Dr. Henry F. Epstein for being involved in the details of my project from the beginning of my education at UTMB, through sharing reagents and expertise as well as serving on my committee.
    [Show full text]
  • Supplementary Note 1 - Copy Number Estimation with Fastcn
    1 Supplementary Note 1 - Copy number estimation with fastCN 1.1 Introduction Multiple approaches that utilize read depth to identify regions of copy number variation have been developed. One successful set of approaches utilize the mrFAST and mrsFAST aligners, tools which efficiently return all matching locations for short sequencing reads within a specified edit distance. These tools have been used to analyze CNV patterns in multiple studies of humans and non-human primates [1–4]. However, this estimation required separate steps including mapping, BAM file sorting based on location, and read pileup followed by GC corrections, requiring the storage and manipulation of several large files. Since the total time for disk I/O and the use of multiple intermediate files is a serious bottleneck for large scale analyses, we developed fastCN to efficiently estimate genome copy number from short read data. This program utilizes the data output from the short read mapper mrsFAST [5], and reports per-bp read depth in an efficient compressed binary format. The fastCN software package is available on GitHub (https://github.com/KiddLab/fastCN). 1.2 Implementation and optimization The fastCN core pipeline consists of two major applications. The first, GC_control_gen, generates a control region file for the next stage of the pipeline based on (1) the reference genome and user supplied files indicating (2) regions of the genome assumed to not be copy number variable and (3) regions of the genome which have been masked prior to read mapping. To avoid excessive depth pile ups due to repetitive regions, we utilize a version of the genome reference where all elements defined by RepeatMasker, elements defined by tandem repeat finder (TRF), and 50-mers with at least 20 genome matches within an edit distance of two are masked to ‘N’ prior to short read mapping.
    [Show full text]