Comprehensive Variant Detection in a Human Genome with Pacbio High-Fidelity Reads William J

Total Page:16

File Type:pdf, Size:1020Kb

Comprehensive Variant Detection in a Human Genome with Pacbio High-Fidelity Reads William J Comprehensive variant detection in a human genome with PacBio high-fidelity reads William J. Rowell, Paul Peluso, John Harting, Yufeng Qian, Aaron Wenger, Richard Hall, David R. Rank PacBio, 1305 O’Brien Drive, Menlo Park, CA 94025 Abstract Mapping Clinically Relevant Genes Figure 2. High-Fidelity reads have (a) Figure 5. Structural Human genomic variations range in size from single nucleotide substitutions to large improved ‘mappability’ compared to short variant detection. High- chromosomal rearrangements. Sequencing technologies tend to be optimized for reads. (a) The majority of high-fidelity reads fidelity reads can be more detecting particular variant types and sizes. Short reads excel at detecting SNVs and can be mapped at MAQP60. (b) 15 kb high- easily aligned in repetitive small indels, while long or linked reads are typically used to detect larger structural fidelity reads are mapped with high regions, extending SV variants or phase distant loci. Long reads are more easily mapped to repetitive confidence to the 123 kb region centered on detection into previously regions, but tend to have lower per-base accuracy, making it difficult to call short CYP2D6, which has several large segmental difficult-to-map regions like variants. The PacBio Sequel System produces two main data types: long continuous duplications. simple repeats in the reads (up to 100 kbp), generated by single passes over a long template, and Circular (b) Consensus Sequence (CCS) reads, generated by calculating the consensus of many AUTS2 locus (left panel), sequencing passes over a single shorter template (500 bp to 20 kbp). The long-range implicated in autism, and information in continuous reads is useful for genome assembly and structural variant the CNTNAP2 locus (right detection. The higher base accuracy of CCS effectively detects and phases short panel), implicated in variants in single molecules. epilepsy. Recent improvements in library preparation protocols and sequencing chemistry have increased the length, accuracy, and throughput of CCS reads. For the human sample HG002, we collected 28-fold coverage 15 kbp high-fidelity CCS reads with an Figure 6. Short variant average read quality above Q20 (99% accuracy). The length and accuracy of these Structural Variant Calling detection in reads allow us to detect SNVs, indels, and structural variants not only in the Genome PMS2/PMS2CL. High- in a Bottle (GIAB) high confidence regions, but also in segmental duplications, HLA fidelity reads can be Sequencing assigned to the correct loci, and clinically relevant “difficult-to-map” genes. (a) (b) Figure 3. Structural 5 Mb 3 Mb 10 Mb High-Fidelity 15 kbp reads locus of the gene- As with continuous long reads, we call structural variants at 90.0% recall compared to variant detection. PacBio Sequel System pseudogene pair PMS2 (left the GIAB structural variant benchmark “truth” set, with the added advantages of base SNVs indels structural variants (a) Structural variants of 1-49 bp ≥50 bp PMS2CL pair resolution for variant calls and improved recall at compound heterozygous loci. Read Mapping 20 bp and larger can be panel) and (right minimap2 detected (b) with panel). Using a simple With minimap2 alignments, GATK4 HaplotypeCaller variant calls, and simple variant GATK workflow, short Variant Calling minimap2 alignment filtration, we have achieved a SNP F-Score of 99.51% and an INDEL F-Score of variants can be detected 80.10% against the GIAB short variant benchmark “truth” set, in addition to calling pbsv followed by pbsv analysis. (c) ~39,000 over the majority of the variants outside of the high confidence region established by GIAB using previous Visualization difficult-to-map PMS2 locus, technologies. With the long-range information available in 15 kbp reads, we applied deletion insertion inversion translocation IGV variants are detected in HG002, and (d) we a gene associated with the read-backed phasing tool WhatsHap to generate phase blocks with a mean length Lynch syndrome. of 65 kbp across the entire genome. Using an alignment-based approach, we typed achieve high recall and precision compared to all major MHC class I and class II genes to at least 3-field precision. (c) (a) the GIAB SV Tier1 v0.6 This new data type has the potential to expand the GIAB high confidence regions and benchmark. “truth” benchmark sets to many previously difficult-to-map genes and allow a single (d) sequencing protocol to address both short variants and large structural variants. Recall 90.0% (b) Locus Allele 1 Allele 2 Precision 94.0% HLA-A A*01:01:01:01 A*26:01:01 Sample Preparation and Primary Analysis HLA-B B*35:08:01 B*38:01:01:01 HLA-C C*04:01:01 C*12:03:01 HLA-DPA1 DPA1*01:03:01:02 DPA1*01:03:01:04 HLA-DPB1 DPB1*04:01:01:01 *DPB1*04:01:01 Short Variant Calling HLA-DQA1 DQA1*01:05:01 DQA1*03:01:01 HG003 HG004 (a) 3-4 µg DNA HLA-DQB1 DQB1*05:01:01 DQB1*03:02:01 (GIAB HG002) HLA-DRB1 DRB1*04:03:01 DRB1*10:01:01:03 HG002 Sequencing (a) (b) GIAB ‘truth’ GATK HC HLA-DRB4 DRB4*01:03:01 - 20 kbp shear (Megaruptor) High-Fidelity 15 kbp reads Benchmark on CCS PacBio Sequel System 13,438 (FN) 3,047,837 16,248 (FP) (TP) SMRTbell adapter ligation Read Mapping Figure 7. HLA typing. (a) High-fidelity reads are easily aligned to MHC class I and II minimap2 99.56% recall genes, such as HLA-DQB1, and (b) allow typing to three fields or greater. 15 kbp band size selection (SageELF) 99.47% precision Variant Calling polymerase binding GATK4 HaplotypeCaller (c) Conclusions 12 hr pre-extension Filtering GATK4 Filter Variants 24 hr movie • High-fidelity long reads can be mapped definitively to more parts of the genome Phasing than short reads, including segmental duplications, tandem repeats, structural WhatsHap variants, and highly divergent sequences. Visualization (b) (c) • Structural variant detection with PBSV is similar to that of continuous long reads IGV (CLR), with the added benefit of base pair resolution of breakpoints. ~88% accuracy • Because of the increased accuracy, high-fidelity long reads can be used with existing workflows to detect short variants with comparable SNP F1 score. The >99% accuracy (QV20) majority of indel miscalls are 1bp in length, and precision can be improved by (e) splitting indels by size before applying filters. (d) • High-fidelity long reads give users the ability to detect both short variants and (d) structural variants with the same dataset. CYP2D6 (f) References • From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline. Van der Auwera GA, Carneiro M, Hartl C, Poplin R, del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella K, Altshuler D, Gabriel S, DePristo M, 2013 CURRENT Figure 1. Generation of high-fidelity long reads. (a) Following shearing and Figure 4. Short variant detection and phasing. (a) High-fidelity reads can be used PROTOCOLS IN BIOINFORMATICS 43:11.10.1-11.10.33 SMRTbell library creation, the SageELF system was used to select the 15 kbp band, with pre-existing short variant callers, (b) achieving SNP recall and precision • WhatsHap: fast and accurate read-based phasing. Marcel Martin, Murray which was sequenced for 20 hours on the Sequel System. (b) High-fidelity reads comparable with short read technology at similar depths. (c) Unfiltered indels are Patterson, Shilpa Garg, Sarah O. Fischer, Nadia Pisanti, Gunnar W. Klau, were generated as the consensus of multiple sequencing passes over single dominated by a low quality peak corresponding to 1 bp indels, which can be Alexander Schoenhuth, Tobias Marschall. bioRxiv 085050; doi: SMRTbell templates. (c) Q20+ CCS reads from a representative single Sequel addressed by splitting indels by size before applying filters. (d) Distribution of https://doi.org/10.1101/085050 SMRT Cell. (d) High-fidelity reads have a tight read length distribution around 14 WhatsHap read-backed phase block lengths. (e) 500 kbp phase blocks covering the • http://www.broadinstitute.org/igv kbp, and read qualities approaching Q30. difficult-to-sequence CYP2D6 locus on chromosome 22. (f) Phasing of the HLA loci. • https://www.pacb.com/support/software-downloads/ For Research Use Only. Not for use in diagnostic procedures. © Copyright 2018 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. Fragment Analyzer is a trademark of Advanced Analytical Technologies. All other trademarks are the sole property of their respective owners..
Recommended publications
  • DOE Human Genome Program Contractor-Grantee Workshop VI November 9-13, 1997 Santa Fe, New Mexico
    CONF-971146 DOE uman ~ ~enome Contact for queries about this publication: Human Genome Program U.S. Department of Energy Office of Biological and Environmental Research ER-72GTN Washington, DC 20585 301/903-6488, Fax: 301/903-8521 E-mail: [email protected] A limited number of print copies are available. Contact: Betty K. Mansfield Oak Ridge National Laboratory 1060 Commerce Park, MS 6480 Oak Ridge, TN 37830 423/576-6669, Fax: 423/574-9888 E-mail: [email protected] An electronic version of this document will be available, after the November 1997 meeting, at the Human Genome Project Information Web site under Publications (http://www.ornl.gov/hgmis). This report has been reproduced directly from the best obtainable copy. Available to DOE and DOE contractors from the Office of Scientific and Technical Information; P.O. Box 62; Oak Ridge, TN 37831. Price information: 423/576-8401. Available to the public from the National Technical Information Service; U.S. Department of Commerce; 5285 Port Royal Road; Springfield, VA 22161. CONF-971146 DOE Human Genome Program Contractor-Grantee Workshop VI November 9-13, 1997 Santa Fe, New Mexico Date Published: October 1997 Prepared for the U.S. Department of Energy Office of Energy Research Office of Biological and Environmental Research Washington, D.C. 20585 under budget and reporting code KP 0404000 Prepared by Hwnan Genome Management Information System Oak Ridge National Laboratory Oak Ridge, 1N 37830-6480 Managed by LOCKHEED MARTIN ENERGY RESEARCH CORP. for the U.S. DEPARTMENT OF ENERGY UNDER CONTRACT DE-AC05-960R22464 Contents Introduction to Contractor-Grantee Workshop VI .............................
    [Show full text]
  • Comparative Analysis of Pacbio and Oxford Nanopore Sequencing Technologies for Transcriptomic Landscape Identification of Penaeus Monodon
    life Brief Report Comparative Analysis of PacBio and Oxford Nanopore Sequencing Technologies for Transcriptomic Landscape Identification of Penaeus monodon Zulema Udaondo 1 , Kanchana Sittikankaew 2, Tanaporn Uengwetwanit 2 , Thidathip Wongsurawat 1,3 , Chutima Sonthirod 4, Piroon Jenjaroenpun 1,3 , Wirulda Pootakham 4, Nitsara Karoonuthaisiri 2 and Intawat Nookaew 1,* 1 Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA; [email protected] (Z.U.); [email protected] (T.W.); [email protected] (P.J.) 2 National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency, Pathum Thani 12120, Thailand; [email protected] (K.S.); [email protected] (T.U.); [email protected] (N.K.) 3 Division of Bioinformatics and Data Management for Research, Department of Research and Development, Faculty of Medicine, Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand 4 National Omics Center (NOC), National Science and Technology Development Agency (NSTDA), Pathum Thani 12120, Thailand; [email protected] (C.S.); [email protected] (W.P.) * Correspondence: [email protected]; Tel.: +1-501-686-6025; Fax: +1-501-603-1766 Citation: Udaondo, Z.; Sittikankaew, K.; Uengwetwanit, T.; Wongsurawat, Abstract: With the advantages that long-read sequencing platforms such as Pacific Biosciences T.; Sonthirod, C.; Jenjaroenpun, P.; (Menlo Park, CA, USA) (PacBio) and Oxford Nanopore Technologies (Oxford, UK) (ONT) can offer, Pootakham, W.; Karoonuthaisiri, N.; various research fields such as genomics and transcriptomics can exploit their benefits. Selecting an Nookaew, I. Comparative Analysis of appropriate sequencing platform is undoubtedly crucial for the success of the research outcome, PacBio and Oxford Nanopore thus there is a need to compare these long-read sequencing platforms and evaluate them for specific Sequencing Technologies for research questions.
    [Show full text]
  • In This Table Protein Name, Uniprot Code, Gene Name P-Value
    Supplementary Table S1: In this table protein name, uniprot code, gene name p-value and Fold change (FC) for each comparison are shown, for 299 of the 301 significantly regulated proteins found in both comparisons (p-value<0.01, fold change (FC) >+/-0.37) ALS versus control and FTLD-U versus control. Two uncharacterized proteins have been excluded from this list Protein name Uniprot Gene name p value FC FTLD-U p value FC ALS FTLD-U ALS Cytochrome b-c1 complex P14927 UQCRB 1.534E-03 -1.591E+00 6.005E-04 -1.639E+00 subunit 7 NADH dehydrogenase O95182 NDUFA7 4.127E-04 -9.471E-01 3.467E-05 -1.643E+00 [ubiquinone] 1 alpha subcomplex subunit 7 NADH dehydrogenase O43678 NDUFA2 3.230E-04 -9.145E-01 2.113E-04 -1.450E+00 [ubiquinone] 1 alpha subcomplex subunit 2 NADH dehydrogenase O43920 NDUFS5 1.769E-04 -8.829E-01 3.235E-05 -1.007E+00 [ubiquinone] iron-sulfur protein 5 ARF GTPase-activating A0A0C4DGN6 GIT1 1.306E-03 -8.810E-01 1.115E-03 -7.228E-01 protein GIT1 Methylglutaconyl-CoA Q13825 AUH 6.097E-04 -7.666E-01 5.619E-06 -1.178E+00 hydratase, mitochondrial ADP/ATP translocase 1 P12235 SLC25A4 6.068E-03 -6.095E-01 3.595E-04 -1.011E+00 MIC J3QTA6 CHCHD6 1.090E-04 -5.913E-01 2.124E-03 -5.948E-01 MIC J3QTA6 CHCHD6 1.090E-04 -5.913E-01 2.124E-03 -5.948E-01 Protein kinase C and casein Q9BY11 PACSIN1 3.837E-03 -5.863E-01 3.680E-06 -1.824E+00 kinase substrate in neurons protein 1 Tubulin polymerization- O94811 TPPP 6.466E-03 -5.755E-01 6.943E-06 -1.169E+00 promoting protein MIC C9JRZ6 CHCHD3 2.912E-02 -6.187E-01 2.195E-03 -9.781E-01 Mitochondrial 2-
    [Show full text]
  • Mechanism of Tripterygium Wilfordii for the Treatment of Idiopathic Membranous Nephropathy Based on Network Pharmacology
    Mechanism of Tripterygium Wilfordii for the Treatment of Idiopathic Membranous Nephropathy Based on Network Pharmacology Honghong Shi Second Hospital of Shanxi Medical University https://orcid.org/0000-0003-1385-5270 Yanjuan Hou Second Hospital of Shanxi Medical University Xiaole Su Second Hospital of Shanxi Medical University Jun Qiao Second Hospital of Shanxi Medical University Qian Wang Second Hospital of Shanxi Medical University Xiaojiao Guo Second Hospital of Shanxi Medical University Zhihong Gao Second Hospital of Shanxi Medical University Lihua Wang ( [email protected] ) Second Hospital of Shanxi Medical University https://orcid.org/0000-0003-2741-7765 Research Keywords: Tripterygium Wilfordii, Idiopathic membranous nephropathy, Network pharmacology Posted Date: June 24th, 2021 DOI: https://doi.org/10.21203/rs.3.rs-640717/v1 License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License Page 1/15 Abstract Background Tripterygium wilfordii has been widely used for idiopathic membranous nephropathy (IMN), while the pharmacological mechanisms are still unclear. This study is aimed at revealing potential therapeutic targets and pharmacological mechanism of tripterygium wilfordii for the treatment of IMN based on network pharmacology. Methods Active components of tripterygium wilfordii were obtained from the Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform. IMN-associated target genes were collected from GeneCards database, DisGeNET database, and OMIMI database. VENNY 2.1 was used to identify the overlapping genes between active compounds of tripterygium wilfordii and IMN target genes. Using STRING database and Cytoscape 3.7.2 software to analyze interactions among overlapping genes. The Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment of the targets were analyzed using the Rx64 4.0.2 software, colorspace, stringi, DOSE, clusterProler, and enrichplot packages.
    [Show full text]
  • Automatic Functional Annotation of Predicted Active Sites: Combining PDB and Literature Mining
    Automatic functional annotation of predicted active sites: combining PDB and literature mining Kevin Nagel Wolfson College A dissertation submitted to the University of Cambridge for the degree of Doctor of Philosophy European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom. Email: [email protected] January 2009 Declaration This dissertation is the result of my own work, and includes nothing which is the outcome of work done in collaboration, except where specifically indicated in the text. The disser- tation does not exceed the specified length limit of 300 pages as defined by the Biology Degree Committee. This thesis has been typeset in 12pt font using LATEX 2"according to the specifications defined by the Board of Graduate Studies and the Biology Degree Committee. 1 Summary Kevin Nagel European Bioinformatics Institute University of Cambridge Dissertation title: Automatic functional annotation of predicted active sites: combining PDB and literature mining. Proteins are essential to cell functions, which is mainly identified in biological experiments. The structural models for proteins help to explain their function, but are not direct evidence for their function. Nonetheless, we can mine structural databases, such as Protein Data Bank (PDB), to filter out shared structural components that are meaningful with regards to the protein function. This thesis applied mining techniques to PDB to identify evolutionary conserved struc- tural patterns, e.g. active sites. This analysis retrieved 3- and 4-bodies with assumed two- and three-way residue interaction that have been selected from a distribution analysis of residue triplets. A subset of the mined patterns is assumed to represent an active site, which should be confirmed by annotations gathered by automatic literature analysis.
    [Show full text]
  • Transcriptomics of Cumulus Cells – a Window Into Oocyte Maturation in Humans Brandon A
    Wyse et al. Journal of Ovarian Research (2020) 13:93 https://doi.org/10.1186/s13048-020-00696-7 RESEARCH Open Access Transcriptomics of cumulus cells – a window into oocyte maturation in humans Brandon A. Wyse1*† , Noga Fuchs Weizman1†, Seth Kadish1, Hanna Balakier1, Mugundhine Sangaralingam1 and Clifford L. Librach1,2,3,4 Abstract Background: Cumulus cells (CC) encapsulate growing oocytes and support their growth and development. Transcriptomic signatures of CC have the potential to serve as valuable non-invasive biomarkers for oocyte competency and potential. The present sibling cumulus-oocyte-complex (COC) cohort study aimed at defining functional variations between oocytes of different maturity exposed to the same stimulation conditions, by assessing the transcriptomic signatures of their corresponding CC. CC were collected from 18 patients with both germinal vesicle and metaphase II oocytes from the same cycle to keep the biological variability between samples to a minimum. RNA sequencing, differential expression, pathway analysis, and leading-edge were performed to highlight functional differences between CC encapsulating oocytes of different maturity. Results: Transcriptomic signatures representing CC encapsulating oocytes of different maturity clustered separately on principal component analysis with 1818 genes differentially expressed. CCs encapsulating mature oocytes were more transcriptionally synchronized when compared with CCs encapsulating immature oocytes. Moreover, the transcriptional activity was lower, albeit not absent, in CC encapsulating mature oocytes, with 2407 fewer transcripts detected than in CC encapsulating immature (germinal vesicle - GV) oocytes. Hallmark pathways and ovarian processes that were affected by oocyte maturity included cell cycle regulation, steroid metabolism, apoptosis, extracellular matrix remodeling, and inflammation. Conclusions: Herein we review our findings and discuss how they align with previous literature addressing transcriptomic signatures of oocyte maturation.
    [Show full text]
  • Mutations in FYCO1 Cause Autosomal-Recessive Congenital Cataracts
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector REPORT Mutations in FYCO1 Cause Autosomal-Recessive Congenital Cataracts Jianjun Chen,1,9 Zhiwei Ma,1,9 Xiaodong Jiao,1 Robert Fariss,2 Wanda Lee Kantorow,3 Marc Kantorow,3 Eran Pras,4 Moshe Frydman,5 Elon Pras,5 Sheikh Riazuddin,6,7 S. Amer Riazuddin,6,8,10 and J. Fielding Hejtmancik1,10,* Congenital cataracts (CCs), responsible for about one-third of blindness in infants, are a major cause of vision loss in children worldwide. Autosomal-recessive congenital cataracts (arCC) form a clinically diverse and genetically heterogeneous group of disorders of the crys- talline lens. To identify the genetic cause of arCC in consanguineous Pakistani families, we performed genome-wide linkage analysis and fine mapping and identified linkage to 3p21-p22 with a summed LOD score of 33.42. Mutations in the gene encoding FYVE and coiled- coil domain containing 1 (FYCO1), a PI(3)P-binding protein family member that is associated with the exterior of autophagosomes and mediates microtubule plus-end-directed vesicle transport, were identified in 12 Pakistani families and one Arab Israeli family in which arCC had previously been mapped to the overlapping CATC2 region. Nine different mutations were identified, including c.3755 delC (p.Ala1252AspfsX71), c.3858_3862dupGGAAT (p.Leu1288TrpfsX37), c.1045 C>T (p.Gln349X), c.2206C>T (p.Gln736X), c.2761C>T (p.Arg921X), c.2830C>T (p.Arg944X), c.3150þ1G>T, c.4127T>C (p.Leu1376Pro), and c.1546C>T (p.Gln516X).
    [Show full text]
  • The in Vivo Endothelial Cell Translatome Is Highly Heterogeneous Across Vascular Beds
    bioRxiv preprint doi: https://doi.org/10.1101/708701; this version posted July 19, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. In vivo endothelial cell heterogeneity The in vivo endothelial cell translatome is highly heterogeneous across vascular beds Audrey C.A. Cleuren1, Martijn A. van der Ent2, Hui Jiang3, Kristina L. Hunker2, Andrew Yee1*, David R. Siemieniak1,4, Grietje Molema5, William C. Aird6, Santhi K. Ganesh2,7 and David Ginsburg1,2,4,7,8,§ 1Life Sciences Institute, 2Department of Internal Medicine, 3Department of Biostatistics, 4Howard Hughes Medical Institute, University of Michigan, Ann Arbor, Michigan, USA, 5Department of Pathology and Medical Biology, University of Groningen, Groningen, the Netherlands, 6Center for Vascular Biology Research, Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA, 7Department of Human Genetics and 8Department of Pediatrics, University of Michigan, Ann Arbor, Michigan, USA * Current address: Department of Pediatrics, Baylor College of Medicine, Houston, TX § Corresponding author; email [email protected] Running title: in vivo endothelial cell heterogeneity Key words: endothelial cells, RiboTag, gene expression profiling, RNA sequencing Cleuren et al. 1 bioRxiv preprint doi: https://doi.org/10.1101/708701; this version posted July 19, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
    [Show full text]
  • Integrated Systems Analysis Reveals a Molecular Network Underlying Autism Spectrum Disorders
    Published online: December 30, 2014 Article Integrated systems analysis reveals a molecular network underlying autism spectrum disorders Jingjing Li1,†, Minyi Shi1,†, Zhihai Ma1,†, Shuchun Zhao2, Ghia Euskirchen1, Jennifer Ziskin2, Alexander Urban3, Joachim Hallmayer3 & Michael Snyder1,* Abstract than 300 different human genes (Basu et al, 2009). These mutations account for very few autism cases, suggesting that the genetic archi- Autism is a complex disease whose etiology remains elusive. We tecture of autism is comprised of extreme locus heterogeneity integrated previously and newly generated data and developed a (Abrahams & Geschwind, 2008). Key issues in understanding the systems framework involving the interactome, gene expression and underlying pathophysiology of ASDs are identifying and characteriz- genome sequencing to identify a protein interaction module with ing the shared molecular pathways perturbed by the diverse set of members strongly enriched for autism candidate genes. Sequencing ASD mutations (Bill & Geschwind, 2009; Berg & Geschwind, 2012). of 25 patients confirmed the involvement of this module in autism, The common approach to uncover pathways underlying ASD is which was subsequently validated using an independent cohort of based on enrichment tests against a set of annotated pathways for over 500 patients. Expression of this module was dichotomized with mutations derived from a genome-wide comparison between cases a ubiquitously expressed subcomponent and another subcomponent and controls. For example, a b-catenin/chromatin remodeling preferentially expressed in the corpus callosum, which was protein network showed enrichment for the de novo mutations iden- significantly affected by our identified mutations in the network tified from sequencing exomes of sporadic cases with autism center. RNA-sequencing of the corpus callosum from patients with (O’Roak et al, 2012).
    [Show full text]
  • Targeting of Copper-Trafficking Chaperones Causes Gene-Specific
    © 2019. Published by The Company of Biologists Ltd | Biology Open (2019) 8, bio046961. doi:10.1242/bio.046961 RESEARCH ARTICLE Targeting of copper-trafficking chaperones causes gene-specific systemic pathology in Drosophila melanogaster: prospective expansion of mutational landscapes that regulate tumor resistance to cisplatin Eleni I. Theotoki1,*, Athanassios D. Velentzas1,*, Stamatia A. Katarachia1, Nikos C. Papandreou1, Nikolas I. Kalavros2, Sofia N. Pasadaki1, Aikaterini F. Giannopoulou1, Panagiotis Giannios3, Vassiliki A. Iconomidou1, Eumorphia G. Konstantakou4, Ema Anastasiadou2, Issidora S. Papassideri1 and Dimitrios J. Stravopodis1,‡ ABSTRACT important insights for Atox1 and CCS potential exploitation as Copper, a transition metal, is an essential component for normal predictive gene biomarkers of cancer-cell chemotherapy responses. growth and development. It acts as a critical co-factor of many KEY WORDS: Atox1, CCS, Chaperone, Cisplatin, Copper, enzymes that play key roles in diverse cellular processes. The Drosophila, Menkes, Wilson’s present study attempts to investigate the regulatory functions decisively controlling copper trafficking during development and aging of the Drosophila model system. Hence, through engagement INTRODUCTION of the GAL4/UAS genetic platform and RNAi technology, we herein The maintenance of metal homeostasis in cells is an issue of major examined the in vivo significance of Atox1 and CCS genes, products importance for an organism’s proper function and well-being. of which pivotally govern cellular copper trafficking in fly tissue Among others, the transition metal copper is a crucial component pathophysiology. Specifically, we analyzed the systemic effects of for survival, growth and development, as it acts as a key co-factor of their targeted downregulation on the eye, wing, neuronal cell many enzymes that have major roles in several cellular processes, populations and whole-body tissues of the fly.
    [Show full text]
  • You Can Check If Genes Are Captured by the Agilent Sureselect V5 Exome
    IIHG Clinical Exome Sequencing Test Gene Coverage • Whole Exome Sequencing is a targeted capture platform which does not capture the entire exome. Regions not captured by the exome will not be analyzed. o Please note, it is important to understand the absence of a reportable variant in a given gene does not mean there are not pathogenic variants in that gene. • Data sensitivity and specificity for exome testing is variable as gene coverage is not uniform throughout the exome. • The Agilent SureSelect XT Human All Exon v5 kit captures ~98% of Refseq coding base pairs, and >94% of the captured coding bases in the exome are covered at our depth-of-coverage minimum threshold (30 reads). • All test reports include the following information: o Regions of the Symptom Candidate Gene List which were not captured by the current exome capture platform. o Regions of the Symptom Candidate Gene List which were not sequenced to a sufficient depth of coverage to make a clinical diagnosis. • You can check if genes are captured by the Agilent Agilent SureSelect v5 exome capture, and how well those genes are typically covered here. • Explanations for the headings; • Symbol: Gene symbol • Chr: Chromosome • % Captured: the minimum portion of the gene captured by the Agilent SureSelect XT Human All Exon v5 kit. This includes all transcripts for a given gene. o 100.00% = the entire gene is captured o 0.00% = none of the gene is captured • % Covered: the minimum portion of the gene that has at least 30x average coverage when sequenced on the Illumina HiSeq2000 with 100 base pair (bp) paired-end reads for eight CEPH samples.
    [Show full text]
  • Differential Long Non-Coding RNA Expression Profiles in Human
    www.nature.com/scientificreports OPEN Diferential long non-coding RNA expression profles in human oocytes and cumulus cells Received: 31 July 2017 Julien Bouckenheimer1, Patricia Fauque2, Charles-Henri Lecellier3, Céline Bruno2, Thérèse Accepted: 22 January 2018 Commes1, Jean-Marc Lemaître1,4, John De Vos1,4,5 & Said Assou1 Published: xx xx xxxx Progress in assisted reproductive technologies strongly relies on understanding the regulation of the dialogue between oocyte and cumulus cells (CCs). Little is known about the role of long non-coding RNAs (lncRNAs) in the human cumulus-oocyte complex (COC). To this aim, publicly available RNA- sequencing data were analyzed to identify lncRNAs that were abundant in metaphase II (MII) oocytes (BCAR4, C3orf56, TUNAR, OOEP-AS1, CASC18, and LINC01118) and CCs (NEAT1, MALAT1, ANXA2P2, MEG3, IL6STP1, and VIM-AS1). These data were validated by RT-qPCR analysis using independent oocytes and CC samples. The functions of the identifed lncRNAs were then predicted by constructing lncRNA-mRNA co-expression networks. This analysis suggested that MII oocyte lncRNAs could be involved in chromatin remodeling, cell pluripotency and in driving early embryonic development. CC lncRNAs were co-expressed with genes involved in apoptosis and extracellular matrix-related functions. A bioinformatic analysis of RNA-sequencing data to identify CC lncRNAs that are afected by maternal age showed that lncRNAs with age-related altered expression in CCs are essential for oocyte growth. This comprehensive analysis of lncRNAs expressed in human MII oocytes and CCs could provide biomarkers of oocyte quality for the development of non-invasive tests to identify embryos with high developmental potential.
    [Show full text]