Structural Forms of the Human Amylase Locus and Their Relationships to Snps, Haplotypes, and Obesity

Total Page:16

File Type:pdf, Size:1020Kb

Structural Forms of the Human Amylase Locus and Their Relationships to Snps, Haplotypes, and Obesity Structural Forms of the Human Amylase Locus and Their Relationships to SNPs, Haplotypes, and Obesity The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Usher, Christina Leigh. 2015. Structural Forms of the Human Amylase Locus and Their Relationships to SNPs, Haplotypes, and Obesity. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences. Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:17467224 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Structural forms of the human amylase locus and their relationships to SNPs, haplotypes, and obesity A dissertation presented by Christina Leigh Usher to The Division of Medical Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the subject of Genetics and Genomics Harvard University Cambridge, Massachusetts March 2015 © 2015 Christina Leigh Usher All rights reserved. Dissertation Advisor: Professor Steven McCarroll Christina Leigh Usher Structural forms of the human amylase locus and their relationships to SNPs, haplotypes, and obesity Abstract Hundreds of human genes reside in structurally complex loci that elude molecular analysis and assessment in genome-wide association studies (GWAS). One such locus contains the three different amylase genes (AMY2B, AMY2A, and AMY1) responsible for digesting starch into sugar. The copy number of AMY1 is reported to be the genome’s largest influence on obesity, yet has gone undetected in GWAS. Using droplet digital PCR (ddPCR), sequence analysis, and optical mapping, we characterized eight common structural forms of the amylase locus, their mutational histories, and their relationships to SNPs. We found that AMY1 copy number has a unique distribution undetectable to earlier methods that can be understood from an underlying set of structural forms and their allele frequencies. Despite a history of recurrent structural mutations, AMY1 copy number has maintained partial correlations to nearby SNPs; these SNPs do not associate with body mass index (BMI). To directly test for association, we measured amylase gene copy number using ddPCR in 1,000 Estonians selected for being either obese or lean and in two cohorts totaling ~3,500 individuals using sequence analysis. We had 99% power to detect even the lower bound of the reported effects on BMI and obesity, yet found no association. This study model of using multiple methods to analyze the copy number, structural haplotypes, and surrounding SNP haplotypes of multi-allelic variants will likely facilitate more robust disease association results in future studies. iii Acknowledgements My graduate school experience has been a long journey that would have been even longer without the help of my mentors, friends, and family. I would first like to thank Steve McCarroll for being my mentor. It has been an honor being his first graduate student, and quite possibly, his most stubborn. Yet, despite my greatest protestations, he never stopped giving me good advice that I would secretly follow and good ideas that drove my projects forward. His best gift has been his whole-hearted approval of my career aspirations and the connections to get me there. I would like to thank the members of the McCarroll lab for being fun, supportive, and knowledgeable. I would like to thank the first generation members Cindy Liu, Tom Mullen, and Elizabeth Fels for being a friend, a motivator, and a ridiculously good assistant, respectively. I’d like to thank the newer folks Curtis, Vanessa, Heather, Evan, Dominique, Arpiar, and Michelle for keeping the good vibes going. Of course, I have to give shout outs to Bob Handsaker and Jim Nemesh for being our omnipotent computational specialists. And last, but not least, I have to thank my officemates Linda Boettger, Aswin Sekar, Nolan Kamitaki, and Avery Davis for being more friends than coworkers. Know that even though I grumpily wore my headphones every time you guys were talking, I was still listening. iv I am grateful for having had the opportunity to work with and learn from a wonderful group of collaborators: the labs of Joel Hirschhorn, Timothy Frayling, Charles Lee, and BioNano. It was also a privilege to be guided by my Dissertation Advisory Committee Jesse Gray, Mark Daly, Alkes Price, and James Gusella. I would also like to thank Mark Daly, Terence Capellini, Matt Warman, and John Armour for kindly agreeing to be a part of my dissertation examination committee. And lastly, I’d like to thank the members of my undergraduate lab David Bloom, Nicole Giordani, Dacia Kwiatkowski, Cameron Lilly, and Levi Watson for making science look so fun that I wanted to pursue it in graduate school. I am also indebted to my family (Susan, Michael, Kevin, David, and Joyce Usher, and Heinz and Helga Meincke) for keeping me on track by asking me when I’m going to graduate even more times than the BBS administrative office. But honestly, my life wouldn’t have been as successful without them shipping up to Boston for condo repairs and keeping up long-distance support through cell phones and Facebook. v Table of Contents Introduction ................................................................................................................. 1! Multi-allelic CNVs ................................................................................................... 2! The rationale of the dissertation ............................................................................ 8! The amylase locus .................................................................................................... 9! Chapter 1. Measuring the amylase locus ................................................................ 21! Optimizing the measurement of AMY1 ............................................................... 23! The distribution of amylase gene copy number ................................................. 29! The structural alleles of the amylase locus .......................................................... 31! Chapter 2. Assembling the alleles of the amylase locus ......................................... 35! Published assemblies ............................................................................................. 35! Novel assemblies .................................................................................................... 39! Chapter 3. SNP correlations to the amylase locus ................................................. 45! SNPs capturing structural haplotypes ................................................................ 47! SNPs capturing copy number .............................................................................. 48! Chapter 4. Associating the amylase locus to obesity .............................................. 49! Discussion .................................................................................................................. 54! Discussion of the previous report ......................................................................... 54! Discussion of my proposed study model .............................................................. 57! Future Directions ................................................................................................... 61! Appendix A. Dissecting the timing & origin of gene expression variation .......... 64! Appendix B. Experimental Methods ....................................................................... 87! vi Amylase .................................................................................................................. 87! Het-Seq ................................................................................................................. 107! Appendix C. Protocols for amylase genotyping using ddPCR ........................... 117! References ................................................................................................................ 140! vii Introduction Human genomes have thousands of deletion and duplication polymorphisms. These so-called copy number variants (CNVs) cause as many as ~0.78% of base pairs to differ in copy number between any two individuals’ genomes1. The first indication that these copy number alterations could influence human phenotypes came from sporadic diseases, now termed genomic disorders, caused by de novo structural alterations2. The dominant mutations responsible tended to alter gene dosage and occur in regions of genomic instability caused by segmental duplications and low copy repeats3. It was not long until some inherited genomic disorders and mendelian diseases were linked to inherited CNVs in linkage studies4-6. However, despite the evidence of inheritance, early studies of CNV in normal individuals still referred to common CNVs as recurring, as if these CNVs had independent origins7 and were affecting as yet undefined phenotypes. Though rare and de novo CNVs have well-known roles in disease, with many having strong odds ratios of 2–308-10, we now know from microarray-based genome-wide CNV studies in phenotypically normal populations that most of the CNV in any individual’s genome arises from a reservoir of polymorphisms that are common11, stably 1 inherited11, and mostly benign12. Most deletions and simple duplications are now routinely assessed in genome-wide association
Recommended publications
  • Chromosome 1 (Human Genome/Inkae) A
    Proc. Nati. Acad. Sci. USA Vol. 89, pp. 4598-4602, May 1992 Medical Sciences Integration of gene maps: Chromosome 1 (human genome/inkae) A. COLLINS*, B. J. KEATSt, N. DRACOPOLIt, D. C. SHIELDS*, AND N. E. MORTON* *CRC Research Group in Genetic Epidemiology, Department of Child Health, University of Southampton, Southampton, S09 4XY, United Kingdom; tDepartment of Biometry and Genetics, Louisiana State University Center, 1901 Perdido Street, New Orleans, LA 70112; and tCenter for Cancer Research, Massachusetts Institute of Technology, 40 Ames Street, Cambridge, MA 02139 Contributed by N. E. Morton, February 10, 1992 ABSTRACT A composite map of 177 locI has been con- standard lod tables extracted from the literature. Multiple structed in two steps. The first combined pairwise logarithm- pairwise analysis of these data was performed by the MAP90 of-odds scores on 127 loci Into a comprehensive genetic map. computer program (6), which can estimate an errorfrequency Then this map was projected onto the physical map through e (7) and a mapping parameter p such that map distance w is cytogenetic assignments, and the small amount ofphysical data a function of 0, e and p (8). It also includes a bootstrap to was interpolated for an additional 50 loci each of which had optimize order and a stepwise elimination of weakly sup- been assigned to an interval of less than 10 megabases. The ported loci to identify a conservative set of reliably ordered resulting composite map is on the physical scale with a reso- (framework) markers. The genetic map was combined with lution of 1.5 megabases.
    [Show full text]
  • Characterization of Genomic Copy Number Variation in Mus Musculus Associated with the Germline of Inbred and Wild Mouse Populations, Normal Development, and Cancer
    Western University Scholarship@Western Electronic Thesis and Dissertation Repository 4-18-2019 2:00 PM Characterization of genomic copy number variation in Mus musculus associated with the germline of inbred and wild mouse populations, normal development, and cancer Maja Milojevic The University of Western Ontario Supervisor Hill, Kathleen A. The University of Western Ontario Graduate Program in Biology A thesis submitted in partial fulfillment of the equirr ements for the degree in Doctor of Philosophy © Maja Milojevic 2019 Follow this and additional works at: https://ir.lib.uwo.ca/etd Part of the Genetics and Genomics Commons Recommended Citation Milojevic, Maja, "Characterization of genomic copy number variation in Mus musculus associated with the germline of inbred and wild mouse populations, normal development, and cancer" (2019). Electronic Thesis and Dissertation Repository. 6146. https://ir.lib.uwo.ca/etd/6146 This Dissertation/Thesis is brought to you for free and open access by Scholarship@Western. It has been accepted for inclusion in Electronic Thesis and Dissertation Repository by an authorized administrator of Scholarship@Western. For more information, please contact [email protected]. Abstract Mus musculus is a human commensal species and an important model of human development and disease with a need for approaches to determine the contribution of copy number variants (CNVs) to genetic variation in laboratory and wild mice, and arising with normal mouse development and disease. Here, the Mouse Diversity Genotyping array (MDGA)-approach to CNV detection is developed to characterize CNV differences between laboratory and wild mice, between multiple normal tissues of the same mouse, and between primary mammary gland tumours and metastatic lung tissue.
    [Show full text]
  • Renal Cell Neoplasms Contain Shared Tumor Type–Specific Copy Number Variations
    The American Journal of Pathology, Vol. 180, No. 6, June 2012 Copyright © 2012 American Society for Investigative Pathology. Published by Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ajpath.2012.01.044 Tumorigenesis and Neoplastic Progression Renal Cell Neoplasms Contain Shared Tumor Type–Specific Copy Number Variations John M. Krill-Burger,* Maureen A. Lyons,*† The annual incidence of renal cell carcinoma (RCC) has Lori A. Kelly,*† Christin M. Sciulli,*† increased steadily in the United States for the past three Patricia Petrosko,*† Uma R. Chandran,†‡ decades, with approximately 58,000 new cases diag- 1,2 Michael D. Kubal,§ Sheldon I. Bastacky,*† nosed in 2010, representing 3% of all malignancies. Anil V. Parwani,*†‡ Rajiv Dhir,*†‡ and Treatment of RCC is complicated by the fact that it is not a single disease but composes multiple tumor types with William A. LaFramboise*†‡ different morphological characteristics, clinical courses, From the Departments of Pathology* and Biomedical and outcomes (ie, clear-cell carcinoma, 82% of RCC ‡ Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania; cases; type 1 or 2 papillary tumors, 11% of RCC cases; † the University of Pittsburgh Cancer Institute, Pittsburgh, chromophobe tumors, 5% of RCC cases; and collecting § Pennsylvania; and Life Technologies, Carlsbad, California duct carcinoma, approximately 1% of RCC cases).2,3 Benign renal neoplasms are subdivided into papillary adenoma, renal oncocytoma, and metanephric ade- Copy number variant (CNV) analysis was performed on noma.2,3 Treatment of RCC often involves surgical resec- renal cell carcinoma (RCC) specimens (chromophobe, tion of a large renal tissue component or removal of the clear cell, oncocytoma, papillary type 1, and papillary entire affected kidney because of the relatively large size of type 2) using high-resolution arrays (1.85 million renal tumors on discovery and the availability of a life-sus- probes).
    [Show full text]
  • Differential Proteomic Analysis of the Pancreas of Diabetic Db/Db Mice Reveals the Proteins Involved in the Development of Complications of Diabetes Mellitus
    Int. J. Mol. Sci. 2014, 15, 9579-9593; doi:10.3390/ijms15069579 OPEN ACCESS International Journal of Molecular Sciences ISSN 1422-0067 www.mdpi.com/journal/ijms Article Differential Proteomic Analysis of the Pancreas of Diabetic db/db Mice Reveals the Proteins Involved in the Development of Complications of Diabetes Mellitus Victoriano Pérez-Vázquez 1,*, Juan M. Guzmán-Flores 1, Daniela Mares-Álvarez 1, Magdalena Hernández-Ortiz 2, Maciste H. Macías-Cervantes 1, Joel Ramírez-Emiliano 1 and Sergio Encarnación-Guevara 2 1 Depto. de Ciencias Médicas, División de Ciencias de la Salud, Campus León, Universidad de Guanajuato, León, Guanajuato 37320, Mexico; E-Mails: [email protected] (J.M.G.-F.); [email protected] (D.M.-A.); [email protected] (M.H.M.-C.); [email protected] (J.R.-E.) 2 Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos 62210, Mexico; E-Mails: [email protected] (M.H.-O.); [email protected] (S.E.-G.) * Author to whom correspondence should be addressed; E-Mail: [email protected]; Tel.: +52-477-7143-812; Fax: +52-477-7167-623. Received: 4 April 2014; in revised form: 14 May 2014 / Accepted: 19 May 2014 / Published: 30 May 2014 Abstract: Type 2 diabetes mellitus is characterized by hyperglycemia and insulin-resistance. Diabetes results from pancreatic inability to secrete the insulin needed to overcome this resistance. We analyzed the protein profile from the pancreas of ten-week old diabetic db/db and wild type mice through proteomics. Pancreatic proteins were separated in two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) and significant changes in db/db mice respect to wild type mice were observed in 27 proteins.
    [Show full text]
  • Chuanxiong Rhizoma Compound on HIF-VEGF Pathway and Cerebral Ischemia-Reperfusion Injury’S Biological Network Based on Systematic Pharmacology
    ORIGINAL RESEARCH published: 25 June 2021 doi: 10.3389/fphar.2021.601846 Exploring the Regulatory Mechanism of Hedysarum Multijugum Maxim.-Chuanxiong Rhizoma Compound on HIF-VEGF Pathway and Cerebral Ischemia-Reperfusion Injury’s Biological Network Based on Systematic Pharmacology Kailin Yang 1†, Liuting Zeng 1†, Anqi Ge 2†, Yi Chen 1†, Shanshan Wang 1†, Xiaofei Zhu 1,3† and Jinwen Ge 1,4* Edited by: 1 Takashi Sato, Key Laboratory of Hunan Province for Integrated Traditional Chinese and Western Medicine on Prevention and Treatment of 2 Tokyo University of Pharmacy and Life Cardio-Cerebral Diseases, Hunan University of Chinese Medicine, Changsha, China, Galactophore Department, The First 3 Sciences, Japan Hospital of Hunan University of Chinese Medicine, Changsha, China, School of Graduate, Central South University, Changsha, China, 4Shaoyang University, Shaoyang, China Reviewed by: Hui Zhao, Capital Medical University, China Background: Clinical research found that Hedysarum Multijugum Maxim.-Chuanxiong Maria Luisa Del Moral, fi University of Jaén, Spain Rhizoma Compound (HCC) has de nite curative effect on cerebral ischemic diseases, *Correspondence: such as ischemic stroke and cerebral ischemia-reperfusion injury (CIR). However, its Jinwen Ge mechanism for treating cerebral ischemia is still not fully explained. [email protected] †These authors share first authorship Methods: The traditional Chinese medicine related database were utilized to obtain the components of HCC. The Pharmmapper were used to predict HCC’s potential targets. Specialty section: The CIR genes were obtained from Genecards and OMIM and the protein-protein This article was submitted to interaction (PPI) data of HCC’s targets and IS genes were obtained from String Ethnopharmacology, a section of the journal database.
    [Show full text]
  • Role of Amylase in Ovarian Cancer Mai Mohamed University of South Florida, [email protected]
    University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School July 2017 Role of Amylase in Ovarian Cancer Mai Mohamed University of South Florida, [email protected] Follow this and additional works at: http://scholarcommons.usf.edu/etd Part of the Pathology Commons Scholar Commons Citation Mohamed, Mai, "Role of Amylase in Ovarian Cancer" (2017). Graduate Theses and Dissertations. http://scholarcommons.usf.edu/etd/6907 This Dissertation is brought to you for free and open access by the Graduate School at Scholar Commons. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Scholar Commons. For more information, please contact [email protected]. Role of Amylase in Ovarian Cancer by Mai Mohamed A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Pathology and Cell Biology Morsani College of Medicine University of South Florida Major Professor: Patricia Kruk, Ph.D. Paula C. Bickford, Ph.D. Meera Nanjundan, Ph.D. Marzenna Wiranowska, Ph.D. Lauri Wright, Ph.D. Date of Approval: June 29, 2017 Keywords: ovarian cancer, amylase, computational analyses, glycocalyx, cellular invasion Copyright © 2017, Mai Mohamed Dedication This dissertation is dedicated to my parents, Ahmed and Fatma, who have always stressed the importance of education, and, throughout my education, have been my strongest source of encouragement and support. They always believed in me and I am eternally grateful to them. I would also like to thank my brothers, Mohamed and Hussien, and my sister, Mariam. I would also like to thank my husband, Ahmed.
    [Show full text]
  • Marker Identification of the Grade of Dysplasia of Intraductal Papillary
    cancers Article Marker Identification of the Grade of Dysplasia of Intraductal Papillary Mucinous Neoplasm in Pancreatic Cyst Fluid by Quantitative Proteomic Profiling 1, 2, 1 3 4 Misol Do y , Hongbeom Kim y, Dongyoon Shin , Joonho Park , Haeryoung Kim , Youngmin Han 2, Jin-Young Jang 2,* and Youngsoo Kim 1,3,* 1 Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul 03080, Korea; [email protected] (M.D.); [email protected] (D.S.) 2 Department of Surgery, Seoul National University College of Medicine, Seoul 03080, Korea; [email protected] (H.K.); [email protected] (Y.H.) 3 Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul 03080, Korea; [email protected] 4 Department of Pathology, Seoul National University College of Medicine, Seoul 03080, Korea; [email protected] * Correspondence: [email protected] (J.-Y.J.); [email protected] (Y.K.); Tel.: +82-10-8338-6719 (J.-Y.J.); +82-2-740-8073 (Y.K.) The first two authors contributed equally to this work. y Received: 10 August 2020; Accepted: 20 August 2020; Published: 23 August 2020 Abstract: The incidence of patients with pancreatic cystic lesions, particularly intraductal papillary mucinous neoplasm (IPMN), is increasing. Current guidelines, which primarily consider radiological features and laboratory data, have had limited success in predicting malignant IPMN. The lack of a definitive diagnostic method has led to low-risk IPMN patients undergoing unnecessary surgeries. To address this issue, we discovered IPMN marker candidates by analyzing pancreatic cystic fluid by mass spectrometry. A total of 30 cyst fluid samples, comprising IPMN dysplasia and other cystic lesions, were evaluated.
    [Show full text]
  • Improved Detection of Gene Fusions by Applying Statistical Methods Reveals New Oncogenic RNA Cancer Drivers
    bioRxiv preprint doi: https://doi.org/10.1101/659078; this version posted June 3, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Improved detection of gene fusions by applying statistical methods reveals new oncogenic RNA cancer drivers Roozbeh Dehghannasiri1, Donald Eric Freeman1,2, Milos Jordanski3, Gillian L. Hsieh1, Ana Damljanovic4, Erik Lehnert4, Julia Salzman1,2,5* Author affiliation 1Department of Biochemistry, Stanford University, Stanford, CA 94305 2Department of Biomedical Data Science, Stanford University, Stanford, CA 94305 3Department of Computer Science, University of Belgrade, Belgrade, Serbia 4Seven Bridges Genomics, Cambridge, MA 02142 5Stanford Cancer Institute, Stanford, CA 94305 *Corresponding author [email protected] Short Abstract: The extent to which gene fusions function as drivers of cancer remains a critical open question. Current algorithms do not sufficiently identify false-positive fusions arising during library preparation, sequencing, and alignment. Here, we introduce a new algorithm, DEEPEST, that uses statistical modeling to minimize false-positives while increasing the sensitivity of fusion detection. In 9,946 tumor RNA-sequencing datasets from The Cancer Genome Atlas (TCGA) across 33 tumor types, DEEPEST identifies 31,007 fusions, 30% more than identified by other methods, while calling ten-fold fewer false-positive fusions in non-transformed human tissues. We leverage the increased precision of DEEPEST to discover new cancer biology. For example, 888 new candidate oncogenes are identified based on over-representation in DEEPEST-Fusion calls, and 1,078 previously unreported fusions involving long intergenic noncoding RNAs partners, demonstrating a previously unappreciated prevalence and potential for function.
    [Show full text]
  • Research Article Sex Difference of Ribosome in Stroke-Induced Peripheral Immunosuppression by Integrated Bioinformatics Analysis
    Hindawi BioMed Research International Volume 2020, Article ID 3650935, 15 pages https://doi.org/10.1155/2020/3650935 Research Article Sex Difference of Ribosome in Stroke-Induced Peripheral Immunosuppression by Integrated Bioinformatics Analysis Jian-Qin Xie ,1,2,3 Ya-Peng Lu ,1,3 Hong-Li Sun ,1,3 Li-Na Gao ,2,3 Pei-Pei Song ,2,3 Zhi-Jun Feng ,3 and Chong-Ge You 2,3 1Department of Anesthesiology, Lanzhou University Second Hospital, Lanzhou, Gansu 730030, China 2Laboratory Medicine Center, Lanzhou University Second Hospital, Lanzhou, Gansu 730030, China 3The Second Clinical Medical College of Lanzhou University, Lanzhou, Gansu 730030, China Correspondence should be addressed to Chong-Ge You; [email protected] Received 13 April 2020; Revised 8 October 2020; Accepted 18 November 2020; Published 3 December 2020 Academic Editor: Rudolf K. Braun Copyright © 2020 Jian-Qin Xie et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Ischemic stroke (IS) greatly threatens human health resulting in high mortality and substantial loss of function. Recent studies have shown that the outcome of IS has sex specific, but its mechanism is still unclear. This study is aimed at identifying the sexually dimorphic to peripheral immune response in IS progression, predicting potential prognostic biomarkers that can lead to sex- specific outcome, and revealing potential treatment targets. Gene expression dataset GSE37587, including 68 peripheral whole blood samples which were collected within 24 hours from known onset of symptom and again at 24-48 hours after onset (20 women and 14 men), was downloaded from the Gene Expression Omnibus (GEO) datasets.
    [Show full text]
  • Supplementary Note 1 - Copy Number Estimation with Fastcn
    1 Supplementary Note 1 - Copy number estimation with fastCN 1.1 Introduction Multiple approaches that utilize read depth to identify regions of copy number variation have been developed. One successful set of approaches utilize the mrFAST and mrsFAST aligners, tools which efficiently return all matching locations for short sequencing reads within a specified edit distance. These tools have been used to analyze CNV patterns in multiple studies of humans and non-human primates [1–4]. However, this estimation required separate steps including mapping, BAM file sorting based on location, and read pileup followed by GC corrections, requiring the storage and manipulation of several large files. Since the total time for disk I/O and the use of multiple intermediate files is a serious bottleneck for large scale analyses, we developed fastCN to efficiently estimate genome copy number from short read data. This program utilizes the data output from the short read mapper mrsFAST [5], and reports per-bp read depth in an efficient compressed binary format. The fastCN software package is available on GitHub (https://github.com/KiddLab/fastCN). 1.2 Implementation and optimization The fastCN core pipeline consists of two major applications. The first, GC_control_gen, generates a control region file for the next stage of the pipeline based on (1) the reference genome and user supplied files indicating (2) regions of the genome assumed to not be copy number variable and (3) regions of the genome which have been masked prior to read mapping. To avoid excessive depth pile ups due to repetitive regions, we utilize a version of the genome reference where all elements defined by RepeatMasker, elements defined by tandem repeat finder (TRF), and 50-mers with at least 20 genome matches within an edit distance of two are masked to ‘N’ prior to short read mapping.
    [Show full text]
  • The Wayward Dog: Is the Australian Native Dog Or Dingo a Distinct Species?
    Zootaxa 4317 (2): 201–224 ISSN 1175-5326 (print edition) http://www.mapress.com/j/zt/ Article ZOOTAXA Copyright © 2017 Magnolia Press ISSN 1175-5334 (online edition) https://doi.org/10.11646/zootaxa.4317.2.1 http://zoobank.org/urn:lsid:zoobank.org:pub:3CD420BC-2AED-4166-85F9-CCA0E4403271 The Wayward Dog: Is the Australian native dog or Dingo a distinct species? STEPHEN M. JACKSON1,2,3,9, COLIN P. GROVES4, PETER J.S. FLEMING5,6, KEN P. APLIN3, MARK D.B. ELDRIDGE7, ANTONIO GONZALEZ4 & KRISTOFER M. HELGEN8 1Animal Biosecurity & Food Safety, NSW Department of Primary Industries, Orange, New South Wales 2800, Australia. 2School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, NSW 2052. 3Division of Mammals, National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012, USA. E-mail: [email protected] 4School of Archaeology & Anthropology, Australian National University, Canberra, ACT 0200, Australia. E: [email protected]; [email protected] 5Vertebrate Pest Research Unit, Biosecurity NSW, NSW Department of Primary Industries, Orange, New South Wales 2800, Australia. E-mail: [email protected] 6 School of Environmental & Rural Science, University of New England, Armidale, NSW 2351, Australia. 7Australian Museum Research Institute, Australian Museum, 1 William St. Sydney, NSW 2010, Australia. E-mail: [email protected] 8School of Biological Sciences, Environment Institute, and ARC (Australian Research Council) Centre for Australian Biodiversity and Heritage, University of Adelaide, Adelaide, SA 5005, Australia. E-mail: [email protected] 9Corresponding author. E-mail: [email protected] Abstract The taxonomic identity and status of the Australian Dingo has been unsettled and controversial since its initial description in 1792.
    [Show full text]
  • The Genome-Wide Landscape of Copy Number Variations in the MUSGEN Study Provides Evidence for a Founder Effect in the Isolated Finnish Population
    European Journal of Human Genetics (2013) 21, 1411–1416 & 2013 Macmillan Publishers Limited All rights reserved 1018-4813/13 www.nature.com/ejhg ARTICLE The genome-wide landscape of copy number variations in the MUSGEN study provides evidence for a founder effect in the isolated Finnish population Chakravarthi Kanduri1, Liisa Ukkola-Vuoti1, Jaana Oikkonen1, Gemma Buck2, Christine Blancher2, Pirre Raijas3, Kai Karma4, Harri La¨hdesma¨ki5 and Irma Ja¨rvela¨*,1 Here we characterized the genome-wide architecture of copy number variations (CNVs) in 286 healthy, unrelated Finnish individuals belonging to the MUSGEN study, where molecular background underlying musical aptitude and related traits are studied. By using Illumina HumanOmniExpress-12v.1.0 beadchip, we identified 5493 CNVs that were spread across 467 different cytogenetic regions, spanning a total size of 287.83 Mb (B9.6% of the human genome). Merging the overlapping CNVs across samples resulted in 999 discrete copy number variable regions (CNVRs), of which B6.9% were putatively novel. The average number of CNVs per person was 20, whereas the average size of CNV per locus was 52.39 kb. Large CNVs (41 Mb) were present in 4% of the samples. The proportion of homozygous deletions in this data set (B12.4%) seemed to be higher when compared with three other populations. Interestingly, several CNVRs were significantly enriched in this sample set, whereas several others were totally depleted. For example, a CNVR at chr2p22.1 intersecting GALM was more common in this population (P ¼ 3.3706 Â 10 À44) than in African and other European populations. The enriched CNVRs, however, showed no significant association with music-related phenotypes.
    [Show full text]