A Spatially Aware Likelihood Test to Detect Sweeps from Haplotype Distributions
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
PARSANA-DISSERTATION-2020.Pdf
DECIPHERING TRANSCRIPTIONAL PATTERNS OF GENE REGULATION: A COMPUTATIONAL APPROACH by Princy Parsana A dissertation submitted to The Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy Baltimore, Maryland July, 2020 © 2020 Princy Parsana All rights reserved Abstract With rapid advancements in sequencing technology, we now have the ability to sequence the entire human genome, and to quantify expression of tens of thousands of genes from hundreds of individuals. This provides an extraordinary opportunity to learn phenotype relevant genomic patterns that can improve our understanding of molecular and cellular processes underlying a trait. The high dimensional nature of genomic data presents a range of computational and statistical challenges. This dissertation presents a compilation of projects that were driven by the motivation to efficiently capture gene regulatory patterns in the human transcriptome, while addressing statistical and computational challenges that accompany this data. We attempt to address two major difficulties in this domain: a) artifacts and noise in transcriptomic data, andb) limited statistical power. First, we present our work on investigating the effect of artifactual variation in gene expression data and its impact on trans-eQTL discovery. Here we performed an in-depth analysis of diverse pre-recorded covariates and latent confounders to understand their contribution to heterogeneity in gene expression measurements. Next, we discovered 673 trans-eQTLs across 16 human tissues using v6 data from the Genotype Tissue Expression (GTEx) project. Finally, we characterized two trait-associated trans-eQTLs; one in Skeletal Muscle and another in Thyroid. Second, we present a principal component based residualization method to correct gene expression measurements prior to reconstruction of co-expression networks. -
Approximating Selective Sweeps
Approximating Selective Sweeps by Richard Durrett and Jason Schweinsberg Dept. of Math, Cornell U. Corresponding Author: Richard Durrett Dept. of Mathematics 523 Malott Hall Cornell University Ithaca NY 14853 Phone: 607-255-8282 FAX: 607-255-7149 email: [email protected] 1 ABSTRACT The fixation of advantageous mutations in a population has the effect of reducing varia- tion in the DNA sequence near that mutation. Kaplan, Hudson, and Langley (1989) used a three-phase simulation model to study the effect of selective sweeps on genealogies. However, most subsequent work has simplified their approach by assuming that the number of individ- uals with the advantageous allele follows the logistic differential equation. We show that the impact of a selective sweep can be accurately approximated by a random partition created by a stick-breaking process. Our simulation results show that ignoring the randomness when the number of individuals with the advantageous allele is small can lead to substantial errors. Key words: selective sweep, hitchhiking, coalescent, random partition, paintbox construction 2 When a selectively favorable mutation occurs in a population and is subsequently fixed (i.e., its frequency rises to 100%), the frequencies of alleles at closely linked loci are altered. Alleles present on the chromosome on which the original mutation occurred will tend to increase in frequency, and other alleles will decrease in frequency. Maynard Smith and Haigh (1974) referred to this as the ‘hitchhiking effect,’ because an allele can get a lift in frequency from selection acting on a neighboring allele. They considered a situation with a neutral locus with alleles A and a and a second locus where allele B has a fitness of 1 + s relative to b. -
A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus
Page 1 of 781 Diabetes A Computational Approach for Defining a Signature of β-Cell Golgi Stress in Diabetes Mellitus Robert N. Bone1,6,7, Olufunmilola Oyebamiji2, Sayali Talware2, Sharmila Selvaraj2, Preethi Krishnan3,6, Farooq Syed1,6,7, Huanmei Wu2, Carmella Evans-Molina 1,3,4,5,6,7,8* Departments of 1Pediatrics, 3Medicine, 4Anatomy, Cell Biology & Physiology, 5Biochemistry & Molecular Biology, the 6Center for Diabetes & Metabolic Diseases, and the 7Herman B. Wells Center for Pediatric Research, Indiana University School of Medicine, Indianapolis, IN 46202; 2Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202; 8Roudebush VA Medical Center, Indianapolis, IN 46202. *Corresponding Author(s): Carmella Evans-Molina, MD, PhD ([email protected]) Indiana University School of Medicine, 635 Barnhill Drive, MS 2031A, Indianapolis, IN 46202, Telephone: (317) 274-4145, Fax (317) 274-4107 Running Title: Golgi Stress Response in Diabetes Word Count: 4358 Number of Figures: 6 Keywords: Golgi apparatus stress, Islets, β cell, Type 1 diabetes, Type 2 diabetes 1 Diabetes Publish Ahead of Print, published online August 20, 2020 Diabetes Page 2 of 781 ABSTRACT The Golgi apparatus (GA) is an important site of insulin processing and granule maturation, but whether GA organelle dysfunction and GA stress are present in the diabetic β-cell has not been tested. We utilized an informatics-based approach to develop a transcriptional signature of β-cell GA stress using existing RNA sequencing and microarray datasets generated using human islets from donors with diabetes and islets where type 1(T1D) and type 2 diabetes (T2D) had been modeled ex vivo. To narrow our results to GA-specific genes, we applied a filter set of 1,030 genes accepted as GA associated. -
A New Approach for Using Genome Scans to Detect Recent Positive Selection in the Human Genome
PLoS BIOLOGY A New Approach for Using Genome Scans to Detect Recent Positive Selection in the Human Genome Kun Tang1*, Kevin R. Thornton2, Mark Stoneking1 1 Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, 2 Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America Genome-wide scanning for signals of recent positive selection is essential for a comprehensive and systematic understanding of human adaptation. Here, we present a genomic survey of recent local selective sweeps, especially aimed at those nearly or recently completed. A novel approach was developed for such signals, based on contrasting the extended haplotype homozygosity (EHH) profiles between populations. We applied this method to the genome single nucleotide polymorphism (SNP) data of both the International HapMap Project and Perlegen Sciences, and detected widespread signals of recent local selection across the genome, consisting of both complete and partial sweeps. A challenging problem of genomic scans of recent positive selection is to clearly distinguish selection from neutral effects, given the high sensitivity of the test statistics to departures from neutral demographic assumptions and the lack of a single, accurate neutral model of human history. We therefore developed a new procedure that is robust across a wide range of demographic and ascertainment models, one that indicates that certain portions of the genome clearly depart from neutrality. Simulations of positive selection showed that our tests have high power towards strong selection sweeps that have undergone fixation. Gene ontology analysis of the candidate regions revealed several new functional groups that might help explain some important interpopulation differences in phenotypic traits. -
Refining the Use of Linkage Disequilibrium As A
HIGHLIGHTED ARTICLE | INVESTIGATION Refining the Use of Linkage Disequilibrium as a Robust Signature of Selective Sweeps Guy S. Jacobs,*,†,1 Timothy J. Sluckin,* and Toomas Kivisild‡ *Mathematical Sciences, University of Southampton, Southampton SO17 1BJ, United Kingdom, †Complexity Institute, Nanyang Technological University, Singapore 637723, and ‡Department of Biological Anthropology, University of Cambridge, Cambridge CB2 1QH, United Kingdom ORCID IDs: 0000-0002-4698-7758 (G.S.J.); 0000-0002-9163-0061 (T.J.S.); 0000-0002-6297-7808 (T.K.) ABSTRACT During a selective sweep, characteristic patterns of linkage disequilibrium can arise in the genomic region surrounding a selected locus. These have been used to infer past selective sweeps. However, the recombination rate is known to vary substantially along the genome for many species. We here investigate the effectiveness of current (Kelly’s ZnS and vmax) and novel statistics at inferring hard selective sweeps based on linkage disequilibrium distortions under different conditions, including a human-realistic demographic model and recombination rate variation. When the recombination rate is constant, Kelly’s ZnS offers high power, but is outperformed by a novel statistic that we test, which we call Za: We also find this statistic to be effective at detecting sweeps from standing variation. When recombination rate fluctuations are included, there is a considerable reduction in power for all linkage disequilibrium-based statistics. However, this can largely be reversed by appropriately controlling for expected linkage disequilibrium using a genetic map. To further test these different methods, we perform selection scans on well-characterized HapMap data, finding that all three statistics—vmax; Kelly’s ZnS; and Za—are able to replicate signals at regions previously identified as selection candidates based on population differentiation or the site frequency spectrum. -
Alternative Haplotypes of Antigen Processing Genes in Zebrafish Diverged Early in Vertebrate Evolution
Alternative haplotypes of antigen processing genes in PNAS PLUS zebrafish diverged early in vertebrate evolution Sean C. McConnella,1, Kyle M. Hernandezb, Dustin J. Wciselc,d, Ross N. Kettleboroughe, Derek L. Stemplee, Jeffrey A. Yoderc,d,f, Jorge Andradeb, and Jill L. O. de Jonga,1 aSection of Hematology-Oncology and Stem Cell Transplant, Department of Pediatrics, The University of Chicago, Chicago, IL 60637; bCenter for Research Informatics, The University of Chicago, Chicago, IL 60637; cDepartment of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, NC 27607; dGenomic Sciences Graduate Program, North Carolina State University, Raleigh, NC 27607; eVertebrate Development and Genetics, Wellcome Trust Sanger Institute, Cambridge CB10 1SA, United Kingdom; and fComparative Medicine Institute, North Carolina State University, Raleigh, NC 27607 Edited by Peter Parham, Stanford University School of Medicine, Stanford, CA, and accepted by Editorial Board Member Peter Cresswell June 23, 2016 (received for review May 16, 2016) Antigen processing and presentation genes found within the Recent genomic studies have offered considerable insights into MHC are among the most highly polymorphic genes of vertebrate the evolution of the vertebrate adaptive immune system by com- genomes, providing populations with diverse immune responses to a paring phylogenetically divergent species (11–14). Throughout wide array of pathogens. Here, we describe transcriptome, exome, vertebrates, gene linkage within the MHC region is highly con- and whole-genome sequencing of clonal zebrafish, uncovering the served. For example, MHCI and antigen processing genes remain most extensive diversity within the antigen processing and presen- tightly linked in sharks, members of the oldest vertebrate lineage tation genes of any species yet examined. -
Identification of Selective Sweeps, Major Genes, and Genotype by Diet Interactions Melanie D
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Theses and Dissertations in Animal Science Animal Science Department 12-2015 Genomic Analysis of Sow Reproductive Traits: Identification of Selective Sweeps, Major Genes, and Genotype by Diet Interactions Melanie D. Trenhaile University of Nebraska-Lincoln, [email protected] Follow this and additional works at: http://digitalcommons.unl.edu/animalscidiss Part of the Meat Science Commons Trenhaile, Melanie D., "Genomic Analysis of Sow Reproductive Traits: Identification of Selective Sweeps, Major Genes, and Genotype by Diet Interactions" (2015). Theses and Dissertations in Animal Science. 114. http://digitalcommons.unl.edu/animalscidiss/114 This Article is brought to you for free and open access by the Animal Science Department at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in Theses and Dissertations in Animal Science by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. GENOMIC ANALYSIS OF SOW REPRODUCTIVE TRAITS: IDENTIFICATION OF SELECTIVE SWEEPS, MAJOR GENES, AND GENOTYPE BY DIET INTERACTIONS By Melanie Dawn Trenhaile A THESIS Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Master of Science Major: Animal Science Under the Supervision of Professor Daniel Ciobanu Lincoln, Nebraska December, 2015 GENOMIC ANALYSIS OF SOW REPRODUCTIVE TRAITS: IDENTIFICATION OF SELECTIVE SWEEPS, MAJOR GENES, AND GENOTYPE BY DIET INTERACTIONS Melanie D. Trenhaile, M.S. University of Nebraska, 2015 Advisor: Daniel Ciobanu Reproductive traits, such as litter size and reproductive longevity, are economically important. However, selection for these traits is difficult due to low heritability, polygenic nature, sex-limited expression, and expression late in life. -
HSD17B8 (NM 014234) Human Tagged ORF Clone – RG203806
OriGene Technologies, Inc. 9620 Medical Center Drive, Ste 200 Rockville, MD 20850, US Phone: +1-888-267-4436 [email protected] EU: [email protected] CN: [email protected] Product datasheet for RG203806 HSD17B8 (NM_014234) Human Tagged ORF Clone Product data: Product Type: Expression Plasmids Product Name: HSD17B8 (NM_014234) Human Tagged ORF Clone Tag: TurboGFP Symbol: HSD17B8 Synonyms: D6S2245E; dJ1033B10.9; FABG; FABGL; H2-KE6; HKE6; KE6; RING2; SDR30C1 Vector: pCMV6-AC-GFP (PS100010) E. coli Selection: Ampicillin (100 ug/mL) Cell Selection: Neomycin ORF Nucleotide >RG203806 representing NM_014234 Sequence: Red=Cloning site Blue=ORF Green=Tags(s) TTTTGTAATACGACTCACTATAGGGCGGCCGGGAATTCGTCGACTGGATCCGGTACCGAGGAGATCTGCC GCCGCGATCGCC ATGGCGTCTCAGCTCCAGAACCGACTCCGCTCCGCACTGGCCTTGGTCACAGGTGCGGGGAGCGGCATCG GCCGAGCGGTCAGTGTACGCCTGGCCGGAGAGGGGGCCACCGTAGCTGCCTGCGACCTGGACCGGGCAGC GGCACAGGAGACGGTGCGGCTGCTGGGCGGGCCAGGGAGCAAGGAGGGGCCGCCCCGAGGGAACCATGCT GCCTTCCAGGCTGACGTGTCTGAGGCCAGGGCCGCCAGGTGCCTGCTGGAACAAGTGCAGGCCTGCTTTT CTCGCCCACCATCTGTCGTTGTGTCCTGTGCGGGCATCACCCAGGATGAGTTTCTGCTGCACATGTCTGA GGATGACTGGGACAAAGTCATAGCTGTCAACCTCAAGGGCACCTTCCTAGTCACTCAGGCTGCAGCACAA GCCCTGGTGTCCAATGGTTGTCGTGGTTCCATCATCAACATCAGTAGCATCGTAGGAAAGGTGGGGAACG TGGGGCAGACAAACTATGCAGCATCCAAGGCTGGAGTGATTGGGCTGACCCAGACCGCAGCCCGGGAGCT TGGACGACATGGGATCCGCTGTAACTCTGTCCTCCCAGGGTTCATTGCAACACCCATGACACAGAAAGTG CCACAGAAAGTGGTGGACAAGATTACTGAAATGATCCCGATGGGACACTTGGGGGACCCTGAGGATGTGG CAGATGTGGTCGCATTCTTGGCATCTGAAGATAGTGGATACATCACAGGGACCTCAGTGGAAGTCACTGG AGGTCTTTTCATG ACGCGTACGCGGCCGCTCGAG -
Epigenetic Reprogramming Underlies Efficacy of DNA Demethylation
www.nature.com/scientificreports OPEN Epigenetic reprogramming underlies efcacy of DNA demethylation therapy in osteosarcomas Naofumi Asano 1,2, Hideyuki Takeshima3, Satoshi Yamashita3, Hironori Takamatsu2,3, Naoko Hattori3, Takashi Kubo4, Akihiko Yoshida5, Eisuke Kobayashi6, Robert Nakayama2, Morio Matsumoto2, Masaya Nakamura2, Hitoshi Ichikawa 4, Akira Kawai6, Tadashi Kondo1 & Toshikazu Ushijima 3* Osteosarcoma (OS) patients with metastasis or recurrent tumors still sufer from poor prognosis. Studies have indicated the efcacy of DNA demethylation therapy for OS, but the underlying mechanism is still unclear. Here, we aimed to clarify the mechanism of how epigenetic therapy has therapeutic efcacy in OS. Treatment of four OS cell lines with a DNA demethylating agent, 5-aza-2′- deoxycytidine (5-aza-dC) treatment, markedly suppressed their growth, and in vivo efcacy was further confrmed using two OS xenografts. Genome-wide DNA methylation analysis showed that 10 of 28 primary OS had large numbers of methylated CpG islands while the remaining 18 OS did not, clustering together with normal tissue samples and Ewing sarcoma samples. Among the genes aberrantly methylated in primary OS, genes involved in skeletal system morphogenesis were present. Searching for methylation-silenced genes by expression microarray screening of two OS cell lines after 5-aza-dC treatment revealed that multiple tumor-suppressor and osteo/chondrogenesis-related genes were re-activated by 5-aza-dC treatment of OS cells. Simultaneous activation of multiple genes related to osteogenesis and cell proliferation, namely epigenetic reprogramming, was considered to underlie the efcacy of DNA demethylation therapy in OS. Osteosarcoma (OS) is the most common malignant tumor of the bone in children and adolescents1. -
Rapid Evolution of a Skin-Lightening Allele in Southern African Khoesan
Rapid evolution of a skin-lightening allele in southern African KhoeSan Meng Lina,1,2, Rebecca L. Siforda,b,3, Alicia R. Martinc,d,e,3, Shigeki Nakagomef, Marlo Möllerg, Eileen G. Hoalg, Carlos D. Bustamanteh, Christopher R. Gignouxh,i,j, and Brenna M. Henna,2,4 aDepartment of Ecology and Evolution, State University of New York at Stony Brook, Stony Brook, NY 11794; bThe School of Human Evolution and Social Change, Arizona State University, Tempe, AZ 85287; cAnalytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114; dProgram in Medical and Population Genetics, Broad Institute, Cambridge, MA 02141; eStanley Center for Psychiatric Research, Broad Institute, Cambridge, MA 02141; fSchool of Medicine, Trinity College Dublin, Dublin 2, Ireland; gDST-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town 8000, South Africa; hDepartment of Genetics, Stanford University, Stanford, CA 94305; iColorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045; and jDepartment of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045 Edited by Nina G. Jablonski, The Pennsylvania State University, University Park, PA, and accepted by Editorial Board Member C. O. Lovejoy October 18, 2018 (received for review February 2, 2018) Skin pigmentation is under strong directional selection in northern Amhara or another group who themselves are substantially European and Asian populations. The indigenous KhoeSan popula- admixed with Near Eastern people has been proposed (13, 14). -
Identifying the Favored Mutation in a Positive Selective Sweep
HHS Public Access Author manuscript Author ManuscriptAuthor Manuscript Author Nat Methods Manuscript Author . Author manuscript; Manuscript Author available in PMC 2018 November 12. Published in final edited form as: Nat Methods. 2018 April ; 15(4): 279–282. doi:10.1038/nmeth.4606. Identifying the Favored Mutation in a Positive Selective Sweep Ali Akbari1, Joseph J. Vitti2,3, Arya Iranmehr1, Mehrdad Bakhtiari4, Pardis C. Sabeti2,3, Siavash Mirarab1, and Vineet Bafna4 1Department of Electrical & Computer Engineering, University of California San Diego, La Jolla, California, USA 2Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA 3Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA 4Department of Computer Science & Engineering, University of California San Diego, La Jolla, California, USA Abstract Methods to identify signatures of selective sweeps in population genomics data have been actively developed, but mostly do not identify the specific mutation favored by selection. We present a method, iSAFE, that uses a statistic derived solely from population genetics signals to accurately pinpoint the favored mutation in a large region (~5 Mbp). iSAFE does not require any knowledge of demography, specific phenotype under selection, or functional annotations of mutations. Human genetic data have revealed a multitude of genomic regions believed to be evolving under positive selection. Methods for detecting regions under selection from genetic variations exploit a variety of genomic signatures. Allele frequency based methods analyze the distortion in site frequency spectrums; Linkage Disequilibrium (LD) based methods use extended homozygosity in haplotypes; other methods use differences in allele frequency between populations; and finally, composite methods combine multiple test scores to improve the resolution1–3. -
Open Data for Differential Network Analysis in Glioma
International Journal of Molecular Sciences Article Open Data for Differential Network Analysis in Glioma , Claire Jean-Quartier * y , Fleur Jeanquartier y and Andreas Holzinger Holzinger Group HCI-KDD, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Auenbruggerplatz 2/V, 8036 Graz, Austria; [email protected] (F.J.); [email protected] (A.H.) * Correspondence: [email protected] These authors contributed equally to this work. y Received: 27 October 2019; Accepted: 3 January 2020; Published: 15 January 2020 Abstract: The complexity of cancer diseases demands bioinformatic techniques and translational research based on big data and personalized medicine. Open data enables researchers to accelerate cancer studies, save resources and foster collaboration. Several tools and programming approaches are available for analyzing data, including annotation, clustering, comparison and extrapolation, merging, enrichment, functional association and statistics. We exploit openly available data via cancer gene expression analysis, we apply refinement as well as enrichment analysis via gene ontology and conclude with graph-based visualization of involved protein interaction networks as a basis for signaling. The different databases allowed for the construction of huge networks or specified ones consisting of high-confidence interactions only. Several genes associated to glioma were isolated via a network analysis from top hub nodes as well as from an outlier analysis. The latter approach highlights a mitogen-activated protein kinase next to a member of histondeacetylases and a protein phosphatase as genes uncommonly associated with glioma. Cluster analysis from top hub nodes lists several identified glioma-associated gene products to function within protein complexes, including epidermal growth factors as well as cell cycle proteins or RAS proto-oncogenes.