Accurate Methods for Ancestry and Relatedness Inference

Total Page:16

File Type:pdf, Size:1020Kb

Accurate Methods for Ancestry and Relatedness Inference ACCURATE METHODS FOR ANCESTRY AND RELATEDNESS INFERENCE A DISSERTATION SUBMITTED TO THE PROGRAM IN BIOMEDICAL INFORMATICS AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Jesse M. Rodriguez December 2013 © 2013 by Jesse M. Rodriguez. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/ This dissertation is online at: http://purl.stanford.edu/cn371vd3410 ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Serafim Batzoglou, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Russ Altman I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Carlos Bustamante Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost for Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives. iii Abstract The predisposition to many diseases is strongly influenced by the genome of an in- dividual. However, the association between the genome and most diseases is not fully understood, so there is an ongoing effort to characterize these associations. One way to characterize disease-genome associations is by studying the familial and an- cestral origin of individuals in the context of disease. This kind of study relies on the fact that individuals with shared origins tend to have genomes and phenotypes that are similar to one another. Detailed information regarding familial and ancestral origin is often unknown, however, it can be inferred computationally by examining the genome. Therefore, it is important that we have accurate methods to infer this information in order to facilitate disease-genome associations. In this dissertation, I describe the contributions I have made to accurately inferring the ancestry and relat- edness of individuals based on their genomes. First, I describe my work on Alloy, a method to infer the ancestral origin of segments of the genome based on a factorial HMM. Next, I present Parente, a method to infer which individuals in a group are related to one another by detecting genomic segments that are identical-by-descent (IBD) using an embedded likelihood ratio test. Finally, I present Parente2, an ex- tension of Parente that incorporates linkage disequilibrium information and results in significantly higher accuracy. iv Acknowledgements I owe a great deal of thanks to many people who have supported me through the years of my PhD. To Pavel, Nitin, and Josef, for giving me a place to start on this journey. To Tiffany, Dan, Sarah, George, Sam, Tom, Marc, Andreas, Noah, Alex, David, Eugene, Marina, and Karen for your instruction, advice, discussions, friendship, and memories. To my BMI classmates and members of the Batzoglou lab for you support, all of the fun, and for being great colleagues. To Mary Jeanne, Nancy, and Alex Sandra, for your help and guidance. To David Paik, Mark Musen, and Larry Fagan, for your mentorship and advice. To Russ, David, and Carlos, for being on my committee. To Arend, Cheryl, Noah, Lin, and Roy, for teaching me so much and being so great to work with. To Serafim, for being relentlessly supportive and giving me the chance to have fun with my PhD. To Sivan, for your friendship, bountiful ideas, and hard work. And to Kelly, Audrey, and my family, for your love, patience, and encouragement. v Contents Abstract iv Acknowledgements v 1 Introduction 1 2 Biology, terminology, and technology 4 2.1 Biology and terminology . .4 2.1.1 Technology . .7 3 Relatedness and ancestry 10 3.1 Family genetics . 10 3.1.1 IBD . 11 vi 3.2 Ancestry . 14 4 Alloy 16 4.1 Introduction . 16 4.1.1 Previous work . 17 4.1.2 Overview of methods and results . 18 4.2 Methods . 19 4.2.1 Factorial hidden Markov model . 20 4.2.2 Inference with the forward-backward algorithm . 22 4.2.3 Transition probabilities . 24 4.2.4 Linkage disequilibrium model . 25 4.3 Results . 27 4.3.1 Simulation of admixed individuals and training the background LD models . 27 4.3.2 Evaluating Alloy's accuracy under complex and ancient ad- mixtures. 29 4.3.3 Exploring background LD models. 31 vii 4.3.4 Measuring robustness to inaccuracies in model parameters. 31 4.3.5 Evaluating model accuracy under varying amounts of training data. 33 4.4 Discussion . 34 4.4.1 Robustness to different admixtures models . 34 4.4.2 Learning admixture parameters . 35 4.4.3 Time complexity reduction . 36 4.4.4 Incorporating additional variation . 37 4.4.5 Conclusion . 38 5 Parente 40 5.1 Introduction . 40 5.2 Methods . 44 5.2.1 Likelihood ratio test . 46 5.2.2 Embedded likelihood ratio test . 48 5.2.3 Genotyping-error function . 49 5.2.4 Likelihood-ratio test threshold . 50 viii 5.3 Results . 51 5.3.1 Constructing training and testing datasets. 51 5.3.2 Simulations to evaluate performance. 53 5.3.3 Parente's accuracy and comparison to fastIBD. 54 5.3.4 Training Parente's model and thresholds. 56 5.3.5 Embedded LRT and local thresholds. 57 5.3.6 Accuracy performance characteristics. 63 5.4 Discussion . 64 6 Parente2 67 6.1 Introduction . 67 6.2 Methods . 69 6.2.1 Inner Log-Likelihood Ratio . 72 6.2.2 Outer Log-Likelihood Ratio . 74 6.2.3 Genotyping-error function . 76 6.2.4 Window and window set definitions . 76 ix 6.2.5 Window filter . 77 6.2.6 Decreasing running time with the SpeeDB filter . 78 6.2.7 Facilitating larger window sizes . 79 6.3 Results . 79 6.3.1 Data sets . 79 6.3.2 Simulations . 81 6.3.3 Experimental parameters . 83 6.3.4 Accuracy of Parente2 compared to other methods . 84 6.3.5 Augmented window definitions and window filter yields better performance . 86 6.3.6 ELR yields better performance . 88 6.3.7 Increasing window size increases performance . 89 6.3.8 Parente2 is robust to window set size . 89 6.4 Discussion . 90 6.4.1 Speed and accuracy tradeoff . 90 6.4.2 Amount of training data required . 91 x 6.4.3 When no training data is available . 93 6.4.4 Recommended settings for Parente2 ............. 93 6.4.5 Applicability DNA sequencing studies . 94 7 Conclusions and Future Directions 97 7.1 Potential impact . 97 7.2 Future directions . 98 Bibliography 102 xi List of Tables 3.1 Expected number and size of IBD segments based on relationship . 13 5.1 Normality of window scores as a function of window size. 65 6.1 Example of tiling method used to break up latent IBD. In this example, 6 source individuals used to create 3 composite individuals, each having 9 genomic segments (eg assuming a chromosome of length 1.8 cM with a segment size of 0.2 cM). Each entry in the table contains the index of the source individual used for the jth genomic segment of the ith composite individual. 81 6.2 Pairwise accuracy of Parente2 and other methods. Table 6.2A shows sensitivity at lower false positive rates and Table 6.2B shows sensitivity at higher false positive rates. fastIBD was run ten times with ten different seeds according to author recommendations. For the GERMLINE-64 and GERMLINE-128 rows, GERMLINE was run on phased data with GERMLINE's seed size set to 64 and 128, respectively. 85 xii 6.3 Positional accuracy of Parente2 and other methods. Accuracy was measured based on the portion of the genome that was in or not in a simulated IBD segment for each pair of individuals. 86 6.4 The accuracy and running time of evaluated IBD inference methods. Each method was used to detect 2 cM IBD segments in 10 trials of the the HapMap data set. The Parente2 entry represents when Parente2 was run using the augmented window set with the window filter. Parente2-SpeeDB is the same but with the applica- tion of the SpeeDB filter. The Parente2-Std. entry represents when Parente2 was run using the standard window set without the win- dow filter and without SpeeDB. fastIBD was run ten times with ten different random seeds according to the authors' recommendations and the sum of the running time all ten runs is reported. GERMLINE-64 and GERMLINE-128 refer to running GERMLINE while using seed sizes of 64 and 128, respectively. The phasing pipeline provided with GERMLINE was used to phase the data prior to running GERMLINE and its running time is included in the reported running time. The number of pairs of individuals processed by each method per second is reported in the Pairs/sec column. 90 6.5 Parente2's accuracy at various thresholds when detecting 2 and 4 cM IBD segments....................... 95 xiii List of Figures 2.1 Linkage disequilibrium . .8 3.1 Shared DNA by familial relationship . 11 4.1 Alloy's factorial HMM. 21 4.2 The state space of the HMM. 25 4.3 Alloy's performance over increasing generations since admixture and comparison to WINPOP.
Recommended publications
  • From Population and Personalized Genomics to Personalized/Precision Medicine Manolis Dermitzakis
    From Population and Personalized Genomics to Personalized/Precision Medicine Manolis Dermitzakis University of Geneva Dept Genetic Medicine and Development University of Geneva Medical School and Swiss Institute of Bioinformatics [email protected] http://funpopgen.unige.ch/ Our “engine” Revolu'on in Medicine • Advances in technology • Deep learning of human biology Complex traits/disease Space& Time& Popula'on)of)cells) Individual)cells) ) GENE)A) Modified from Dermitzakis Nat Genet 2008 HapMap: cataloguing “common” genec variaon HapMap Consortium Nature 2005 1000 genomes: cataloguing “all” genec variaon Genome-Wide association studies (GWAS) Gene expression as a key molecular phenotype – expression QTL (eQTL) analysis 1Mb$ TSS$ 1Mb$ 1Mb$window$ RNAseq$ gene$ SNPs$ cis$ trans$ eQTL$ Molecular"phenotype" AA" AC" CC" Func'onal varia'on to organismal phenotype GENETIC ASSOCIATION IS A CAUSAL LINK 1Mb$ TSS$ 1Mb$ 1Mb$window$ RNAseq$ gene$ SNPs$ Space& cis$ Interpretation of GWAS using molecular QTLs trans$ eQTL$ Time& eQTLs Molecular"phenotype" 1Mb TSS 1Mb AA" AC" CC" 1Mb window Mechanistic RNAseq insights to gene SNPs Genetic variation Methylation Chromatin Background Rationale Genome-wide association studies (GWAS) have identified hundreds of common DNA variants associated with multiple complex diseases and traits. ~90% of GWAS SNPs lie in noncoding regions (e.g. intergenic, introns). Associated (tag) SNP (*) Chr * Genes * Causal mechanism? Causal gene/s? Many studies show trait-associated SNPs enriched for eQTLs LCL eQTLs LCL
    [Show full text]
  • Ppzvn2fxegiagrfn1cznnrmqw
    We gratefully acknowledge the following Authors from the Originating laboratories responsible for obtaining the specimens, as well as the Submitting laboratories where the genome data were generated and shared via GISAID, on which this research is based. All Submitters of data may be contacted directly via www.gisaid.org Authors are sorted alphabetically. Accession ID Originating Laboratory Submitting Laboratory Authors EPI_ISL_1562503 Aegis Sciences Corporation Centers for Disease Control and Prevention Division of Viral Dakota Howard, Dhwani Batra, Peter W. Cook, Kara Moser, Adrian Paskey, Jason Caravas, Benjamin Rambo-Martin, Shatavia Morrison, Christopher Diseases, Pathogen Discovery Gulvick, Scott Sammons, Yvette Unoarumhi, Darlene Wagner, Matthew Schmerer, Cyndi Clark, Patrick Campbell, Rob Case, Vikramsinha Ghorpade, Holly Houdeshell, Ola Kvalvaag, Dillon Nall, Ethan Sanders, Alec Vest, Shaun Westlund, Matthew Hardison, Clinton R. Paden, Duncan MacCannell EPI_ISL_1648208 Quest Diagnostics Incorporated Centers for Disease Control and Prevention Division of Viral Dakota Howard, Dhwani Batra, Peter W. Cook, Kara Moser, Adrian Paskey, Jason Caravas, Benjamin Rambo-Martin, Shatavia Morrison, Christopher Diseases, Pathogen Discovery Gulvick, Scott Sammons, Yvette Unoarumhi, Darlene Wagner, Matthew Schmerer, S. H. Rosenthal, A. Gerasimova, R. M. Kagan, B. Anderson, M. Hua, Y. Liu, L.E. Bernstein, K.E. Livingston, A. Perez, I. A. Shlyakhter, R. V. Rolando, R. Owen, P. Tanpaiboon, F. Lacbawan, Clinton R. Paden, Duncan MacCannell EPI_ISL_1660458
    [Show full text]
  • PRESS RELEASE Geneva | April 29Th, 2019
    PRESS RELEASE Geneva | April 29th, 2019 WARNING: embargoed until May 2nd, 2019, 7pm GMT More than genes themselves, how, where and when they are expressed determine our biological traits – our phenotypes. If gene expression is A model to decipher controlled by many regulatory elements, what, ultimately, controls those? And how does genetic variation affect them? The SysGenetiX project, led the complexity by the University of Geneva (UNIGE) in collaboration with the University of Lausanne (UNIL), Switzerland, precisely aimed to investigate these regu- of gene regulation latory elements, as well as the manifold interactions between them and with genes, with the ultimate goal of understanding the mechanisms that Scientist at the Universities render some people more predisposed to manifesting particular diseases of Geneva and Lausanne, than others. By studying chromatin modifications (i.e. how the genome is “packaged”) in the cells of about 300 individuals, scientists from Geneva Switzerland, designed a and Lausanne not only identified the very structure of these regulatory ele- framework to analyse gene ments, they were also able to model how their interactions throughout the regulation, and offer a model whole genome influence gene regulation and risk of disease. A pioneering to better understand the role approach, to read in Science, that will shape tomorrow’s precision medicine. of the non-coding portion of the genome in disease risk. Emmanouil Dermitzakis, professor at the Department of Genetic medicine and Development at UNIGE Faculty of Medicine, Director of the Health 2030 Genome Centre and leader of the SysGenetiX project is a specialist of the genetic variation of gene regulation.
    [Show full text]
  • SARS-Cov-2 Transmission, Vaccination Rate and the Fate of Resistant Strains
    medRxiv preprint doi: https://doi.org/10.1101/2021.02.08.21251383; this version posted February 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license . SARS-CoV-2 transmission, vaccination rate and the fate of resistant strains 1 2 3* 1* Simon A. Rella ,​ Yuliya A. Kulikova ,​ Emmanouil T. Dermitzakis ,​ Fyodor A. Kondrashov ​ ​ ​ ​ 1 Institute​ of Science and Technology Austria, 1 Am Campus, Klosterneuburg, 3400, Austria 2 Banco​ de España, Calle Alcala 48, 28014 Madrid, Spain 3 Department​ of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland *Corresponding authors, ETD [email protected], FAK ​ ​ [email protected] Vaccines are thought to be the best available solution for controlling the ongoing 1,2 3–6 SARS-CoV-2 pandemic .​ However, the emergence of vaccine-resistant strains ​ may ​ ​ come too rapidly for current vaccine developments to alleviate the health, economic and 7,8 social consequences of the pandemic .​ To quantify and characterize the risk of such a ​ 9,10 scenario, we created a SIR-derived model ​ with initial stochastic dynamics of the ​ vaccine-resistant strain to study the probability of its emergence and establishment. Using parameters realistically resembling SARS-CoV-2 transmission, we model a wave-like pattern of the pandemic and consider the impact of the rate of vaccination and the strength of non-pharmaceutical intervention measures on the probability of emergence of a resistant strain.
    [Show full text]
  • Download Pdf Version
    PRESS Release Geneva August 31st 2012 under strict embargo until Sept 2nd 2012, 19 pm Swiss Time a new light shed Why are some people more likely than others to suffer from diseases such as type 2 diabetes or heart dysfunction? It is partly due to gene- on genetic tics. Several genetic variants are indeed known to increase the risk of developing these common and multifactorial diseases. Among these regulation’s role differences, the majority concerns modifications in the genome that in the affect the level of expression of certain genes. With colleagues from King’s College, Oxford University and the Wellcome Trust Sanger Ins- predisposition to titute, Emmanouil Dermitzakis, Louis-Jeantet Professor at the Faculty of Medicine of the University of Geneva (UNIGE), and his team disco- common diseases vered several thousands variants affecting the expression levels of genes, 358 of which seem to play a key role in the predisposition to certain diseases. This study is being published in the journal Nature An international team co- Genetics. led by Professor Emmanouil Genetic disease risk differences between one individual and another Dermitzakis of the University are based on complex aetiology. Indeed, they may reflect differences of Geneva has discovered in the genes themselves, or else differences at the heart of the regions several thousands new involved in the regulation of these same genes. genetic variants impacting By gene regulation we mean the decision that the cell makes as to gene expression some of when, where and at what level to activate or suppress the expression of a gene. In theory, two people could thus share a gene that is per- which are responsible for fectly identical and yet show differences in their predisposition to a predisposition to common disease due to genetic differences concerning the regulation (overex- diseases, bringing closer to pression or underexpression) of this same gene.
    [Show full text]
  • Genome-Wide Association Studies in Japanese with Special Reference To
    Dec.19 Dec.20 13:00 ~ 13:10 Opening Remarks Session 3 Functional non-coding RNA (2) Hidetoshi Inoko (Chair: Tatsushi Toda) Tokai University, Japan 9:00 ~ 9:35 piRNA/rasiRNA: a novel class of functional non-coding RNAs 14 Mikiko C.Siomi Session 1 Asian Genomic Network Institute for Genome Research University of Tokushima (Chair: Hidetoshi Inoko) 9:35 ~ 10:10 The regulation of microRNA function by RNA editing 15 13:10 ~ 14:10 Lessons learned from studying a single-gene disorder 6 Yukio Kawahara, MD, PhD Lap-Chee Tsui The Wistar Institute The University of Hong Kong 10:10 ~ 10:20 Break 14:10 ~ 14:45 Sunyoung Kim 7 Seoul National University, Korea Session 4 Genome Wide Association Study (GWAS) (1) 14:45 ~ 15:20 Building Asia Pacific R & D Highway for Genomic Medicine and Clinical (Chair: Itsuro Inoue) Development 8 Recent Discoveries and New Challenges Ken-ichi Arai 10:20 ~ 10:55 16 Asia-Pacific IMBN; University of Tokyo; SBI Biotech Co., Ltd. Augustine Kong deCODE Genetics, Sturlugata 8, IS-101 Reykjavik, Iceland 15:20 ~ 15:40 Break 10:55 ~ 11:30 Josephine Hoh, Ph.D. 17 Yale University Session 2 Functional non-coding RNA (1) (Chair: Hiroyuki Mano) 11:30 ~ 11:55 Genome-wide association studies in Japanese with special reference to narcolepsy 19 Katsushi Tokunaga 15:40 ~ 16:15 Evolution of microRNAs 9 University of Tokyo Eugene Berezikov Hubrecht Institute, Utrecht, The Netherlands 11:55 ~ 13:00 Lunch 16:15 ~ 16:50 Prediction of microRNA targets 10 Session 5 Genome Wide Association Study (GWAS) (2) Nikolaus Rajewsky Max Delbruck Center for Molecular Medicine, Germany (Chair: Katsushi Tokunaga) 13:00 ~ 13:35 Sweet dreams: finding genes for diabetes and obesity 20 16:50 ~ 17:25 microRNAs in Human Cancer 11 Carlo M.
    [Show full text]
  • Combined Genetic and Transcriptome Analysis of Patients With
    Systemic lupus erythematosus Ann Rheum Dis: first published as 10.1136/annrheumdis-2018-214379 on 5 June 2019. Downloaded from TRANSLATIONAL SCIENCE Combined genetic and transcriptome analysis of patients with SLE: distinct, targetable signatures for susceptibility and severity Nikolaos I Panousis, 1,2,3 George K Bertsias, 4,5 Halit Ongen,1,2,3 Irini Gergianaki,4,5 Maria G Tektonidou, 6,7 Maria Trachana,8 Luciana Romano-Palumbo,1 Deborah Bielser,1 Cedric Howald,1,2,3 Cristina Pamfil,9 Antonis Fanouriakis, 10 Despoina Kosmara,4,5 Argyro Repa,4 Prodromos Sidiropoulos,4,5 Emmanouil T Dermitzakis,1,2,3,11 Dimitrios T Boumpas5,7,10,11,12 Handling editor Josef S ABSTRact Key messages Smolen Objectives Systemic lupus erythematosus (SLE) diagnosis and treatment remain empirical and the ► Additional material is What is already known about this subject? published online only. To view molecular basis for its heterogeneity elusive. We explored ► Previous DNA microarray gene expression please visit the journal online the genomic basis for disease susceptibility and severity. studies have identified gene signatures (http:// dx. doi. org/ 10. 1136/ Methods mRNA sequencing and genotyping in blood involved in systemic lupus erythematosus (SLE) annrheumdis- 2018- 214379). from 142 patients with SLE and 58 healthy volunteers. such as those linked to granulocytes, pattern Abundances of cell types were assessed by CIBERSORT For numbered affiliations see recognition receptors, type I interferon and and cell-specific effects by interaction terms in linear end of article. plasmablasts. models. Differentially expressed genes (DEGs) were used Correspondence to to train classifiers (linear discriminant analysis) of SLE What does this study add? Professor Emmanouil T versus healthy individuals in 80% of the dataset and A more comprehensive profiling of the ‘genomic Dermitzakis, Department of were validated in the remaining 20% running 1000 ► Genetic Medicine and architecture’ of SLE by combining genetic and iterations.
    [Show full text]
  • (Gtex) Pilot Analysis: Multitissue Gene Regulation in Humans
    The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation GTEx Consortium. "The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans." Science 348, 6235 (May 2015): 648-660 © 2015 American Association for the Advancement of Science As Published http://dx.doi.org/10.1126/SCIENCE.1262110 Publisher American Association for the Advancement of Science (AAAS) Version Author's final manuscript Citable link https://hdl.handle.net/1721.1/121352 Terms of Use Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. HHS Public Access Author manuscript Author Manuscript Author ManuscriptScience. Author Manuscript Author manuscript; Author Manuscript available in PMC 2015 August 24. Published in final edited form as: Science. 2015 May 8; 348(6235): 648–660. doi:10.1126/science.1262110. The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans GTEx Consortium†,* Abstract Understanding the functional consequences of genetic variation, and how it affects complex human disease and quantitative traits, remains a critical challenge for biomedicine. We present an analysis of RNA sequencing data from 1641 samples across 43 tissues from 175 individuals, generated as part of the pilot phase of the Genotype-Tissue Expression (GTEx) project. We describe the landscape of gene expression across tissues, catalog thousands of tissue-specific and shared regulatory expression quantitative trait loci (eQTL) variants, describe complex network relationships, and identify signals from genome-wide association studies explained by eQTLs.
    [Show full text]
  • Mapping with Expression DOI: 10.1038/Nrg2214 in Recent Years, Several Studies Have of a Novel Candidate Gene, Vanin 1 270 Participants in the Hapmap
    RESEA r CH HIGHLIGHTS H U M an genetics Mapping with expression DOI: 10.1038/nrg2214 In recent years, several studies have of a novel candidate gene, vanin 1 270 participants in the HapMap. made inroads into the genetic control (VNN1) for heart disease. Their results show not only a large of gene expression, showing that Although the study by Göring number of cis associations of gene transcript abundance for individual et al. examines a large number of expression with particular SNPs, genes can be treated as heritable individuals, the use of a relatively but also confirm previous findings quantitative traits. Although this small set of microsatellite markers on significant differences between work promises to improve our limits the mapping resolution populations. Finally, the density of understanding of gene regulation, the of the individual eQTLs. By con- genotyped SNPs allowed the authors published work has been limited by trast, the two other new studies to map many of the associations to relatively small sample sizes, incom- use genome-wide SNP association. regions very close to the transcrip- plete coverage of the genome, and the In the study by William Cookson tional start sites of genes, in regions use of transformed cells. Three new and colleagues, transcript levels of highly conserved sequence. papers address these shortcomings from more than 20,000 genes in All told, these studies provide the in a complementary fashion, and lymphoblastoid cell lines were most comprehensive view yet of provide a more complete picture of associated with more than 400,000 the abundant cis-regulatory varia- so-called ‘expression QTLs’ (eQTLs).
    [Show full text]
  • Exploring Disease Predisposition to Deliver Personalized Medicine 23 October 2017
    Exploring disease predisposition to deliver personalized medicine 23 October 2017 variants. Data made available in the past 7 years helped scientists around the world make tremendous progress in the analysis of tissue- specific genome variation and predisposition to disease. Examining many different types of human tissues sampled on hundreds of people gives new insights into how genomic variants - inherited spelling changes in the DNA code - can control how, when and how many genes are turned on and off in different tissues, and increase individuals' risk to a Credit: CC0 Public Domain wide range of diseases. One of the main findings of the GTEx consortium is that the same variant can often have a different effect depending upon the tissue in which it is present. For example, a variant Geneticists from the University of Geneva have that affects the activity of two genes associated taken an important step towards true predictive with blood pressure had a stronger impact in the medicine. Exploring the links between diseases tibial artery - even though there was greater gene and tissue-specific gene activity, they have been activity in other tissues. able to build a model that constitutes a first step towards the identification of specific sequences in Unravelling the pathogenicity of non-coding the non-coding genome signalling their genome variants pathogenicity in the context of a specific disease. In a second study, they went even further by To gauge how variants affect gene activity, associating particular disease risks - including researchers perform eQTL analysis. An eQTL - or schizophrenia, cardiovascular disease and expression quantitative trait locus - is an diabetes - to the variability of genome activity in association between a variant at a specific genomic various cell types, with surprising results.
    [Show full text]
  • Divisions of Genetics and Rheumatology Department Of
    MARÍA GUTIÉRREZ-ARCELUS 77 Avenue Louis Pasteur, Suite 255 Divisions of Genetics and Rheumatology Boston, MA, 02115, USA Department of Medicine +1 617 525 4056 Brigham and Women’s Hospital [email protected] Harvard Medical School, Broad Institute EDUCATION 2009-2014 University of Geneva, Geneva, Switzerland PhD in Bioinformatics • Highest Mention “Très bien” • Thesis title: Mechanisms and Tissue Specificity of the Genetic and Epigenetic Variation in Gene Regulation • Advisor: Emmanouil T. Dermitzakis, PhD 2005-2009 National Autonomous University of Mexico (UNAM), Cuernavaca, Mexico Bachelor of Science in Genomic Sciences • Honorific Mention • Top 3 in class diplomas in four consecutive years • GPA 9.7/10 ACADEMIC APPOINTMENTS July 2019 - Harvard Medical School, Brigham and Women’s Hospital, Boston, USA present Instructor of Medicine May 2019 - Single Cell Genomics Core, Brigham and Women’s Hospital, Boston USA present Director of Computational Biology GRANTS/FELLOWSHIPS 2014 - 2015 Early Postdoc Mobility Fellowship Granted by: Swiss National Science Foundation $71,850 for 18 months 2009 - 2011 PhD Fellowship Granted by: National Center for Competence in Research, Swiss National Science Foundation 2 years of PhD student salary RESEARCH EXPERIENCE July 2014- Immunogenomics - Harvard Medical School, Brigham and Women’s Hospital: Boston, USA present Postdoctoral Research Fellow Mentored by Dr. Soumya Raychaudhuri • Applied machine learning tools to low-input and single-cell RNA-seq datasets to find new functional features of innate and adaptive human T cells • Developed a statistical modeling framework and used high-depth RNA-seq to identify dynamic allele-specific expression during T cell activation across the genome • Developed a new bioinformatics algorithm to robustly quantify expression in highly polymorphic HLA genes, revealing regulatory variant associated with autoimmunity 1 July 2010- Functional Population Genomics - University of Geneva: Geneva, Switzerland Feb 2014 Graduate Research Scientist Mentored by Dr.
    [Show full text]
  • PDF Download
    Emmanouil Dermitzakis uses next-generation sequencing to understand the function of human genome variants. Cellular systems genetics in humans (SysGenetiX) Turning to natural mutations to understand gene regulation Our genes determine many aspects of who we are. But it’s not just the genes themselves that play a role in establishing our biological traits. If and how they are expressed, which is crucially controlled by regulatory elements, is also a major factor. The goal of the SysGenetiX project is to closely investigate these regulatory elements, as well as their manifold interactions with genes. The findings may in future contribute to advancing our understanding of why some people are more predisposed to contracting particular diseases than others. It makes sense that not all of our genes are expressed at all times tage, mutations will be passed on, resulting in different variants throughout our bodies. For example, during embryo development, of these DNA segments. These variants can either be completely genes have to be turned on and off at different times according to neutral, have a positive effect, or entail drawbacks. That being the a fixed pattern to result in the formation of distinct body parts and case, some of the mutations are being linked to predisposition to tissues. Even within the fully developed body, the cells of different certain diseases. tissues express different sets of genes so that these tissues can “Until now, this relationship has only been shown in terms of a carry out their specialized functions. Regulatory elements make statistical correlation,” explains Emmanouil Dermitzakis, Professor sure that the right genes are expressed at the right time in re- of Genetics in the Department of Genetic Medicine and Develop- sponse to specific stimuli.
    [Show full text]