UNIVERSITY of CALIFORNIA SAN DIEGO Identifying And

UNIVERSITY OF CALIFORNIA SAN DIEGO Identifying and characterizing de novo tandem repeat mutations and their contribution to autism spectrum disorders A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Bioinformatics and Systems Biology by Ileena Mitra Committee in charge: Professor Melissa Gymrek, Chair Professor Vineet Bafna Professor Joseph Gleeson Professor Abraham Palmer Professor Jonathan Sebat 2021 Copyright Ileena Mitra, 2021 All rights reserved. The dissertation of Ileena Mitra is approved, and it is acceptable in quality and form for publication on microfilm and electronically: Chair University of California San Diego 2021 iii DEDICATION To my parents, for your unconditional love and support. your dreams are mine, and my success is yours. iv EPIGRAPH When I had all the answers, the questions changed. Paulo Coelho v TABLE OF CONTENTS Signature Page................................................………................................................................................. iii Dedication.................................................................................................................................................... iv Epigraph….................................................................................................................................................... v Table of Contents......................................................................................................................................... vi List of Figures .............................................................................................................................................. ix List of Tables................................................................................................................................................. x Acknowledgements...................................................................................................................................... xi Vita.............................................................................................................................................................. xii Abstract of the Dissertation ....................................................................................................................... xiii CHAPTER 1: INTRODUCTION .................................................................................................................. 1 1.1 Tandem Repeats ..................................................................................................................... 1 1.2 Genetic architecture of Autism Spectrum Disorders .............................................................. 5 1.3 Outline .................................................................................................................................... 6 CHAPTER 2: IDENTIFYING TANDEM REPEAT MUTATIONS ............................................................ 7 2.1 Introduction ............................................................................................................................ 7 2.2 MonSTR model and method details ....................................................................................... 8 2.2.1 MonSTR statistical model............................................................................................... 8 2.2.2 MonSTR statistical model for chromosome X ............................................................. 10 2.2.3 Naïve mutation model ................................................................................................... 10 2.2.4 Inferring parent of origin .............................................................................................. 11 2.2.5 MonSTR implementation details .................................................................................. 12 2.2.6 MonSTR limitations and relationship to other tools ..................................................... 13 2.3 TR genotyping and de novo mutation discovery pipeline.................................................... 14 vi 2.3.1 Simplex autism family WGS data................................................................................. 14 2.3.2 Bioinformatics pipeline architecture ............................................................................. 15 2.3.3 Genome-wide TR genotyping ....................................................................................... 16 2.3.4 Identifying de novo TR mutations ................................................................................ 17 2.4 Evaluation of mutations with capillary electrophoresis (CE) fragment analysis ................. 19 2.5 Statistical validation of pipeline ........................................................................................... 22 2.5.1 Evaluating MonSTR on simulated WGS data .............................................................. 22 2.5.2 Comparison to previously reported mutation rates ....................................................... 25 2.6 Acknowledgements .............................................................................................................. 28 CHAPTER 3: PATTERNS OF TANDEM REPEAT MUTATIONS IN THE GENERAL POPULATION ..................................................................................................................................................................... 29 3.1 Introduction .......................................................................................................................... 29 3.2 Methods ................................................................................................................................ 30 3.2.1 Determinants of TR mutation rates ............................................................................... 30 3.2.2 Analysis of mutation directionality bias ....................................................................... 31 3.3 Genome-wide patterns of de novo TR mutations ................................................................. 31 3.3.1 Expansions versus contractions TR mutations ............................................................. 32 3.3.2 TR mutations show distinct patterns based on parent of origin .................................... 35 3.3.3 Allele directionality bias ............................................................................................... 38 3.3.4 Influence of local genomic and epigenomic features on TR mutation rates................. 40 3.4 Acknowledgements .............................................................................................................. 42 CHAPTER 4: PATTERNS OF DE NOVO TANDEM REPEAT MUTATIONS IN AUTISM ................ 43 4.1 Introduction .......................................................................................................................... 43 4.2 Methods ................................................................................................................................ 44 4.2.1 Mutation burden statistical testing ................................................................................ 44 vii 4.2.2 Enrichment of common variant risk ............................................................................. 45 4.2.3 Gene-set enrichment analysis using MAGMA ............................................................. 45 4.2.4 Gene expression analysis .............................................................................................. 46 4.3 TR mutation burden in ASD ................................................................................................ 47 4.4 Common variant enrichment analyses ................................................................................. 54 4.5 Proband mutations predicted to alter gene expression ......................................................... 56 4.6 Acknowledgements .............................................................................................................. 59 CHAPTER 5: PRIORITIZING DISEASE RELEVANT TR MUTATIONS IN AUTISM ........................ 60 5.1 Introduction .......................................................................................................................... 60 5.2 Inferring selection coefficients at TRs using SISTR ............................................................ 61 5.3 Prioritizing pathogenic TR mutations .................................................................................. 62 5.4 Acknowledgements .............................................................................................................. 64 CHAPTER 6: CONCLUSION .................................................................................................................... 65 6.1 Conclusion ........................................................................................................................... 65 6.2 Acknowledgements ............................................................................................................. 66 References................................................................................................................................................... 68 Appendix..................................................................................................................................................... 74 viii LIST OF FIGURES Figure 1: Study Design. ............................................................................................................................... 16 Figure

UNIVERSITY of CALIFORNIA SAN DIEGO Identifying And

Title a New Centrosomal Protein Regulates Neurogenesis By

Supplemental Information Proximity Interactions Among Centrosome

Supplemental Information

WO 2019/079361 Al 25 April 2019 (25.04.2019) W 1P O PCT

Supplementary Table S2

Behavioral and Other Phenotypes in a Cytoplasmic Dynein Light Intermediate Chain 1 Mutant Mouse

Metastatic Adrenocortical Carcinoma Displays Higher Mutation Rate and Tumor Heterogeneity Than Primary Tumors

Literature Mining Sustains and Enhances Knowledge Discovery from Omic Studies

Microtubule Regulation in Cystic Fibrosis Pathophysiology

Systematic Analysis and Biomarker Study for Alzheimer's Disease

Protein-Protein Interaction Network Alignment and Evolution

Nº Ref Uniprot Proteína Péptidos Identificados Por MS/MS 1 P01024