Evolution of Y Chromosome Ampliconic Genes in Great Apes

The Pennsylvania State University The Graduate School EVOLUTION OF Y CHROMOSOME AMPLICONIC GENES IN GREAT APES A Dissertation in Bioinformatics and Genomics by Rahulsimham Vegesna © 2020 Rahulsimham Vegesna Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy May 2020 The dissertation of Rahulsimham Vegesna was reviewed and approved by the following: Paul Medvedev Associate Professor of Computer Science & Engineering Associate Professor of Biochemistry & Molecular Biology Dissertation Co-Adviser Co-Chair of Committee Kateryna D. Makova Pentz Professor of Biology Dissertation Co-Adviser Co-Chair of Committee Michael DeGiorgio Associate Professor of Biology and Statistics Wansheng Liu Professor of Animal Genomics George H. Perry Chair, Intercollege Graduate Degree Program in Bioinformatics and Genomics Associate Professor of Anthropology and Biology ii ABSTRACT In addition to the sex-determining gene SRY and several other single-copy genes, the human Y chromosome harbors nine multi-copy gene families which are expressed exclusively in testis. In humans, these gene families are important for spermatogenesis and their loss is observed in patients suffering from infertility. However, only five of the nine ampliconic gene families are found across great apes, while others are missing or pseudogenized in some species. My research goal is to understand the evolution of the Y ampliconic gene families in humans and in non-human great ape species. The specific objectives I addressed in this dissertation are 1. To test whether Y ampliconic gene expression levels depend on their copy number and whether there is a gene dosage compensation to counteract the ampliconic gene copy number variation observed in humans. For the nine ampliconic gene families found in humans, the copy number and expression levels were estimated in 149 men. Among the Y ampliconic gene families, higher copy number leads to higher expression. Within the Y ampliconic gene families, copy number does not influence gene expression, rather a high tolerance for variation in gene expression was observed in testis of presumably healthy men. We also found that expression of five Y ampliconic gene families is coordinated with that of their non-Y (i.e. X or autosomal) homologs. Indeed, five ampliconic gene families had consistently lower expression levels when compared to their non-Y homologs suggesting dosage regulation, while the HSFY family had higher expression levels than its X homolog and thus lacked dosage regulation. 2. To test whether the Y ampliconic gene copy number and gene expression levels are conserved across great apes. For the ampliconic gene families found in great apes, the copy number and expression levels were estimated in independent datasets ranging from two to 14 samples per species. Our results indicate high variability in gene family size but conservation in gene expression levels in Y ampliconic gene families. This relationship was similar to what was observed in humans. However, for three gene families, size was positively correlated with gene expression levels across species, suggesting that, given sufficient evolutionary time, copy number influences gene expression on the Y chromosome. 3. To study the dynamics of gene (and gene family) loss and gain in great ape Y chromosomes. Given the assemblies and alignments of great ape Y chromosomes, we determined the gene content on the Y chromosome of bonobo and orangutan. We then reconstructed the evolutionary history of gene content across great apes to observe that there was an increased rate of loss of genes in Pan genus (bonobo and chimpanzee) when compared to other great apes. The human palindromes P6 and P7 which are void of known ampliconic genes are conserved across great apes. The potential reason for their conservation is presence of possible gene expression regulators and not genes on these palindromes. The results of this dissertation significantly advance our understanding of Y chromosome evolution in great apes. They provide an overview of variation in gene copy number and expression levels of these highly similar gene families which have been a challenge to study previously. Table of Contents LIST OF TABLES ......................................................................................................... viii LIST OF FIGURES ......................................................................................................... ix ACKNOWLEDGMENTS ............................................................................................... xiii Chapter 1 ....................................................................................................................... 1 Introduction .................................................................................................................... 1 References ................................................................................................................. 3 Chapter 2 ....................................................................................................................... 7 Dosage regulation, and variation in gene expression and copy number at human Y chromosome ampliconic genes ...................................................................................... 7 Abstract ...................................................................................................................... 7 Introduction ................................................................................................................. 8 Results .......................................................................................................................11 AmpliCoNE: Ampliconic Copy Number Estimator ...................................................11 Y ampliconic gene copy number estimates .............................................................13 Y ampliconic gene families with low copy number in humans are frequently deleted in non-human great apes ........................................................................................14 Y ampliconic gene expression ................................................................................15 More copious gene families have higher gene expression levels ............................15 Within a family, copy number and gene expression are not correlated ...................16 Y haplogroups and ampliconic gene families ..........................................................17 The role of age in ampliconic gene expression .......................................................20 Ampliconic gene dosage regulation ........................................................................20 Discussion .................................................................................................................26 Variability in Y ampliconic gene copy number .........................................................27 Variability in Y ampliconic gene expression ............................................................28 Dosage regulation of Y ampliconic genes ...............................................................30 Materials and Methods ...............................................................................................34 AmpliCoNE: Ampliconic Copy Number Estimator ...................................................34 Simulation-based validation of AmpliCoNE .............................................................36 Datasets .................................................................................................................36 Pipeline for human WGS analysis ..........................................................................37 Experimental validation with droplet digital PCR (ddPCR) ......................................37 iv Estimating gene expression levels ..........................................................................38 Human Y haplogroup determination .......................................................................38 Code availability .....................................................................................................39 References ................................................................................................................39 Chapter 3 ......................................................................................................................47 Ampliconic genes on the great ape Y chromosomes: Rapid evolution of copy number but conservation of expression levels ..................................................................................47 Abstract .....................................................................................................................47 Introduction ................................................................................................................48 Results .......................................................................................................................52 Dynamic evolution of Y ampliconic gene copy number ...........................................52 Conservation of Y ampliconic gene expression in great apes .................................60 The relationship between copy number and gene expression levels ......................62 Y ampliconic gene copy number variation and phenotypes related to sperm competition .............................................................................................................64 Discussion .................................................................................................................65

Evolution of Y Chromosome Ampliconic Genes in Great Apes

Prenatal Diagnosis of Sex Chromosome Mosaicism with Two Marker Chromosomes in Three Cell Lines and a Review of the Literature

Network Medicine Approach for Analysis of Alzheimer's Disease Gene Expression Data

The Role of Chromosome X in Intraocular Pressure Variation and Sex-Specific Effects

Ageing-Associated Changes in DNA Methylation in X and Y Chromosomes

The Origin and Evolution of Human Ampliconic Gene Families and Ampliconic Structure

Discovery of Candidate Genes for Stallion Fertility from the Horse Y Chromosome

Supplementary Table 1: Adhesion Genes Data Set

The Role of the X Chromosome in Embryonic and Postnatal Growth

Identification of Potential Key Genes and Pathway Linked with Sporadic Creutzfeldt-Jakob Disease Based on Integrated Bioinformatics Analyses

Whole Exome Sequencing in Families at High Risk for Hodgkin Lymphoma: Identification of a Predisposing Mutation in the KDR Gene

Sequence Analysis in Bos Taurus Reveals Pervasiveness of X–Y Arms Races in Mammalian Lineages

Soft Computing in Bioinformatics Outline