Evolution of genetic and regulatory sex differences in mammals

By

Sahin Naqvi

A.B. Molecular Biology Princeton University, 2012

Submitted to the Department of Biology In Partial Fulfillment of the Requirements for the Degree of

DOCTOR OF PHILOSOPHY

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

JUNE 2019

© Sahin Naqvi. All rights reserved.

The author hereby grants to MIT permission to reproduce and to distribute publically paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created.

Signature of author:………………………………………………………………………………… Department of Biology April 4, 2019

Certified by:……………………………………………………………………………………….. David C. Page Professor of Biology Thesis Advisor

Accepted by:……………………………………………………………………………………….. Amy E. Keating Professor of Biology Co-Chair, Biology Graduate Committee

1

2 Evolution of genetic and gene regulatory sex differences in mammals

by

Sahin Naqvi

Submitted to the Department of Biology on April 4, 2019 in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Biology

Abstract

Sex differences are widespread in mammalian health, development, and disease. Ultimately, sex differences derive from the sex ; males are XY and females are XX, but the mammalian X and Y chromosomes evolved from an ancestral pair of ordinary autosomes. These genetic sex differences, through a variety of regulatory mechanisms, give rise to sex differences in gene expression across the genome, which in turn result in the observed phenotypic differences between males and females. In this thesis, I take an evolutionary perspective on this pathway, using computational analysis of both publically available and newly generated data to provide insight into the molecular basis of mammalian sex differences.

First, to better understand the selective forces underlying the evolution of the amniote sex chromosomes from ordinary autosomes, we reconstructed gene-by-gene dosage sensitivities on the ancestral autosomes through phylogenetic analysis of microRNA target sites, finding that preexisting heterogeneities in dosage sensitivity shaped the evolution of both the mammalian XY and avian ZW sex chromosomes. Second, to understand the extent to which genome-wide sex differences are conserved across both tissues and species, we conducted a five-species, twelve- tissue survey of sex differences in gene expression, finding that most sex bias in gene expression has arisen during since the last common ancestor of boroeutherian mammals, and that evolutionary gains or losses of regulation by sex-biased factors likely drove a significant fraction of lineage-specific changes in sex bias. Third, we used the results of this survey to show that conserved sex bias in gene expression contributes to the male bias in height and body size observed in a range of mammalian species, including humans. Together, these studies suggest that dosage sensitivity played a key role in both the evolution of mammalian sex chromosomes and their contribution to phenotypic sex differences, as well revealing the widespread nature and phenotypic impact of sex differences in gene expression across the genome.

Thesis Supervisor: David C. Page Title: Professor of Biology

3 Acknowledgements

I would like to thank my thesis advisor, David Page, for his support and mentorship. I am fortunate to have chosen a thesis advisor who constantly has my best interests in mind. While I am sure my interactions with David over the years have impacted me in ways that I don’t even realize yet, working with him has fundamentally shaped how I communicate scientific ideas and has shown me that one must take risks to continue moving forwards. He has also instilled in me a deep appreciation for the use of the comma.

My thesis committee members, Dave Bartel and Peter Reddien, have provided me with invaluable advice, both related to science and beyond, throughout graduate school. I thank both present and past members of the Bartel lab for tolerating many unannounced visits and fielding my relatively uninformed questions about microRNAs (Vikram Agarwal, Sean McGeary, Jeff Morgan, Stephen Eichhorn), and for being valuable collaborators in a more official capacity (Kathy Lin). A special thanks to my outside committee member Christine Diesteche, who very graciously agreed to travel to Boston from Seattle to attend my thesis defense.

This work would have not been possible without the assistance, advice, and friendship of the members of the Page lab, both past and present. I am grateful to Susan Tocio and Jorge Adarme, who do an incredible job of keeping the lab running efficiently. Between the two of them, Winston Bellott and Alex Godfrey have likely read every formal document I have written during my time in the lab. As both unofficial and official collaborators, their generosity with their time, critical feedback and creative ideas has contributed immensely to this work. Jenn Hughes has played an invaluable role in shaping the direction of the lab and helping with strategic and logistical planning of projects and publications. I thank Pete Nicholls for being the only other lab member to understand cricket, and for also introducing me to the intricacies of early germ cell development through a very enjoyable collaboration. Mary Goodheart volunteered many hours to ensure that my incompetency in handling rodents did not derail my thesis work too drastically. I would also like to thank the entire Sex Differences Subgroup for providing a stimulating intellectual environment and a diverse set of perspectives for me to learn from.

I couldn’t have asked for a better group of friends, both at and outside of MIT, who have offered both support and distraction from graduate school when needed.

I would like to thank Leah Dodell for bringing the best out in me.

Above all, I thank my family for constantly believing in my abilities, even when I doubted them the most, and for always encouraging me to pursue my interests, no matter where they might take me. I thank my parents for all that they have sacrificed for my education, and my sisters for making sure that I never took myself too seriously.

4 Table of contents

Abstract ...... 3 Acknowledgements ...... 4 Chapter 1. Introduction ...... 7 Part 1. The evolution of the mammalian and avian sex chromosomes ...... 9 The sex-specific as a degenerating autosome ...... 10 Dosage compensation of the sex-shared chromosome ...... 14 Exceptions to the rules: Gene survival on the sex-specific chromosome a lack of dosage compensation on the sex-shared chromosome ...... 19 Part 2. Phenotypic and physiological sex differences in mammals ...... 23 Reproductive tract ...... 23 Height and body size ...... 24 Immune system ...... 26 Cardiovascular system ...... 27 Metabolism ...... 28 Part 3. Sex-biased gene expression as an intermediary from sex chromosomes to phenotypic sex differences ...... 30 Prior studies of sex bias in autosomal gene expression ...... 33 Upstream causes of sex-biased gene expression: sexually dimorphic hormonal environments ...... 36 Upstream causes of sex-biased gene expression: sex chromosome complement outside the reproductive tract ...... 41 Linking sex-biased gene expression to phenotypic sex differences ...... 44 Evolutionary causes of sex-biased gene expression: sexual conflict and sexually antagonistic selection ...... 46 Part 4. Concluding remarks ...... 47 References ...... 49 Chapter 2. Conserved microRNA targeting reveals preexisting heterogeneities in dosage sensitivity that shaped amniote sex chromosome evolution ...... 64 Summary ...... 66 Introduction ...... 67 Results ...... 69 Analysis of human copy number variation indicates conserved microRNA targeting of sensitive to dosage increases ...... 69 X-Y pairs and X-inactivated genes have higher miRNA conservation scores than X escape genes ...... 73 Heterogeneities in X-linked miRNA targeting were present on the ancestral autosomes .... 81 Z-W pairs have higher miRNA conservation scores than other ancestral Z-linked genes ... 86 Heterogeneities in Z-linked miRNA targeting were present on the ancestral autosomes .... 92 Analyses of experimental datasets validate miRNA target site function ...... 93 Discussion ...... 98 Methods...... 111 References ...... 117

5 Chapter 3. Evolutionary dynamics of sex-biased gene expression in mammalian tissues . 123 Summary ...... 124 Introduction ...... 124 Results ...... 126 A five-species, 12-tissue survey of sex differences in gene expression ...... 126 Conserved sex-biased gene expression exists across the body ...... 135 Most sex bias in gene expression has arisen since the last common ancestor of boroeutherian mammals ...... 144 Sex-biased gene expression is associated with reduced selective constraint ...... 148 Evolutionary turnover of motifs for sex-biased transcription factors reflects lineage-specific changes in sex bias ...... 151 Discussion ...... 159 Methods...... 162 References ...... 175 Chapter 4. Conserved sex bias in autosomal gene expression contributes to sex differences in height ...... 181 Summary ...... 182 Introduction ...... 182 Results ...... 186 Discussion ...... 198 Methods...... 200 References ...... 203 Chapter 5. Conclusions and future directions ...... 207 The contribution of the mammalian sex chromosomes to sex differences ...... 208 An omnigenic model for sex differences in complex traits ...... 209 Sex bias in mRNA splicing ...... 211 From tissues to cell-types to single cells ...... 212 References ...... 214

6

Chapter 1. Introduction

I thank Winston Bellott and Leah Dodell for comments on this chapter.

7 The existence of two sexes has fascinated biologists for centuries. In humans and other mammals, sex manifests in two major ways: one, phenotypic sexual dimorphism – put simply, males and females “look different” – and two, a highly differentiated, or heteromorphic, pair of sex chromosomes – males are XY, and females are XX. Furthermore, this phenotypic sexual dimorphism is determined by the sex chromosomes. While these observations may seem unremarkable to many, they actually represent the outcome of a deep and wildly unpredictable evolutionary history.

During animal evolution, phenotypic sexual dimorphism as a phenomenon likely arose multiple times. For example, one survey of passerine birds suggested that the transition from monomorphic to dimorphic coloring occurred at least 150 times (Price & Birch, 1996). Across species, sexual dimorphism is often, but not always, found together with anisogamy, which refers to males and females producing differently sized gametes (i.e. egg and sperm). Sex chromosomes were a relatively late addition, arising soon after the common ancestor of mammals and birds (Nanda et al., 1999; Ross et al., 2005). Furthermore, while mammals have an

XX/XY sex chromosome system, in birds and some reptiles, females are ZW and males are ZZ indicating that sex chromosomes also arose more than once during evolution. Thus, two evolutionary innovations have combined to create the current situation in humans other mammals, with distinct sexual dimorphism and an XX/XY sex chromosome pair. This thesis focuses on understanding how these two linked but separable facets of animal biology have come to be, with a focus on the mammalian lineage. In the Part 1 of this introduction, I will review the evolutionary history of the mammalian and avian sex chromosomes, the ultimate determinants of all the phenotypic sex differences observed. In Part 2, I will describe some of what is known about mammalian phenotypic sex differences, in both the healthy and diseased states. In Part 3, I

8 will introduce a conceptual model linking these two ends of the pathway, hypothesizing how the sex chromosomes ultimately result in phenotypic sexual dimorphism. Central to this model are sex differences in the expression of genes across the genome.

Part 1. The evolution of the mammalian and avian sex chromosomes

As mentioned above, sex chromosomes are a relatively recent innovation in the evolutionary history of sex – the X and Y chromosomes arose from a pair of ordinary autosomes after the divergence of mammals and birds ~300 million years ago. The avian Z and W chromosomes arose from a different pair of ordinary autosomes in the same common ancestor. Therefore, these two pairs of sex chromosomes can be thought of as parallel and independent experiments of nature, where an ostensibly random pair of autosomes differentiated into a sex chromosome pair over hundreds of millions of years. As I will detail in these next sections, many aspects of sex chromosome evolution are shared between these two experiments of nature, whereas some features are unique to either mammals or birds. First, I will introduce the concept of the sex- specific chromosome, the Y in mammals and the W in birds, as a degenerating autosome, covering both theoretical and empirical studies in support of this idea. Second, turning to the sex- shared chromosomes, the X in mammals and Z in birds, I will discuss the response to a loss of gene dosage from the degenerating sex-specific chromosome, an evolutionary process known as dosage compensation. Finally, I will highlight and seek to explain exceptions to these two themes of degeneration and dosage compensation.

9 The sex-specific chromosome as a degenerating autosome

The idea that heteromorphic sex chromosomes evolve from an autosomal pair of chromosomes is almost as old as the study of genetics itself. In 1914, Hermann Muller mapped a gene to the fourth chromosome (an autosome) of melanogaster (Muller, 1914). With this finding, all chromosomes except for the male-specific Y chromosome were known to have a mapped gene. To account for the lack of Y-linked genes, Muller proposed that the X and Y chromosomes evolved from an ordinary pair of autosomes. D. melanogaster males do not undergo meiotic recombination; as a result, the male-specific Y chromosome was unable to rid itself of deleterious mutations, leading to degradation and gene loss. Over the next century, some of the details of this theory would be revised as researchers considered sex chromosomes in additional species such as mammals and birds, but it is nevertheless remarkable that the core idea proposed by Muller has held true time and time again.

The two main regards in which Muller’s original theories were modified related to 1) the molecular mechanisms behind a lack of recombination on sex chromosomes, and 2) the population genetic explanations as to why a lack of recombination would lead to degeneration of a sex-specific chromosome. Susumu Ohno first extended Muller’s theories to vertebrates. While many vertebrate lineages, such as mammals, birds, and snakes, possess heteromorphic or differentiated sex chromosomes, there are numerous examples of homomorphic, or cytologically indistinguishable, sex chromosomes, as well as species with no sex chromosomes at all. For example, in alligators (Ferguson & Joanen, 1982), sex is determined by the temperature under which the egg develops and is therefore environmentally, rather than genetically, determined.

Ohno postulated that species with no sex chromosomes, homomorphic sex chromosomes, and heteromorphic sex chromosomes represented a continuum of evolutionary states. An ordinary

10 pair of autosomes, therefore, having acquired mutations that gave it sex-determining capacity, could then undergo degeneration. Unlike in Drosophila, vertebrate chromosomes undergo recombination in both male and female meiosis, thus there was no innate mechanism for the lack of recombination that Muller hypothesized led to the degradation of the Y chromosome. Ohno proposed that large chromosomal inversions on the sex-specific chromosome could lead to suppression of recombination, as any recombination events within the inversion would result in both duplications and deletions of large parts of the chromosome and likely lead to death or at least a drastic reduction in fitness (Ohno, 1967). Implicit in this model was the idea that suppression of recombination, and thus sex chromosome differentiation could occur in a stepwise manner, as even large inversions would only encompass parts of the chromosome.

Empirical support for this idea was lacking until many years later, when Lahn and Page found evidence consistent with at least four regional, stepwise suppressions of X-Y recombination, which they termed “evolutionary strata” (Lahn & Page, 1999). It is important to note that while suppression of X-Y recombination affects the majority of the sex chromosomes, the human X and Y chromosome continue to recombine at their ends, in the so-called pseudoautosomal regions. None of the evolutionary forces related to suppression of recombination are thus expected to take place in the pseudoautosomal regions.

The second major revision to Muller’s original theories of sex chromosome evolution concerned how and why a lack of recombination between members of a sex chromosome pair would lead to degeneration of the sex-specific chromosome but maintenance of the ancestral gene content of the sex-shared chromosome. Muller’s original theory stemmed from his work on balanced lethal mutations, where each member of a homologous chromosome pair contained a different mutation conferring a recessive lethal phenotype. Individuals homozygous for either

11 chromosome do not survive, and thus this configuration can be maintained in a population. The only way to get rid of such a configuration is by recombination between the two mutations that places them both in cis, i.e. on the same physical chromosome. Muller hypothesized that since sex chromosomes do not recombine with one another, this would provide the ideal arrangement for the emergence and accumulation of balanced lethal mutations. The sex-specific chromosome, sheltered from selection by the balanced mutations, would continue to accumulate recessive mutations, while the sex-shared chromosome would continue to recombine in the homogametic sex and thus be exposed to selection (Muller, 1918). However, Fisher later showed that this scenario was unlikely to be the driving force behind degeneration of the sex-specific chromosome, as an incipient X- or Z-linked mutation would still lead to selection for the sex- specific chromosome to maintain the function that was lost on the sex-shared chromosome (R. A.

Fisher, 1935). Alternate theories were required to account for how a lack of recombination can lead to degeneration of the sex-specific chromosome.

The first revised population genetic theory was actually proposed by Muller himself, some years later (Muller, 1964). This theory, called “Muller’s ratchet” and later formalized in a population genetic model by Felsenstein (Felsenstein, 1974), relied on genetic drift to explain the degeneration of a population of sex-specific chromosomes. In the absence of recombination, a population of Y or W chromosomes cannot generate individual chromosomes with fewer deleterious mutations than those that currently exist. As a result, if the least-mutated chromosome in the population were lost due to random genetic drift, the ratchet would have clicked irreversibly towards decay.

In addition to the accumulation of deleterious mutations, absolute linkage between different sites, as is the case on a non-recombining, sex-specific chromosome, can interfere with

12 how selection operates on individual sites. This idea was formalized in two models, those of background selection and genetic hitchhiking. In background selection, genetic diversity at neutral or nearly neutral sites is reduced due to selection against linked, strongly deleterious mutations (Charlesworth, 1994; Charlesworth et al., 1993). In genetic hitchhiking, mutations that confer a substantial fitness benefit rapidly sweep through a population, but slightly deleterious mutations linked to the beneficial mutation also become fixed (J. M. Smith & Haigh, 1974). Both background selection and genetic hitchhiking, by reducing genetic diversity across a chromosome, can be thought to reduce the effective population size of the sex-specific chromosome, rendering it more susceptible to genetic drift and Muller’s ratchet. The relative contributions of Muller’s ratchet, genetic hitchhiking, and background selection to decay of the

Y or W chromosome can vary over the course of degeneration (Bachtrog, 2008). Given the old age of both mammalian and avian sex chromosomes (~200 and ~100 million years, respectively), it is likely that each of these three forces contributed, in a substantial way, to degeneration of the sex-specific chromosome.

While it was soon appreciated that the principles of sex chromosome evolution from ordinary autosomes was a unifying feature of various animal lineages, the independence of evolutionary origins of the mammalian and avian sex chromosome was not so easily resolved.

Ohno first proposed that the mammalian and avian sex chromosomes evolved from the same ordinary autosome in their common ancestor – in other words, that the avian Z and W chromosomes were homologous to the mammalian X and Y chromosomes (Ohno, 1967).

Mapping studies, first for solitary (Baverstock et al., 1982) and then handfuls (Fridolfsson et al.,

1998; Nanda et al., 1999) of genes, began to demonstrate that this was not the case. The sequencing of the human X and chicken Z chromosomes further refined these observations,

13 showing that the human X was orthologous to chicken chromosomes 1 and 4 (Ross et al., 2005), while the chicken Z was orthologous to human chromosomes 5, 9, and 18 (Bellott et al., 2010).

Dosage compensation of the sex-shared chromosome

Alongside the emerging study of the sex-specific chromosome, fundamental discoveries were being made regarding the sex-shared chromosome, again by the same characters that kick-started the study of the degenerating sex-specific chromosome. In 1948, Muller showed that an X- linked, loss-of-function allele resulted in a dosage-dependent effect on eye color in either male or female flies, separately. However, females with two copies of the allele showed the same eye color as males with only one copy. Muller initially proposed that the “compensation” mechanism was a reduction of gene activity in females (Muller, 1948), but later studies showed that dosage compensation in Drosophila in fact consisted of a male-specific increase in X-linked gene activity (Mukherjee & Beermann, 1965).

However, the situation in mammals turned out to be different from Drosophila. Ohno and

Hauschka observed that, when staining chromosomes of cells from both neoplastic and normal mouse tissue, the majority of cells had one X chromosome that was far more intensely stained than the other, a phenomenon then termed “allocyly” (Ohno & Hauschka, 1960). They left open the question of whether the greater stained X chromosome was paternal, maternal, or both.

Shortly thereafter, Mary Lyon proposed that the inactivated X chromosome could be either maternal or paternal, and that this choice was both random and occurred relatively early in development. She synthesized studies of various mouse lines with X-linked coat color, astutely observing that in such lines, coat color was most frequently mosaic, with patches of color

14 corresponding to either allele (Lyon, 1961). Lyon then claimed, perhaps influenced by Muller’s original proposal, that the random inactivation of one X chromosome and the resulting reduction of gene activity in females constituted “dosage compensation” in mammals.

To this point, studies of dosage compensation in both flies and mammals had lacked an evolutionary perspective. This was first provided by Susumu Ohno: in a wide-ranging book, the same in which he proposed inversions as a mechanism for suppression of recombination, he proposed that in mammals, X inactivation actually represented the second step in mammalian dosage compensation. In Ohno’s model, degeneration of Y-linked genes left X-linked genes in males with half their ancestral dosage. In response to this reduction, X-linked gene activity was upregulated approximately two-fold, but in both sexes. Now, gene activity was at the ancestral levels in males, but higher than the ancestral level in females, which then led to the acquisition of

X inactivation in females. Thus, in this model, X inactivation allows females with two upregulated X chromosomes to achieve expression levels similar to the ancestral, autosomal state

(Ohno, 1967). Because X inactivation was discovered to occur by a chromosome-wide mechanism initiated by the noncoding RNA XIST (Brown et al., 1991; Penny et al., 1996), it was also assumed that mechanisms of dosage compensation evolved on a chromosome-wide scale.

However, by studying the X inactivation status of individual genes in a broad range of mammalian species, Jegalian and Page showed that X inactivation could evolve on a gene-by- gene basis (Jegalian & Page, 1998). The key insight from these studies, the “Ohno-Jeglian model” of sex chromosome evolution, is that X inactivation is the second step in an evolutionary process that returns X-linked genes, once their Y-linked homologs have decayed, to their ancestral, autosomal expression levels. This is subtly but significantly different from the idea that

15 X inactivation, or mammalian dosage compensation more generally, evolved in order to equalize gene expression levels between males and females (Figure 1.1).

16

Pseudoautosomal, not X-inactivated

X-linked, Y-linked not X-inactivated Selective pressure Increased expression

Y gene loss/ Selective X-linked, expression restriction pressure increased expression not X-inactivated Subjection to X inactivation

X-linked, increased expression X-inactivated

Figure 1.1. The Ohno-Jegalian model of mammalian sex chromosome evolution.

17 Another important implication of this model is that dosage compensation likely proceeded on a gene-by-gene basis. A number of studies have sought evidence of X-linked upregulation by comparing gene expression levels between the entire X chromosome and all autosomes, with equal numbers of studies supporting or rejecting upregulation (Deng et al., 2011; Julien et al.,

2012; Kharchenko et al., 2011; Lin et al., 2012; Xiong et al., 2010); difficulties in testing the predictions of a gene-by-gene model using comparisons based on all genes on a chromosome are perhaps not surprising.

Given the seemingly disparate evolutionary paths of dosage compensation between mammals and flies, studying dosage compensation in birds represented an opportunity to obtain clarity on the evolutionary processes underlying the response of the sex-shared chromosome to the degeneration of the sex-specific chromosome. Unfortunately, however, the study of dosage compensation in birds has yielded few clarifying answers, as it seems to be unlike both

Drosophila and mammals. Alongside their studies of sex chromatin in mammals, Ohno and colleagues also examined chromosomal staining in chicken cells, finding that while sex chromatin was observed in ZW females, neither of the Z chromosomes in male cells showed particular condensation (Ohno et al., 1960). Some years later, Baverstock et al studied levels of the Z-linked aconitase enzyme three different avian species, finding that males had significantly higher levels of activity than females, but that this difference was less than the two-fold-change expected from comparing two active copies to one (Baverstock et al., 1982). This finding, although based upon only one gene, would prove to be remarkably prescient with respect to the wider state of dosage compensation, as subsequent studies using genomic technologies to measure expression of all Z-linked genes found, on average, the same result, with some variation in the male/female expression ratio between tissues and developmental time. These results have

18 been interpreted as “partial,” or “incomplete” dosage compensation; however, it is still unclear whether the slight reduction in or variation of the male/female expression ratio is regulated by an active mechanism specific to the Z chromosome. Recent results have suggested that a male- biased, Z-linked microRNA (miRNA), through preferential targeting of other Z-linked genes, may contribute to a reduction in the expression ratio (Warnefors et al., 2017). However, a preferential downregulation of Z-linked gene expression in males would have followed a widespread upregulation of Z-linked gene expression in both sexes, an evolutionary step for which there is not yet convincing evidence.

Exceptions to the rules: Gene survival on the sex-specific chromosome a lack of dosage compensation on the sex-shared chromosome

Thus far, I have described two key features of the evolutionary trajectories of the mammalian and avian sex chromosomes; the degeneration of the sex-specific chromosome, and dosage compensation of the sex-shared chromosome in response to the loss of gene dosage resulting from this degeneration. While these can be thought of as general rules or principles of sex chromosome evolution, there are, as always, numerous exceptions to the rule. This section explores these exceptions, i.e. a) genes that survive genetic decay on the sex-specific chromosome and b) genes that are lacking in some aspect of dosage compensation in mammals or birds.

Amid the emerging theories of Y chromosome degeneration from ancestral autosomes, it was an open question as to whether Y chromosomes in general contained any functional genes.

Early studies of human pedigrees claiming Y-linked inheritance of multiple traits were systematically debunked by Curt Stern (Stern, 1957), contributing to the view that mammalian Y

19 chromosomes, as in Drosophila, contained no functional genes. A significant detour from this view was initiated by evidence that the human (Jacobs & Strong, 1959) and mouse (Welshons &

Russell, 1959) Y chromosomes contained the male sex-determining gene; this led to a series of mapping efforts that culminated in the identification of the SRY as the male-determining gene

(Koopman et al., 1991). For many years after this discovery, however, many assumed that other than sex determination, the Y chromosome had no important functions, which would lead to total degeneration in the future (Graves, 2006).

This view was challenged by two lines of evidence. First, striking patterns in the gene content of the human Y chromosome became apparent as additional genes were discovered, enabled by a map of the Y chromosome from naturally occurring deletions (Vollrath et al., 1992) and overlapping DNA clones (Foote et al., 1992). A systematic survey of the expression patterns of all known Y-linked genes revealed a functional coherence represent by two classes of genes: those that showed expression specific to the testis and those that were more broadly expressed across tissues (Lahn, 1997). Many of the Y-linked genes with broad expression also had homologous X-linked genes and were thus likely derived from the same ancestral, autosomal gene; this suggested that long-term survival of robustly expressed Y-linked genes was in fact possible. Second, Y chromosome sequencing, first in humans (Skaletsky et al., 2003), then in chimp (Hughes et al., 2005) and rhesus macaque (Hughes et al., 2012), allowed for an empirical assessment of the prediction that the Y had continued to degenerate throughout mammalian evolution. The results clearly rejected this hypothesis, showing evidence of rapid, exponential decay to a non-zero baseline, as the ancestral gene content of the human and chimp Y chromosomes was nearly identical. Thus, the degeneration of the Y chromosome had ceased during ~12 million years of evolution (six million years on each of the human and chimp

20 branches since their divergence), and coherent sets of genes seemed to have been retained, both findings that ran contrary to theoretical predictions.

The two lines of evidence against degeneration of the Y, the kinetics of decay and the nonrandom nature of survival, were finally brought together with the sequencing of the ancestral portions of Y chromosomes from eight mammalian species and the analysis of their gene content. Relative to genes that without a Y-linked homolog, X-linked genes with surviving- linked Y homologs were enriched for broad expression across tissues, as well as gene regulatory functions such as transcription, translation, and splicing (Bellott et al., 2014). Furthermore, these

X-Y pair genes showed elevated signs of haploinsufficiency, suggesting that their dosage sensitivity prevented loss of the Y homolog due to genetic decay. A recent, analogous analysis of

W chromosomes in chicken and other birds has found the same principles likely underlie gene survival in birds, as indicated by an analogous analysis of W chromosomes in chicken and 13 other avian species (Bellott et al., 2017). These studies demonstrate of the power of having two independent experiments of nature occur during the evolution of the mammalian and avian sex chromosome from different pairs of ordinary autosomes and indicate that in both lineages, survival of dosage-sensitive regulators is an important exception to the rule of decay of the sex- specific chromosome.

Along with the discovery of X inactivation, it was predicted that some genes on the X chromosome would escape X inactivation. The obvious candidates for escape were located in the pseudoautosomal region of the X and Y (Lyon, 1962) – these regions continue to recombine in both males and females, and so should not have traversed the evolutionary pathway from genetic decay to the acquisition of X inactivation, as was demonstrated for a number of individual genes

(Ellison et al., 1992; Fialkow, 1970; Goodfellow et al., 1984). The first X-linked gene outside the

21 pseudoautosomal region shown to escape X inactivation was ZFX (Schneider-Gädicke et al.,

1989), followed by RPS4X (Fisher et al., 1990). It was immediately noted that both ZFX and

RPS4X had broadly expressed, Y-linked homologs, putting them in the special class of X-linked survivors mentioned above. Through analysis of a greater number of X-Y pair genes, Bellott et al provided a rationale for such behavior: some X-linked genes were so dosage sensitive that even in females, the fitness costs of inactivating one allele were too high. Later, large-scale surveys

(Balaton et al., 2015; Carrel & Willard, 2005; Cotton et al., 2011, 2013, 2015; Tukiainen et al.,

2017) indicated that 10-15% of X-linked genes escape X inactivation in humans, many of which did not have Y-linked homologs. While escape from X inactivation in mice was shown to be far less extensive, first for Zfx and Rps4x individually (Ashworth et al., 1991) and then by systematic surveys (Berletch et al., 2015; F. Yang et al., 2010), it was clear that there was an additional class of genes on the mammalian X chromosome: those that lacked an X-linked homolog, but continued to escape X inactivation in females. In Chapter 2, I will describe a study that suggests that these genes continue to escape X inactivation due to a lack of dosage sensitivity, as those with greater dosage sensitivity became subject to X inactivation, reducing the effects of overexpression in females. As I will discuss in the next section of this introductory chapter, this also has implications for which sex-linked genes are the most likely candidates to contribute directly to genome-wide sex differences in gene expression.

While other parts of the dosage compensation pathway were also predicted to proceed equally for all genes, this turned out to not be the case; here too, dosage sensitivity likely plays a role in explaining the exceptions. X-linked genes that are part of large complexes, and thus likely dosage sensitive, show levels of expression closer to the autosomal average (Pessia et al., 2012). Chicken Z-linked genes with fewer deletions and duplications while they were on

22 autosomes even further back in time show more equal male/female expression ratios (Zimmer et al., 2016). Thus, multiple lines of evidence point towards an important role for gene-by-gene dosage sensitivity in shaping both survival on the sex-specific chromosome and dosage compensation on the sex-shared chromosome. The study described in Chapter 2 will extend these findings to show that such heterogeneities in dosage sensitivity were present on the ancestral autosomes, leading to an updated version of the Ohno-Jegalian model of sex chromosome evolution.

Part 2. Phenotypic and physiological sex differences in mammals

While transitions between monomorphic and dimorphic sexes in lineages such as fungi and algae provide evidence of its repeated evolution, all mammals show some form of sexual dimorphism.

In this section, I will describe some of the most prominent phenotypic sex differences in mammals. Phenotypic sexual dimorphism has been most intensely studied in humans due to its obvious clinical implications and has been described in a vast array of tissues and organ systems, but I will focus on phenotypes and organ systems where sexual dimorphism in other mammalian species has also been studied.

Reproductive tract

The reproductive tract is the most sexually dimorphic organ system in mammals. The genetic mechanisms by which sexual dimorphism in the reproductive tract arises during development have been the subject of intense study for decades and can conceptually be divided into two parts: sex determination, the initial switch that directs development of the bipotential gonad down the male or female pathway, and sex differentiation, where the male or female

23 gonad results in other sex-specific reproductive structures, mostly through sexually dimorphic hormone signaling. In mammals, expression of the Y-linked SRY in the developing, bi-potential gonad results in male sex determination, whereas lack of SRY expression results in progression down the default female pathway of gonadal development (Koopman et al., 1991). Once the bipotential gonad differentiates sufficiently down the male or female pathway into a testis or an ovary, it secretes male or female gonadal hormones.

The most immediate effect of this hormone secretion is to affect the fate of the duct system, the reproductive structures surrounding the gonads. Initially, the predecessors to both the male and female ducts, the Wolffian and Mullerian ducts respectively, surround the bipotential gonad. The most important hormones for male development are anti-Mullerian hormone (AMH) and androgens, both produced by the testis. AMH production results in regression of the

Mullerian duct and, together with androgens, leads to development of the Wolffian duct into the epidydmis, vas deferens, and seminal vesicles (Josso, 1970). A lack of AMH production leads to development of the Mullerian duct into the oviduct, uterus, and vagina; this development also leads to the regression of the Wolffian duct (Behringer et al., 1994).

Height and body size

At the phenotypic level, one of the most widespread sex differences in mammals is that of height or body size; males are generally larger than females, a phenomenon referred to as sexual size dimorphism (SSD). This difference is relatively subtle, as in most cases, the distributions of male and female heights, which each follow a normal distribution, overlap substantially. For example, in human European populations, males are on average ~13 cm (1.08-fold) taller than females

(Sanjak et al., 2017). In other species, body mass, rather than body length, the equivalent of

24 height, is often used to quantify sexual size dimorphism. An analysis of 1370 mammalian species indicated an average male/female mass ratio of 1.18 (Lindenfors et al. 2007). An extreme example of SSD can be found in southern elephant seals; adult males weigh 4-10 times more than adult females (Ralls & Mesnick, 2009). While male-biased SSD is largely the rule in mammals, some exceptions do exist, mostly in rodent species; females are heavier and longer than males in both chinchillas (Lammers et al., 2005) and yellow-pine chipmunks (Schulte-

Hostedde & Millar, 2000).

The widespread nature of SSD, both in mammals and other lineages, has led to much speculation about its evolutionary causes. This line of research was initiated by Bernard Rensch in 1950; he observed that between closely related species, the degree of SSD increased with a species’ size when SSD was generally male-biased, while the degree of SSD decreased with size when SSD was female-biased (Rensch, 1950). The most widely accepted explanation for

Rensch’s rule, and for SSD more generally, invokes sexual selection; for males, increased size is associated with increased reproductive success, while smaller females have greater reproductive success. Evidence to support this hypothesis originates from studies in both mammals

(Lindenfors et al., 2007) and birds (Szekely et al., 2004) that correlated the degree of SSD across species with mating patterns and behaviors, which can be used as a proxy for the strength of sexual selection. For instance, a species that exhibits polygyny, a mating system when individual males form long-term mating relationships with more than one female, intensifies the strength of sexual selection since there is greater variance in mating (and thus reproductive) success among males (Kirkpatrick, 1987). Additionally, studies in humans have shown a positive correlation between height and reproductive success in males, and a negative correlation in females (Sanjak et al., 2017).

25

Immune system

There are substantial sex differences in the human immune system; these are perhaps best exemplified by the pervasive sex bias in the incidence and prevalence of most autoimmune disorders. The majority of autoimmune diseases show female bias in prevalence, ranging from modest (~2:1 female to male prevalence for multiple sclerosis) to substantial (~9:1 for both systemic lupus erythematosus and Sjogren’s syndrome), but a few such as ankylosing spondylitis show male bias (Ngo et al., 2014). Furthermore, there is good evidence for an evolutionarily conserved sex bias in susceptibility to autoimmune disorders; in mice, for example, both spontaneous (Roubinian et al., 1978) and chemically induced (D. L. Smith et al., 2007) lupus- like disease shows greater mortality and severity in females. Sex differences in both the innate and adaptive arms of the immune system likely contribute to female biases with respect to autoimmune disease. Studies in both humans (van Eijk et al., 2007) and mice (Scotland &

Stables, 2011) indicate that females mount a stronger innate inflammatory response than males when challenged by bacterial infection or endotoxins. Studies of immune cell composition in humans, mice, and dogs have indicated a greater number of circulating (Amadori et al., 1995;

Greeley et al., 1996) and tissue-resident (Scotland & Stables, 2011) CD4+ T cells, a key mediator of the adaptive immune response, in females as compared to males.

26 Cardiovascular system

Compared to females, males have a consistently higher risk (Lori et al., 2011) and an earlier age of onset (Maas & Appelman, 2010) across a range of cardiovascular diseases. As a set, cardiovascular diseases show a broad range of etiologies and causes, but sex differences are perhaps best exemplified by hypertrophic cardiomyopathy (HCM), a disease defined by abnormal cardiac muscle contractility and cellular enlargement of the left ventricle that shows a greater prevalence severity and earlier onset in males relative to females (Codd et al., 1989;

Maron et al., 1995). Mutations in a handful of autosomal genes encoding heart-specific sarcomere constitute approximately 50% of HCM cases (Burke et al., 2016), suggesting that sex differences in the function of the healthy heart, upon interaction with these mutations, drive the observed difference in HCM incidence. Indeed, healthy males show greater left ventricle mass than females at baseline (Giovanni et al., 1995) and a greater hypertrophic response to strenuous endurance training (Howden et al., 2015). In mice, mutation of the cardiac myosin heavy chain gene recapitulates the male bias in onset and severity observed in humans

(Geisterfer-Lowrance et al., 1996), but studies of the cardiac adaptation to exercise indicate an increased hypertrophic response in females, the opposite to that observed in humans (Foryst-

Ludwig et al., 2011). Nevertheless, right ventricular pacing at a high heartbeat led to greater hypertrophy in males compared to females (Kiczak et al., 2015). Thus, important aspects of cardiac sex differences are conserved between human and mouse, most clearly resulting in a greater susceptibility of males to cardiomyopathy.

27 Metabolism

Males and females differ substantially in energy storage and usage, both at the systemic and cellular levels. In humans, there are well-known sex differences in total adiposity and the distribution of white adipose tissue: women have greater relative fat mass than men, which is driven by a greater propensity to store adipose tissue in subcutaneous areas, as compared to the male propensity to store visceral adipose tissue (Fiore et al., 1986). These differences are broadly recapitulated in mice, as females have greater relative fat mass than males. Although the distribution of body fat generally differs between the species, female mice have greater deposits of subcutaneous inguinal fat, whereas males have greater visceral gonadal fat deposits (Chen et al., 2012). In addition, in both humans and mice, females have more active brown adipose tissue than males (Cypess et al., 2009; Oliver et al., 2002). Physiological studies in humans have shown that females are more insulin sensitive than males (Frias et al., 2001), and studies of isolated cells in mice has shown this difference to hold at the level of adipocytes themselves (Macotela et al., 2009). These systemic and cell-intrinsic metabolic differences likely contribute to sex differences in conditions such as obesity, which is more prevalent in women (Ng et al., 2014), and both type 1 (Östman et al., 2008) and type 2 diabetes (Ogurtsova et al., 2017), which are more prevalent in men.

An additional aspect of metabolism that differs significantly between the sexes is that of drug clearance. Some of the best-known examples in humans are that males clear acetaminophen

(Miners et al., 1983) and various benzodiazepines (an anti-anxiety medication) (Bigos et al.,

2008; Macleod et al., 1979) faster than females. In contrast, the antibiotic erythromycin (Austin et al., 1980) and the antihypertensive drug nifedipine (Krecic-Shepard et al., 2000) are cleared faster in women. Outside of humans, these sex differences have been most well-studied in the

28 rat, as it displays some extreme sex differences conserved with humans: male rats metabolize diazepam ~10 times faster than female rats (Nau & Liddiard, 1980). However, studies in rats also suggest important differences with humans, as liver microsomes from males metabolize nifedipine faster than those from females (Niwa et al., 1995), an opposite direction of bias as compared to humans. Furthermore, studies in both mice and rats have shown no significant sex difference in acetaminophen metabolism (Dai et al., 2006; Tarloff et al., 1996). Interestingly, studies of liver microsomes from hamsters, another rodent, revealed a female bias in erythromycin clearance, similar to humans (Miura et al., 1988).

This section is by no means an exhaustive summary of phenotypic sex differences. For example, human males have an approximately 20% greater risk than females for developing any cancer, while some individual cancer types show a greater than two-fold male bias in predominance

(Edgren et al., 2012). Furthermore, sex differences in brain morphology have been observed in both humans (Ruigrok et al., 2014) and mice (Gorski et al., 1978), with mice also displaying distinct sexually dimorphic behaviors under hormonal control (Wu & Shah, 2011). However, the above-described examples, from the reproductive tract and the cardiovascular, immune, and metabolic systems, illustrate the phenotypic breadth of mammalian sex differences. Besides humans, phenotypic sex differences have been mostly studied in rodents, due to the ease of experimental manipulation. Therefore, the entire evolutionary history of many sexually dimorphic phenotypes is unknown, and can only be determined by further study in a diverse array of species spanning the breadth of the mammalian phylogenetic tree.

29 Part 3. Sex-biased gene expression as an intermediary from sex chromosomes to phenotypic sex differences

Thus far in this introductory chapter, I have discussed the evolution of the mammalian and avian sex chromosomes, which are the ultimate source of differences between males and females, followed by phenotypic and physiological sex differences across the body, emphasizing that sex differences beyond the reproductive tract are present in a number of mammalian species, not just humans. A natural question arises, then, as to how the sex chromosomes ultimately lead to phenotypic sex differences in health and disease. In general, phenotypic variation both within and between species is thought to largely arise from gene expression differences resulting from genetic variation in cis-regulatory sequences (Carroll, 2008; Ramos et al., 2009). While it is now appreciated that the Y chromosome has additional genes of some importance, the total number of protein-coding genes is modest (27 in humans, depending on how genes are counted) (Skaletsky et al., 2003). The X chromosome, which can contribute to sex differences through escape from X inactivation, has a greater number of genes (700-1,000) (Ross et al., 2005), but genetic studies have shown that X-linked genetic variation explains ~1% of total phenotypic variance for a number of complex traits (J. Yang et al., 2011). Therefore, it is unlikely that the direct, cell- intrinsic effects of sex-linked genes could fully account for the numerous phenotypic sex differences observed in mammals, which calls for a model that integrates sex differences in autosomal gene expression.

In this section, I conceptualize and describe such a pathway connecting genetic sex differences to sex differences in phenotypic health and disease. The basics of the pathway are as follows. First, the sex chromosomes lead to genome-wide sex differences in gene expression through a combination of sexually dimorphic hormonal signaling emanating from the gonads and

30 cell-intrinsic effects of sex-linked genes acting outside the reproductive tract. Next, sex-biased gene expression leads to the phenotypic and physiological sex differences observed which, finally, when pushed over disease thresholds by environmental or genetic insults, result in sex differences in disease incidence, prevalence, or severity (Figure 1.2). This pathway is a simplifying model, as there is likely feedback between the steps of the linear progression described. For example, sex-linked genes could contribute to sex differences in hormone signaling through cell-intrinsic effects and vice versa. Furthermore, sex differences in phenotypes likely also feed back onto sex-biased gene expression – for example, males being larger on average could place different strains on the male heart as compared to the female heart, which would presumably result in some sex differences in gene expression. Nevertheless, regardless of such subtleties, sex-biased gene expression remains a central mediator of the effects, cell-intrinsic or hormonally-mediated, of sex chromosomes on sex differences in mammalian phenotypes and physiology.

31

Figure 1.2. A pathway from sex chromosomes to sex differences in health and disease.

32 Prior studies of sex bias in autosomal gene expression

Alongside discoveries being made regarding the existence of genes on the Y chromosome and escape of genes from X chromosome inactivation, it was apparent (and shown in some cases) that such cases would result in sex-biased gene expression: male bias of Y-linked genes by virtue of being on a male-specific chromosome, and female bias of X-linked genes escaping X inactivation as a result of two transcriptionally active copies in females and only one in males.

Around the time that these discoveries were being made, sex differences in the mRNA levels of autosomal genes in tissues outside the reproductive tract were also being discovered, sometimes at the same time as cloning of the genes themselves (Hastie et al., 1979). Even in these early studies, conducted mostly in liver tissue, it was apparent that some sex biases in expression outside the reproductive tract were conserved. For example, the major urinary protein was shown to be strongly male-biased in mouse and rat, while prolactin activity was strongly female-biased in rat, rabbit, and guinea pig (Posner et al., 1974). However, the search for sex-biased gene expression was limited by technology: with so many genes in the genome, there was not an unbiased way to search for those with sex-biased expression.

The advent of techniques allowing for genome-wide gene expression profiling, first by microarray hybridization and then by RNA sequencing (RNA-seq), allowed researchers to assess global patterns of sex-biased gene expression. The first studies to do so focused on sex-biased gene expression in Drosophila, which was found to be both widespread across the genome (W.

Jin et al., 2001) and remarkably different between otherwise closely related species, implying changes in the magnitude and direction of sex-biased gene expression over short evolutionary timespans (Ranz et al., 2003; Zhang et al., 2007). However, these and other studies used RNA extract from either the gonads or the entire body of the fly – therefore, most of the sex biases

33 described by these studies originated from the reproductive tract. As discussed in Part 2

(Phenotypic and physiological sex differences in mammals – Reproductive tract), these differences are expected to be extensive due to cellular differentiation of male and female gonadal tissue.

Soon after these initial studies, researchers began applying microarray techniques to study sex-biased gene expression globally in non-reproductive, mammalian tissues. The first major study in this space assessed sex-biased gene expression in mouse liver, kidney, and hypothalamus, in addition to the gonads, finding a relatively small number of genes that showed sex differences in expression in the liver and kidney (six and 20, respectively) involved in drug and steroid metabolism (Rinn et al., 2004). While the relatively small number of sex differences might have been surprising, this was likely because Rinn et al focused on sex-specific expression

(i.e. genes expressed almost exclusively in males or females). Furthermore, the use of pooled

RNA from multiple individuals as biological replicates may have masked subtler variation by sex. A later study on sex-biased gene expression in mouse liver, muscle, and adipose, with a much larger sample size, found extremely widespread sex bias that was mostly tissue-specific

(Yang et al., 2006). Importantly, while a handful of genes showed expression patterns close to sex-specific, most sex bias was of smaller magnitude (less than a two-fold-change difference between the sexes).

Because of the inherent variability, both biological and technical, in assaying gene expression in human samples, quantifying the extent of sex-biased expression in human tissues has proved more challenging. Initial studies did not detect substantial sex-biased expression outside of genes on the X or Y chromosomes (Isensee et al., 2008; Vawter et al., 2004; Welle et al., 2008), but better powered studies found substantial sex-biased expression in individual

34 tissues such as liver (Zhang et al., 2011) and brain (Reinius et al., 2008). Interestingly, both of these studies, comparing to sex bias in mouse and other primates, respectively, found evidence of conservation of sex bias, extending initial single-gene studies that had noticed sex-biased expression in multiple species to suggest a wider signature of conserved sex-biased gene expression. At the same time, these studies noticed cases where sex bias was lineage-specific, either in its direction (i.e. male-biased in one species and female-biased in another) or identity

(i.e. only sex-biased in one or a subset of species). However, these studies did not make an effort to systematically assess whether most sex in the liver or brain was conserved or lineage-specific.

The advent of RNA sequencing, in which RNA fragments are reverse transcribed to cDNA and then sequenced in a high-throughput manner, provided several advantages over microarrays: a better ability to estimate relative expression levels, leading to reduced technical noise, ability to assess splicing in a more unbiased fashion, and comparisons across species, which were hindered by differential hybridization rates in microarray experiments. These advantages were soon realized by researchers, with one of the first such studies being a comparison of sex-biased expression and splicing in human, chimpanzee, and macaque livers

(Blekhman et al., 2010). While studies thus far had focused on a fairly limited set of tissues, the

Genotype Tissue Expression Consortium (GTEx), which collected up to 44 different post- mortem human tissues from 499 individuals, yielded a total of 7,051 samples. This permitted surveys of sex-biased expression across an unprecedented range of tissues, with the overall finding that most sex bias is tissue specific (Gershoni & Pietrokovski, 2017).

35 Upstream causes of sex-biased gene expression: sexually dimorphic hormonal environments

It is well appreciated that androgens secreted from the testis and estrogens secreted from the ovary are the primary hormonal drivers of sexual dimorphism across the body. One small caveat to this point is that androgens and estrogens are not fully male- and female-specific hormones, respectively; small amounts of androgens are produced by the adrenal gland in both sexes (Stewart & Krone, 2016), and the ovary also produces some androgens (Bulun, 2016).

Furthermore, androgens can be converted to estrogens by an enzymatic reaction called aromatization, leading to a small amount of functional estrogen in males (Flores et al., 1973;

Naftolin et al., 1971, 1972). Androgens and estrogens refer to a class of closely related molecules; for example, while the major androgen is testosterone, its derivative dihydrotestosterone binds much more strongly to the androgen (Tóth & Zakar, 1983).

There are three major types of estrogens: estrone, estradiol, and estriol. Estradiol is the predominant estrogen during most of the reproductive lifespan, but estrone dominates after menopause (Judd et al., 1976) and estriol during pregnancy (Tulchinsky et al., 1972). While the major effects of sex hormones manifest during puberty, there is a short burst of activity late in prenatal development or immediately after birth in both males and females, a phenomenon termed “minipuberty” that has been observed in humans, rodents, and horses (Chellakooty et al.,

2003; Corbier et al., 1992; Forest et al., 1974). These exceptions notwithstanding, the effects of androgens and estrogens on sex differences can be broadly divided into two types of mechanisms: 1) direct effects of circulating androgens or estrogens on transcription and signaling in cells and tissues across the body, and 2) androgens and estrogens resulting in sexual dimorphism in additional hormonal axes such as the hypothalamic-pituitary-adrenal (HPA) and growth hormone (GH) axes, which then regulate tissues in a sexually dimorphic manner.

36 The primary mechanism of action for both androgens and estrogens is through binding to cytosolically located receptors, , encoded by the AR gene, and alpha and beta, encoded by the ESR1 and ESR2 genes, respectively. Once bound, these hormone receptors can translocate to the nucleus, associating with short, palindromic sequence motifs termed androgen (Ham et al., 1988) or estrogen (Klein-Hitpaß et al., 1986) response elements (AREs, EREs). As with almost all transcription factors, genome-wide studies of AR and ER binding by chromatin immunoprecipitation followed by high throughput sequencing

(ChIP-Seq) indicate that the presence of AREs or EREs is neither necessary nor sufficient for binding, but that binding is rather determined by the higher-order chromatin state and interactions at each locus. However, because mutations of AR and ER are crucial determinants in the incidence and severity of prostate and breast cancer, respectively, such studies have been largely performed in prostate (J. S. Carroll et al., 2007; L. Yang et al., 2013; Yu et al., 2010) or breast cancer (Chew et al., 2009; Hurtado et al., 2011; Palmieri et al., 2012) cell lines or tissues.

Nevertheless, these studies have collectively found that AR and ER activate some genes and repress others through the recruitment of transcriptional coactivators or corepressors. Because the transcriptional programs regulated by AR and ER are likely highly tissue-specific, studies in other non-reproductive tissues assessing the contribution of sexually dimorphic transcriptional regulation by hormones to sex-biased gene expression have relied on sequence matches to AREs and EREs as a proxy (Blekhman et al., 2010; Roberts et al., 2016). Such approaches have suggested that AR- and ER-mediated regulation of transcription is at most a weak contributor to sex-biased gene expression, but alternate approaches to identify AR- or ER-regulated genes in a tissue-specific manner may yield different insights.

37 A secondary mechanism of action for androgens and estrogens does not involve transcriptional regulation, but rather rapid intracellular signaling responses (the “non-genomic” pathway). Here, hormone-bound AR or ER can directly participate in signaling cascades in the without regulating transcription. Both androgens and estrogens have been shown to regulate calcium mobilization (Improta-Brears et al., 1999; Lieberherr & Grosse, 1994), cyclic

AMP production (Aronica et al., 1994; Shakil et al., 2002), as well as multiple members of the mitogen activated protein kinase (MAPK) pathway involved in and cell proliferation

(Kousteni et al., 2001; Migliaccio et al., 2000). It is likely that both intracellular and membrane- bound AR and ER mediates many of these non-genomic functions (Benten et al., 1999a; Pappas et al., 1995), but other studies indicate that non-genomic functions can take place in cells lacking

AR or ER expression (Benten et al., 1999b; Toran-Allerand et al., 2002). By activating various signaling pathways in a sexually dimorphic manner, the non-genomic pathway of androgens and estrogens could still contribute to sex-biased gene expression. However, the relative contribution of the genomic and non-genomic pathways to sex-biased gene expression is unknown.

Starting in the 1960s, a large number of studies in rodents have elucidated the mechanisms by which sexually dimorphic GH signaling, initiated by sexually dimorphic gonadal hormones, results in sex-biased gene expression in the liver. The pituitary secretes GH in a sexually dimorphic pattern, where GH is pulsatile in males and continuous in females. This is most clear in rodents (Jansson et al., 1985), but there is evidence of such sexual dimorphism in adults humans as well (Hindmarsh et al., 1999; Jaffe et al., 1998). Studies in rodents indicate that both sexes show pulsatile GH secretion throughout puberty, with females having a higher baseline than males, and that the stereotypical sexual dimorphism in pulsatility is only apparent well after puberty (Gabriel et al., 1992). Sexually dimorphic GH secretion patterns are due to a

38 combination of sexually dimorphic signaling from the hypothalamus to the pituitary and intrinsic sexual dimorphism in the pituitary’s sensitivity to hypothalamic signals, both of which are the result of organizational effects of androgens and estrogens during development (Tannenbaum &

Painson, 1991). Sexually dimorphic GH is both necessary and sufficient for a large fraction of sex-biased gene expression in both mouse (Wauthier et al., 2010) and rat (Wauthier & Waxman,

2008) liver, as indicated by removal of the pituitary and treatment with pulsatile or continuous

GH in either sex. The transcription factor STAT5b is a key mediator of the effect of sexually dimorphic GH on sex-biased gene expression, as indicated by mouse knockout models where a majority of sex-biased gene expression is lost (Clodfelter et al., 2006; Laz et al., 2007). High levels of GH activate STAT5b through phosphorylation and nuclear localization; STAT5b activity levels thus mirror GH levels, resulting in pulsatile activity in males and continuous activity in females (Choi & Waxman, 2000). STAT5b shows sex-biased binding patterns of hundreds of genes; generally, male-biased STAT5b binding correlates with male-biased expression levels, while female-biased STAT5b binding correlates with female-biased expression levels, indicating that the direct effects of STAT5b on sex-biased gene expression are largely mediated through transcriptional activation (Zhang et al., 2012). However, the mechanism by which pulsatile versus continuous STAT5b activity results in sex differences in

STAT5b binding across the genome are unknown. Furthermore, the effect of sexually dimorphic

STAT5b signaling on sex-biased gene expression is enhanced by additional sex-biased,

STAT5b-dependent transcription factors. For example, continuous STAT5b activity in female liver results in repression and male-biased expression of BCL6, a transcriptional repressor that competes with STAT5b binding in both sexes, but to a greater extent in males and specifically at female-biased genes (Laz et al., 2009; Zhang et al., 2012). The transcription factor Cux2

39 essentially plays the opposite role to BCL6, showing female-specific expression as a result of repression by pulsatile STAT5b activity in males (C.-S. Chen et al., 2007). However, in addition to repressing male-biased genes in female liver (the opposite role of BCL6), it also further activates female-biased genes. Together, these detailed studies indicate that the indirect effects of gonadal hormones on the liver (through sexually dimorphic GH patterns) modulate a hierarchical transcriptional network that accounts for a large proportion of sex-biased gene expression.

Whether this type of regulatory logic applies to other tissues or species remains to be seen.

A third major hormonal axis is the hypothalamic-pituitary-adrenal (HPA) axis, a neuroendocrine system that primarily functions in the response to various stressors. The hypothalamus releases corticotropin-releasing hormone (CRH) and arginine vasopressin (AVP), which synergistically stimulate the release of adrenocorticotropic hormone (ACTH) from the pituitary. Finally, ACTH leads to the synthesis and release of glucocorticoids from the adrenal gland (Goel et al., 2014). While the most well-studied effects of glucocorticoids are on metabolism, where they function to stimulate glucose production through effects on the liver, muscle and adipose tissue, glucocorticoids have been appreciated to have wide-ranging effects, mediated through binding to their cognate receptors that are expressed in virtually all tissues

(Carroll et al., 2017) and subsequent localization to the nucleus, where they act to regulate transcription. Sex differences in the HPA axis most strongly manifest in the response to stress; in rats, females show more rapid and greater increases in glucocorticoid levels than males in response to a range of stressors (Frederic et al., 1993; Kant et al., 1983). This sex difference in

HPA responsivity is broadly recapitulated in humans, but with some exceptions. No sex difference in cortisol levels was observed after intense exercise (Friedmann & Kindermann,

1989), but both cold exposure (Gerra et al., 1992) and pharmacological stimulation (Gallucci et

40 al., 1993; Uhart et al., 2006) of the HPA axis yield increased cortisol responses in females relative to males. Notably, this latter sexual dimorphism in response to direct HPA stimulation has been observed in cattle, suggesting deep evolutionary conservation (Hulbert et al., 2012).

Even at basal condition, females rats show higher cortisol pulse frequency and pulse amplitude than males (Seale et al., 2004). Assays in rodents based on removal of the gonads (Critchlow et al., 1963; Gaskin & Kitay, 1971), correlation with the estrous cycle in females (Carey et al.,

1995), or treatment with sex hormones (Handa et al., 1994; Weiser & Handa, 2009) have suggested that opposing effects of androgens and estrogens partially explain this sex difference; androgens act to inhibit the HPA axis, while estrogens both stimulate basal HPA activity and potentiate the response to stress. While no studies have yet tied sex differences in HPA activity to specific sex differences in gene expression, the binds to the same

DNA elements as the androgen receptor (Scheidereit et al., 1983), raising the possibility that the sexually dimorphic response to stress, mediated through the HPA axis, could exaggerate preexisting sex biases due to androgens.

Upstream causes of sex-biased gene expression: sex chromosome complement outside the reproductive tract

For many years, the view has been that outside the reproductive tract, essentially all sex- biased gene expression could be explained by the direct or indirect effects of androgens in males and estrogens in females. However, with an emerging understanding of the exceptions of the rules of sex chromosome evolution, survival on the Y chromosome and escape from X inactivation, it was appreciated that genes on the sex chromosomes could have cell-intrinsic effects on sex differences throughout the body. For genes without a Y-linked homolog, escape

41 from X inactivation presents a relatively simple mechanism; due to two active copies being expressed in females and only one in females, there will be female-biased activity of genes escaping X inactivation; this female bias could then have regulatory effects on the rest of the genome.

For genes with a surviving Y homolog, there are multiple ways in which such X-Y pairs could contribute to sex-biased gene expression across the genome. The simplest model is that the

Y-linked homologs, some time after the suppression of recombination, evolved new functions different from their X-linked counterparts and thus male-specific. However, there is to date no evidence of a broadly expressed, Y-linked homolog with demonstrated function different from its X-linked counterpart. Clearly, while the lack of evidence does not mean there are no cases of

Y-specific function, the current evidence suggests that this may not be a significant contributor to sex-biased gene expression. The second class of models requires that Y-linked homologs are either functionally interchangeable with their X-linked counterparts or degenerated somewhat in their function. In the case where Y-linked homologs have retained the same protein functions as their X-linked counterparts, upregulation of the Y homolog relative to the X would create a higher dosage of the X-Y pair in males than in females, where two active X-linked copies would be expressed. Alternatively, both lower expression of a functionally equivalent Y-linked protein and loss of protein function would result in greater activity of the X-Y pair in females, due to the

X-linked homologs escaping X inactivation. This second class of models has arguably greater support in the literature, as the X-linked homologs of X-Y pairs escape X inactivation consistently across species and in a wide range of tissues. Furthermore, there is evidence that individual X-Y pairs are functionally interchangeable (Sekiguchi et al., 2004; Wang et al., 2015;

42 Watanabe et al., 1993), or that the Y-linked homologs have partially reduced functionality relative to their X-linked counterparts (Shpargel et al., 2012, 2017; Walport et al., 2014).

The degree to which the two classes of above-described genes, those without a Y homolog and escaping X inactivation or X-Y pairs, contribute to sex differences in gene expression is a function of two important factors: their genome-wide gene regulatory functions and their dosage sensitivity. Relative to X-linked genes without a Y homolog (both those escaping and subject to X inactivation), X-Y pairs are particularly enriched for regulatory functions such as transcription, splicing, and translation, suggesting that they may contribute disproportionately to sex differences. A number of studies have characterized regulatory targets of individual X-Y pairs using assays such as chromatin immunoprecipitation (to map DNA binding of KDM6A (Mansour et al., 2012), KDM5C (Holstege et al., 2013), and ZFX (Schreiner et al., 2018)) or crosslinking immunoprecipitation (to map RNAs bound by DDX3X (Nostrand et al., 2017; J. Zhang et al., 2016)) followed by high-throughput sequencing, finding that they regulate a large number of genes across the genome. Dosage sensitivity is important because some of the above-described contributions to sex-biased gene expression are based on relative small differences between X-linked homologs in males and females, as most sex bias resulting from escape from X inactivation in human tissues is less than two-fold in magnitude (Tukiainen et al., 2017). As mentioned in Part 1, X-Y pairs are enriched for markers of dosage sensitivity relative to all other ancestral X-linked genes without a surviving Y homolog. Thus, two lines of evidence suggest that X-Y pairs may be significant contributors to sex bias in autosomal gene expression.

43 Linking sex-biased gene expression to phenotypic sex differences

Thus far, I have described how both sexually dimorphic hormonal environments and cell- intrinsic regulatory roles of the sex chromosomes could lead to sex bias in autosomal gene expression. An important next step in this pathway is the link from sex-biased gene expression to phenotypic sex differences. Conceptually, this requires evidence for a causal relationship between the expression level of the gene that shows sex bias and the sex-biased phenotype of interest. In this section, I discuss known cases where sex-biased expression of specific genes has been shown to be a likely contributor to sex differences in a phenotype, using examples from drug metabolism and autoimmunity.

One well-understood example is the case of sex differences in hepatic drug metabolism; as discussed in Part 2, male mice and rats metabolize a number of drugs faster than females and thus require a higher dose to achieve the same effect as females. A number of drug metabolizing enzymes, but primarily a subset of the cytochrome P450 enzymes, show expression patterns in the liver that are practically sex-specific, with almost no expression in one sex and high expression in the other (Laz et al., 2006; Wauthier & Waxman, 2008). While this link is obvious in rodents, which show stark differences in phenotypic drug metabolism as well as expression of the relevant P450 enzymes, the relationship is less clear in other species such as humans, which show much more subtle differences in both expression of P450 enzymes and the ultimate phenotype of sex differences in drug metabolism. Nevertheless, even in humans, continuous administration of growth hormone, which is characteristic of a female hormonal environment and has shown to cause sex-specific P450 enzyme expression in rodents, results in an increase in activity of CYP3A4, an important catalyst of oxidative metabolism in the liver (Jaffe et al.,

2002).

44 Recently, sex-biased expression of a number of individual genes has been linked to sex differences in immune phenotypes, with special relevance to autoimmune disease. A study of sex bias in human skin and keratinocytes identified female-biased expression of the autosomal transcription factor VGLL3 as an important mediator of sex-biased expression across the genome. While VGLL3 itself was not associated with immune phenotypes or autoimmune disease, it was shown to positively regulate several additional genes across the genome, resulting in their female-biased expression. Notably, a number of these female-biased targets of VGLL3 have important roles in female-biased autoimmune disorders. Administration of exogenous IL-7 in mice accelerates the development of Sjogren’s syndrome-like symptoms, while IL-7 blockage stops the same symptoms (Jin et al., 2013). In humans, a therapeutic antibody that neutralizes the cytokine TNF ligand superfamily member 13B (TNFSF13B, also known as B-cell-activation factor of the TNF family, or BAFF) is an effective therapy for systemic lupus erythematosus

(Vincent et al., 2014). A second study in a mouse model of multiple sclerosis showed that male- biased expression of Il33 in immunized mast cells attenuates the proinflammatory, anti-myelin response preferentially, resulting in a less severe response to immunization in males (Russi et al.,

2018). Finally, in an example that illustrates how sex-biased gene expression emanating directly from the sex chromosomes can directly impact phenotypic sex differences, the toll-like receptor

TLR7 was recently shown to escape X inactivation in a range of immune cell-types, leading to higher TLR7 dosage in females than males. In vitro studies of B lymphocytes indicated that female cells in which TLR7 escapes from X inactivation show a greater proliferative and immunoglobulin class switch response to TLR7 ligands (Souyris et al., 2018). Recognition of auto-ribonucleoproteins is a hallmark of systemic lupus erythematosus, a highly female-biased disease, and increases of Tlr7 dosage in mice accelerate both B cell autoreactivity and lupus

45 progression (Pisitkun et al., 2006; Subramanian et al., 2006). Since TLR7 lacks a surviving Y homolog, this example illustrates how sex chromosomes can contribute directly to phenotypic sex differences, without the intermediate of sex-biased autosomal gene expression.

In summary, studies thus far have identified a number of cases where sex biases in the expression of individual genes likely contribute to known sex differences in a range of phenotypes. However, many complex traits and common diseases that also show some degree of sex bias, such as height and multiple sclerosis, are highly polygenic, with many if not most genes in the genome exerting some small effect on phenotypic variation or disease risk (Boyle et al.,

2017). Studies thus far have not taken advantage of this polygenic architecture when connecting sex-biased gene expression, which is widespread, to phenotypic sex differences in health and disease.

Evolutionary causes of sex-biased gene expression: sexual conflict and sexually antagonistic selection

In addition to seeking to understand the biochemical and physiological causes of sex-biased gene expression, researchers have also sought answers to its evolutionary causes. It has long been appreciated that from the perspective of reproductive fitness, males and females can have different optimal values with respect to various traits (Darwin, 1871). This is expected to result in evolutionary pressures for each sex to obtain those differing optimal values, a concept referred to as sexually antagonistic selection, or sexual conflict. Because most of the genome (besides the male-specific Y or W chromosome) is shared between the sexes, genetic variants resulting in phenotypic changes towards the optimum for one sex are detrimental to the fitness of the other.

This has been demonstrated most clearly through artificial mating schemes in Drosophila, where

46 haploid genomes show opposing effect on reproductive fitness when present in males or females

(Chippindale et al., 2001), and studies of wild deer, where male red deer with relatively high reproductive fitness fathered daughters with relatively low fitness (Foerster et al., 2007). If sex- biased gene expression ultimately leads to such phenotypic sex differences, then sexual antagonism can be thought of as the evolutionary cause of sex-biased gene expression. Both direct and indirect evidence, mainly from the reproductive tract, support this idea. One such study was an expression analysis of Drosophila lines where various haploid genomes were expressed in either males or females and reproductive fitness was measured, allowing for the direct identification of sexually antagonistic genes whose expression levels showed opposite correlations with reproductive fitness between the sexes, and essentially bypassing the phenotypic intermediate described above. This approach identified hundreds of genes that demonstrated both sexually antagonistic and sex-biased expression patterns (Innocenti &

Morrow, 2010). A study of sex-biased gene expression in the gonads of six avian species found a correlation between the number of gains or losses of male-biased expression in a species’ lineage and the degree of sexual ornamentation (Harrison et al., 2015). However, outside the reproductive tract, the link between sex-biased gene expression and sexually antagonistic selection is less clear, in part due to the above-described difficulties in associating sex-biased gene expression to phenotypic sex differences.

Part 4. Concluding remarks

This introductory chapter has outlined and attempted to connect genetic, gene regulatory, and phenotypic sex differences in mammals. Part 1 described the evolution of the mammalian sex

47 chromosomes, the ultimate determinants of all sex differences, with the use of the avian sex chromosomes as an independent experiment of nature. Decades of study have revealed that degeneration of the sex-specific chromosome, the Y in mammals and the W in birds, combined with dosage compensation of the sex-shared chromosome, the X in mammals and the Z in birds, drove the evolution of the sex chromosomes from a pair of ordinary autosomes. However, exceptions to these two rules also occur, partly as a result of gene dosage sensitivity; as described in Part 3, these exceptions may be important contributors to differences between males and females independent of hormones. Part 2 explores some of the known phenotypic sex differences in mammals, reaching the overall conclusion that sex differences are widespread across organ systems and, in some cases, species. Finally, part 3 links these observations into a conceptual model for how sex chromosomes ultimately lead to phenotypic sex differences. A key feature of this pathway is that sex bias in autosomal gene expression mediates much of the effects of sex chromosomes, either through indirect effects of sexually dimorphic hormonal environments following gonadal differentiation or through direct, cell-intrinsic effects of sex- linked genes.

The three studies described in subsequent chapters of this thesis fall at different points along this proposed pathway. Chapter 2 assesses the role of preexisting heterogeneities in dosage sensitivity in determining the fate of genes on both the mammalian X and avian Z chromosomes. we do so by quantifying the conservation of binding sites for microRNAs (miRNAs), a class of small noncoding RNAs that tune gene expression levels through repressive effects. Our analyses indicate that three classes of X-linked genes, those with a surviving Y homolog, those with no Y homolog and subject to X inactivation, and those with no Y homolog and escaping X inactivation, differed significantly in dosage sensitivity on the ancestral autosomes, suggesting

48 that this preexisting heterogeneity played an important role in determining their ultimate fates.

We found X-linked genes with a surviving Y homolog to be the most dosage-sensitive, while those with no Y homolog and escaping X inactivation to be the least dosage-sensitive, which, as discussed at the end of this thesis, has important implications for future studies considering the contribution of these two classes of genes to cell-intrinsic sex differences across the body. In

Chapter 3, we describe the generation and analysis of an RNA sequencing dataset to assess genome-wide sex differences in gene expression in twelve tissues in each of five mammalian species: human, cynomolgus macaque, mouse, rat, and dog. While there is conserved sex bias in gene expression in every tissue, most sex bias has been acquired since the last common ancestor of the five species studied, a finding that has important implications for the use of non-human mammals. In Chapter 4, we use the results of this survey to show that conserved sex bias in autosomal gene expression contributes to the known sex differences in mammalian height and body size, where males are most commonly larger than females. These results serve as a proof- of-concept for understanding how sex bias in gene expression contribute to sex differences in complex traits, and given the likely sexual selection acting on height, document an instance where sexual selection results in sex-biased gene expression outside the reproductive tract.

References

Amadori, A., Zamarchi, R., De Silvestro, G., Forza, G., Cavatton, G., Danieli, G. A., Clementi, M., and Chieco-Bianchi, L. (1995). Genetic control of the CD4/CD8 T-cell ratio in humans. Nature Medicine, 1(12), 1279–1283. Aronica, S. M., Kraus, W. L., and Katzenellenbogen, B. S. (1994). Estrogen action via the cAMP signaling pathway: stimulation of adenylate cyclase and cAMP-regulated gene transcription. Proceedings of the National Academy of Sciences of the United States of America, 91(18), 8517–8521. Ashworth, A., Rastan, S., Lovell-Badge, R., and Kay, G. (1991). X-chromosome inactivation

49 may explain the difference in viability of XO humans and mice. Nature, 351(6325), 406– 408. Austin, K. L., Mather, L. E., Philpot, C. R., and McDonald, P. J. (1980). Intersubject and dose- related variability after intravenous administration of erythromycin. British Journal of Clinical Pharmacology, 10(3), 273–279. Bachtrog, D. (2008). The temporal dynamics of processes underlying Y chromosome degeneration. Genetics, 179(3), 1513–1525. Balaton, B. P., Cotton, A. M., and Brown, C. J. (2015). Derivation of consensus inactivation status for X-linked genes from genome-wide studies. Biology of Sex Differences, 6, 35. Baverstock, P. R., Adams, M., Polkinghorne, R. W., and Gelder, M. (1982). A sex-linked enzyme in birds - Z-chromosome conservation but no dosage compensation. Nature, 296(5859), 763–766. Behringer, R. R., Finegold, M. J., and Cate, R. L. (1994). Müllerian-inhibiting substance function during mammalian sexual development. Cell, 79(3), 415–425. Bellott, D. W., Hughes, J. F., Skaletsky, H., Brown, L. G., Pyntikova, T., Cho, T.-J., Koutseva, N., Zaghlul, S., Graves, T., Rock, S., Kremitzki, C., Fulton, R. S., Dugan, S., Ding, Y., Morton, D., Khan, Z., Lewis, L., … Page, D. C. (2014). Mammalian Y chromosomes retain widely expressed dosage-sensitive regulators. Nature, 508(7497), 494–499. Bellott, D. W., Skaletsky, H., Cho, T.-J., Brown, L., Locke, D., Chen, N., Galkina, S., Pyntikova, T., Koutseva, N., Graves, T., Kremitzki, C., Warren, W. C., Clark, A. G., Gaginskaya, E., Wilson, R. K., and Page, D. C. (2017). Avian W and mammalian Y chromosomes convergently retained dosage-sensitive regulators. Nature Genetics, in press. Bellott, D. W., Skaletsky, H., Pyntikova, T., Mardis, E. R., Graves, T., Kremitzki, C., Brown, L. G., Rozen, S., Warren, W. C., Wilson, R. K., and Page, D. C. (2010). Convergent evolution of chicken Z and human X chromosomes by expansion and gene acquisition. Nature, 466(7306), 612–616. Benten, W. P. M., Lieberherr, M., Giese, G., Wrehlke, C., Stamm, O., Sekeris, C. E., Mossmann, H., and Wunderlich, F. (1999). Functional testosterone receptors in plasma membranes of T cells. The FASEB Journal, 13(1), 123–133. Benten, W. P. M., Lieberherr, M., Stamm, O., Wrehlke, C., Guo, Z., and Wunderlich, F. (1999). Testosterone Signaling through Internalizable Surface Receptors in Androgen Receptor-free Macrophages. Molecular Biology of the Cell, 10(10), 3113–3123. Berletch, J. B., Ma, W., Yang, F., Shendure, J., Noble, W. S., Disteche, C. M., and Deng, X. (2015). Escape from X Inactivation Varies in Mouse Tissues. PLOS Genetics, 11(3), e1005079. Bigos, K. L., Pollock, B. G., Coley, K. C., Miller, D. D., Marder, S. R., Aravagiri, M., Kirshner, M. A., Schneider, L. S., and Bies, R. R. (2008). Sex, Race, and Smoking Impact Olanzapine Exposure. The Journal of Clinical Pharmacology, 48(2), 157–165. Blekhman, R., Marioni, J. C., Zumbo, P., Stephens, M., and Gilad, Y. (2010). Sex-specific and lineage-specific alternative splicing in primates. Genome Research, 20(2), 180–189. Boyle, E. A., Li, Y. I., and Pritchard, J. K. (2017). An expanded view of complex traits: from polygenic to omnigenic. Cell, 169(7), 1177–1186. Brown, C. J., Ballabio, A., Rupert, J. L., Lafreniere, R. G., Grompe, M., Tonlorenzi, R., and Willard, H. F. (1991). A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature, 349(6304), 38–44. Bulun, S. E. (2016). Chapter 17 - Physiology and Pathology of the Female Reproductive Tract.

50 In S. Melmed, K. S. Polonsky, P. R. Larsen, & H. M. B. T. Kronenberg (Eds.), Williams Textbook of Endocrinology (13th ed., pp. 581–652). Philadelphia: Elsevier. Burke, M. A., Cook, S. A., Seidman, J. G., and Seidman, C. E. (2016). Clinical and Mechanistic Insights Into the Genetics of Cardiomyopathy. Journal of the American College of Cardiology, 68(25), 2871–2886. Carey, M. P., Deterd, C. H., Koning, J. de, Helmerhorst, F., and Kloet, E. R. de. (1995). The influence of ovarian steroids on hypothalamic-pituitary-adrenal regulation in the female rat. Journal of Endocrinology, 144(2), 311–321. Carrel, L., and Willard, H. F. (2005). X-inactivation profile reveals extensive variability in X- linked gene expression in females. Nature, 434(March), 400–404. Carroll, J. S., Chinnaiyan, A. M., Keeton, E. K., Liu, X. S., Li, W., Pienta, K. J., Wang, Q., Jänne, O. A., and Brown, M. (2007). A Hierarchical Network of Transcription Factors Governs Androgen Receptor-Dependent Prostate Cancer Growth. Molecular Cell, 27(3), 380–392. Carroll, S. B. (2008). Evo-Devo and an Expanding Evolutionary Synthesis: A Genetic Theory of Morphological Evolution. Cell, 134(1), 25–36. Carroll, T. B., Aron, D., Findling, J., and Tyrrell, J. (2017). Chapter 9: Glucocorticoids and Adrenal Androgens. In D. G. Gardner, D. M. Shoback, & F. S. Greenspan (Eds.), Greenspan’s Basic and Clinical Endocrinology (10th ed., pp. 322–365). New York: McGraw-Hill Medical. Charlesworth, B. (1994). The effect of background selection against deleterious mutations on weakly selected, linked variants. Genetical Research, 63(3), 213–227. Charlesworth, B., Morgan, M. T., and Charlesworth, D. (1993). The effect of deleterious mutations on neutral molecular variation. Genetics, 134(4), 1289–1303. Chellakooty, M., Schmidt, I. M., Haavisto, A. M., Boisen, K. A., Damgaard, I. N., Mau, C., Petersen, J. H., Juul, A., Skakkebæk, N. E., and Main, K. M. (2003). Inhibin A, inhibin B, follicle-stimulating hormone, luteinizing hormone, estradiol, and sex hormone-binding globulin levels in 473 healthy infant girls. Journal of Clinical Endocrinology and Metabolism, 88(8), 3515–3520. Chen, C.-S., Laz, E. V, Holloway, M. G., and Waxman, D. J. (2007). Characterization of Three Growth Hormone-Responsive Transcription Factors Preferentially Expressed in Adult Female Liver. Endocrinology, 148(7), 3327–3337. Chen, X., McClusky, R., Chen, J., Beaven, S. W., Tontonoz, P., Arnold, A. P., and Reue, K. (2012). The number of x chromosomes causes sex differences in adiposity in mice. PLoS Genetics, 8(5), e1002709. Chew, E. G. Y., Joseph, R., Ruan, X., Wei, C.-L., Ho, A., Pan, Y. F., Lim, K. S., Bourque, G., Desai, K. V., Fullwood, M. J., Welboren, W.-J., Lee, Y. K., Choy, P. Y., Karuturi, R. K. M., Liu, E. T., Ruan, Y., Herve, T., … Liu, J. (2009). An oestrogen-receptor-α-bound human chromatin interactome. Nature, 462(7269), 58–64. Chippindale, A. K., Gibson, J. R., and Rice, W. R. (2001). Negative genetic correlation for adult fitness between sexes reveals ontogenetic conflict in Drosophila. Proceedings of the National Academy of Sciences, 98(4), 1671–1675. Choi, H. K., and Waxman, D. J. (2000). Plasma Growth Hormone Pulse Activation of Hepatic JAK-STAT5 Signaling: Developmental Regulation and Role in Male-Specific Liver Gene Expression. Endocrinology, 141(9), 3245–3255. Clodfelter, K. H., Holloway, M. G., Park, S.-H., Waxman, D. J., Hodor, P., and Ray, W. J.

51 (2006). Sex-Dependent Liver Gene Expression Is Extensive and Largely Dependent upon Signal Transducer and Activator of Transcription 5b (STAT5b): STAT5b-Dependent Activation of Male Genes and Repression of Female Genes Revealed by Microarray Analysis. Molecular Endocrinology, 20(6), 1333–1351. Codd, M., Sugrue, D., Gersh, B. J., and Melton, L. J. (1989). Epidemiology of idiopathic dilated and hypertrophic cardiomyopathy. A population-based study in Olmsted County, Minnesota, 1975-1984. Circulation (Vol. 80). Corbier, P., Edwards, D. A., and Roffi, J. (1992). The neonatal testosterone surge: A comparative study. Archives Internationales de Physiologie, de Biochimie et de Biophysique, 100(2), 127–131. Cotton, A. M., Ge, B., Light, N., Adoue, V., Pastinen, T., and Brown, C. J. (2013). Analysis of expressed SNPs identifies variable extents of expression from the human inactive X chromosome. Genome Biology, 14(11), R122. Cotton, A. M., Lam, L., Affleck, J. G., Wilson, I. M., Peñaherrera, M. S., McFadden, D. E., Kobor, M. S., Lam, W. L., Robinson, W. P., and Brown, C. J. (2011). Chromosome-wide DNA methylation analysis predicts human tissue-specific X inactivation. Human Genetics, 130, 187–201. Cotton, A. M., Price, E. M., Jones, M. J., Balaton, B. P., Kobor, M. S., and Brown, C. J. (2015). Landscape of DNA methylation on the X chromosome reflects CpG density, functional chromatin state and X-chromosome inactivation. Human Molecular Genetics, 24(6), 1528– 1539. Critchlow, V., Liebelt, R. A., Bar-Sela, M., Mountcastle, W., and Lipscomb, H. S. (1963). Sex difference in resting pituitary-adrenal function in the rat. American Journal of Physiology- Legacy Content, 205(5), 807–815. Cypess, A. M., Lehman, S., Williams, G., Tal, I., Rodman, D., Goldfine, A. B., Kuo, F. C., Palmer, E. L., Tseng, Y.-H., Doria, A., Kolodny, G. M., and Kahn, C. R. (2009). Identification and Importance of Brown Adipose Tissue in Adult Humans. New England Journal of Medicine, 360(15), 1509–1517. Dai, G., He, L., Chou, N., and Wan, Y.-J. Y. (2006). Acetaminophen Metabolism Does Not Contribute to Gender Difference in Its Hepatotoxicity in Mouse. Toxicological Sciences, 92(1), 33–41. Darwin, C. (1871). The descent of man and selection in relation to sex. London: J. Murray. Deng, X., Hiatt, J. B., Nguyen, D. K., Ercan, S., Sturgill, D., Hillier, L. W., Schlesinger, F., Davis, C. a, Reinke, V. J., Gingeras, T. R., Shendure, J., Waterston, R. H., Oliver, B., Lieb, J. D., and Disteche, C. M. (2011). Evidence for compensatory upregulation of expressed X- linked genes in mammals, and Drosophila melanogaster. Nature Genetics, 43(12), 1179–1185. Edgren, G., Liang, L., Adami, H.-O., and Chang, E. T. (2012). Enigmatic sex disparities in cancer incidence. European Journal of Epidemiology, 27(3), 187–196. Ellison, J. W., Ramos, C., Yen, P. H., and Shapiro, L. J. (1992). Structure and expression of the human pseudoautosomal gene XE7. Human Molecular Genetics, 1, 691–696. Felsenstein, J. (1974). The evolutionary advantage of recombination. Genetics, 78(2), 737 LP- 756. Ferguson, M. W. J., and Joanen, T. (1982). Temperature of egg incubation determines sex in Alligator mississippiensis. Nature, 296(5860), 850–853. Fialkow, P. J. (1970). X-chromosome inactivation and the Xg locus. American Journal of

52 Human Genetics, 22(4), 460–463. Fiore, D., Zurlo, F., Enzi, G., Gasparo, M., Semisa, M., and Biondetti, P. R. (1986). Subcutaneous and visceral fat distribution according to sex, age, and overweight, evaluated by computed tomography. The American Journal of Clinical Nutrition, 44(6), 739–746. Fisher, E. M., Beer-Romero, P., Brown, L. G., Ridley, A., McNeil, J. A., Lawrence, J. B., Willard, H. F., Bieber, F. R., and Page, D. C. (1990). Homologous ribosomal protein genes on the human X and Y chromosomes: Escape from X inactivation and possible implications for turner syndrome. Cell, 63(6), 1205–1218. Fisher, R. A. (1935). The Sheltering of Lethals. The American Naturalist, 69(724), 446–455. Flores, F., Naftolin, F., and Ryan, K. J. (1973). Aromatization of Androstenedione and Testosterone by Rhesus Monkey Hypothalamus and Limbic System. Neuroendocrinology, 11(3), 177–182. Foerster, K., Coulson, T., Sheldon, B. C., Pemberton, J. M., Clutton-Brock, T. H., and Kruuk, L. E. B. (2007). Sexually antagonistic genetic variation for fitness in red deer. Nature, 447(7148), 1107–1110. Foote, S., Vollrath, D., Hilton, A., and Page, D. C. (1992). The Human Y Chromosome: Spanning the Euchromatic Region. Science, 258(October), 60–66. Forest, M. G., Sizonenko, P. C., Cathiard, A. M., and Bertrand, J. (1974). Hypophyso-Gonadal Function in Humans during the First Year of Life. Journal of Clinical Investigation, 53(3), 819–828. Foryst-Ludwig, A., Kreissl, M. C., Sprang, C., Thalke, B., Böhm, C., Benz, V., Gürgen, D., Dragun, D., Schubert, C., Mai, K., Stawowy, P., Spranger, J., Regitz-Zagrosek, V., Unger, T., and Kintscher, U. (2011). Sex differences in physiological cardiac hypertrophy are associated with exercise-mediated changes in energy substrate availability. American Journal of Physiology-Heart and Circulatory Physiology, 301(1), H115–H122. Frederic, F., Oliver, C., Wollman, E., Delhaye-Bouchaud, N., and Mariani, J. (1993). IL-1 and LPS induce a sexually dimorphic response of the hypothalamo-pituitary-adrenal axis in several mouse strains. European Cytokine Network, 4(5), 321–329. Frias, J. P., Macaraeg, G. B., Ofrecio, J., Yu, J. G., Olefsky, J. M., and Kruszynska, Y. T. (2001). Decreased Susceptibility to Fatty Acid{\textendash}Induced Peripheral Tissue Insulin Resistance in Women. Diabetes, 50(6), 1344–1350. Fridolfsson, A.-K., Cheng, H., Copeland, N. G., Jenkins, N. A., Liu, H.-C., Raudsepp, T., Woodage, T., Chowdhary, B., Halverson, J., and Ellegren, H. (1998). Evolution of the avian sex chromosomes from an ancestral pair of autosomes. Proceedings of the National Academy of Sciences, 95(14), 8147–8152. Friedmann, B., and Kindermann, W. (1989). Energy metabolism and regulatory hormones in women and men during endurance exercise. European Journal of Applied Physiology and Occupational Physiology, 59(1–2), 1–9. Gabriel, S. M., Roncancio, J. R., and Ruiz, N. S. (1992). Growth Hormone Pulsatility and the Endocrine Milieu during Sexual Maturation in Male and Female Rats. Neuroendocrinology, 56(5), 619–628. Gallucci, W. T., Baum, A., Laue, L., Rabin, D. S., Chrousos, G. P., Gold, P. W., and Kling, M. A. (1993). Sex differences in sensitivity of the hypothalamic-pituitary-adrenal axis. Health Psychology. US: American Psychological Association. Gaskin, J., and Kitay, J. (1971). Hypothalamic and Pituitary Regulation of Adrenocortical Function in the Hamster: Effects of Gonadectomy and Gonadal Hormone Replacement1.

53 Endocrinology, 89(4), 1047–1053. Geisterfer-Lowrance, A. A. T., Christe, M., Conner, D. A., Ingwall, J. S., Schoen, F. J., Seidman, C. E., and Seidman, J. G. (1996). A Mouse Model of Familial Hypertrophic Cardiomyopathy. Science, 272(5262), 731–734. Gerra, G., Volpi, R., Delsignore, R., Maninetti, L., Caccavari, R., Vourna, S., Maestri, D., Chiodera, P., Ugolotti, G., and Coiro, V. (1992). Sex-related responses of beta-endorphin, ACTH, GH and PRL to cold exposure in humans. Acta Endocrinologica, 126(1), 24–28. Gershoni, M., and Pietrokovski, S. (2017). The landscape of sex-differential transcriptome and its consequent selection in human adults. BMC Biology, 15(1), 1–15. Giovanni, de S., B., D. R., R., D. S., and A., M. R. (1995). Gender Differences in Left Ventricular Growth. Hypertension, 26(6), 979–983. Goel, N., Workman, J. L., Lee, T. T., Innala, L., and Viau, V. (2014). Sex differences in the HPA axis. Comprehensive Physiology, 4(3), 1121–1155. Goodfellow, P., Pym, B., Mohandas, T., and Shapiro, L. J. (1984). The cell surface antigen locus, MIC2X, escapes X-inactivation. American Journal of Human Genetics, 36(4), 777– 782. Gorski, R. A., Gordon, J. H., Shryne, J. E., and Southam, A. M. (1978). Evidence for a morphological sex difference within the medial preoptic area of the rat brain. Brain Research, 148(2), 333–346. Graves, J. a M. (2006). Sex chromosome specialization and degeneration in mammals. Cell, 124(5), 901–14. Greeley, E. H., Kealy, R. D., Ballam, J. M., Lawler, D. F., and Segre, M. (1996). The influence of age on the canine immune system. Veterinary Immunology and Immunopathology, 55(1), 1–10. Ham, J., Thomson, A., Needham, M., Webb, P., and Parker, M. (1988). Characterization of response elements for androgens, glucocorticoids and progestins in mouse mammary tumour virus. Nucleic Acids Research, 16(12), 5263–5276. Handa, R. J., Nunley, K. M., Lorens, S. A., Louie, J. P., McGivern, R. F., and Bollnow, M. R. (1994). Androgen regulation of adrenocorticotropin and corticosterone secretion in the male rat following novelty and foot shock stressors. Physiology & Behavior, 55(1), 117–124. Harrison, P. W., Wright, A. E., Zimmer, F., Dean, R., Montgomery, S. H., Pointer, M. A., and Mank, J. E. (2015). Sexual selection drives evolution and rapid turnover of male gene expression. Proceedings of the National Academy of Sciences, 112(14), 4393–4398. Hastie, N. D., Held, W. A., and Toole, J. J. (1979). Multiple genes coding for the androgen- regulated major urinary proteins of the mouse. Cell, 17(2), 449–457. Hindmarsh, P. C., Dennison, E., Pincus, S. M., Cooper, C., Fall, C. H. D., Matthews, D. R., Pringle, P. J., and Brook, C. G. D. (1999). A Sexually Dimorphic Pattern of Growth Hormone Secretion in the Elderly. Journal of Clinical Endocrinology & Metabolism, 84(8), 2679–2685. Holstege, F. C. P., Grosveld, F. G., Timmers, H. T. M., de Graaf, P., van IJcken, W. F. J., Muiño, J. M., Kaufmann, K., van Leenen, D., Koerkamp, M. J. G., and Outchkourov, N. S. (2013). Balancing of Histone H3K4 Methylation States by the Kdm5c/SMCX Histone Demethylase Modulates Promoter and Enhancer Function. Cell Reports, 3(4), 1071–1079. Howden, E. J., Perhonen, M., Peshock, R. M., Zhang, R., Arbab-Zadeh, A., Adams-Huet, B., and Levine, B. D. (2015). Females have a blunted cardiovascular response to one year of intensive supervised endurance training. Journal of Applied Physiology, 119(1), 37–46.

54 Hughes, J. F., Skaletsky, H., Brown, L. G., Pyntikova, T., Graves, T., Fulton, R. S., Dugan, S., Ding, Y., Buhay, C. J., Kremitzki, C., Wang, Q., Shen, H., Holder, M., Villasana, D., Nazareth, L. V, Cree, A., Courtney, L., … Page, D. C. (2012). Strict evolutionary conservation followed rapid gene loss on human and rhesus Y chromosomes. Nature, 483(7387), 82–86. Hughes, J. F., Skaletsky, H., Pyntikova, T., Minx, P. J., Graves, T., Rozen, S., Wilson, R. K., and Page, D. C. (2005). Conservation of Y-linked genes during human evolution revealed by comparative sequencing in chimpanzee. Nature, 437(7055), 100–3. Hulbert, L. E., Carroll, J. A., Ballou, M. A., Burdick, N. C., Dailey, J. W., Caldwell, L. C., Loyd, A. N., Vann, R. C., Welsh, T. H., and Randel, R. D. (2012). Sexually dimorphic stress and pro-inflammatory cytokine responses to an intravenous corticotropin-releasing hormone challenge of Brahman cattle following transportation. Innate Immunity, 19(4), 378–387. Hurtado, A., Holmes, K. A., Ross-Innes, C. S., Schmidt, D., and Carroll, J. S. (2011). FOXA1 is a key determinant of estrogen receptor function and endocrine response. Nature Genetics, 43(1), 27–33. Improta-Brears, T., Whorton, A. R., Codazzi, F., York, J. D., Meyer, T., and McDonnell, D. P. (1999). Estrogen-induced activation of mitogen-activated protein kinase requires mobilization of intracellular calcium. Proceedings of the National Academy of Sciences, 96(8), 4686–4691. Innocenti, P., and Morrow, E. H. (2010). The sexually antagonistic genes of drosophila melanogaster. PLoS Biology, 8(3). Isensee, J., Witt, H., and Pregla, R. (2008). Sexually dimorphic gene expression in the heart of mice and men. J Mol Med, 61–74. Jacobs, P. A., and Strong, J. A. (1959). A Case of Human Intersexuality Having a Possible XXY Sex-Determining Mechanism. Nature, 183(4657), 302–303. Jaffe, C. A., Ocampo-Lim, B., Guo, W., Krueger, K., Sugahara, I., DeMott-Friberg, R., Bermann, M., and Barkan, A. L. (1998). Regulatory mechanisms of growth hormone secretion are sexually dimorphic. Journal of Clinical Investigation, 102(1), 153–164. Jaffe, C. A., Turgeon, D. K., Lown, K., Demott-Friberg, R., and Watkins, P. B. (2002). Growth hormone secretion pattern is an independent regulator of growth hormone actions in humans. American Journal of Physiology-Endocrinology and Metabolism, 283(5), E1008– E1015. Jansson, J.-O., Isaksson, O., and Edén, S. (1985). Sexual Dimorphism in the Control of Growth Hormone Secretion*. Endocrine Reviews, 6(2), 128–150. Jegalian, K., and Page, D. C. (1998). A proposed path by which genes common to mammalian X and Y chromosomes evolve to become X inactivated. Nature, 394(August), 776–780. Jin, J. O., Kawai, T., Cha, S., and Yu, Q. (2013). Interleukin-7 enhances the Th1 response to promote the development of Sjögren’s syndrome-like autoimmune exocrinopathy in mice. Arthritis and Rheumatism, 65(8), 2132–2142. Jin, W., Riley, R. M., Wolfinger, R. D., White, K. P., Passador-Gurgell, G., and Gibson, G. (2001). The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster. Nature Genetics, 29(4), 389–395. Josso, N. (1970). Action of testosterone on the Wolffian duct of rat fetus in organ culture . Archives d’Anatomie Microscopique et de Morphologie Experimentale, 59(1), 37–49. Judd, H. L., Yen, S. S. C., and Lucas, W. E. (1976). Serum 17β-Estradiol and Estrone Levels in Postmenopausal Women With and Without Endometrial Cancer. The Journal of Clinical

55 Endocrinology & Metabolism, 43(2), 272–278. Julien, P., Brawand, D., Soumillon, M., Necsulea, A., Liechti, A., Schütz, F., Daish, T., Grützner, F., and Kaessmann, H. (2012, January). Mechanisms and evolutionary patterns of mammalian and avian dosage compensation. PLoS Biology. Kant, G. J., Lenox, R. H., Bunnell, B. N., Mougey, E. H., Pennington, L. L., and Meyerhoff, J. L. (1983). Comparison of stress response in male and female rats: Pituitary cyclic AMP and plasma prolactin, growth hormone and corticosterone. Psychoneuroendocrinology. Netherlands: Elsevier Science. Kharchenko, P. V, Xi, R., and Park, P. J. (2011). Evidence for dosage compensation between the X chromosome and autosomes in mammals. Nature Genetics, 43(12), 1167–1169. Kiczak, L., Tomaszek, A., Pas, U., Bania, J., Noszczyk-nowak, A., Skrzypczak, P., Pas, R., Zacharski, M., Janiszewski, A., Kuropka, P., Kuropka, P., Ponikowski, P., and Jankowska, E. A. (2015). Sex differences in porcine left ventricular myocardial remodeling due to right ventricular pacing. Biology of Sex Differences, 6(32), 1–16. Kirkpatrick, M. (1987). Sexual Selection by Female Choice in Polygynous Animals. Annual Review of Ecology and Systematics, 18(1), 43–70. Klein-Hitpaß, L., Schorpp, M., Wagner, U., and Ryffel, G. U. (1986). An estrogen-responsive element derived from the 5′ flanking region of the Xenopus vitellogenin A2 gene functions in transfected human cells. Cell, 46(7), 1053–1061. Koopman, P., Gubbay, J., Vivian, N., Goodfellow, P., and Lovell-Badge, R. (1991). Male development of chromosomally female mice transgenic for Sry. Nature, 351(6322), 117– 121. Kousteni, S., Bellido, T., Plotkin, L. I., O’Brien, C. A., Bodenner, D. L., Han, L., Han, K., DiGregorio, G. B., Katzenellenbogen, J. A., Katzenellenbogen, B. S., Roberson, P. K., Weinstein, R. S., Jilka, R. L., and Manolagas, S. C. (2001). Nongenotropic, Sex- Nonspecific Signaling through the Estrogen or Androgen Receptors: Dissociation from Transcriptional Activity. Cell, 104(5), 719–730. Krecic-Shepard, M. E., Park, K., Barnas, C., Slimko, J., Kerwin, D. R., and Schwartz, J. B. (2000). Race and sex influence clearance of nifedipine: Results of a population study. Clinical Pharmacology & Therapeutics, 68(2), 130–142. Lahn, B. T. (1997). Functional Coherence of the Human Y Chromosome. Science, 278(5338), 675–680. Lahn, B. T., and Page, D. C. (1999). Four evolutionary strata on the human X chromosome. Science, 286(5441), 964–967. Lammers, A. R., Dziech, H. A., and German, R. Z. (2005). Ontogeny of Sexual Dimorphism in Chinchilla Lanigera (Rodentia: Chinchillidae) . Journal of Mammalogy, 82(1), 179–189. Laz, E. V, Holloway, M. G., and Waxman, D. J. (2006). Codependence of Growth Hormone- Responsive, Sexually Dimorphic Hepatic Gene Expression on Signal Transducer and Activator of Transcription 5b and Hepatic Nuclear Factor 4α. Molecular Endocrinology, 20(3), 647–660. Laz, E. V, Holloway, M. G., Waxman, D. J., Hosui, A., Hennighausen, L., and Cui, Y. (2007). Loss of Sexually Dimorphic Liver Gene Expression upon Hepatocyte-Specific Deletion of Stat5a-Stat5b Locus. Endocrinology, 148(5), 1977–1986. Laz, E. V, Meyer, R. D., Su, T., and Waxman, D. J. (2009). Male-Specific Hepatic Bcl6: Growth Hormone-Induced Block of Transcription Elongation in Females and Binding to Target Genes Inversely Coordinated with STAT5. Molecular Endocrinology, 23(11), 1914–1926.

56 Lieberherr, M., and Grosse, B. (1994). Androgens increase intracellular calcium concentration and inositol 1,4,5- trisphosphate and diacylglycerol formation via a pertussis toxin-sensitive G- protein. Journal of Biological Chemistry, 269(10), 7217–7223. Lin, F., Xing, K., Zhang, J., and He, X. (2012). Expression reduction in mammalian X chromosome evolution refutes Ohno’s hypothesis of dosage compensation. Proceedings of the National Academy of Sciences, 109(29), 11752–11757. Lindenfors, P., L Gittleman, J., and Jones, K. (2007). Sexual size dimorphism in mammals. In Evolutionary Studies of Sexual Size Dimorphism (pp. 16–26). Oxford University Press. Lori, M., Elizabeth, B.-C., and Nanette, K. W. (2011). Sex/Gender Differences in Cardiovascular Disease Prevention. Circulation, 124(19), 2145–2154. Lyon, M. F. (1961). Gene action in the X-chromosome of the mouse (Mus musculus L.). Nature, 190, 372–373. Lyon, M. F. (1962). Sex chromatin and gene action in the mammalian X-chromosome. American Journal of Human Genetics, 14, 135–48. Maas, A. H. E. M., and Appelman, Y. E. A. (2010). Gender differences in coronary heart disease. Netherlands Heart Journal, 18(12), 598–603. Macleod, S. M., Giles, H. G., Bengert, B., Liu, F. F., and Sellers, E. M. (1979). Age- and Gender-Related Differences in Diazepam Pharmacokinetics. The Journal of Clinical Pharmacology, 19(1), 15–19. Macotela, Y., Boucher, J., Tran, T. T., and Kahn, C. R. (2009). Sex and depot differences in adipocyte insulin sensitivity and glucose metabolism. Diabetes, 58(4), 803–812. Mansour, A. A., Gafni, O., Weinberger, L., Zviran, A., Ayyash, M., Rais, Y., Krupalnik, V., Zerbib, M., Amann-Zalcenstein, D., Maza, I., Geula, S., Viukov, S., Holtzman, L., Pribluda, A., Canaani, E., Horn-Saban, S., Amit, I., … Hanna, J. H. (2012). The H3K27 demethylase Utx regulates somatic and germ cell epigenetic reprogramming. Nature, 488(7411), 409–13. Maron, B. J., Gardin, J. M., Flack, J. M., Gidding, S. S., Kurosaki, T. T., and Bild, D. E. (1995). Prevalence of Hypertrophic Cardiomyopathy in a General Population of Young Adults. Circulation, 92(4), 785–789. Migliaccio, A., Castoria, G., Di Domenico, M., de Falco, A., Bilancio, A., Lombardi, M., Barone, M. V, Ametrano, D., Zannini, M. S., Abbondanza, C., and Auricchio, F. (2000). Steroid-induced androgen receptor-oestradiol receptor beta-Src complex triggers prostate cancer cell proliferation. The EMBO Journal, 19(20), 5406–5417. Miners, J. O., Attwood, J., and Birkett, D. J. (1983). Influence of sex and oral contraceptive steroids on paracetamol metabolism. British Journal of Clinical Pharmacology, 16(5), 503– 509. Miura, T., Komori, M., Iwasaki, M., Kurozumi, K., Ohta, K., Ohmori, S., Kitada, M., and Kamataki, T. (1988). Sex-related difference in oxidative metabolism of testosterone and erythromycin by hamster liver microsomes. FEBS Letters, 231(1), 183–186. Mukherjee, A. S., and Beermann, W. (1965). Synthesis of Ribonucleic Acid by the X- Chromosomes of Drosophila melanogaster and the Problem of Dosage Compensation. Nature, 207(4998), 785–786. Muller, H. J. (1914). A gene for the fourth chromosome of Drosophila. Journal of Experimental Zoology, 17(3), 325–336. Muller, H. J. (1918). Genetic Variability, Twin Hybrids and Constant Hybrids, in a Case of Balanced Lethal Factors. Genetics, 3(5), 422–499. Muller, H. J. (1948). Evidence of the precision of genetic adaptation. Harvey Lect Ser, 43, 165–

57 229. Muller, H. J. (1964). The relation of recombination to mutational advance. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 1(1), 2–9. Naftolin, F., Ryan, K. J., and Petro, Z. (1971). Aromatization OF Androstenedione by Limbic System Tissue From Human Foetuses. Journal of Endocrinology, 51(4), 795–796. Naftolin, F., Ryan, K. J., and Petro, Z. (1972). Aromatization of Androstenedione by the Anterior Hypothalamus of Adult Male and Female Rats. Endocrinology, 90(1), 295–298. Nanda, I., Shan, Z., Schartl, M., Burt, D. W., Koehler, M., Nothwang, H., Grützner, F., Paton, I. R., Windsor, D., Dunn, I., Engel, W., Staeheli, P., Mizuno, S., Haaf, T., and Schmid, M. (1999). 300 million years of conserved synteny between chicken Z and human chromosome 9. Nature Genetics, 21(march), 258–259. Nau, H., and Liddiard, C. (1980). Postnatal development of sex-dependent differences in the metabolism of diazepam by rat liver. Biochemical Pharmacology, 29(3), 447–449. Ng, M., Fleming, T., Robinson, M., Thomson, B., Graetz, N., Margono, C., Mullany, E. C., Biryukov, S., Abbafati, C., Abera, S. F., Abraham, J. P., Abu-Rmeileh, N. M. E., Achoki, T., AlBuhairan, F. S., Alemu, Z. A., Alfonso, R., Ali, M. K., … Gakidou, E. (2014). Global, regional, and national prevalence of overweight and obesity in children and adults during 1980–2013: a systematic analysis for the Global Burden of Disease Study 2013. The Lancet, 384(9945), 766–781. Ngo, S. T., Steyn, F. J., and McCombe, P. A. (2014). Gender differences in autoimmune disease. Frontiers in Neuroendocrinology, 35(3), 347–369. Niwa, T., Kaneko, H., Naritomi, Y., Togawa, A., Shiraga, T., Iwasaki, K., Tozuka, Z., and Hata, T. (1995). Species and sex differences of testosterone and nifedipine oxidation in liver microsomes of rat, dog and monkey. Xenobiotica, 25(10), 1041–1049. Nostrand, E. L. Van, Freese, P., Pratt, G. A., Wang, X., Wei, X., Xiao, R., Blue, S. M., Dominguez, D., Cody, N. A. L., Olson, S., Zhan, L., Bazile, C., Philip, L., Bouvrette, B., Duff, M. O., Garcia, K. E., Gelboin-burkhart, C., … Yeo, G. (2017). A Large-Scale Binding and Functional Map of Human RNA Binding Proteins. BioRxiv. Ogurtsova, K., da Rocha Fernandes, J. D., Huang, Y., Linnenkamp, U., Guariguata, L., Cho, N. H., Cavan, D., Shaw, J., and Makaroff, L. (2017). IDF Diabetes Atlas: Global estimates for the prevalence of diabetes for 2015 and 2040. Diabetes Research and Clinical Practice (Vol. 128). Ohno, S. (1967). Sex chromosomes and sex-linked genes. Springer-Verlag. Ohno, S., and Hauschka, T. S. (1960). Allocycly of the X-Chromosome in Tumors and Normal Tissues. Cancer Research, 20(4), 541–545. Ohno, S., Kaplan, W. D., and Kinosita, R. (1960). On the sex chromatin of Gallus domesticus. Experimental Cell Research, 19, 180–183. Oliver, J., Roca, P., Frontera, M., Pujol, E., Gianotti, M., Rodrı́guez-Cuenca, S., and Justo, R. (2002). Sex-dependent Thermogenesis, Differences in Mitochondrial Morphology and Function, and Adrenergic Response in Brown Adipose Tissue. Journal of Biological Chemistry, 277(45), 42958–42963. Östman, J., Lönnberg, G., Arnqvist, H. J., Blohmé, G., Bolinder, J., Schnell, A. E., Eriksson, J. W., Gudbjörnsdottir, S., Sundkvist, G., and Nyström, L. (2008). Gender differences and temporal variation in the incidence of type 1 diabetes: results of 8012 cases in the nationwide Diabetes Incidence Study in Sweden 1983–2002. Journal of Internal Medicine, 263(4), 386–394.

58 Palmieri, C., Ellis, I. O., Ali, S., Chin, S.-F., Holmes, K. A., Green, A. R., Carroll, J. S., Teschendorff, A. E., Gojis, O., Ross-Innes, C. S., Dunning, M. J., Ali, H. R., Brown, G. D., Caldas, C., and Stark, R. (2012). Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature, 481(7381), 389–393. Pappas, T. C., Gametchu, B., and Watson, C. S. (1995). Membrane estrogen receptors identified by multiple antibody labeling and impeded-ligand binding. The FASEB Journal, 9(5), 404– 410. Penny, G. D., Kay, G. F., Sheardown, S. A., Rastan, S., and Brockdorff, N. (1996). Requirement for Xist in X chromosome inactivation. Nature, 379(6561), 131–137. Pessia, E., Makino, T., Bailly-Bechet, M., McLysaght, A., and Marais, G. a B. (2012). Mammalian X chromosome inactivation evolved as a dosage-compensation mechanism for dosage-sensitive genes on the X chromosome. Proceedings of the National Academy of Sciences of the United States of America, 109(14), 5346–51. Pisitkun, P., Deane, J. A., Difilippantonio, M. J., Tarasenko, T., Satterthwaite, A. B., and Bolland, S. (2006). Autoreactive B Cell Responses to RNA-Related Antigens Due to TLR7 Gene Duplication. Science, 312(5780), 1669–1672. Posner, B. I., Friesen, H. G., Kelly, P. A., and Tsushima, T. (1974). Studies of Insulin, Growth Hormone and Prolactin Binding: Ontogenesis, Effects of Sex and Pregnancy. Endocrinology, 95(2), 532–539. Price, T., and Birch, G. L. (1996). Repeated evolution of sexual color dimorphism in passerine birds. The Auk, 113(4), 842–848. Ralls, K., and Mesnick, S. L. (2009). Sexual Dimorphism. Encyclopedia of Marine Mammals, 1005–1011. Ramos, E. M., Sethupathy, P., Junkins, H. A., Mehta, J. P., Collins, F. S., Manolio, T. A., and Hindorff, L. A. (2009). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences, 106(23), 9362–9367. Ranz, M., Castillo-davis, C. I., and Meiklejohn, C. D. (2003). Sex-Dependent Gene Expression and Evolution of the Drosophila. Science, 300(June), 1742–1745. Reinius, B., Saetre, P., Leonard, J. a, Blekhman, R., Merino-Martinez, R., Gilad, Y., and Jazin, E. (2008). An evolutionarily conserved sexual signature in the primate brain. PLoS Genetics, 4(6), e1000100. Rensch, B. (1950). Die Abhangigkeit der Relativen Sexual-differenz von der Korpergrosse. Bonner Zool. Beitr., 1, 58–69. Rinn, J. L., Rozowsky, J. S., Laurenzi, I. J., Petersen, P. H., Zou, K., Zhong, W., Gerstein, M., and Snyder, M. (2004). Major molecular differences between mammalian sexes are involved in drug metabolism and renal function. Developmental Cell, 6(6), 791–800. Roberts, C. T., Buckberry, S., Shoubridge, C., Bianco-Miotto, T., Clifton, V., Breen, J., and Mayne, B. T. (2016). Large Scale Gene Expression Meta-Analysis Reveals Tissue-Specific, Sex-Biased Gene Expression in Humans. Frontiers in Genetics, 7(October), 1–14. Ross, M. T., Grafham, D. V, Coffey, A. J., Scherer, S., McLay, K., Muzny, D., and Platzer, M. (2005). The DNA sequence of the human X chromosome. Nature, 434(March), 325–337. Roubinian, J. R., Talal, N., Greenspan, J. S., Goodman, J., and Siiteri, P. K. (1978). Effect of castration and sex hormone treatment on survival, anti- nucleic acid antibodies, and glomerulonephritis in NZB/NZW F1 mice. Journal of Experimental Medicine, 147(6), 1568–1583.

59 Ruigrok, A. N. V., Salimi-Khorshidi, G., Lai, M. C., Baron-Cohen, S., Lombardo, M. V., Tait, R. J., and Suckling, J. (2014). A meta-analysis of sex differences in human brain structure. Neuroscience and Biobehavioral Reviews, 39, 34–50. Russi, A. E., Ebel, M. E., Yang, Y., and Brown, M. A. (2018). Male-specific IL-33 expression regulates sex-dimorphic EAE susceptibility. Proceedings of the National Academy of Sciences, 201710401. Sanjak, J. S., Sidorenko, J., Robinson, M. R., Thornton, K. R., and Visscher, P. M. (2017). Evidence of directional and stabilizing selection in contemporary humans. Proceedings of the National Academy of Sciences, 115(20), 201707227. Scheidereit, C., Geisse, S., M. Westphal, H., and Beato, M. (1983). The glucocorticoid receptor binds to defined nucleotide sequences near the promoter of mouse mammary tumour virus. Nature (Vol. 304). Schneider-Gädicke, A., Beer-Romero, P., Brown, L. G., Nussbaum, R., and Page, D. C. (1989). ZFX has a gene structure similar to ZFY, the putative human sex determinant, and escapes X inactivation. Cell, 57(7), 1247–1258. Schreiner, S., Guo, Y., Witt, H., Rhie, S. K., Luo, Z., Farnham, P. J., Yao, L., and Perez, A. A. (2018). ZFX acts as a transcriptional activator in multiple types of human tumors by binding downstream from transcription start sites at the majority of CpG island promoters. Genome Research, 28(3), 310–320. Schulte-Hostedde, A. I., and Millar, J. S. (2000). Measuring sexual size dimorphism in the yellow-pine chipmunk (Tamias amoenus). Canadian Journal of Zoology, 78(5), 728–733. Scotland, R., and Stables, M. (2011). Sex differences in resident immune cell phenotype underlie more efficient acute inflammatory responses in female mice. Blood, 118(22), 5918–5928. Seale, J. V., Wood, S. A., Atkinson, H. C., Bate, E., Lightman, S. L., Ingram, C. D., Jessop, D. S., and Harbuz, M. S. (2004). Gonadectomy reverses the sexually diergic patterns of circadian and stress-induced hypothalamic-pituitary-adrenal axis activity in male and female rats. Journal of Neuroendocrinology, 16(6), 516–524. Sekiguchi, T., Iida, H., Fukumura, J., and Nishimoto, T. (2004). Human DDX3Y , the Y- encoded isoform of RNA helicase DDX3 , rescues a hamster temperature-sensitive ET24 mutant cell line with a DDX3X mutation. Experimental Cell Research, 300, 213–222. Shakil, T., Belsham, D. D., Hoque, A. N. E., and Husain, M. (2002). Differential Regulation of Gonadotropin-Releasing Hormone Secretion and Gene Expression by Androgen: Membrane Versus Activation. Molecular Endocrinology, 16(11), 2592–2602. Shpargel, K. B., Sengoku, T., Yokoyama, S., and Magnuson, T. (2012). UTX and UTY demonstrate histone demethylase-independent function in mouse embryonic development. PLoS Genetics, 8(9), e1002964. Shpargel, K. B., Starmer, J., Ge, K., Wang, C., and Magnuson, T. (2017). UTX-guided neural crest function underlies craniofacial features of Kabuki syndrome. Proceedings of the National Academy of Sciences, 114(43), E9046–E9055. Skaletsky, H., Kuroda-kawaguchi, T., Minx, P. J., Cordum, H. S., Hillier, L., Brown, L. G., Repping, S., Pyntikova, T., Ali, J., Bieri, T., Chinwalla, A., Delehaunty, A., Delehaunty, K., Du, H., Fewell, G., Fulton, L., Fulton, R., … Page, D. C. (2003). The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature, 423, 825– 838. Smith, D. L., Dong, X., Du, S., Oh, M., Singh, R. R., and Voskuhl, R. R. (2007). A female preponderance for chemically induced lupus in SJL/J mice. Clinical Immunology, 122(1),

60 101–107. Smith, J. M., and Haigh, J. (1974). The hitch-hiking effect of a favourable gene. Genetical Research, 23(1), 23–35. Souyris, M., Cenac, C., Azar, P., Daviaud, D., Canivet, A., Grunenwald, S., Pienkowski, C., Chaumeil, J., Mejía, J. E., and Guéry, J.-C. (2018). TLR7 escapes X chromosome inactivation in immune cells. Science Immunology, 3(19), eaap8855. Stern, C. (1957). The problem of complete Y-linkage in man. American Journal of Human Genetics, 9(3), 147–166. Stewart, P. M., and Krone, N. P. (2016). Chapter 15 - Adrenal Cortex. In S. Melmed, K. S. Polonsky, P. R. Larsen, & H. M. B. T. Kronenberg (Eds.), Williams Textbook of Endocrinology (13th ed., pp. 479–536). Philadelphia: Elsevier. Subramanian, S., Tus, K., Li, Q.-Z., Wang, A., Tian, X.-H., Zhou, J., Liang, C., Bartov, G., McDaniel, L. D., Zhou, X. J., Schultz, R. A., and Wakeland, E. K. (2006). A Tlr7 translocation accelerates systemic autoimmunity in murine lupus. Proceedings of the National Academy of Sciences, 103(26), 9970–9975. Szekely, T., Freckleton, R. P., and Reynolds, J. D. (2004). Sexual selection explains Rensch’s rule of size dimorphism in shorebirds. Proceedings of the National Academy of Sciences, 101(33), 12224–12227. Tannenbaum, G. S., and Painson, J.-C. (1991). Sexual Dimorphism of Somatostatin and Growth Hormone-Releasing Factor Signaling in the Control of Pulsatile Growth Hormone Secretion in the Rat*. Endocrinology, 128(6), 2858–2866. Tarloff, J. B., Khairallah, E. A., Cohen, S. D., and Goldstein, R. S. (1996). Sex- and Age- Dependent Acetaminophen Hepato- and Nephrotoxicity in Sprague-Dawley Rats: Role of Tissue Accumulation, Nonprotein Sulfhydryl Depletion, and Covalent Binding. Toxicological Sciences, 30(1), 13–22. Toran-Allerand, C. D., Guan, X., MacLusky, N. J., Horvath, T. L., Diano, S., Singh, M., Connolly, E. S., Nethrapalli, I. S., and Tinnikov, A. A. (2002). ER-X: A Novel, Plasma Membrane-Associated, Putative Estrogen Receptor That Is Regulated during Development and after Ischemic Brain Injury. The Journal of Neuroscience, 22(19), 8391 LP-8401. Tóth, M., and Zakar, T. (1983). Relative binding affinities of testosterone, 19-nortestosterone and their 5α-reduced derivatives to the androgen receptor and to other androgen-binding proteins. Journal of Steroid Biochemistry, 17, 653–660. Tukiainen, T., Villani, A.-C., Yen, A., Rivas, M. A., Marshall, J. L., Satija, R., Aguirre, M., Gauthier, L., Fleharty, M., Kirby, A., Cummings, B. B., Castel, S. E., Karczewski, K. J., Aguet, F., Byrnes, A., Aguet, F., Ardlie, K. G., … MacArthur, D. G. (2017). Landscape of X chromosome inactivation across human tissues. Nature, 550(7675), 244–248. Tulchinsky, D., Hobel, C. J., Yeager, E., and Marshall, J. R. (1972). Plasma estrone, estradiol, estriol, progesterone, and 17-hydroxyprogesterone in human pregnancy: I. Normal pregnancy. American Journal of Obstetrics and Gynecology, 112(8), 1095–1100. Uhart, M., Chong, R. Y., Oswald, L., Lin, P. I., and Wand, G. S. (2006). Gender differences in hypothalamic-pituitary-adrenal (HPA) axis reactivity. Psychoneuroendocrinology, 31(5), 642–652. van Eijk, L. T., Dorresteijn, M. J., Smits, P., van der Hoeven, J. G., Netea, M. G., and Pickkers, P. (2007). Gender differences in the innate immune response and vascular reactivity following the administration of endotoxin to human volunteers*. Critical Care Medicine, 35(6).

61 Vawter, M. P., Evans, S., Choudary, P., Tomita, H., Meador-, J., Molnar, M., Li, J., Lopez, J. F., Myers, R., Cox, D., Watson, S. J., Akil, H., Jones, E. G., and Bunney, W. E. (2004). Gender-Specific Gene Expression in Post-Mortem Human Brain : Localization to Sex Chromosomes. Neuropsychopharmacology, 29, 373–384. Vincent, F. B., Morand, E. F., Schneider, P., and Mackay, F. (2014). The BAFF/APRIL system in SLE pathogenesis. Nature Reviews Rheumatology, 10, 365. Vollrath, D., Foote, S., Hilton, A., Brown, L. G., Beer-Romero, P., Bogan, J. S., and Page, D. C. (1992). The human Y chromosome: A 43-interval map based on naturally occurring deletions. Science, 258(5079), 52–59. Walport, L. J., Hopkinson, R. J., Vollmar, M., Madden, S. K., Gileadi, C., Oppermann, U., Schofield, C. J., and Johansson, C. (2014). Human UTY (KDM6C) Is a Male-specific N- Methyl Lysyl Demethylase. Journal of Biological Chemistry, 289(26), 18302–18313. Wang, T., Birsoy, K., Hughes, N. W., Krupczak, K. M., Post, Y., Wei, J. J., Lander, E. S., and Sabatini, D. M. (2015). Identification and characterization of essential genes in the . Science, 350(6264), 1096–1101. Warnefors, M., Mossinger, K., Halbert, J., Studer, T., VandeBerg, J. L., Lindgren, I., Fallahshahroudi, A., Jensen, P., and Kaessmann, H. (2017). Sex-biased microRNA expression in mammals and birds reveals underlying regulatory mechanisms and a role in dosage compensation. Genome Research, 1–13. Watanabe, M., Zinn, A. R., Page, D. C., and Nishimoto, T. (1993). Functional equivalence of human X- and Y-encoded isoforms of ribosomal protein S4 consistent with a role in Turner syndrome. Nature Genetics, 4(3), 268–271. Wauthier, V., Sugathan, A., Meyer, R. D., Dombkowski, A. A., and Waxman, D. J. (2010). Intrinsic Sex Differences in the Early Growth Hormone Responsiveness of Sex-Specific Genes in Mouse Liver. Molecular Endocrinology, 24(March), 667–678. Wauthier, V., and Waxman, D. J. (2008). Sex-Specific Early Growth Hormone Response Genes in Rat Liver. Molecular Endocrinology, 22(8), 1962–1974. Weiser, M. J., and Handa, R. J. (2009). Estrogen impairs glucocorticoid dependent negative feedback on the hypothalamic-pituitary-adrenal axis via within the hypothalamus. Neuroscience, 159(2), 883–895. Welle, S., Tawil, R., and Thornton, C. A. (2008). Sex-Related Differences in Gene Expression in Human Skeletal Muscle. PLoS ONE, (1), e1385. Welshons, W. J., and Russell, L. B. (1959). The Y-chromosome as the bearer of male determining factors in the mouse. Proceedings of the National Academy of Sciences of the United States of America, 45(4), 560–566. Wu, M. V, and Shah, N. M. (2011). Control of masculinization of the brain and behavior. Current Opinion in Neurobiology, 21(1), 116–123. Xiong, Y., Chen, X., Chen, Z., Wang, X., Shi, S., Wang, X., Zhang, J., and He, X. (2010). RNA sequencing shows no dosage compensation of the active X-chromosome. Nature Genetics, 42(12), 1043–1047. Yang, F., Babak, T., Shendure, J., and Disteche, C. M. (2010). Global survey of escape from X inactivation by RNA-sequencing in mouse. Genome Research, 20(5), 614–622. Yang, J., Manolio, T. A., Pasquale, L. R., Boerwinkle, E., Caporaso, N., Cunningham, J. M., de Andrade, M., Feenstra, B., Feingold, E., Hayes, M. G., Hill, W. G., Landi, M. T., Alonso, A., Lettre, G., Lin, P., Ling, H., Lowe, W., … Visscher, P. M. (2011). Genome partitioning of genetic variation for complex traits using common SNPs. Nature Genetics, 43, 519.

62 Yang, L., Lin, C., Jin, C., Yang, J. C., Tanasa, B., Li, W., Merkurjev, D., Ohgi, K. A., Meng, D., Zhang, J., Evans, C. P., and Rosenfeld, M. G. (2013). LncRNA-dependent mechanisms of androgen-receptor-regulated gene activation programs. Nature, 500(7464), 598–602. Yang, X., Schadt, E. E., Wang, S., Wang, H., Arnold, A. P., Ingram-Drake, L., Drake, T. a, and Lusis, A. J. (2006). Tissue-specific expression and regulation of sexually dimorphic genes in mice. Genome Research, 16(8), 995–1004. Yu, J., Yu, J., Mani, R. S., Cao, Q., Brenner, C. J., Cao, X., Wang, X., Wu, L., Li, J., Hu, M., Gong, Y., Cheng, H., Laxman, B., Vellaichamy, A., Shankar, S., Li, Y., Dhanasekaran, S. M., … Chinnaiyan, A. M. (2010). An Integrated Network of Androgen Receptor, Polycomb, and TMPRSS2-ERG Gene Fusions in Prostate Cancer Progression. Cancer Cell, 17(5), 443–454. Zhang, J., Gilbertson, R. J., Wang, Y.-D., Kanagaraj, A., Rusch, M., Patmore, D. M., Ellison, D. W., Valentin-Vega, Y. A., Finkelstein, D., Taylor, J. P., Kim, H. J., Moore, J., and Parker, M. (2016). Cancer-associated DDX3X mutations drive stress granule assembly and impair global translation. Scientific Reports, 6(1), 1–16. Zhang, Y., Klein, K., Sugathan, A., Nassery, N., Dombkowski, A., Zanger, U. M., and Waxman, D. J. (2011). Transcriptional profiling of human liver identifies sex-biased genes associated with polygenic dyslipidemia and coronary artery disease. PloS One, 6(8), e23506. Zhang, Y., Laz, E. V., and Waxman, D. J. (2012). Dynamic, Sex-Differential STAT5 and BCL6 Binding to Sex-Biased, Growth Hormone-Regulated Genes in Adult Mouse Liver. Molecular and Cellular Biology, 32(4), 880–896. Zhang, Y., Sturgill, D., Parisi, M., Kumar, S., and Oliver, B. (2007). Constraint and turnover in sex-biased gene expression in the genus Drosophila. Nature, 450(7167), 233–237. Zimmer, F., Harrison, P. W., Dessimoz, C., and Mank, J. E. (2016). Compensation of Dosage- Sensitive Genes on the Chicken Z Chromosome. Genome Biology and Evolution, 8(4), 1233–1242.

63

Chapter 2. Conserved microRNA targeting reveals preexisting gene dosage sensitivities that shaped amniote sex chromosome evolution

Sahin Naqvi, Daniel W. Bellott, Kathy S. Lin, & David C. Page

Author contributions S.N., D.W.B. and D.C.P designed the study. S.N. performed analyses with assistance from D.W.B. K.S.L developed and implemented the step-detection algorithm. S.N. and D.C.P wrote the paper.

Acknowledgements We thank V. Agarwal, S. Eichorn, S. McGeary, and D. Bartel for assistance with the TargetScan database and helpful discussions; A. Godfrey for updated human-chicken orthology information; and A. Godfrey, J. Hughes and H. Skaletsky for critical reading of the manuscript.

Adapted from: Naqvi, S., Bellott, D.W., Lin, K.S., & Page, D.C. Conserved microRNA targeting reveals preexisting gene dosage sensitivities that shaped amniote sex chromosome evolution. Genome Research. 28, 474-483 (2018), doi: 10.1101/gr.230433.117

64

65 Summary

Mammalian X and Y Chromosomes evolved from an ordinary autosomal pair. Genetic decay of the Y led to X Chromosome inactivation (XCI) in females, but some Y-linked genes were retained during the course of sex chromosome evolution, and many X-linked genes did not become subject to XCI. We reconstructed gene-by-gene dosage sensitivities on the ancestral autosomes through phylogenetic analysis of microRNA (miRNA) target sites and compared these preexisting characteristics to the current status of Y-linked and X-linked genes in mammals. Preexisting heterogeneities in dosage sensitivity, manifesting as differences in the extent of miRNA-mediated repression, predicted either the retention of a Y homolog or the acquisition of XCI following Y gene decay. Analogous heterogeneities among avian Z-linked genes predicted either the retention of a W homolog or gene-specific dosage compensation following W gene decay. Genome-wide analyses of human copy number variation indicate that these heterogeneities consisted of sensitivity to both increases and decreases in dosage. We propose a model of XY/ZW evolution incorporating such preexisting dosage sensitivities in determining the evolutionary fates of individual genes. Our findings thus provide a more complete view of the role of dosage sensitivity in shaping the mammalian and avian sex chromosomes, and reveal an important role for post-transcriptional regulatory sequences

(miRNA target sites) in sex chromosome evolution.

66 Introduction

The mammalian X and Y Chromosomes evolved from a pair of ordinary autosomes over the past

300 million years (Lahn & Page, 1999). Only 3% of genes on the ancestral pair of autosomes survive on the human Y Chromosome (Bellott et al., 2010; Skaletsky et al., 2003), compared to

98% on the X Chromosome (Mueller et al., 2013). In females, one copy of the X Chromosome is silenced by X inactivation (XCI); this silencing evolved on a gene-by-gene basis following Y gene loss in males and X upregulation in both sexes (Berletch et al., 2015; Jegalian & Page,

1998; Ross et al., 2005; Tukiainen et al., 2017), and some genes escape XCI in humans (Carrel &

Willard, 2005) and other mammals (Yang et al., 2010). Dosage compensation refers to any mechanism restoring ancestral dosage following gene loss from the sex-specific chromosome. In mammalian males, therefore, dosage compensation consisted solely of X upregulation, as it returned X-linked gene expression to ancestral levels following Y gene loss. In females, dosage compensation involved both X upregulation and the acquisition of XCI, which increased and decreased X-linked expression levels, respectively. Since females did not undergo any initial decrease in ancestral dosage due to Y gene loss, X upregulation and the acquisition of XCI together restored ancestral expression levels.

In parallel, the avian Z and W sex chromosomes evolved from a different pair of autosomes than the mammalian X and Y Chromosomes (Bellott et al., 2010; Nanda et al., 1999;

Ross et al., 2005). Decay of the female-specific W Chromosome was similarly extensive, but birds did not evolve a large-scale inactivation of Z-linked genes analogous to XCI in mammals

(Itoh et al., 2007). Dosage compensation, as measured by a male/female expression ratio close to

1, has been observed for some Z-linked genes in some tissues. (Mank & Ellegren, 2009; Uebbing et al., 2015; Zimmer et al., 2016). Thus, genes previously found on the ancestral autosomes that

67 gave rise to the mammalian or avian sex chromosomes have undergone significant changes in gene dosage. In modern mammals, these molecular events have resulted in three classes of ancestral X-linked genes representing distinct evolutionary fates: those with a surviving Y homolog, those with no Y homolog and subject to XCI, and those with no Y homolog but escaping XCI. In birds, two clear classes of ancestral Z-linked genes have arisen: those with or without a W homolog, with additional heterogeneity among Z-linked genes without a W homolog as a result of gene-specific dosage compensation. Identifying gene-by-gene properties that distinguish classes of X- and Z-linked genes is thus crucial to understanding the selective pressures underlying the molecular events of mammalian and avian sex chromosome evolution.

Emerging evidence suggests a role for gene dosage sensitivity in mammalian and avian sex chromosome evolution. X- and Z-linked genes with surviving homologs on the mammalian

Y or avian W Chromosomes are enriched for important regulatory functions and predictors of haploinsufficiency compared to those lacking Y or W homologs (Bellott et al., 2014, 2017); similar observations have been made in fish (White et al., 2015) and Drosophila (Kaiser et al.,

2011). Human X- and chicken Z-linked genes that show the strongest signatures of dosage compensation in either lineage also show signs of dosage sensitivity as measured by membership in large protein complexes (Pessia et al., 2012) or evolutionary patterns of gene duplication and retention (Zimmer et al., 2016). Despite these advances, little is known regarding selective pressures resulting from sensitivity to dosage increases, as these studies either focused on haploinsufficiency or employed less direct predictors of dosage sensitivity. Furthermore, it is not known whether heterogeneities in dosage sensitivity among classes of sex-linked genes were acquired during sex chromosome evolution, or predated the emergence of sex chromosomes, as

68 there has been no explicit, systematic reconstruction of dosage sensitivity on the ancestral autosomes that gave rise to the mammalian and avian sex chromosomes.

To assess the role of preexisting dosage sensitivities in XY and ZW evolution, we sought to employ a measure of dosage sensitivity that could be 1) demonstrably informative with respect to sensitivity to dosage increases, and 2) explicitly reconstructed on the ancestral autosomes. We focused on regulation by microRNAs (miRNAs), small noncoding RNAs that function as tuners of gene dosage by lowering target mRNA levels through pairing to the 3’ untranslated region

(UTR) (Bartel, 2009). The repressive nature of miRNA targeting is informative with respect to sensitivity to dosage increases, allowing for a more complete understanding of the role of dosage sensitivity in sex chromosome evolution. Both miRNAs themselves and their complementary target sites can be preserved over millions of years of vertebrate evolution, facilitating the reconstruction of miRNA targeting on the ancestral autosomes through cross-species sequence alignments. As miRNA targeting occurs post-transcriptionally, reconstruction of its ancestral state is decoupled from transcriptional regulatory mechanisms such as XCI that evolved following X-Y differentiation.

Results Analysis of human copy number variation indicates conserved microRNA targeting of genes sensitive to dosage increases

We first sought to determine whether conserved targeting by microRNAs (miRNAs) correlates with sensitivity to dosage increases across the human genome. To estimate pressure to maintain miRNA targeting, we used published probabilities of conserved targeting (PCT scores) for each gene-miRNA interaction in the human genome. The PCT score reflects an estimate of the

69 probability that a given gene-miRNA interaction is conserved due to miRNA targeting, obtained by calculating the conservation of the relevant miRNA target sites relative to the conservation of the entire 3’ UTR (Friedman et al., 2009). In this manner, the PCT score intrinsically controls for differences in background conservation and sequence composition, both of which vary widely among 3’ UTRs due to differing rates of expression divergence and/or sequence evolution. We refer to these PCT scores as “miRNA conservation scores” in the remainder of the text.

A recent study reported a correlation between these miRNA conservation scores and predicted haploinsufficiency (Pinzón et al., 2016), indicating that conserved miRNA targeting broadly corresponds to dosage sensitivity. However, such a correlation does not isolate the effects of sensitivity to dosage increases, which we expect to be particularly important in the context of miRNA targeting. We reasoned that genes for which increases in dosage are deleterious should be depleted from the set of observed gene duplications in healthy human individuals. We used a catalogue of rare genic copy number variation among 59,898 control human exomes (Exome Aggregation Consortium, ExAC) (Ruderfer et al., 2016) to classify autosomal protein-coding genes as exhibiting or lacking duplication or deletion in healthy individuals (see Methods). We compared duplicated and non-duplicated genes with the same deletion status in order to control for differences in sensitivity to underexpression. We found that non-duplicated genes have significantly higher miRNA conservation scores than duplicated genes, irrespective of deletion status (Figure 2.1A,B). Non-deleted genes also have significantly higher scores than deleted genes irrespective of duplication status (Figure 2.2), but

70

Figure 2.1: Conserved miRNA targeting of autosomal genes stratified by copy number variation in 59,898 human exomes. Probabilities of conserved targeting (PCT) of all gene- miRNA interactions involving non-duplicated and duplicated genes, further stratified as (A) deleted (grey, n = 69,339 interactions from 4,118 genes; blue, n = 80,290 interactions from 3,976 genes) or (B) not deleted (orange, n = 51,514 interactions from 2,916 genes; purple, n = 72,826 interactions from 3,510 genes). *** p < 0.001, two-sided Kolmogorov-Smirnov test. (C) Mean gene-level PCT scores. ** p < 0.01, *** p < 0.001, two-sided Wilcoxon rank-sum test.

71

Figure 2.2: Effect of deletion status on autosomal PCT scores. Probabilities of conserved targeting (PCT) of all gene-miRNA interactions involving non-deleted and deleted genes, further stratified as (A) duplicated (grey, n = 69,339 interactions from 4,118 genes; orange, n = 51,514 interactions from 2,916 genes) or (B) not duplicated (purple, n = 72,826 interactions from 3,510 genes; blue, n = 80,290 interactions from 3,976 genes). *** p < 0.001, two-sided Kolmogorov- Smirnov test. (C) PCT scores for all gene sets in (A) and (B) superimposed on one plot. (D) Mean gene-level PCT scores when aggregating sets of duplicated/not duplicated (left) or deleted/not deleted (right) genes. *** p < 0.0001, two-sided Wilcoxon rank-sum test.

72

duplication status has a greater effect on miRNA conservation scores than does deletion status

(blue vs. orange boxes, Figure 2.1C). Thus, conserved miRNA targeting is a feature of genes sensitive to changes in gene dosage in humans and is especially informative with regards to sensitivity to dosage increases, consistent with the known role of miRNAs in tuning gene dosage by lowering target mRNA levels.

X-Y pairs and X-inactivated genes have higher miRNA conservation scores than X escape genes

We next assessed whether the three classes of X-linked genes differ with respect to dosage sensitivity as inferred by conserved miRNA targeting. To delineate these classes, we began with the set of ancestral genes reconstructed through cross-species comparisons of the human X

Chromosome and orthologous chicken autosomes (Bellott et al., 2010, 2014, 2017; Hughes et al.,

2012; Mueller et al., 2013). We designated ancestral X-linked genes with a surviving human Y homolog (Skaletsky et al., 2003) as X-Y pairs and also considered the set of X-linked genes with a surviving Y homolog in any of eight mammals (Bellott et al., 2014) to increase the phylogenetic breadth of findings regarding X-Y pairs. A number of studies have catalogued the inactivation status of X-linked genes in various human tissues and cell-types. We used a meta- analysis that combined results from three studies by assigning a “consensus” X-inactivation status to each gene (Balaton et al., 2015) to designate the remainder of ancestral genes lacking a

Y homolog as subject to or escaping XCI. In summary, we classified genes as either: 1) X-Y pairs, 2) lacking a Y homolog and subject to XCI (X-inactivated), or 3) lacking a Y homolog but escaping XCI (X escape).

73 We found that human X-Y pairs have the highest miRNA conservation scores, followed by X-inactivated and finally X escape genes (Figure 2.3A,B). The expanded set of X-Y pairs across eight mammals also has significantly higher miRNA conservation scores than ancestral X- linked genes with no Y homolog (Figure 2.4). Observed differences between miRNA conservation scores are not driven by distinct subsets of genes in each class, as indicated by gene resampling with replacement (Figure 2.5). The decrease in miRNA conservation scores of X escape genes relative to X-inactivated genes and X-Y pairs is not driven by genes that escape

XCI variably across individuals (Figure 2.6), and was consistent even when including ambiguous genes as either X-inactivated or X escape genes (Figure 2.7). We also verified that these differences were not driven by artificially inflated or deflated conservation scores of certain target sites due to non-uniformity in 3’ UTR conservation (Methods, Figure 2.8).

Finally, we assessed whether miRNA conservation scores distinguish the three classes by providing additional information not accounted for by known factors (Bellott et al., 2014) influencing evolutionary outcomes. We used logistic regression to model, for each gene, the probability of falling into each of the three classes (X-Y pair, X-inactivated, or X escape) as a linear combination of haploinsufficiency probability (pHI) (Huang et al., 2010), human expression breadth (GTEx Consortium, 2015), purifying selection, measured by the ratio of non- synonymous to synonymous substitution rates (dN/dS) between human and mouse orthologs

(Yates et al., 2016), and mean gene-level miRNA conservation scores. We note that pHI is a score composed of several genic features, one of which is the number of protein-protein interactions, consistent with the idea that members of large protein complexes tend to be dosage- sensitive (Papp et al., 2003; Pessia et al., 2012). Removing either miRNA

74

Figure 2.3. X-Y pairs and X-inactivated genes have higher miRNA conservation scores than X escape genes. PCT score distributions of all gene-miRNA interactions involving (A) human X-Y pairs (n = 371 interactions from 15 genes), X-inactivated genes (n = 6,743 interactions from 329 genes), and X escape genes (n = 1,037 interactions from 56 genes). ** p < 0.01, two-sided Kolmogorov-Smirnov test. (B) Mean gene-level PCT scores. * p < 0.05, ** p < 0.01, two-sided Wilcoxon rank-sum test.

75

Figure 2.4: PCT scores of X-Y pairs across 8 mammals. (A) PCT score distributions of all gene- miRNA interactions involving X-Y pairs across eight sequenced mammalian Y chromosomes (n = 647 interactions from 32 genes) and other ancestral X genes (n = 8,831 interactions from 457 genes). ** p < 0.01, two-sided Kolmogorov-Smirnov test. (B) Gene-level mean PCT scores. * p < 0.05, two-sided Wilcoxon rank-sum test.

76

Figure 2.5: Resampled mean PCT scores of X-linked genes. (A) Resampled gene-miRNA PCT scores for human X-Y pairs (n = 15 genes), X-inactivated genes (n = 329 genes) and X escape genes (n = 56 genes). (B) Resampled gene-miRNA PCT scores for X-Y pairs across eight mammals (n = 32 genes) and genes with no Y homolog in any of eight mammals (n = 457 genes). Points and error bars represent the median and 95% confidence intervals from 1,000 gene samplings with replacement. * p < 0.05, ** p < 0.01, empirical p-value computed as the fraction of random non-overlapping gene sets with a median difference in PCT score at least as large as the true difference.

77

Figure 2.6: PCT score comparisons with consistent and variable escape genes separated. (A) PCT score distributions of all gene-miRNA interactions involving X-Y pairs (n = 371 interactions from 16 genes), X-inactivated genes (n = 6743 interactions from 329 genes), consistent escape genes (n = 567 interactions from 30 genes), or variable escape genes (n = 470 interactions from 26 genes) as defined by Balaton et al (Balaton et al., 2015). * p < 0.05, ** p < 0.01, two-sided Kolmogorov-Smirnov test. (B) Resampled gene-miRNA PCT scores of gene classes from (A). Points and error bars represent the median and 95% confidence intervals from 1,000 gene samplings with replacement. * p < 0.05, empirical p-value computed as the fraction of random non-overlapping gene sets with a median difference in PCT score at least as large as the true difference.

78

Figure 2.7: PCT score comparisons with discordant genes included as X-inactivated or escape. PCT score distributions of all gene-miRNA interactions (A,C) or mean gene-level PCT score (B,D) of classes of X-linked genes with genes with a discordant XCI call (n = 721 interactions from 40 genes) included as X-inactivated (A,B) or X escape (C,D). Numbers of gene-miRNA interactions and genes as in Figure 2.1, but with the addition of discordant gene numbers/interactions to X-inactivated genes (A,B) or X escape genes (C,D). * p < 0.05, ** p < 0.01, two-sided Kolmogorov-Smirnov (A,C) or Wilcoxon rank-sum (B,D) test.

79

Figure 2.8: Variation in within-UTR conservation does not account for observed differences in PCT score among classes of X-linked genes. (A) Example of step-detection to segment 3` UTRs. Top, base-wise branch length scores; bottom, probabilities of transition to a new section. Dashed line indicates p-value cutoff used to delineate a new section (plotted as alternating magenta/yellow points). (B) Boxplots of within-UTR conservation bias (see Methods) for all gene-miRNA interactions involving classes of X-linked genes. (C) Comparisons of PCT scores normalized by within-UTR bias. **, p < 0.01, *** p < 0.001, two-sided Kolmogorov-Smirnov test.

80

conservation or pHI as predictors from the full model resulted in inferior model fits as measured by Aikake’s information criterion (AIC) (full model, AIC 321.5; full model minus miRNA conservation, AIC 327.7; full model minus pHI, AIC 327.3; higher AIC indicates inferior model). Therefore, miRNA conservation and pHI contribute independent information that distinguishes the 3 classes of X-linked genes. Based on our analyses of autosomal copy number variation (Figure 2.1), we attribute this independence to the fact that miRNA conservation scores are most informative with regards to sensitivity to dosage increases. Taken together, these results indicate significant heterogeneity in dosage sensitivity, as inferred by miRNA target site conservation, among the three classes of ancestral X-linked genes: X-Y pairs are the most dosage-sensitive, while X-inactivated genes are of intermediate dosage sensitivity, and X escape genes are the least dosage-sensitive.

Heterogeneities in X-linked miRNA targeting were present on the ancestral autosomes

We next asked whether differences in miRNA targeting were present on the ancestral autosomes that gave rise to the mammalian X and Y Chromosomes. To reconstruct the ancestral state of miRNA targeting, we first focused on miRNA target sites in the 3’ UTR of human orthologs that align with perfect identity to a site in the corresponding chicken ortholog; these sites were likely present in the common ancestor of mammals and birds (Figure 2.9A,B). We found that X-Y pairs have the most human-chicken conserved target sites, followed by X-inactivated genes, and then

X escape genes (Figure 2.9C, top). Unlike the miRNA conservation scores used earlier, this metric does not account for background conservation; we therefore estimated the background conservation of each 3’ UTR using shuffled miRNA family seed sequences (see Methods). X-Y

81 pairs, X-inactivated genes, and X escape genes do differ significantly with respect to background conservation (Figure 2.10), but these differences cannot account for the observed differences in true human-chicken conserved sites (Figure 2.9C, bottom). We observed similar results for the expanded set of X-Y pairs across 8 mammals (Figure 2.11A).

Differences in the number of human-chicken conserved sites among the three classes of

X-linked genes could be explained by heterogeneity in miRNA targeting present on the ancestral autosomes, or by ancestral homogeneity followed by different rates of target site loss during or following X-Y differentiation. To distinguish between these two possibilities, we took advantage of previous reconstructions of human sex chromosome evolution (Figure 2.9A) (Bellott et al.,

2014), which confirmed that, following the divergence of placental mammals from marsupials, an X-autosome chromosomal fusion generated the X-added region (XAR) (Watson et al., 1990).

Genes on the XAR are therefore X-linked in placental mammals, but autosomal in marsupials such as the opossum. We limited our analysis to genes in the XAR and target sites conserved between orthologous chicken and opossum 3’ UTRs, ignoring site conservation in humans; these sites were likely present in the common ancestor of mammals and birds, and an absence of such sites cannot be explained by site loss following X-Y differentiation. We observed the same pattern as with the human-chicken conserved sites, both before and after accounting for background 3’ UTR conservation (Figure 2.9D, three gene classes; Figure 2.11B, X-Y pairs across 8 mammals). These results demonstrate that the autosomal precursors of X-Y pairs and X- inactivated genes were subject to more miRNA-mediated regulation than X escape genes.

Combined with our earlier results, we conclude that present-day heterogeneities in dosage sensitivity on the mammalian X Chromosome existed on the ancestral autosomes from which it derived.

82

Figure 2.9. Heterogeneities in X-linked miRNA targeting were present on the ancestral autosomes. (A) Example reconstruction of an ancestral miR-96 target site in the 3` UTR of KDM6A, an X-linked gene in the X-added region (XAR) with a surviving Y homolog. Dots in non-human species indicate identity with the human sequence, dashes gaps indicate gaps in the multiple sequence alignment. (B) Distributions of sites conserved between 3` UTRs of human and chicken orthologs (top) or comparisons to background expectation (bottom, see Methods) for human X-Y pairs (n = 16), X-inactivated genes (n = 251), and X escape genes (n = 42). (C) Statistics as in (B), but using sites conserved between chicken and opossum 3` UTRs only for genes in the XAR; X-Y pairs (n = 11), X-inactivated genes (n = 58), and X escape genes (n = 27).

83

Figure 2.10: Background human-chicken 3’ UTR conservation among classes of X-linked genes. Mean number of human-chicken conserved sites found using shuffled miRNA seed sequences for (A) human X-Y pairs (n = 15 genes), X-inactivated genes (n = 329 genes) and X escape genes (n = 56 genes), and (B) X-Y pairs across eight mammals (n = 32 genes) and genes with no Y homolog in any of eight mammals (n = 457 genes). *** p < 0.001, two-sided Wilcoxon rank-sum test.

84

Figure 2.11: Ancestral miRNA targeting of X-Y pairs across 8 mammals. (A) Distributions of sites conserved between 3` UTRs of human and chicken orthologs (top) or comparisons to background expectation (bottom, see Methods) for X-Y pairs across 8 mammals (n = 25) and other ancestral X genes (n = 351). (D) Statistics as in (C), but using sites conserved between chicken and opossum 3` UTRs only for genes in the XAR; X-Y pairs across 8 mammals (n = 15), other ancestral X genes (n = 102).

85 Z-W pairs have higher miRNA conservation scores than other ancestral Z-linked genes

We next assessed whether classes of avian Z-linked genes, those with and without a W homolog, show analogous heterogeneities in sensitivity to dosage increases. We used the set of ancestral genes reconstructed through cross-species comparisons of the avian Z Chromosome and orthologous human autosomes and focused on the set of Z-W pairs identified by sequencing of the chicken W Chromosome (Bellott et al., 2010, 2017). To increase the phylogenetic breadth of our comparisons, we also included candidate Z-W pairs obtained through comparisons of male and female genome assemblies (4 species set) or inferred by read-depth changes in female genome assemblies (14 species set, see Methods for details) (Zhou et al., 2014). The more complete 3’ UTR annotations in the human genome relative to chicken allow for a more accurate assessment of conserved miRNA targeting. Accordingly, we analyzed the 3’ UTRs of the human orthologs of chicken Z-linked genes.

We found that the human orthologs of Z-W pairs have higher miRNA conservation scores than the human orthologs of other ancestral Z genes (Figure 2.12A, B). Differences in miRNA conservation scores between Z-W pairs and other ancestral Z genes remained significant when considering the expanded sets of Z-W pairs across four and 14 avian species (Figure 2.13).

These differences are not driven by distinct subsets of genes, as indicated by gene resampling with replacement (Figure 2.14), and cannot be accounted for by within-UTR variation in regional conservation (Figure 2.15). Logistic regression models indicate that miRNA conservation scores provide additional information not captured by known factors (Bellott et al., 2017) influencing survival of W-linked genes (full model model, AIC 127.1; full model minus miRNA conservation, AIC 137.8; full model minus pHI 132.7; higher AIC indicates inferior model).

Together, these results indicate

86

Figure 2.12. Z-W pairs have higher miRNA conservation scores than other ancestral Z- linked genes. PCT score distributions of all gene-miRNA interactions involving the human orthologs of (A) chicken Z-W pairs (n = 832 interactions from 28 genes) and other ancestral Z genes (n = 16,692 interactions from 657 genes). *** p < 0.001, two-sided Kolmogorov-Smirnov test. (B) Mean gene-level PCT scores. *** p < 0.001, two-sided Wilcoxon rank-sum test.

87

Figure 2.13: PCT scores of Z-W pairs across 4 and 14 birds. (A,C) PCT score distributions of all gene-miRNA interactions (A) Z-W pairs including predictions from three additional birds with male and female genome sequence (n = 2,187 interactions from 78 genes) and other ancestral Z genes (n = 15,357 interactions from 607 genes), or (C) Z-W pairs including read depth-based predictions from 10 additional birds with only female genome sequence (n = 4,458 interactions from 157 genes) and other ancestral Z genes (n = 13,086 interactions from 528 genes) *** p < 0.001, two-sided Kolmogorov-Smirnov test. (B,D) Gene-level mean PCT scores. *** p < 0.01, two-sided Wilcoxon rank-sum test.

88

Figure 2.14: Resampled mean PCT scores of Z-linked genes. Gene sets: (A) chicken Z-W pairs (n = 28 genes) and other ancestral Z genes (n = 657 genes), (B) Z-W pairs across four birds (n = 78 genes) compared to the remainder of ancestral Z genes (n = 607 genes), and (C) Z-W pairs across 14 birds (n = 157 genes) compared to the remainder of ancestral Z genes (n = 528 genes). Points and error bars represent the median and 95% confidence intervals from 1,000 gene samplings with replacement. *** p < 0.001, empirical p-value computed as the fraction of random non-overlapping gene sets with a median difference in PCT score at least as large as the true difference.

89

Figure 2.15: Variation in within-UTR conservation cannot account for observed differences in PCT score among classes of Z-linked genes. (A) Boxplots of within-UTR conservation bias (see Methods) for all gene-miRNA interactions involving chicken Z-W pairs or other ancestral X genes. Numbers of interactions and genes as in Figure 2.12A. ** p < 0.01, two-side Wilcoxon rank-sum test. (B) Comparisons of PCT scores normalized by within-UTR bias. *** p < 0.001, two-sided Kolmogorov-Smirnov test.

90

Figure 2.16: Correlation of Z-linked gene-specific dosage compensation with gene-level PCT score. Distributions of chicken male/female expression ratio, normalized to that of human and anolis (y-axis) as a function of mean gene-level PCT quartile (x-axis) for all expressed Z-linked gene with no W homolog. Expression ratios are plotted on a log2 scale; values closer to 0 imply more effective dosage compensation following W gene loss.

91 that Z-linked genes with a surviving W homolog are more sensitive to changes in dosage -- both increases and decreases -- than are genes without a surviving W homolog.

While there are two clear classes of Z-linked genes -- those with or without a W homolog

-- studies of Z-linked gene expression have suggested additional heterogeneity among Z-linked genes without a W homolog due to gene-specific dosage compensation (Mank & Ellegren, 2009;

Uebbing et al., 2015; Zimmer et al., 2016). If Z-linked genes with no W homolog exist upon a continuum from non-compensated to dosage-compensated, those that are more compensated should have more conserved miRNA target sites, reflective of greater dosage sensitivity. We quantified the dosage compensation by using RNA sequencing data (Marin et al., 2017) to compare, in 4 somatic tissues, the chicken male/female expression ratio to the analogous ratio in human and anolis (see Methods). In the brain, kidney, and liver, Z-linked genes with no W homolog and higher mean miRNA conservation scores had male/female expression ratios closer to 1 (Figure 2.16). Thus, in addition to the above-described differences between Z-linked genes with or without a W homolog, Z-linked genes with no W homolog but with more effective dosage compensation have more conserved miRNA target sites than non-compensated genes.

Heterogeneities in Z-linked miRNA targeting were present on the ancestral autosomes

We next asked whether differences in miRNA targeting between Z-W pairs and other ancestral

Z-linked genes were present on the ancestral autosomes that gave rise to the avian Z and W

Chromosomes. We found that chicken Z-W pairs have more human-chicken-conserved miRNA target sites than their Z-linked counterparts without surviving W homologs, both before (Figure

2.17B, top) and after (Figure 2.17B, bottom) accounting for the background conservation of each individual 3’ UTR. To confirm that these differences represent ancestral heterogeneity rather than differential site loss during or following Z-W differentiation, we instead considered the

92 number of sites conserved between human and anolis lizard, which diverged from birds prior to

Z-W differentiation (Figure 2.17A). Chicken Z-W pairs contain an excess of human-anolis conserved miRNA target sites, both before (Figure 2.17C, top) and after (Figure 2.17C, bottom) accounting for the background conservation of each individual 3’ UTR. We observed similar results with the predicted four-species (Figure 2.18) and 14-species (Figure 2.19) sets of Z-W pairs. Thus, the autosomal precursors of avian Z-W pairs were subject to more miRNA-mediated regulation than the autosomal precursors of Z-linked genes that lack a W homolog. Furthermore, in the liver and brain, Z-linked genes with no W homolog with an excess of human-chicken- conserved miRNA sites had male/female expression ratios closer to 1, implying more effective dosage compensation (Figure 2.20). Together, these results indicate heterogeneity in dosage sensitivity among genes on the ancestral autosomes that gave rise to the avian Z Chromosome.

Analyses of experimental datasets validate miRNA target site function

Our results to this point, which indicate preexisting heterogeneities in dosage constraints among

X- or Z-linked genes as inferred by predicted miRNA target sites, lead to predictions regarding the function of these sites in vivo. To test these predictions, we turned to publically available experimental datasets consisting both of gene expression profiling following transfection or knockout of individual miRNAs, and of high-throughput crosslinking-immunoprecipitation

(CLIP) to identify sites that bind Argonaute in vivo (see Methods). If the above-studied sites are effective in mediating target repression, targets of an individual miRNA should show increased

93

Figure 2.17. Heterogeneities in Z-linked miRNA targeting were present on the ancestral autosomes. (A) Example reconstruction of an ancestral miR-145 target site in the 3` UTR of RASA1, a Z-linked gene with a surviving W homolog. Example of 3` UTR sequence alignment for RASA1, a Z-linked gene with a surviving W homolog, with a target site for miR-145 highlighted in gray. (B) Numbers of sites conserved between 3` UTRs of human and chicken orthologs (top) or comparisons to background expectation (bottom) for chicken Z-W pairs (n = 27) and other ancestral Z genes (n = 578). (C) Statistics as in (B), but using sites conserved between human and anolis 3` UTRs.

94

Figure 2.18: Ancestral miRNA targeting of Z-W pairs across 4 birds. (A) Distributions of sites conserved between 3` UTRs of human and chicken orthologs (top) or comparisons to background expectation (bottom, see Methods) for Z-W pairs across chicken and three additional birds with male and female genome sequence (4 birds, n = 73) and other ancestral Z genes (n = 532). (D) Statistics as in (C), but using sites conserved between human and anolis 3` UTRs; Z-W pairs across 4 birds (n = 73), other ancestral Z genes (n = 527). *** p < 0.001, two- sided Fisher’s exact test.

95

Figure 2.19: Ancestral miRNA targeting of predicted Z-W pairs across 14 birds. (A) Distributions of sites conserved between 3` UTRs of human and chicken orthologs (top) or comparisons to background expectation (bottom, see Methods) for Z-W pairs in chicken, predicted in three additional birds with male and female genome sequence, and predicted based on read depth from 10 additional birds with only female genome sequence (14 birds, n = 147) and other ancestral Z genes (n = 458). (D) Statistics as in (C), but using sites conserved between human and anolis 3` UTRs; Z-W pairs across 14 birds (n = 147), other ancestral Z genes (n = 453).

96

Figure 2.20: Correlation of Z-linked gene-specific dosage compensation with human- chicken-conserved site excess. Distributions of chicken male/female expression ratios, normalized to those of human and anolis (y-axis) for expressed Z-linked genes with no W homolog with (left) or without (right) an excess of human-chicken-conserved miRNA sites. ** p < 0.01, *** p < 0.0001, Wilcoxon rank-sum test.

97 expression levels or Argonaute binding following miRNA transfection, and decreased expression levels following miRNA knockout. Together, our analyses of publically available datasets fulfilled these predictions, validating the function of these sites in multiple cellular contexts and species (Figure 2.21). From the gene expression profiling data, we observed results consistent with effective targeting by a) eleven different miRNA families in human HeLa cells (Figure

2.22), b) four different miRNAs in human HCT116 and HEK293 cells (Figure 2.23), and c) miR-

155 in mouse B and Th1 cells (Figure 2.24). In the CLIP data, the human orthologs of X- or Z- linked targets of miR-124 are enriched for Argonaute-bound clusters that appear following miR-

124 transfection, while a similar but non-significant enrichment is observed for miR-7 (5). Thus, conserved miRNA target sites used to infer dosage constraints on X-linked genes and the autosomal orthologs of Z-linked genes can effectively mediate target repression in living cells.

Discussion

Here, through the evolutionary reconstruction of microRNA (miRNA) target sites, we provide evidence for preexisting heterogeneities in dosage sensitivity among genes on the mammalian X and avian Z Chromosomes. We first showed that, across all human autosomal genes, dosage sensitivity -- as indicated by patterns of genic copy number variation -- correlates with the degree of conserved miRNA targeting. We found that conserved targeting correlates especially strongly with sensitivity to dosage increases, consistent with miRNA targeting serving to reduce gene expression. Turning to the sex chromosomes of mammals and birds, genes that retained a homolog on the sex-specific Y or W Chromosome (X-Y and Z-W pairs) have more conserved miRNA target sites than genes with no Y or W homolog. In mammals, genes with no Y homolog

98

Figure 2.21. Analyses of experimental datasets validate miRNA target site function. Responses to transfection (A,B,C) or knockout (D) of indicated miRNAs in human (A,B,C) or mouse (D) cell-types. Each panel depicts corresponding changes in mRNA levels (A,B), in fraction of Argonaute-bound genes (C), and in mRNA stability and translational efficiency as measured by ribosome protected fragments (RPF, D). In each case, X-linked genes and the human orthologs of Z-linked genes containing target sites with an assigned PCT score (red) for the indicated miRNA were compared to all expressed genes lacking target sites (black); gene numbers are indicated in parentheses. (A,B,D) *** p < 0.001, two-sided Kolmogorov-Smirnov test. (C) * p < 0.05, two-sided Fisher’s exact test.

99

Figure 2.22: Gene expression changes following small RNA transfections in human HeLa cells. * p < 0.05, *** p < 0.001, two-sided K-S test.

100

Figure 2.23: Gene expression changes following transfection or knockdown of additional miRNAs in human HCT116 or HEK293 cells. *** p < 0.001, two-sided Kolmogorov-Smirnov test.

101

Figure 2.24: Changes in mRNA stability and translational efficiency and gene expression following miR-155 knockout in mouse immune cells. In each case, mouse orthologs of X- or Z-linked genes containing a human-mouse-conserved (hsa-mmu) miR-155 site were compared to mouse genes containing only nonconserved miR-155 sites. * p < 0.05, *** p < 0.001, two-sided Kolmogorov-Smirnov test.

102

Figure 2.25: Argonaute binding measured by high-throughput crosslinking- immunoprecipitation (CLIP) following miRNA transfection in HEK293 cells. * p < 0.05, two-sided Fisher’s exact test.

103 that became subject to XCI have more conserved sites than those that continued to escape XCI following Y gene decay. In birds, across Z-linked genes with no W homolog, the degree of conserved miRNA targeting correlates with the degree of gene-specific dosage compensation.

We then reconstructed the ancestral state of miRNA targeting, observing significant heterogeneities in the extent of miRNA targeting, and thus dosage sensitivity, on the ancestral autosomes that gave rise to the mammalian and avian sex chromosomes. Finally, through analysis of publically available experimental datasets, we validated the function, in living cells, of the miRNA target sites used to infer dosage sensitivity. We thus conclude that differences in dosage sensitivity – both to increases and to decreases in gene dosage -- among genes on the ancestral autosomes influenced their evolutionary trajectory during sex chromosome evolution, not only on the sex-specific Y and W Chromosomes, but also on the sex-shared X and Z

Chromosomes.

Our findings build upon previous work in three important ways. First, our analysis of miRNA-mediated repression indicates that these heterogeneities consist of sensitivities to dosage increases and decreases, whereas previous studies had either focused on sensitivity to underexpression or could not differentiate the two. Second, our reconstruction of miRNA targeting on the ancestral autosomes provides direct evidence that heterogeneities in dosage sensitivity among classes of X- and Z-linked were preexisting rather than acquired during sex chromosome evolution. Finally, by pointing to specific regulatory sequences (miRNA target sites) functioning to tune gene dosage both prior to and during sex chromosome evolution, our study provides a view of dosage compensation encompassing post-transcriptional regulation.

Human disease studies support the claim that increased dosage of X-Y pairs and X-inactivated genes is deleterious to fitness. Copy number gains of the X-linked gene KDM6A, which has a

104 surviving human Y homolog, are found in patients with developmental abnormalities and intellectual disability (Lindgren et al., 2013). HDAC6, CACNA1F, GDI1, and IRS4 all lack Y homologs and are subject to XCI in humans. A mutation in the 3’ UTR of HDAC6 abolishing targeting by miR-433 has been linked to familial chondrodysplasia in both sexes (Simon et al.,

2010). Likely gain-of-function mutations in CACNA1F cause congenital stationary night blindness in both sexes (Hemara-Wahanui et al., 2005). Copy number changes of GDI1 correlate with the severity of X-linked mental retardation in males, with female carriers preferentially inactivating the mutant allele (Vandewalle et al., 2009). Somatic genomic deletions downstream of IRS4 lead to its overexpression in lung squamous carcinoma (Weischenfeldt et al., 2017).

Males with partial X disomy due to translocation of the distal long arm of the X Chromosome

(Xq28) to the long arm of the Y Chromosome show severe mental retardation and developmental defects (Lahn et al., 1994). Most genes in Xq28 are inactivated in 46,XX females but escape inactivation in such X;Y translocations, suggesting that increased dosage of Xq28 genes caused the cognitive and developmental defects. We anticipate that further studies will reveal additional examples of the deleterious effects of increases in gene dosage of X-Y pairs and X-inactivated genes.

We and others previously proposed that Y gene decay drove upregulation of homologous

X-linked genes in both males and females, and that XCI subsequently evolved at genes sensitive to increased expression from two active X-linked copies in females (Jegalian & Page, 1998;

Ohno, 1967). Our finding that X-inactivated genes have higher miRNA conservation scores than

X escape genes is consistent with this aspect of the model. However, recent studies indicating heterogeneity in dosage sensitivity between classes of mammalian X- or avian Z-linked genes

(Bellott et al., 2014, 2017; Pessia et al., 2012; Zimmer et al., 2016), combined with the present

105 finding that these dosage sensitivities existed on the ancestral autosomes, challenge the previous assumption of a single evolutionary pathway for all sex-linked genes.

We therefore propose a revised model of X-Y and Z-W evolution in which the ancestral autosomes that gave rise to the mammalian and avian sex chromosomes contained three (or two, in the case of birds) classes of genes with differing dosage sensitivities (Figure 2.26A,B). For ancestral genes with high dosage sensitivity, Y or W gene decay would have been highly deleterious, and thus the Y- or W-linked genes were retained. According to our model, these genes’ high dosage sensitivity also precluded upregulation of the X- or Z-linked homolog, and, in mammals, subsequent X-inactivation; indeed, their X-linked homologs continue to escape

XCI (Bellott et al., 2014). For ancestral mammalian genes of intermediate dosage sensitivity, Y gene decay did occur, and was accompanied or followed by compensatory upregulation of the X- linked homolog in both sexes; the resultant increased expression in females was deleterious and led to the acquisition of XCI. Ancestral mammalian genes of low dosage sensitivity continued to escape XCI following Y decay; heterogeneity in X upregulation may further subdivide such genes (Figure 2.21A). These genes’ dosage insensitivity set them apart biologically, and evolutionarily, from the other class of X-linked genes escaping XCI -- those with a surviving Y homolog.

Our revised model relates preexisting, gene-by-gene heterogeneities in dosage sensitivity to the outcomes of sex chromosome evolution. However, the suppression of X-Y recombination did not occur on a gene-by-gene basis, instead initiating Y gene decay and subsequent dosage compensation through a series of large-scale inversions encompassing many genes (Lahn &

Page, 1999). The timings and boundaries of these evolutionary strata varied among mammalian lineages, thus leading to unique chromosome-scale evolutionary dynamics across mammals.

106

Figure 2.26. An evidence-based model of preexisting heterogeneities in dosage sensitivity shaping mammalian and avian sex chromosome evolution. In this model, preexisting heterogeneities in dosage sensitivity determined the trajectory of Y/W gene loss in both mammals and birds, and of subsequent X-inactivation in mammals and dosage compensation in birds. Colored arrow widths are scaled approximately to the number of ancestral genes in each class. (A) The dashed orange line represents the possibility that a subset of X-linked genes may have not undergone compensatory X upregulation following Y gene decay. (B) Ancestral Z genes with no W homolog follow a gradient of preexisting dosage sensitivity (top, grey to white), which determined the degree of dosage compensation following W gene loss (bottom).

107 These large-scale changes would have then allowed for genic selection to take place according to the preexisting dosage sensitivities outlined above. In this way, the course of sex chromosome evolution in mammals is a composite of 1) preexisting, gene-by-gene dosage sensitivities and 2) the manner in which the history of the X and Y unfolded in particular lineages via discrete, large- scale inversions.

In this study, we have focused on classes of ancestral X-linked genes delineated by the survival of a human Y homolog or by the acquisition of XCI in humans, but such evolutionary states can differ among mammalian lineages and species. In mouse, for instance, both Y gene decay (Bellott et al., 2014) and the acquisition of X-inactivation (Yang et al., 2010) are more complete than in humans or other mammals, as exemplified by RPS4X, which retains a Y homolog and continues to escape XCI in primates, but has lost its Y homolog and is subject to

XCI in rodents. These observations could be explained by shortened generation times in the rodent lineage, resulting in longer evolutionary times, during which the forces leading to Y gene decay and the acquisition of X-inactivation could act (Charlesworth & Crow, 1978; Jegalian &

Page, 1998; Ohno, 1967). Another case of lineage differences involves HUWE1, which lacks a Y homolog and is subject to XCI in both human and mouse, but retains a functional Y homolog in marsupials, where it continues to escape XCI. In the future, more complete catalogues of X- inactivation and escape in additional mammalian lineages would make it possible to examine whether analogous, preexisting dosage sensitivities differentiate the three classes of X-linked genes (X-Y pairs, X-inactivated genes, and X escape genes) in other species.

Previous studies have sought evidence of X-linked upregulation during mammalian sex chromosome evolution using comparisons of gene expression levels between the whole of the X

Chromosome and all of the autosomes, with equal numbers of studies rejecting or finding

108 evidence consistent with upregulation (Deng et al., 2011; Julien et al., 2012; Kharchenko et al.,

2011; Lin et al., 2012; Xiong et al., 2010). This is likely due to gene-by-gene heterogeneity in dosage sensitivities that resulted in a stronger signature of upregulation at more dosage sensitive genes (Pessia et al., 2012). Similarly, studies of Z-linked gene expression in birds provide evidence for the gene-by-gene nature of Z dosage compensation, as measured by comparisons of gene expression levels between ZZ males and ZW females (Itoh et al., 2007; Mank & Ellegren,

2009; Uebbing et al., 2015), and indicate a stronger signature of dosage compensation at predicted dosage-sensitive genes (Zimmer et al., 2016). By showing that such dosage sensitivities existed on the ancestral autosomes and consist of sensitivity to both increases and decreases, our findings highlight an additional aspect of dosage compensation that affects both birds and mammals.

In addition to revealing similarities between mammals and birds, our study provides a view of dosage compensation that highlights post-transcriptional regulatory mechanisms, pointing to specific non-coding sequences with known mechanisms (microRNA target sites) functioning across evolutionary time. A recent study in birds showed a role for a Z-linked miRNA, miR-2954-3p, in dosage compensation of some Z-linked genes (Warnefors et al., 2017).

Our study suggests an additional, broader role for miRNA targeting, with hundreds of different miRNAs acting to tune gene dosage both before and during sex chromosome evolution.

Furthermore, our finding of greater conserved miRNA targeting of X-inactivated genes relative to X escape genes shows that it is possible to predict the acquisition of a transcriptional regulatory state (XCI) during sex chromosome evolution on the basis of a preexisting, post- transcriptional regulatory state. Perhaps additional post-transcriptional regulatory mechanisms

109 and their associated regulatory elements will be shown to play roles in mammalian and avian dosage compensation.

Recent work has revealed that the sex-specific chromosome -- the Y in mammals and the

W in birds -- convergently retained dosage-sensitive genes with important regulatory functions

(Bellott et al., 2014, 2017). Our study, by reconstructing the ancestral state of post-transcriptional regulation, provides direct evidence that such heterogeneity in dosage sensitivity existed on the ancestral autosomes that gave rise to the mammalian and avian sex chromosomes. This heterogeneity influenced both survival on the sex-specific chromosomes in mammals and birds and the evolution of XCI in mammals. Thus, two independent experiments of nature offer empirical evidence that modern-day amniote sex chromosomes were shaped, during evolution, by the properties of the ancestral autosomes from which they derive.

110 Methods

Human genic copy number variation

To annotate gene deletions and duplications, we used data from the Exome Aggregation

Consortium (ExAC) (ftp://ftp.broadinstitute.org/pub/ExAC_release/release0.3.1/cnv/), which consists of autosomal genic duplications and deletions (both full and partial) called in 59,898 exomes (Ruderfer et al., 2016). We used the publicly available genic deletion counts but re- computed genic duplication counts using only full duplications, reasoning that partial duplications are unlikely to result in increased dosage of the full gene product. We thus required that an individual duplication fully overlapped the longest protein-coding transcript (GENCODE v19) of a gene using BEDtools (RRID:SCR_006646) (Quinlan & Hall, 2010). We removed genes flagged by ExAC as lying in known regions of recurrent CNVs. This yielded 4,118 genes within duplications and deletions, 3,976 genes within deletions but not duplications, 2,916 genes within duplications but not deletions, and 3,510 genes not subject to duplication or deletion.

X- and Z-linked gene sets

Analyses of conserved miRNA targeting based on multiple species alignments are unreliable for multicopy or ampliconic genes due to ambiguous sequence alignment between species. To avoid such issues, we first removed multicopy and ampliconic genes (Mueller et al., 2013) from a previously published set of human X genes present in the amniote ancestor (Bellott et al., 2014).

We then excluded genes in the human pseudoautosomal (PAR) regions since these genes have not been exposed to the same evolutionary forces as genes in regions where X-Y recombination has been suppressed. Of the remaining ancestral X genes, we classified the 15 genes with human

Y-linked homologs as X-Y pairs. We also analyzed the larger set of 32 X-Y pairs across eight

111 mammals (human, chimpanzee, rhesus macaque, marmoset, mouse, rat, bull, and opossum) with sequenced Y Chromosomes (Bellott et al., 2014).

To classify ancestral X-linked genes without Y homologs as subject to or escaping XCI in humans, we used a collection of consensus XCI calls which aggregate the results of three studies

(Carrel & Willard, 2005; Cotton et al., 2013, 2015) assaying XCI escape (Balaton et al., 2015).

Out of 472 ancestral X genes without a human Y homolog assigned an XCI status by Balaton et al. (Balaton et al., 2015), 329 were subject to XCI (“Subject” or “Mostly subject” in Balaton et al.), 26 displayed variable escape (“Variable escape” or “Mostly variable escape”) from XCI, and

30 showed consistent escape (“Escape” or “Mostly escape”). We excluded 40 ancestral X genes with a “Discordant” XCI status as assigned by Balaton et al. In the main text, we present results obtained after combining both variable and consistent escape calls from Balaton et al. into one class, yielding the following counts: 15 X-Y pairs, 329 ancestral X genes subject to XCI, and 56 ancestral X genes with evidence of escape from XCI. We also performed analyses considering escape and variable escape genes separately.

microRNA target sites

Pre-calculated PCT scores for all gene-miRNA family interactions were obtained from

TargetScanHuman v7.1 (RRID:SCR_010845)

(http://www.targetscan.org/vert_71/vert_71_data_download/Summary_Counts.all_predictions.tx t.zip), (Friedman et al., 2009). We excluded mammalian-specific miRNA families based on classifications by Friedman et al (Friedman et al., 2009) and updated in TargetScanHuman v7.1(Agarwal et al., 2015). To account for gene-specific variability in the number and PCT score of gene-miRNA interactions within a group of genes, we sampled 1000x with replacement from

112 the same group of genes and computed the mean gene-miRNA PCT score for all associated gene miRNA interactions from each sampling. These 1000 samplings were then used to estimate the median resampled gene-miRNA PCT and 95% confidence intervals.

Site-wise alignment information was obtained from TargetScanHuman v7.1

(http://www.targetscan.org/vert_71/vert_71_data_download/Conserved_Family_Info.txt.zip). To determine which target sites are present in the 3` UTRs of both human and chicken orthologs, we counted, for genes with both a human and chicken ortholog, the number of miRNA interactions that had at least one target site in both human and chicken. To control for gene-specific background 3` UTR conservation, we generated six control k-mers for each miRNA family seed sequence that were matched exactly for nucleotide and CpG content. Six was the maximum number of unique control k-mers that could be generated for all sequences. We repeated the above counting analysis with each of the control k-mers using scripts from TargetScan, and compared, for each gene, the observed number of human-chicken-conserved miRNA interactions

(the observed conservation signal) to the average number from controls (the background conservation). This same procedure was repeated for alternative pairs of species considered

(opossum-chicken and human-anolis lizard).

Variation in within-UTR conservation bias

To address the possibility that non-uniformity in regional 3’ UTR conservation could artificially inflate or deflate conservation scores of certain target sites, we implemented a step-detection algorithm to segment 3’ UTRs into regions of homogeneous background conservation and calculated miRNA site conservation relative to these smaller regions. The PCT of a given miRNA

113 target site depends on the conservation of the site, as measured by the total branch length of the phylogenetic tree containing the target sites (branch length score, BLS) relative to the mean BLS of the whole 3` UTR. To address the possibility that non-uniformity in the regional BLS could artificially inflate or deflate conservation scores of certain target sites, we implemented a step- detection algorithm to segment 3` UTRs into regions of homogeneous BLS values. In order to call steps within a 3` UTR, we computed the t-test p-value between the BLS values of the 50-nt window upstream and downstream of each nucleotide position in the 3` UTR. Transitions were called at a log p-value cutoff of -15. Because of noise in the BLS signal, the log p-value often dips below -15 several times around each transition. If more than 1 position met the cutoff within

100 nucleotides of each other, we took only the one with the smaller p-value. We then computed, for each miRNA site, the ratio of the mean BLS of its section to that of the entire 3` UTR; we term this statistic the “within-UTR conservation bias”. Values of this statistic greater than 1 indicate that the PCT overestimates the relative conservation of a given target site, while values less than 1 indicate that the PCT underestimates conservation. For gene-miRNA interactions with multiple sites, we used the mean within-UTR conservation bias for all sites. We also repeated

PCT score comparisons between classes of X- and Z-linked genes with PCT scores normalized by the corresponding gene-miRNA within-UTR conservation bias (Figure 2.8C, Figure 2.13B).

Logistic regression

Logistic regression models were constructed using the function ‘multinom’ in the R package

‘nnet.’ We used previously published values for known factors in the survival of Y-linked

(Bellott et al., 2014) and W-linked (Bellott et al., 2017) genes except for human expression

114 breadth, which we recalculated using data from the GTEx Consortium v6 data release (The

GTEx Consortium, 2015). Briefly, kallisto was used to estimate transcript per million (TPM) values in the 10 male samples with the highest RNA integrity numbers (RINs) from each of 37 tissues, and expression breadth across tissues was calculated as described in (Bellott et al., 2014), using median TPM values for each tissue.

Assessing Z-linked dosage compensation using cross-species RNA-sequencing data

Raw RNA-seq reads of male and female samples from 4 somatic tissues (liver, brain, kidney, and heart) from human, chicken and anolis were obtained from (Marin et al., 2017) (GSE97367).

Kallisto was used to pseudomap reads and quantify transcript abundances (human, GENCODE v26; chicken and anolis, Ensembl 87), with the following options: “--bias”, “--single”, “-l 200”,

“-s 20.” Transcripts were summed to the gene-level using the tximport R package with the option

“lengthscaledTPM.” Ensembl one-to-one orthologs were used, except for ancestral Z-linked genes, where orthology assignments from Bellott et al were used. Within each tissue, gene-level counts were normalized across species using the trimmed median of means (TMM) method in the edgeR R package. Genes were only considered for analysis if they were expressed > 1 TPM in all human, chicken, and anolis samples from that tissue. The limma/voom R package was used to quantify the male/female expression ratio in chicken relative to the male/female ratio in human and anolis.

Gene expression profiling and crosslinking datasets

Fold-changes in mRNA expression from a compendium of small RNA (sRNA) transfections

(corresponding to twelve different miRNAs) in HeLa cells were obtained from Agarwal and

115 colleagues (Agarwal et al., 2015) (GSM210904, GSM37601, GSM210913, GSM210903,

GSM210911, GSM210898, GSM210897, GSM210897, GSM210901, GSM210909,

GSM119747; E-MEXP-1402(1595297513)). Further datasets describing the effects of transfecting miR-103 in HCT116 cells (Linsley et al., 2007) (GSM156580), knocking down miR-92a in HEK293 cells (Hafner et al., 2010) (GSM538818), transfecting miR-7 or miR-124 in

HEK293 cells (Hausser et al., 2009) (GSM363763,

GSM363766, GSM363769, GSM363772, GSM363775, GSM363778), or of knocking out miR-

155 in mouse B cells (Eichhorn et al., 2014)

(GSM1479572, GSM1479576, GSM1479580, GSM1479584), T cells (Loeb et al., 2012)

(GSM1012118, GSM1012119, GSM1012120, GSM1012121, GSM1012122, GSM1012123), or

Th1 and Th2 cells (Rodriguez et al., 2007) (E-TABM-232), processed as described in (Agarwal et al., 2015), were provided by V. Agarwal. Targets for the PAR-CLIP study (Hafner et al.,

2010) were inferred from an online resource of HEK293 clusters observed after transfection of either miR-124

(http://www.mirz.unibas.ch/restricted/clipdata/RESULTS/miR124_TRANSFECTION/miR124_

TRANSFECTION.html) or miR-7

(http://www.mirz.unibas.ch/restricted/clipdata/RESULTS/miR7_TRANSFECTION/miR7_TRA

NSFECTION.html).

Code availability

A custom Python (RRID:SCR_008394) script utilizing Biopython (RRID:SCR_007173) was used to generate shuffled miRNA family seed sequences. Identification of miRNA target site matches using shuffled seed sequences was performed using the ‘targetscan_70.pl’ perl script

116 (http://www.targetscan.org/vert_71/vert_71_data_download/targetscan_70.zip). 3’ UTR segmentation was performed with the ‘plot_transitions.py’ python script. Code is available at: https://github.com/snaqvi1990/Naqvi17-code.

References

Agarwal, V., Bell, G. W., Nam, J., and Bartel, D. P. (2015). Predicting effective microRNA target sites in mammalian mRNAs. ELife, 4, e05005. Balaton, B. P., Cotton, A. M., and Brown, C. J. (2015). Derivation of consensus inactivation status for X-linked genes from genome-wide studies. Biology of Sex Differences, 6, 35. Bartel, D. P. (2009). MicroRNAs: target recognition and regulatory functions. Cell, 136(2), 215– 233. Bellott, D. W., Hughes, J. F., Skaletsky, H., Brown, L. G., Pyntikova, T., Cho, T.-J., Koutseva, N., Zaghlul, S., Graves, T., Rock, S., Kremitzki, C., Fulton, R. S., Dugan, S., Ding, Y., Morton, D., Khan, Z., Lewis, L., … Page, D. C. (2014). Mammalian Y chromosomes retain widely expressed dosage-sensitive regulators. Nature, 508(7497), 494–499. Bellott, D. W., Skaletsky, H., Cho, T.-J., Brown, L., Locke, D., Chen, N., Galkina, S., Pyntikova, T., Koutseva, N., Graves, T., Kremitzki, C., Warren, W. C., Clark, A. G., Gaginskaya, E., Wilson, R. K., and Page, D. C. (2017). Avian W and mammalian Y chromosomes convergently retained dosage-sensitive regulators. Nature Genetics, in press. Bellott, D. W., Skaletsky, H., Pyntikova, T., Mardis, E. R., Graves, T., Kremitzki, C., Brown, L. G., Rozen, S., Warren, W. C., Wilson, R. K., and Page, D. C. (2010). Convergent evolution of chicken Z and human X chromosomes by expansion and gene acquisition. Nature, 466(7306), 612–616. Berletch, J. B., Ma, W., Yang, F., Shendure, J., Noble, W. S., Disteche, C. M., and Deng, X. (2015). Escape from X Inactivation Varies in Mouse Tissues. PLOS Genetics, 11(3), e1005079. Carrel, L., and Willard, H. F. (2005). X-inactivation profile reveals extensive variability in X- linked gene expression in females. Nature, 434(March), 400–404. Charlesworth, B., and Crow, J. F. (1978). Model for evolution of Y chromosomes and dosage compensation, 75(11), 5618–5622. Cotton, A. M., Ge, B., Light, N., Adoue, V., Pastinen, T., and Brown, C. J. (2013). Analysis of expressed SNPs identifies variable extents of expression from the human inactive X chromosome. Genome Biology, 14(11), R122. Cotton, A. M., Price, E. M., Jones, M. J., Balaton, B. P., Kobor, M. S., and Brown, C. J. (2015). Landscape of DNA methylation on the X chromosome reflects CpG density, functional chromatin state and X-chromosome inactivation. Human Molecular Genetics, 24(6), 1528– 1539. Deng, X., Hiatt, J. B., Nguyen, D. K., Ercan, S., Sturgill, D., Hillier, L. W., Schlesinger, F., Davis, C. a, Reinke, V. J., Gingeras, T. R., Shendure, J., Waterston, R. H., Oliver, B., Lieb, J. D., and Disteche, C. M. (2011). Evidence for compensatory upregulation of expressed X- linked genes in mammals, Caenorhabditis elegans and Drosophila melanogaster. Nature Genetics, 43(12), 1179–1185.

117 Eichhorn, S. W., Guo, H., McGeary, S. E., Rodriguez-Mias, R. a, Shin, C., Baek, D., Hsu, S.-H., Ghoshal, K., Villén, J., and Bartel, D. P. (2014). mRNA Destabilization Is the Dominant Effect of Mammalian MicroRNAs by the Time Substantial Repression Ensues. Molecular Cell. Friedman, R. C., Farh, K. K.-H., Burge, C. B., and Bartel, D. P. (2009). Most mammalian mRNAs are conserved targets of microRNAs. Genome Research, 19(1), 92–105. Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Berninger, P., Rothballer, A., Jr, M. A., Munschauer, M., Ulrich, A., Wardle, G. S., Dewell, S., Zavolan, M., and Tuschl, T. (2010). Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell, 141(1), 129–141. Hausser, J., Landthaler, M., Jaskiewicz, L., Gaidatzis, D., and Zavolan, M. (2009). mRNA binding of Argonaute / EIF2C − miRNA complexes and the degradation of miRNA targets Relative contribution of sequence and structure features to the mRNA binding of Argonaute / EIF2C – miRNA complexes and the degradation of miRNA targets. Genome Research, 2009–2020. Hemara-Wahanui, A., Berjukow, S., Hope, C. I., Dearden, P. K., Wu, S.-B., Wilson-Wheeler, J., Sharp, D. M., Lundon-Treweek, P., Clover, G. M., Hoda, J.-C., Striessnig, J., Marksteiner, R., Hering, S., and Maw, M. a. (2005). A CACNA1F mutation identified in an X-linked retinal disorder shifts the voltage dependence of Cav1.4 channel activation. Proceedings of the National Academy of Sciences of the United States of America, 102(21), 7553–7558. Huang, N., Lee, I., Marcotte, E. M., and Hurles, M. E. (2010). Characterising and predicting haploinsufficiency in the human genome. PLoS Genetics, 6(10), e1001154. Hughes, J. F., Skaletsky, H., Brown, L. G., Pyntikova, T., Graves, T., Fulton, R. S., Dugan, S., Ding, Y., Buhay, C. J., Kremitzki, C., Wang, Q., Shen, H., Holder, M., Villasana, D., Nazareth, L. V, Cree, A., Courtney, L., … Page, D. C. (2012). Strict evolutionary conservation followed rapid gene loss on human and rhesus Y chromosomes. Nature, 483(7387), 82–86. Itoh, Y., Melamed, E., Yang, X., Kampf, K., Wang, S., Yehya, N., Van Nas, A., Replogle, K., Band, M. R., Clayton, D. F., Schadt, E. E., Lusis, A. J., and Arnold, A. P. (2007). Dosage compensation is less effective in birds than in mammals. Journal of Biology, 6(1), 2. Jegalian, K., and Page, D. C. (1998). A proposed path by which genes common to mammalian X and Y chromosomes evolve to become X inactivated. Nature, 394(August), 776–780. Julien, P., Brawand, D., Soumillon, M., Necsulea, A., Liechti, A., Schütz, F., Daish, T., Grützner, F., and Kaessmann, H. (2012, January). Mechanisms and evolutionary patterns of mammalian and avian dosage compensation. PLoS Biology. Kaiser, V. B., Zhou, Q., and Bachtrog, D. (2011). Nonrandom gene loss from the drosophila miranda neo-Y chromosome. Genome Biology and Evolution, 3, 1329–1337. Kharchenko, P. V, Xi, R., and Park, P. J. (2011). Evidence for dosage compensation between the X chromosome and autosomes in mammals. Nature Genetics, 43(12), 1167–1169. Lahn, B. T., Ma, N., Breg, R. W., Stratton, R., Surti, U., and Page, D. C. (1994). Xq-Yq interchange resulting in supernormal X-linked gene expression in severely retarded males with 46,XYq- karyotype. Nature Genetics, 8, 362–369. Lahn, B. T., and Page, D. C. (1999). Four evolutionary strata on the human X chromosome. Science, 286(5441), 964–967. Lin, F., Xing, K., Zhang, J., and He, X. (2012). Expression reduction in mammalian X chromosome evolution refutes Ohno’s hypothesis of dosage compensation. Proceedings of

118 the National Academy of Sciences, 109(29), 11752–11757. Lindgren, A. M., Hoyos, T., Talkowski, M. E., Hanscom, C., Blumenthal, I., Chiang, C., Ernst, C., Pereira, S., Ordulu, Z., Clericuzio, C., Drautz, J. M., Rosenfeld, J. a, Shaffer, L. G., Velsher, L., Pynn, T., Vermeesch, J., Harris, D. J., … Morton, C. C. (2013). Haploinsufficiency of KDM6A is associated with severe psychomotor retardation, global growth restriction, seizures and cleft palate. Human Genetics, 132(5), 537–52. Linsley, P. S., Schelter, J., Burchard, J., Kibukawa, M., Martin, M. M., Bartz, S. R., Johnson, J. M., Cummins, J. M., Raymond, C. K., Dai, H., Chau, N., Cleary, M., Jackson, A. L., Carleton, M., and Lim, L. (2007). Transcripts targeted by the microRNA-16 family cooperatively regulate cell cycle progression. Mol Cell Biol, 27(6), 2240–2252. Loeb, G. B., Khan, A. A., Canner, D., Hiatt, J. B., Shendure, J., Darnell, R. B., Leslie, C. S., and Rudensky, A. Y. (2012). Transcriptome-wide miR-155 Binding Map Reveals Widespread Noncanonical MicroRNA Targeting. Molecular Cell, 48(5), 760–770. Mank, J. E., and Ellegren, H. (2009). All dosage compensation is local: gene-by-gene regulation of sex-biased expression on the chicken Z chromosome. Heredity, 102(3), 312–320. Marin, R., Cortez, D., Lamanna, F., Pradeepa, M. M., Leushkin, E., Julien, P., Liechti, A., Halbert, J., Brüning, T., Mössinger, K., Trefzer, T., Conrad, C., Kerver, H. N., Wade, J., Tschopp, P., and Kaessmann, H. (2017). Convergent origination of a Drosophila -like dosage compensation mechanism in a reptile lineage. Genome Research, 1–14. Mueller, J. L., Skaletsky, H., Brown, L. G., Zaghlul, S., Rock, S., Graves, T., Auger, K., Warren, W. C., Wilson, R. K., and Page, D. C. (2013). Independent specialization of the human and mouse X chromosomes for the male germ line. Nature Genetics, 45(9), 1083–1087. Nanda, I., Shan, Z., Schartl, M., Burt, D. W., Koehler, M., Nothwang, H., Grützner, F., Paton, I. R., Windsor, D., Dunn, I., Engel, W., Staeheli, P., Mizuno, S., Haaf, T., and Schmid, M. (1999). 300 million years of conserved synteny between chicken Z and human chromosome 9. Nature Genetics, 21(march), 258–259. Ohno, S. (1967). Sex chromosomes and sex-linked genes. Springer-Verlag. Papp, B., Pal, C., and Hurst, L. D. (2003). Dosage sensitivity and the evolution of gene families in yeast. Nature, 424, 194–197. Pessia, E., Makino, T., Bailly-Bechet, M., McLysaght, A., and Marais, G. a B. (2012). Mammalian X chromosome inactivation evolved as a dosage-compensation mechanism for dosage-sensitive genes on the X chromosome. Proceedings of the National Academy of Sciences of the United States of America, 109(14), 5346–51. Pinzón, N., Li, B., Martinez, L., Sergeeva, A., Presumey, J., Apparailly, F., and Seitz, H. (2016). The number of biologically relevant microRNA targets has been largely over-estimated The number of biologically relevant microRNA targets has been largely overestimated, (November), 1–11. Quinlan, A. R., and Hall, I. M. (2010). BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6), 841–842. Rodriguez, A., Vigorito, E., Clare, S., Warren, M. V, Couttet, P., Soond, D. R., Dongen, S. Van, Grocock, R. J., Das, P. P., Miska, E. A., Vetrie, D., Okkenhaug, K., Enright, A. J., Dougan, G., Turner, M., and Bradley, A. (2007). Requirement of bic/microRNA-155 for normal immune function. Science, 981(April), 608–611. Ross, M. T., Grafham, D. V, Coffey, A. J., Scherer, S., McLay, K., Muzny, D., and Platzer, M. (2005). The DNA sequence of the human X chromosome. Nature, 434(March), 325–337. Ruderfer, D. M., Hamamsy, T., Lek, M., Karczewski, K. J., Kavanagh, D., Samocha, K. E.,

119 Exome Aggregation Consortium, Daly, M. J., MacArthur, D. G., Fromer, M., and Purcell, S. M. (2016). Patterns of genic intolerance of rare copy number variation in 59,898 human exomes. Nature Genetics, 48(10), 1107–1111. Simon, D., Laloo, B., Barillot, M., Barnetche, T., Blanchard, C., Rooryck, C., Marche, M., Burgelin, I., Coupry, I., Chassaing, N., Gilbert-Dussardier, B., Lacombe, D., Grosset, C., and Arveiler, B. (2010). A mutation in the 3′-UTR of the HDAC6 gene abolishing the post- transcriptional regulation mediated by hsa-miR-433 is linked to a new form of dominant X- linked chondrodysplasia. Human Molecular Genetics, 19(10), 2015–2027. Skaletsky, H., Kuroda-kawaguchi, T., Minx, P. J., Cordum, H. S., Hillier, L., Brown, L. G., Repping, S., Pyntikova, T., Ali, J., Bieri, T., Chinwalla, A., Delehaunty, A., Delehaunty, K., Du, H., Fewell, G., Fulton, L., Fulton, R., … Page, D. C. (2003). The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature, 423, 825– 838. The GTEx Consortium. (2015). The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science (New York, N.Y.), 348(6235), 648–660. Tukiainen, T., Villani, A.-C., Yen, A., Rivas, M. A., Marshall, J. L., Satija, R., Aguirre, M., Gauthier, L., Fleharty, M., Kirby, A., Cummings, B. B., Castel, S. E., Karczewski, K. J., Aguet, F., Byrnes, A., Aguet, F., Ardlie, K. G., … MacArthur, D. G. (2017). Landscape of X chromosome inactivation across human tissues. Nature, 550(7675), 244–248. Uebbing, S., Konzer, A., Xu, L., Backström, N., Brunström, B., Bergquist, J., and Ellegren, H. (2015). Quantitative mass spectrometry reveals partial translational regulation for dosage compensation in chicken. Molecular Biology and Evolution, 32(10), 2716–2725. Vandewalle, J., Van Esch, H., Govaerts, K., Verbeeck, J., Zweier, C., Madrigal, I., Mila, M., Pijkels, E., Fernandez, I., Kohlhase, J., Spaich, C., Rauch, A., Fryns, J. P., Marynen, P., and Froyen, G. (2009). Dosage-dependent severity of the phenotype in patients with mental retardation due to a recurrent copy-number gain at Xq28 mediated by an unusual recombination. American Journal of Human Genetics, 85(6), 809–822. Warnefors, M., Mossinger, K., Halbert, J., Studer, T., VandeBerg, J. L., Lindgren, I., Fallahshahroudi, A., Jensen, P., and Kaessmann, H. (2017). Sex-biased microRNA expression in mammals and birds reveals underlying regulatory mechanisms and a role in dosage compensation. Genome Research, 1–13. Watson, J. M., Spencer, J. A., Riggs, A. D., and Graves, J. A. (1990). The X chromosome of monotremes shares a highly conserved region with the eutherian and marsupial X chromosomes despite the absence of X chromosome inactivation. Proceedings of the National Academy of Sciences, 87(18), 7125–7129. Weischenfeldt, J., Dubash, T., Drainas, A. P., Mardin, B. R., Chen, Y., Stütz, A. M., Waszak, S. M., Bosco, G., Halvorsen, A. R., Raeder, B., Efthymiopoulos, T., Erkek, S., Siegl, C., Brenner, H., Brustugun, O. T., Dieter, S. M., Northcott, P. A., … Korbel, J. O. (2017). Pan- cancer analysis of somatic copy-number alterations implicates IRS4 and IGF2 in enhancer hijacking. Nature Genetics, 49, 65–74. White, M. a, Kitano, J., and Peichel, C. L. (2015). Purifying Selection Maintains Dosage- Sensitive Genes during Degeneration of the Threespine Stickleback Y Chromosome. Molecular Biology and Evolution, 32(8), 1981–1995. Xiong, Y., Chen, X., Chen, Z., Wang, X., Shi, S., Wang, X., Zhang, J., and He, X. (2010). RNA sequencing shows no dosage compensation of the active X-chromosome. Nature Genetics, 42(12), 1043–1047.

120 Yang, F., Babak, T., Shendure, J., and Disteche, C. M. (2010). Global survey of escape from X inactivation by RNA-sequencing in mouse. Genome Research, 20(5), 614–622. Yates, A., Akanni, W., Amode, M. R., Barrell, D., Billis, K., Carvalho-Silva, D., Cummins, C., Clapham, P., Fitzgerald, S., Gil, L., Girón, C. G., Gordon, L., Hourlier, T., Hunt, S. E., Janacek, S. H., Johnson, N., Juettemann, T., … Flicek, P. (2016). Ensembl 2016. Nucleic Acids Research, 44(D1), D710–D716. Zhou, Q., Zhang, J., Bachtrog, D., An, N., Huang, Q., Jarvis, E. D., Gilbert, M. T. P., and Zhang, G. (2014). Complex evolutionary trajectories of sex chromosomes across bird taxa. Science, 346(6215), 1246338. Zimmer, F., Harrison, P. W., Dessimoz, C., and Mank, J. E. (2016). Compensation of Dosage- Sensitive Genes on the Chicken Z Chromosome. Genome Biology and Evolution, 8(4), 1233–1242.

121

122

Chapter 3. Evolutionary dynamics of sex-biased gene expression in mammalian tissues

Sahin Naqvi, Alexander K. Godfrey, Jennifer F. Hughes, Mary L. Goodheart, Richard N. Mitchell, & David C. Page

Author contributions S.N., A.K.G, J.F.H., and D.C.P. designed the study. J.F.H. procured cyno tissue samples. M.L.G. procured mouse and rat tissue samples, with assistance from S.N. S.N. processed tissue samples and performed computational analyses, with assistance from A.K.G. R.N.M. performed histological evaluations on human tissue sections. D.C.P. supervised work. S.N. and D.C.P. wrote the paper.

Acknowledgements We thank Daniel W. Bellott, Lukas Chmatal, and Richard Ransohoff for critical reading of the manuscript.

Adapted from an article currently in review for publication.

123 Summary

Sex differences are widespread in human health and disease, which are frequently modeled in other mammalian species. However, the extent to which molecular sex differences are conserved across both tissues and species remains unclear. We conducted a 12-tissue, five-species survey of sex differences in gene expression using both publicly available (human) and newly generated

(cynomolgus macaque, mouse, rat, and dog) RNA sequencing data. In each of the tissues assessed, we identified between 128 and 805 genes with conserved sex-biased expression.

However, most sex bias in gene expression (~77%) – both female and male – has arisen since the last common ancestor of boroeutherian mammals, under reduced selective constraint.

Evolutionary gains and losses of regulation by sex-biased transcription factors drove a significant fraction (~27%) of lineage-specific changes in sex bias. Our study suggests that care will be required when modeling human sex differences in other species.

Introduction

Healthy males and females exhibit differences across a wide range of biological processes.

Studies in humans have documented sexual dimorphism in anthropometric traits (Wells, 2007), energy metabolism (Green et al., 1984), brain morphology (Ruigrok et al., 2014), and immune

(Klein & Flanagan, 2016) and cardiac (Hayward et al., 2001) function. Sex differences are also evident in the incidence, prevalence, and mortality of a wide array of diseases, including autoimmune disorders (Ngo et al., 2014), cardiovascular diseases (Regitz-Zagrosek et al., 2016), and autism (Werling & Geschwind, 2013). Beyond humans, sexual dimorphism is common in other mammalian species, many of which are extensively used as models of sex-biased human traits and diseases (Olson et al., 2000). For example, males are larger than females in most

124 mammalian species (Lindenfors et al., 2007), while sex differences in brain structures (Gorski et al., 1978) and immune (Scotland & Stables, 2011) and cardiac (Shioura et al., 2008) function have been observed in rodents. Thus, sex differences in both health and disease are pervasive both across the body and across mammalian evolution. These phenotypic sex differences are likely to be associated with, and in many cases caused in whole or in part by, sex differences in gene activity or function. The mammalian sex chromosomes are one source of such sex differences in gene activity: the Y chromosome harbors male-specific genes (Skaletsky et al.,

2003), some broadly expressed (Bellott et al., 2014), while incomplete inactivation of the second

X chromosome in females can result in female-biased expression of a subset of X-linked genes

(Tukiainen et al., 2017). However, given the scale and complexity of gene networks as well as the greater number of autosomal genes, it is unlikely that sexually dimorphic expression of sex- linked genes alone would account for the full range of sexual dimorphism in mammals.

Understanding the molecular origins of such sexual dimorphism therefore requires a genome- wide, multi-tissue, and comparative approach to sex differences in gene expression.

Our understanding of sex bias in mammalian gene expression is currently lacking in two important regards. First, the degree to which sex-biased expression is conserved across the mammalian lineage is largely unknown, as well as the extent of such conservation in different tissues and organ systems. Multi-tissue studies of sex bias in gene expression have focused on a single species, notably either humans (Gershoni & Pietrokovski, 2017; Mele et al., 2015) or mice

(Yang et al., 2006). Multi-species studies in Drosophila (Assis et al., 2012; Grath & Parsch,

2012; Perry et al., 2014; Ranz et al., 2003; Zhang et al., 2007) have assayed RNA extracted from whole carcasses or gonads, finding rapid evolution of reproduction-related genes, while studies in mammals that have examined non-reproductive tissues focused on single tissues (Blekhman et

125 al., 2010; Reinius et al., 2008). Second, with the exception of single-locus studies in Drosophila

(Williams et al., 2008), lineage-specific regulatory changes that drive the evolution of sex-biased expression remain unexplored. Progress has been made in understanding the diverse mechanisms of X-linked dosage compensation (Disteche, 2012, 2016), the lack of which can lead to sex- biased expression on the X chromosome, but additional mechanisms are likely responsible for the evolution of sex-biased gene expression genome-wide.

To address these questions, we assessed sex differences in gene expression in 12 tissues from human, cynomolgus macaque (cyno), mouse, rat, and dog. These five species span the evolution of boroeutheria, which include all placental mammals except Afrotheria and Xenartha

(species such as elephant and anteater, respectively), and whose last common ancestor lived 80-

100 million years ago. We find ~3,000 genes with conserved sex bias that was likely present in this common ancestor. However, these instances of conserved bias are a minority of total sex bias: when we expand our analysis to include lineage-specific sex biases, we find that ~77% of sex-biased gene expression in these five species has arisen since their last common ancestor, likely facilitated by reduced selective constraint. We show that a significant fraction (~27%) of lineage-specific changes in sex bias were likely driven by gains and losses of regulation by sex- biased transcription factors. Our results provide insight into the extent and mechanisms of sex- biased gene expression across tissues and species.

Results A five-species, 12-tissue survey of sex differences in gene expression

To assess sex differences in non-human mammals, we collected RNA sequencing data from three males and three females from cynomolgus macaque (cyno), mouse, rat, and dog. We

126 sampled 12 tissues from each individual: adipose, adrenal gland, brain, colon, heart, liver, lung, muscle, pituitary, skin, spleen, and thyroid. These tissues were chosen to sample a diverse array of organ systems, and include all three germ layers (Figure 3.1A). We designed tissue collection and processing procedures to minimize additional sources of biological and technical variation by sampling genetically homogenous populations, controlling for variation in the estrous cycle

(only in non-human mammals), and randomizing sample processing with respect to batch (see

Methods). We sequenced a subset of samples with high coverage and longer reads (>108 paired- end reads, 100 bp x100 bp), and the remainder with moderate coverage and shorter reads (>3 x

107 paired-end reads, 75 bp x75 bp). We used our RNA-seq data to systematically improve the transcriptome annotations of each non-human mammalian species. Using RNA-seq data from independent studies, we were able to map a higher percentage of reads to our improved transcriptomes as compared to existing annotations (16% increase in dog, Figure 3.2).

To assess sex differences in humans, we analyzed RNA-seq data from the Genotype

Tissue Expression Consortium (GTEx, v6p release). When analyzing unselected, post-mortem samples, apparent sex differences in gene expression might be due to sex biases in cell type composition, pathology, or other factors; we took a number of steps to avoid these confounders.

For each of the 12 target tissues, we chose GTEx samples most likely to be derived from healthy individuals based on medical history, cause of death, and notes from GTEx pathologists; we then conducted detailed evaluations of histological images from the chosen samples (see Methods).

We also adjusted gene expression values using top principal components to remove variation due to hidden technical or biological confounders. In tissues for which expression data

127

Figure 3.1: A five-species, twelve-tissue survey of sex differences in gene expression. (A) Schematic of study design, with tissues chosen for analysis in all five species highlighted in humans. (B) Hierarchical clustering of 349 RNA-seq samples. Top, pairwise estimates of Jensen- Shannon divergence (JSD) between pairs of samples. Six random human samples per tissue, in addition to all non-human samples, were included for display purposes. Middle, tree dendrogram obtained by hierarchical clustering (average linkage) based on pairwise JSD values. Bottom, sample labels by tissue, species, and sex.

128 75

50 Annotation Existing This study Percent_reads_mapped 25

0

Cyno Dog Mouse Rat Species

Figure 3.2: Comparison of read mapping rates between existing transcriptome annotations and those generated as part of this study. For each species, RNA-seq reads from an independent study (cyno, M5 Spleen from https://doi.org/10.6084/m9.figshare.4697539.v2; dog, SRR388736; mouse, SRR594408; rat, SRR594435) were pseudomapped to either Ensembl 91 transcript annotations or annotations from this study using salmon.

129 from purified cell populations is available, there is a high correlation between sample-level cell- type proportions estimated by CIBERSORT (Newman et al., 2015) and top principal component loadings (Figure 3.3), indicating that principal component analysis effectively captures variation due to cell-type composition.

We removed outlier samples (see Methods) to obtain 740 human and 277 non-human

RNA-seq samples. We then clustered all non-human samples and a randomly chosen subset of human samples based on the expression levels of 12,939 one-to-one orthologous protein-coding genes. With the exception of human adipose tissue and lung, which cluster with each other rather than with adipose and lung samples from the non-human mammals, samples cluster first by tissue, and then by species (Figure 3.1B). This tissue-dominated clustering is consistent with prior studies (Brawand et al., 2011; Merkin et al., 2012), and indicates both consistent sampling of tissues across both species and studies, and that the non-human data generated as part of this study are comparable to the human data from GTEx. Notably, there are no cases where samples cluster by sex before tissue or species, suggesting that overall, species-specific requirements for tissue function, as assessed by gene expression, outweigh sex-specific requirements.

Nevertheless, more detailed analysis indicates that sex does contribute significantly to gene expression variation; pairwise within-sex distances in each tissue-species combination are significantly lower than pairwise between-sex distances (Figure 3.4). Turning to independent

RNA-seq and microarray datasets, we found that both our re-analysis of GTEx and our newly generated data replicated published estimates of sex bias in six different human and mouse tissues (Figure 3.5,3.6). These results indicate that our strategy of re-analysis of publicly available human RNA-seq data combined with newly generated data in non-human mammals

130 provides expression values comparable across species and yields reproducible estimates of sex bias. A ● ● ● ● ● ● ● 0.9 ●● ●● 0.7 males ● ● ●● ● ●● ●●●● ●● ● ● ● ●●● females ● ● ●● ● ●0.8● ● ● ● action ● ● ● r 0.8 ● ●●● ● action 0.6 ● ●● ●● ● r ● ● ● ● ● ● ● ● ● ● ● ● ● 0.7●●● ●R^2● = 0.93 ● ● ● ● ●●● action ● ●action ● ● ● r ● 0.7 r ● ● ●● ● 0.5 r = 0.71 ● ● ● ●● ●● ●● ● ● ●● ● ●● last f ●● ●● ● ● ● ●● ● 0.6 ●● ● b ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ●● ● ●● ● 0.4 ●● ● ● 0.6 ●● ● ● ● ● ● ● ● ●● ● ● ● ● atinocyte f 0.5 ● ● ● ● ● r ● ● ● ● ● ●● ●●● ● ● ● ● ●● e ● ● ● ● ● ● 0.5 ● ● 0.3 ●●●● ● ● k ● ● e fibro ● ● ● ● ● ● ● ● ● ● ● ● v ● ● ● ● e neuronal f 0.4 ● e ●● ●● ● ● e macroglia f ● ● ● v v ●● v ● 0.2 ● ● ● 0.4 r = −0.71 ● ● ●● ● ● ● ● ● ● ● ● ●●● 0.3 ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● males Relati ● ● 0.3 Relati ● ● 0.1 ● ● ● ● ● Relati ● ● Relati ● females 0.2 ● ●

−0.2 −0.1 0.0 0.1 −0.2 −0.1 0.0 −0.20.1 0.0 0.2 0.4

Skin PC2 Skin PC2 Brain PC1 B ●

● ● ● 0.18 ● 0.8 ● 0.7 ● r = −0.88 ●● ● ● ● ● ● 0.16 0.7 r = 0.93 ● ● action

action r = −0.52 action 0.6 r r ● r ● ● ● ● ● 0.14 ● ● ●● ● 0.6 ● ● ● ● ●● ● ● ● ● ● 0.5 ●● ● ● ● 0.12 ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● 0.4 ● ● ●● 0.10 ● ● ● ● ● ● ●● ● ● ● ● ● ● e neuronal f 0.4 ● ● ● ● e macroglia f ●

v ● e endothelial f ● v 0.3 ● ● ● ● ● ● ● v 0.08 ● ● ● ● ● ● ● ● ● ● ● 0.3 ● ● ● ● ● ● ● ● ● males 0.2 ● ● ● Relati ● ●● 0.06 ● ● ● Relati ● ● ● females ● Relati ● ● 0.2 ● ● ● 0.1 0.04 ● ● −0.2 0.0 0.2 0.4 −0.2 0.0 0.2 0.4 −0.2 0.0 0.1 0.2 0.3

Brain PC1 Brain PC1 Brain PC2 C ● ●

0.18 ● ● ●● ● 0.5 0.8 r = 0.48 ● ● ●● ●● 0.8 ● ● ● ● r = ●0.59 ● ● ● ● ● ● 0.16 ● 0.7 ● ● ● action 0.7 ● ● action r 0.4 ● R^2 = −0.52 action ● ● action r ● ● ● r ● ●

r ● ● ● ● ● ● ●● ● 0.14 ● 0.6 ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● 0.3 ● ● ● ● ● ● 0.12 ● ● ● ● ● 0.5 ● ● ● ● ● 0.5 ● ● ● ● ●● ● ● ● ● ● ● ● ● 0.100.2 ● ● ● ●● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● e adipocyte f ● 0.4 ● ● e endothelial f ● v ● e macrophage f ● ● e endothelial f ● ● ● ● ●● ● ● ● v ● v ● v 0.08 ● ● ● ●● ● ● ● ● ● ● ● ● ● 0.1 ● ● ● ● 0.3 ● ●● ● ● ● ● ● ● ● 0.3 r = 0.47 ● ● ● ● ● ● ● ● ● ● ●● ●● Relati ● ● ● ● ●

0.06 ● Relati ● ●●●●● ● ● ● ● ● Relati ● ● ● ● 0.2 ● ● Relati ● ● ● ● ●●● ●●●●●●●●●●●●●●●●● ●● ● ● ●● ●● ●● ● ● 0.0 ● ● ● 0.2 0.04 ● ● ● −0.2−0.2−0.1 0.00.0 0.10.10.2 0.20.3 −0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2 ● Adipose PC1 Adipose PC2 Adipose PC3 Brain PC2

Figure 3.3: Correlation between sample loadings on top principal components with estimated cell-type fractions. For each tissue (A, skin; B, brain; C, adipose), sample loadings (x-axis) for the top principal components were estimated as described in Methods. The y-axis represents the proportion of the indicated cell-type estimated as estimated by CIBERSORT (see Methods). Each plot is annotated with a Pearson correlation coefficient.

131 p = 0.85 0.7

0.6

ergence p = 0.95 v 0.5

0.4

0.3 p = 2.18e-09

0.2

0.1 airwise Jensen−Shannon Di P 0.0 Same tissue, Same tissue, Same tissue, Same tissue, Diff. tissue, Diff. tissue, Same tissue same species same species diff species diff. species same species same species diff. batch same sex diff. sex same sex diff. sex same sex diff. sex

Figure 3.4: Sources of biological and technical variation. Each boxplot represents JSD values for all pairs of samples meeting the described criteria below. The left-most boxplot represents technical variation due to library preparation and sequencing, while the other boxplots represent biological sources of variation. Indicated p-values from two-sided Wilcoxon rank-sum test comparing distances within or between sex for samples in the same/different tissue or species. P- values for all other comparisons were < 2.2e-16 by two-sided Wilcoxon rank-sum test.

132 Skin Liver Brain

2.0 r = 0.64 ● 3 ● ● 0.5

● ● ● ● ● ●● ●● ● ● ● ● ● 1.5 p = 4.4e−40 ● ● 0.0 ● ● ● ● ● ● ● 2 r = 0.29 ● ● ● ● ● ● ● ● ● 1.0 p = 3.6e−21 ● ● −0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● 1 ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● −1.0 ● ● ● ●●● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ● ●●● ● ● ●●●●● ● ●● ● ● ● ●● ● ● ● ● ●●●● ●●● ● ● ● ● ● ● ● ● ●●●● ●●●●●●●●● ● ●● ● ● ● ●●●●●● ● ● ● ●●●●●●●●● ●●●●●●●●● ● ● ● ● ● ●●●●●●●●● ● ● ●● ● ●●●●●●●●●●●●● ●●● ● ●●●●●●● ● ● ● ●● ● ●● ●●●●●●● ●● ● ●● ●●●●●●●●●●●●●●●● ●●●●●●●●● ● ● ● ●● ●● ●● ●●●●●●● ● ● ● ● ●● ● ●●●●●●●●●●●●●●● ●●●●●●●●●●● ●● ● ● ●●●● ● ●●●●●●● ● ● ●● ●● ●●●●●●●●●●●●●●●●● ●●●●●● ●● ●● ●●● ●●●● ● ● ● ● ● ●● ● ●●● ●●●●●●●●●●●●●●● ● ●●●●●●●● ● ● ●● ●●●●●●● ●●●● ● ●● ● ● ● ●●●●●●●●●●●●●●●●●●● ●● ●●●●●●● ●● ● 0.0 ● ●●● ●● ●●●●●●● ● ●● ●●●●● ●●●●●●●●●●●●●● ● ●●● ●●● ● −1.5 r = 0.44 ● ● ● ● ●●●●● ● ● ● ●● ●● ● ●●●●●●●●●●●●●● ●●●●●●● ●●● ● ● ●● ● ●●● 0 ● ● ● ●●●● ●●●●●●●●●●●●● ●●●●●●● ● ●●● ●●●●●● ●● ● ●● ● ●●●●●●●●●●●● ● ●●●● ● ●● ●● ●●●● ● ● ●● ●● ●●●●●●● ●●●●●● ● ●●●● ●●● ● ● ● ● ●●●●●● ●●● ●●●● ● ● ●●●●●●● ● ● ● ● ● ●●●●● ●●●● ● ●●● ●● ● ●●●●●●●●●●● ● ● ● ● ●● ● ●●●●●● ●● ● ● ● ● ●●● ●●●●●●● ●● ● ● ●●●●●●● ●● ● ●● ●● ● ● ●● ● ● ● ● ●●●●●●● ● ● ●●● ● ● ● ●● ●●●● ● ●● ●● ● ●●● ● ● ● ● ● ● ● p = 0.015 −0.5 ● ● ● −2.0 ● ● ●● ● ●●● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1.0 ● ● ● ● −2.5 ● ● GTEx log2(M/F), N=17M, 9F GTEx log2(M/F), N=83M, 36F GTEx log2(M/F), N=33M, 11F

● ● ● −1.5 ● −2 ● −3.0

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 −1.5 −1.0 −0.5 0.0 0.5 1.0 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

Liang 2017 log2(M/F), N=51M, 32F Zhang 2011 log2(M/F), N=8M, 8F pools Werling 2016 log2(M/F), N=5M, 5F

Muscle Heart Adipose

● 3 3 2.0 r = 0.47 ●

2 r = 0.61 2 r = 0.61 ● 1.5 p = 3.3e−37 ● ● ● ● p = 0.026 p = 0.026● ● ●● ● ● ● ● ● ● ● 1.0 ● ● ● ● ● ● ●● 1 1 ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●●●●●●●● ● ● ● ● ● ● ● ●● ●●●●●● ●●●● ● ● 0.5 ● ● ●●● ● ● ● ● ● ● ●● ●●● ●● ●●●●● ●● ●● ●● ●●● ● ● ● ● ● ●●● ●● ●●●●●● ● ●● ● ● ●● ●●● ●● ●●● ●●● ●●●●●●●●●● ●● ● ● ● ● ●● ● ● ●● ●● ●●●●●●● ●●● ● ● ● ●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●● ● ● ● ● ● ●● ●●●●●●● ●●●● ●●● ●●● 0 ● 0 ● ●●● ● ●●●●●●●●●● ●●●●●●●●●●● ●● ● ●●●●● ● ● ● ● ● ●●●●●● ●●●●●●●●●●● ●●● ●● ●●●●●● ●● ●●●●●●●●●●● ●●●●●●●● ● ● ● ● ●●● ● ● ●● ●●● ●●● ●●●●●●●●●●●●●● ● ● ● ● ● ●●●●●●●●●●●●●● ●●●●●●●●● ● ● ● ● ●●● ● ●● ●●●●●●●● ●●●●●●●●● ● ● ● ● ● ● 0.0 ●● ● ●●● ●●●●●●●● ● ●●●●●●●●● ● ●● ● ● ● ● ●●●●●●●●● ●●●●●●●●●●● ● ● ● ●● ●●● ●●● ●●●●● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ● ●●●● ●● ● ● ●●●● ●● ● ● ● ●●● ●● ● ● ● ● ● ●●● ●● ●●●● ● ● ● ● ● ● ● −1 −1 −0.5 ●● ● ● ● ● ● ● ● ● ● ● GTEx log2(M/F), N=70M, 35F GTEx log2(M/F), N=24M, 13F GTEx log2(M/F), N=24M, 13F −1.0 ● ● −2 −2 ● −2 −1 0 1 2 3 −2 −1 0 1 2 3 −2 −1 0 1 2 3

Lindholm 2014 log2(M/F), N=9M, 9F Newman 2017 log2(M/F), N=12M, 12F Viguerie 2012 log2(M/F), N=180M, 323F

Figure 3.5: Correlation between published estimates of sex bias and estimates of sex bias in this study (human). For each tissue, significantly sex-biased genes from published studies were defined as described in Methods. The x-axis represents sex bias from the independent study, and the y-axis represents sex bias from our re-analysis of GTEx. Each plot is annotated with a Pearson correlation coefficient.

133 Liver Muscle Heart

● ● ● ● ● 2.0 r = 0.31 ● r = 0.92 ● ● p = 0.0025 ● 10 ● 2 ● ● r = 0.66 ● ● ●

p = 1.4e−124 ● 1.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● p = 8.1e−13 ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● 5 ● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● 0 ● ●● ● ● ● 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ●●●● ●● ● ● ●●●●● ● ● ● ● ●●●●●●● ● ● ● ● ● ●●●● ●●● ● ● ● ● ● ●●●● ●●●●●●● ● ●●●●● ●●●●●● ●●●● ●●●●●●●● ●●●●●● ●● ● ●●●●●●●● ●●●●●● ● 0 ●●●●●●●● ● ● ●●●●●●●● ● ● ● ●●●●● ● ● ● 0.5 ● ●● ●●●●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●●● ● ● ●● ● −2 ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●●●● ●● ● ● ● ●● ● ● ●● ● −5 ● ● ●●● ● ●● 0.0 ●●●● ●● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● −4 This study log2(M/F) ● This study log2(M/F) This study log2(M/F) −10 −0.5 ● ● ● ● ● ●● ● ● −6 −15 ● ● −1.0 ●

−10 −5 0 5 10 −2 0 2 4 6 8 −1 0 1 2

Li 2017, Marin 2017 log2(M/F) Li 2017 log2(M/F) Li 2017, Marin 2017 log2(M/F)

Spleen Adrenal Lung

● ● ● ● ● ●● ● ●●●● r = 0.56 ● ●● ● ● ● ● ●● ● r = 0.82 ● ● ● ●● ● ● ● ●● ● ● 5 ● ● p = 6.9e−09 ● ● ●●● ● ● ● r = −0.012 1 ●● ●● ● ● ●● ● ● ● p = 5.8e−294 ● ●● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● 2 ●●● ● ● ●● ●●● ●● ● ● ●● ● ●● ●●●● ● ● ● ●●●●●●●●● p = 0.91 ● ● ●●●●●●●●● ● ● ●●●●●●●●●●●●● ●●●●● ● ●● ●●●●●●●●●●●●●●●● ●●● ● ● ●●●●●●●●●●●●●●●●● 0 ● ●● ●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ●●●●●●● ●●●●●●●●●●●●●●●●●● ● ●● ●●●●●●●●●●●●● ●●●●●●●●●●●●● ● ● ● ●●●●●●●●●●●●●● ●●●●●● ●●●● ● ● ● ● ● 0 ● ●●●●●●●●●●●●●●●● ●●● ● ●● ● ● ●●●●●●●●●●●●●● ● ● ● ●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●● 1 ●●●●●●●●●● ● ●●●●●●●●●●●●●● ● ● ●●●●●●●●●● ● ●● ● ●● ● ● ●●● ● ●●●● ● −1 ● ● ● ● ●●● ●● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ●●●● ● ●● ● ● ● ● ●●●● ●●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● 0 ● −5 ● ● ● ● ● ●● ● ● ● ● ● −2 ● ● ● ● ● ● ● ● ● ● ● ● ● −3 ● −1 −10 ● This study log2(M/F) This study log2(M/F) This study log2(M/F) −4 −2 −15 ● −5 ● ● ● −3

−4 −2 0 2 4 −10 −5 0 5 −2 0 2 4

Li 2017 log2(M/F) Li 2017 log2(M/F) Li 2017 log2(M/F)

Lung

1.0 ● r = 0.34 ● p = 1.04e−10 0.5

● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ●● ● ● ● ●●●●● ●●● ●●● ● ● ●● ●● ● ● ● ● ●●● ●●●●●●●●●●● ● ●● ●●● ●●● ●●● ● ●● ●● ● ●●●●●● ●●●●●●● ●●●● ●●● ●● ●●●●●●●●●● ● ●●● ● ● ●●●●●● ●●●●●● ● ●● ● ●●●●●●●●● ●●●● 0.0 ● ●●●● ●●●● ● ● ●●●●●● ● ● ● ●●●● ● ●● ● ●●●●●● ● ● ●●●●● ● ● ●●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● −0.5 ● This study log2(M/F)

−1.0

−2 −1 0 1 2

Franco 2010 log2(M/F)

Figure 3.6: Correlation between published estimates of sex bias and estimates of sex bias in this study (mouse). For each tissue, significantly sex-biased genes from published studies were defined as described in Methods. The x-axis represents sex bias from the independent study, and the y-axis represents sex bias from data newly generated as part of this study. Note that for lung, estimates from Li et al. (2017) do not replicate, but estimates from Franco et al. (2010) do replicate. Each plot is annotated with a Pearson correlation coefficient.

134 Conserved sex-biased gene expression exists across the body

We sought to identify genes that showed evolutionarily conserved sex-biased expression. Within each tissue, we used a linear mixed model approach to identify genes that showed a consistent sex bias (FDR 10%) across species while controlling for differences in expression variability and sample size between species. We further required that genes show a fold change > 1.05 in the same direction in at least four of the five species studied; such genes were likely sex-biased in the common ancestor of boroeutheria (Example in Figure 3.7A). We refer to such genes as having a conserved sex bias. 3,885 of 113,853 expressed gene-tissue pairs, corresponding to

3,161 genes, show a conserved sex bias. Conserved sex bias is generally of modest magnitude

(~90% of sex-biased gene-tissue pairs had < 2-fold change between the sexes, Figure 3.8) but reproducible in independent datasets (Figure 3.9). The number of genes with conserved sex bias per tissue varies from 128 in colon to 805 in pituitary (Figure 3.7B), and is not correlated with tissue sample size or rates of between-species gene expression divergence (Figure 3.10). Of genes with any conserved sex bias, 562 (18%) are sex-biased in more than one tissue. In cases of multi-tissue sex bias, the bias is significantly more likely to be in the same direction in multiple tissues (Figure 3.7D). Thus, conserved sex bias in gene expression is mostly tissue-specific, but a significant minority of genes shows concordant sex bias across multiple tissues, implying that some regulatory factors result in similar profiles of sex-biased expression in multiple tissues or cell-types.

We next considered the extent to which genes with a conserved sex bias were enriched for sex linkage. All assayed Y-linked genes are male-biased (Figure 3.11A), as expected, while

X-linked genes are significantly enriched for conserved female bias (2.1 to 10.2-fold increase relative to autosomes, depending on tissue). The enrichment for X-linked genes is almost entirely

135

Figure 3.7: Conserved sex bias in gene expression across the body. (A) Example of a gene with conserved female-biased expression, defined as significant (10% FDR) in a linear mixed model and with concordant direction of bias (> 1.05-fold change) in at least four of the five species examined. (B) Heatmap of conserved male (blue) and female (orange) sex bias across genes (rows) and tissues (columns). (C) The y-axis represents the number of genes with conserved sex bias in one (left) or multiple (right) tissues. (D) Of genes with conserved sex bias in multiple tissues, the number with concordant (same direction) or discordant (opposite direction) in multiple tissues is plotted. Significance as assessed by two-sided Fisher’s exact test comparing to equal proportions.

136

1.0

0.8

Human 0.6 Cyno Mouse Rat 0.4 Dog Average Cumulative fraction Cumulative 0.2

0.0

0.0 0.5 1.0 1.5

| log2(M/F) |

Figure 3.8: Magnitude of male/female difference in expression of genes with conserved sex bias. Cumulative distribution plots of the magnitude of sex bias (estimated by simple linear regression in each species), for all gene-tissue pairs with a conserved sex bias.

137 Conserved sex bias, mouse replication Rodent-specific sex bias, mouse replication Mouse-specific sex bias, mouse replication (Li 2017 and Marin 2017, Yang 2006, Franco 2010)

Spleen p=2.6e−44 p=1.5e−13 p=9.4e−105

Adrenal p=2.1e−64 p=5.1e−27 p=8.8e−77

Liver p=1.3e−19 p=8.1e−13 p=3.6e−127

Replication Pearson r Heart p=0.0019 p=4.2e−31 p=0.032

Muscle p=0.53 p=1.5e-10 p=0.097 p=0.00073 p=1 p=8.4e-14

p=0.4 Lung p=0.054 p=0.91 p=0.66 p=0.24 p=0.026

0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8

Replication Pearson r Replication Pearson r Replication Pearson r

Figure 3.9: Correlations between estimates of sex bias for evolutionary classes of sex-biased genes from this study with estimates from independent studies (mouse). For conserved (left), rodent-specific (middle), or mouse-specific (right) sex biases defined as described in Methods, the correlation between sex bias estimates from our study and independent studies is shown.

138 ● ● 800 ● Skin ● Spleen ● Lung 800 r = 0.093 ● Heart ● Adipose ● Liver p = 0.77 700 ● Brain ● Thyroid ● Pituitary 700 ● Muscle ● Colon ● Adrenal 600 600 ● r = 0.0083 ● p = 0.97 500 500 ● ● ● ● 400 400 x−biased genes ● x−biased genes ● e e ● ● # s 300 # s 300

● ● 200 200 ● ● ● ● ● ● ● ● ● ●

0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30 60 80 100 120 140

Between−species divergence Sample size

Figure 3.10: No correlation between extent of conserved sex bias and sample size or expression divergence of tissues. Between-species divergence for each tissue was calculated as the median JSD for all pairs of samples from that tissue but from different species.

139 A Conserved male bias B Conserved female bias * Skin * * * * Skin * Heart * * * Heart * Brain * * * Brain * Muscle * * * * Muscle * Spleen * * * * Spleen Adipose * * Adipose * Thyroid Thyroid Fraction Tissues * Colon * * Colon orthologs * * * * sex−biased Lung Lung 1 * * * * 0.8 Liver Liver 0.6 * Pituitary * * * Pituitary 0.4 * * * * * 0.2 Adrenal * * Adrenal 0 X X *Y * * X−Y ated v X escape utosomes utosomes All escape A A (Balaton 16) X−inacti Figure 3.11: Enrichment of genes with conserved female or male bias with respect to autosomes, sex chromosomes, and classes of sex-linked genes. Colors represent the fraction of analyzed one-to-one orthologs on the indicated chromosome(s) with conserved male (A) or female (B) bias. For female bias, all X-linked genes (left) are further subdivided into those previously annotated as escaping or subject to X-inactivation in females by a recent meta- analysis of three studies (middle). X escape genes are further subdivided into those with or without a surviving Y-linked homolog (right). *, Benjamini-Hochberg-adjusted p-value < 0.05, two-sided Fisher’s exact test comparing to autosomes.

140 driven by genes known to escape X-inactivation in females. In turn, the enrichment for X-escape genes was almost entirely driven by a specific subset of X-escape genes: those with a surviving, non-recombining Y-linked homolog in mammals (Figure 3.11B). However, we note that despite these enrichments, most (85 to 95%, depending on tissue) genes with conserved sex bias are autosomal (Figure 3.12). Thus, the mammalian sex chromosomes, primarily as a result of harboring genes with both X- and Y-linked homologs, contribute a small but significant fraction of conserved sex bias in gene expression.

Finally, we assessed whether genes with a conserved sex bias were overrepresented in specific biological pathways via (GO) category enrichment. Genes with conserved male bias in pituitary were enriched for, among other functions, involvement in cyclic

AMP signaling, which plays an important role in response to stress. For example, the pituitary adenylate cyclase-activating polypeptide type 1 (PACAP) receptor, ADCYAP1R1, shows conserved male bias (Figure 3.13); PACAP-dependent stimulation of the hypothalamic-pituitary- adrenal (HPA) axis plays a crucial role in sustained hormone secretion during stress (Stroth et al., 2011). Interestingly, an intronic variant in ADCYAP1R1 is associated with post-traumatic stress disorder (PTSD) in females but not males (Ressler et al., 2011), and it likely acts by decreasing ADCYAP1R1 expression through disruption of the estrogen response element in which it lies (Mercer et al., 2016).

Genes with conserved female bias in colon and thyroid were enriched for a broad range of adaptive immune pathways, perhaps reflecting sex differences in tissue-resident immune cells, while genes with conserved female bias in adipose tissue were enriched for mitochondrial translation, metabolic processes, and ribosomal RNA processing. Mitochondrial function is crucial for maintaining metabolic homeostasis in adipocytes, and female bias of

141 Conserved female bias Conserved male bias

Adrenal Pituitary Liver Lung Colon Thyroid Autosomes Adipose X−Y X Escape Spleen X−inactivated Muscle Y Brain Heart Skin

0.70 0.80 0.90 1.00 0.70 0.80 0.90 1.00

Fraction of total sex bias Fraction of total sex bias

Figure 3.12: Fractions of genes with conserved female (left) or male (right) bias on autosomes, sex chromosomes, or classes of X-linked genes.

142

Figure 3.13: Conserved male bias of cAMP-mediated signaling components in pituitary. Left, example of ADCYAP1R1. Right, on y-axis are listed additional genes involved in cAMP signaling (GO:001993) and displaying conserved male bias in pituitary; colors represent the average magnitude of sex bias across the five species in each of 12 tissues (x-axis). White squares indicate that the gene was not expressed highly enough in that tissue for analysis of sex bias.

143 mitochondrial translation factors could contribute to the known functional differences between male and female adipose tissue (Fuente-Martín et al., 2013). In turn, sex differences in adipose function likely influence both susceptibility to obesity and its complications including metabolic syndrome and type II diabetes as well as responses to pharmacological interventions (Mauvais-

Jarvis, 2015). Our results also suggest that such differences – in HPA axis control by the pituitary, and in adipose function and metabolic homeostasis – are widely conserved across boroeutherian mammals. More broadly, and in addition to these specific enrichments, a great diversity of gene types and biological processes show evidence of conserved sex bias, which was likely present in the common ancestor of boroeutheria.

Most sex bias in gene expression has arisen since the last common ancestor of boroeutherian mammals

Our analyses thus far have focused on conserved instances of sex-biased gene expression; we next investigated sex-biased gene expression specific to subsets of the five species. Differences in statistical power between species could result in false positive calls of lineage-specific sex bias. For example, a gene with true primate-specific male bias might falsely appear as having a human-specific male bias if its expression is significantly biased in humans but does not reach statistical significance in cynomolgus macaques. Thus, we turned to a statistical method, mashr

(Urbut et al., 2019), which allowed us to model the covariation in sex bias across both tissues and species and more confidently determine the lineage of sex bias (see Methods). As an example of mashr’s greater ability to detect lineage-specific sex bias, using mashr instead of raw estimates of sex bias substantially increased the number of events assigned as rodent-specific gains of sex bias in most tissues (Figure 3.14). After implementing mashr-based estimation of

144 sex bias in each tissue-species combination, we assigned each sex-biased gene-tissue pair to one of 12 lineage-specific categories by parsimony: primate-specific gains or losses, rodent-specific gains or losses, gains specific to one of the five species, multiple gains or losses, and more complex patterns of sex bias inconsistent with single gains or losses (examples in Figure 3.15A).

We used the false discovery rate (FDR) of each class of lineage-specific sex bias, estimated by permutation, to estimate the number of true positive sex-biased gene-tissue pairs in each lineage- specific category.

Using these classifications, we assessed how much of the sex-biased gene expression observed in the five species was present in the common ancestor of boroeutheria (i.e., ancestral), or subsequently acquired. Instances of ancestrally sex-biased expression included gene-tissue pairs that we previously identified as having a “conserved” sex bias, as well as gene-tissue pairs that lost sex bias in the primate or rodent lineages or in multiple lineages. By contrast, instances of acquired sex bias included gene-tissue pairs with primate-, rodent-, or species-specific sex bias, as well as multiple gains of sex bias. By this logic, 6,539 (23%) of sex-biased gene-tissue pairs were likely sex-biased in the common ancestor, and 22,194 (77%) of sex-biased gene-tissue pairs likely acquired sex bias after the common ancestor. An additional 8,495 gene-tissue pairs with more complex patterns could not be confidently assigned as ancestrally sex-biased or acquired (Figure 3.15B). If all such “complex” events were ancestral, the ancestral fraction of sex bias would be 40%, whereas if they were acquired, the fraction would be 18%. We also performed these calculations in each tissue separately, and found that while there was variation in the estimated ancestral fraction of sex bias, it constituted the minority of sex bias in all tissues except for pituitary (Figure 3.15C). Together, these results indicate that most sex bias in gene

145 1200 Separate tests mashr 1000 800 600 400 200 0 # genes with rodent gain of sex bias # genes with rodent gain of sex Skin Liver Lung Brain Heart Colon Spleen Muscle Thyroid Adrenal Adipose Pituitary

Figure 3.14: Increased detection of rodent gains of sex bias using mashr. Number of genes detected as having a rodent-specific sex bias (male or female) in each tissue when using standard separate tests for each species (grey, limma/voom FDR < 0.1 and same direction of bias in mouse and rat only) versus when using mashr (red) as described in Methods.

146

Figure 3.15: Most sex bias in gene expression has arisen since the last common ancestor of boroeutheria. (A) Examples of genes with lineage-specific sex bias. (B) The number of true- positive sex-biased gene-tissue pairs (y-axis) in each evolutionary class was calculated as the difference between the total number discovered across all tissues using true or permuted sex labels. Evolutionary classes defined as described in the main text are designated as ancestral, acquired, or complex relative to the last common ancestor of boroeutheria (the five species considered in this study). (C) Comparisons of ancestral to acquired sex biases as in (B), but performed in each tissue separately. Upper and lower confidence intervals represent the fraction of sex bias estimated to be ancestral when counting all complex events as ancestral and acquired, respectively.

147 expression in non-reproductive mammalian tissues arose during rather than prior to the mammalian radiation.

Sex-biased gene expression is associated with reduced selective constraint

Our findings thus far indicate that while there is significant conservation of sex-biased gene expression, most present-day sex bias has arisen since the last boroeutherian common ancestor.

Reduced selective constraint may have facilitated such rapid evolution by allowing genes to be recruited into new sex-biased roles. Reasoning that genes functioning across many tissues and cell-types face increased selective constraint on gene expression levels, we compared breadth of expression between genes with and without any type of sex bias. We performed these comparisons within each of the 12 tissues separately. Sex-biased genes showed significantly lower expression breadth than genes with no bias, with the exception of lung, where sex-biased genes were more broadly expressed (Figure 3.16A). These differences in expression breadth could either be downstream consequences of, or have predated, the observed sex bias. To distinguish between these possibilities, we analyzed expression breadth in chicken, an evolutionary outgroup to mammals, reasoning that patterns common to both human and chicken were likely present in the common mammalian ancestor, prior to the acquisition of sex bias.

Again, sex-biased genes showed almost uniformly lower expression breadth in chicken than did unbiased genes (Figure 3.17).

To directly assess conservation of expression levels in a tissue-specific manner, we used estimates of mammalian gene expression-level constraint learned from 16 species (Chen et al.,

2018) and seven tissues, five of which were also assessed in our study. Similar to expression breadth, sex-biased genes showed lower constraint than unbiased genes in heart, muscle, and liver, with the opposite true in lung (Figure 3.16B).

148 A Human breadth B Expression constraint C Sequence conservation * Heart * Heart * Heart * Muscle * Muscle Muscle * Liver * Liver * Liver * Lung * Lung * Lung * Brain Brain * Brain * Pituitary Pituitary

Adrenal No bias Adrenal Tissues x−biased

* e * Thyroid S Thyroid Adipose Adipose * Human Expression Sequence * Colon breadth constraint conservation Colon * 0.55 0.75 3.8 Spleen 0.5 0.7 3.7 Spleen * 0.45 0.65 3.6 * Skin 0.4 0.6 3.5 Skin * 0.35 0.55 3.4 No bias No bias x−biased x−biased e e S S

Figure 3.16: Sex-biased gene expression is associated with reduced selective constraint. Within each tissue, genes were binned as showing no sex bias, or sex bias of any evolutionary type. Expression constraint represents the genome-wide percentile, and sequence conservation is calculated as the mean coding phyloP score (see Methods for details). In each heatmap, the group median of the indicated gene-level trait is plotted; asterisks indicate a Benjamini- Hochberg-adjusted p-value < 0.05 from a two-sided Wilcoxon rank-sum test, placed on the group (“No bias” or “Sex-biased”) with the lower value of the gene-level trait.

149 Chicken breadth Heart * Muscle * Liver * Lung * Brain * Chicken Pituitary breadth * 0.42 Adrenal 0.4 Tissues * Thyroid 0.38 0.36 * Adipose 0.34 * 0.32 * Colon * Spleen * Skin No bias x−biased e S Figure 3.17: Differences in chicken expression breadth between genes with or without sex- biased gene expression. Chicken expression breadth was calculated using RNA-seq data from nine male tissues from Merkin et al. (2012). In the heatmap, the group median of expression breadth is plotted, and asterisks indicate a Benjamini-Hochberg adjusted p-value < 0.05 for a two-sided Wilcoxon rank-sum test, placed on the group (“No bias” or “Sex-biased”) with the lower median expression breadth.

150 To assess if these differences in selective constraint extend to the nucleotide sequence level, we compared levels of coding sequence conservation. Results for sequence conservation were mixed, as genes sex-biased in heart, spleen, and liver showed decreased sequence conservation relative to genes with no bias in those tissues, while genes sex-biased in adipose, brain, and lung showed increased sequence conservation relative to unbiased genes (Figure

3.16C). This incomplete agreement between expression breadth or constraint and sequence conservation is consistent with recent reports that selective constraint can act independently on either expression or nucleotide sequence (Chen et al., 2018). Considering both expression levels and sequence conservation, however, these results indicate that in the majority of comparisons, sex-biased gene expression is associated with reduced selective constraint, a difference that existed prior to the divergence of the boroeutherian lineages assessed in this study. This is consistent with the idea that genes under reduced selective constraint in the common boroeutherian ancestor were more readily recruited into new sex-biased roles during evolution.

Evolutionary turnover of motifs for sex-biased transcription factors reflects lineage-specific changes in sex bias

We next sought to understand regulatory mechanisms underlying lineage-specific changes in sex bias. One mechanism by which sex-biased expression could evolve is via sex-biased transcription factors (TFs). For example, male-biased TF expression in muscle would result in higher TF activity in male muscle. Genes that acquired motifs for this TF in, say, the primate lineage would then show a primate-specific sex bias in muscle. To test this idea, we searched for motifs enriched in the promoters of sex-biased genes with lineage-specific changes in a given tissue, relative to their unbiased orthologs. We found 83 instances in which motifs gained or lost

151 in the sex-biased orthologs matched predicted binding sites of transcription factors (TFs) with sex bias in the same tissue (Figure 3.18A, Table 3.1). This was significantly more than the ~67 instances of matches expected by chance, as estimated by permuting the tissue of TF sex bias

(Figure 3.18B). By quantifying the enrichment of each motif in its corresponding set of sex- biased orthologs, we estimated that these 83 instances account for the lineage-specific sex bias of

6,073 gene-tissue pairs, or 27% of all lineage-specific sex bias.

To confirm that genes with lineage-specific gains or losses of motifs for sex-biased TFs are TF-bound in living cells, we leveraged publicly available data from chromatin immunoprecipitation sequencing (ChIP-seq) in human and mouse (Table 3.1). Although these assays were almost invariably performed in a different cell type than the tissue of TF sex bias and motif gain, we reasoned that sex-biased genes with gained motifs in a given tissue should nevertheless show enrichment for TF ChIP-seq signal. Consistent with this prediction, 11 of 15 cases with available data showed significant enrichment of ChIP-seq peaks in the promoters of genes with a gain or loss of sex bias and the relevant motif, relative to a background set of genes lacking the motif (Figure 3.18C). Thus, the evolutionary gains and losses of motifs we observed likely correspond to gains and losses of binding by cognate TFs.

If gains and losses of motifs for sex-biased TFs contribute to lineage-specific changes in sex bias in their target genes, there should be directional agreement between the activating or repressive effect of the TF, the sex bias of the TF, and the sex bias of the target gene. For example, target genes activated (or repressed) by a male-biased TF should be male (or female)- biased, and the opposite should be true for female-biased TFs. Rigorously testing this prediction requires experimental manipulation of the TF in the same tissue in which lineage-specific changes in sex bias of its targets are observed. Such data is available for PKNOX1, a

152

Figure 3.18: Evolutionary turnover of motifs for sex-biased transcription factors (TFs) is associated with gains and losses of sex bias. (A) Representative gained or lost motifs in promoters of genes with lineage-specific gains or losses of sex bias (top) aligned with motifs for sex-biased TFs in the same tissue (bottom). The lineage of sex bias gain or loss is indicated above each motif; the sex-biased TF and the lineage of its sex bias are indicated below. (B) Total number of matches between gained/lost motifs and sex-biased TFs when considering the tissue of TF sex bias (black) or randomly chosen tissues (grey). (C) Enrichment of ChIP-seq peaks in promoters of genes with lineage-specific sex biases containing gained or lost motifs for the TF. The sex-biased TF, along with the tissue of sex bias and motif gain/loss and the cell-type in which ChIP-seq was performed, are indicated to the left. The log2 odds ratio for genes with lineage-specific sex bias and containing the motif as compared to a background set of genes with no motif is shown on x-axis, with 95% confidence intervals calculated by Fisher’s exact test. (D) Effect of Pknox1 knockout (x-axis, Kanzleiter et al40) versus sex bias (y-axis), both in mouse muscle, for genes that show loss of sex bias in the primate lineage and contain a motif for PKNOX1 in mouse.

153

Figure 3.19: Conserved sex bias of PKNOX1 (A) and representative PKNOX1 motif- containing genes with primate loss of sex bias (B, C).

154 Excess # Discove Tissue of true- red sex bias positive ChIP accession motif Discovered and motif Matching TF sex bias motif number used lineage motif gain/loss TF lineage matches for validation Human gain AAAATTAG Lung MEF2C Conserved 153 Mouse gain AAAGCHA Adrenal IRF7 Conserved 165 Conserved; Rodent TCF7;RA gain;Rodent RG;RAR gain;Rodent A;NR2C2; gain;Rodent Rat gain AAAGGHC Skin NR5A2 gain 49 Mouse IRF8;IRF Conserved; gain AACCCAM Adrenal 8 Rodent gain 110 Human AACCYGG gain G Adipose SIX1 Conserved 110 Rodent AAGCCTM gain G Adrenal ZFX Conserved 14 AATAAATA ,TAAAAAA Dog A,AATAAA gain T Colon MEF2D Conserved 216 Mouse gain ACAGCTK Liver SNAI2 Conserved 65 Conserved; Mouse ACAGTGG SOX13;G Conserved; gain G Skin LIS1;SP4 Rodent gain 28 Human gain ACCAYGCC Adrenal KLF13 Conserved 39 ENCFF381GEK Human gain ACTTTGGG Adrenal IRF8 Conserved 64 Mouse PKNOX2; gain ACWTGTG Spleen TFE3 Conserved 46 Mouse GSM864673,GS gain AGAGHTC Adrenal RXRB Conserved 112 M864674 Rodent gain AGAKAAC Adipose GATA6 Rodent gain 21 Mouse AGCHTTG, RXRA;R Conserved; gain CCTTTCW Pituitary XRG Rodent gain 77 Human AGGAGWT gain C Muscle BCL6B Conserved 151 Rodent loss AGGCGSC Adrenal ZFX Conserved 13 GSE102616 AGGCHAG Rodent C,CAAGGM gain C Skin NR5A2 Rodent gain 289 Human AGGCTGA Adrenal ZFX Conserved 61 GSE102616

155 gain G,GCCTRT A Primate PKNOX1; loss AGGTGWC Muscle BHLHE41 Conserved 32 GSM972967 Human AGYGAGA gain C Heart IRF7 Conserved 159 Primate AGYGAGA IRF8;IRF gain C Adrenal 7 Conserved 12 Rat gain AMCGGAA Lung ELK4 Conserved 34 Human gain ATCRCTTG Pituitary BHLHE40 Conserved 202 Human TCF7;BC Conserved;P gain ATTAYAGG Thyroid L6 rimate gain 129 Mouse NR2C2;R gain AVAGGAC Skin ARG Rodent gain 63 Rodent BCTGGGCG loss ,TGACCTCR Spleen GLI2 Conserved 42 Rodent gain CAAGGMC Skin REST Rodent gain 148 Rodent loss CACCNCGC Spleen GLIS2 Conserved 24 ENCFF855XBY Human TCF3;US Conserved;P gain CACYTGAG Adrenal F1 rimate gain 50 Rodent CAGAGGC gain A Skin TFAP2A Conserved 154 Rodent CAGAGTW gain C Adipose SIX1 Conserved 13 Human gain CAGATCAY Adipose RORA Conserved 82 Human CAGGAGA gain A Muscle NFATC2 Conserved 134 Rodent gain CASCAAGC Adipose PBX1 Rodent gain 16 Rodent BHLHE40 Conserved; ENCFF981EN gain CCAAGTGY Skin ;USF1 Rodent gain 30 W GLIS3;GL Human IS2;RUN gain CCACCACR Pituitary X3 Conserved 120 ENCFF855XBY Human gain CCACCACR Skin GLIS1 Conserved 71 Rodent GLIS3;GL loss CCCCSGG Pituitary IS2 Conserved 10 ENCFF855XBY CCCGGGDC Dog ,CGTGGGR gain C Adipose CREB3L2 Conserved 33 Rat gain CCCGRAG Liver EHF Rodent gain 82 Rodent GABPA;Z gain CCCTCTTC Skin BTB7A Rodent gain 41 Dog CCCYCYGC Pituitary GLIS3 Conserved 182

156 gain Dog CCCYCYGC gain ,CCGCGDG Pituitary GLIS2 Conserved Dog gain CCGCGDG Pituitary HEY1 Conserved 108 Dog gain CCGKCACC Pituitary ATF1 Conserved 44 Human CCHGGGA ELF1;EB Conserved;P ENCFF256CVB gain G Spleen F1 rimate gain 89 ENCFF256CVB Dog gain CCSCGCGK Lung MYCN Conserved 99 ARNTL;B Conserved; Human HLHE41; Conserved;P gain CCTCGTSA Thyroid CREB3L1 rimate gain 37 Mouse LEF1;TC gain CCTTTCW Pituitary F7L1 Conserved 59 Dog gain CGGRCCTC Pituitary ZFX Conserved 45 Dog gain CGSCCGGA Lung ELK4 Conserved 63 Primate ENCFF431YW gain CGTCYCCA Spleen RFX3 Conserved 22 W Dog CREB3L1 gain CGTGGGRC Adipose ; Conserved 17 Dog CKCGGGC, gain CSGAGGSC Lung ZFX Rodent loss Primate gain CRCCTGTA Adipose SNAI2 Conserved 9 Mouse CTABACA, gain GHACACA Adipose FOXO4 Conserved 85 TEAD2;E Rodent LF1;STA gain CTBGGAA Skin T6 Rodent gain 95 Rat gain CTDAGGA Skin STAT6 Rodent gain 54 Mouse gain CTRACTC Spleen NFE2 Conserved 45 Rodent GSM864673 gain CTTAGMC Adrenal RXRB Conserved 23 GSM864674 Mouse gain CTTCCAW Pituitary ELF4 Conserved 59 Rodent gain CTTCTGDC Adipose RARB Conserved 19 Rodent CYGGGCG loss M Adrenal KLF13 Conserved 16 ENCFF381GEK Mouse gain GACAYCC Pituitary GLIS3 Conserved 61 Mouse GACAYCC, gain CCTTTCW Pituitary PKNOX2 Conserved 59 Human GAGGCCR Adipose ZFX Primate gain 195 GSE102616

157 gain A FOXO1;F Rodent OXP1;FO gain GBAAGCA Skin XO3 Conserved 108 Human GCAGGAG gain A Adipose SNAI2 Conserved 188 GCGCYAA Dog A,CRGCGC gain C Pituitary Conserved 81 GGAGTGC Human A,CTCMCA gain CC Adipose TBX21 Conserved 96 GGGYGAC Human A,CTCMCA ENCFF777MY gain CC Adipose SREBF1 Conserved 172 W GGTAGAC Mouse Y,GCWTGA gain CA Skin SMAD4 Rodent gain 27 Human GGTGAAA RARA;R gain C Adrenal XRB Conserved 43 Mouse gain GHACACA Adipose AR Conserved 59 Rodent NR3C1;N Conserved; gain GHACACA Pituitary R3C2 Rodent gain 22 Human gain GTAATCCY Adrenal RELA Conserved 62 Rat gain GWCCTAS Adrenal RORA Conserved 71 RARA Mouse USF1 Rodent gain gain RTGACCA Skin RARG 52 GSM1299598 Mouse gain TCTGAKA Adipose SIX1 Conserved 52 Rodent loss TGACCTCR Spleen NR2F6 Conserved 19 Human TGGCAVA gain A Heart CREB3L1 Conserved 73

Table 3.1. Information of motif gains and losses correlated with gain or loss of sex bias.

158 TF with conserved male-biased expression in muscle; genes with a muscle-specific loss of sex bias in the primate lineage show depletion of PKNOX1-matching motifs relative to mouse, rat, and dog (Figure 3.18A, examples of PKNOX1 targets with primate loss of sex bias in Figure

3.19). Using gene expression data from a muscle-specific mouse knockout of Pknox1 (Kanzleiter et al., 2014), we found a significant positive correlation between the effect of Pknox1 knockout and sex bias of genes with a PKNOX1-matching motif, confirming the above prediction (Figure

3.18D). Thus, both ChIP-seq and TF knockout data support the notion that gains and losses of regulation by sex-biased transcription factors have contributed to the evolution of sex bias.

Discussion

We have assessed and compared sex bias in mammalian gene expression across an unprecedented number of tissues and species. We found that conserved sex-biased gene expression exists across the body, but that most sex bias in gene expression has arisen since the last common ancestor of boroeutheria, facilitated by reduced selective constraint on expression levels and, to some extent, nucleotide sequence. Finally, we found evidence for a role for sex- biased transcription factors (TFs) in driving the evolution of a significant fraction of sex-biased gene expression, using independent experimental datasets to validate TF binding, and, in the case of PKNOX1, activating or repressing transcription of its targets.

Previous studies have observed rapid evolution of sex-biased gene expression in the gonads (Harrison et al., 2015; Perry et al., 2014; Ranz et al., 2003; Zhang et al., 2007). This rapid evolution likely represents species-specific reproductive functions and can be partially explained by sexual conflict: the existence of biases that increase the reproductive fitness of one sex would lead to pressure in the other sex to acquire fitness-increasing biases at different genes (Connallon

159 & Clark, 2010; Connallon & Knowles, 2005; Parsch & Ellegren, 2013). Our finding that most sex bias in gene expression has arisen since the last common boroeutherian ancestor raises the possibility of similar processes dominating in non-reproductive tissues. Our observation that genes with sex-biased expression show signs of reduced selective constraint is consistent with this idea, as the absence of deleterious effects associated with an initial expression change in one sex would have facilitated its acquisition. However, it is also possible that most cases of acquired sex bias have relatively little phenotypic impact, and so can come and go rapidly during evolution. Exploring the functional impact of acquired sex bias is thus an important future step.

Comparative studies of sex-biased gene expression such as ours have important implications for the use of non-human mammals as models of human traits or diseases that show sex bias. Conserved sex bias in gene expression across the body indicates that certain molecular sex differences in humans, such as those influencing hormone stress responses or metabolic pathways, are amenable to study in a wide range of mammalian model organisms. However, our finding that most sex bias in gene expression has arisen more recently in evolution – and is thus not shared between most mammals – suggests that, in many cases, non-human models may not adequately recreate the human sex differences in question. Indeed, a recent study found that genetic variants that decreased expression of the transcription factor KLF14 in human subcutaneous adipose tissue tend to increase insulin resistance and risk for Type 2 diabetes only in females, but elimination of Klf14 expression in mouse adipose tissue leads to analogous phenotypes in both sexes (Small et al., 2018). In these cases, non-human mammals may still be useful as models of physiological or systems-level sex differences, but caution should be applied when extrapolating specific molecular findings to humans.

160 In this study, we have focused on characterizing and explaining sex differences, at a molecular level, across both tissues and species. An important next step is to use such a catalogue to dissect sex differences at the phenotypic level, especially outside the reproductive tract. We anticipate that the broad range of tissues assessed will prove useful to this end, as sexual dimorphism at the phenotypic level extends across the body and, for many phenotypes, may result from sex bias in multiple different cell-types. In Chapter 4, we use results from this study to understand the sex differences in height, finding that conserved sex bias in gene expression contributes to the male bias in height or body size seen in most mammalian species.

Finally, our finding of sex-biased transcription factors underlying lineage-specific changes in sex bias provides molecular insight into the mechanisms underlying the evolution of sex-biased gene expression in non-reproductive, mammalian tissues. Here we focused on regulatory changes in promoter regions due to the lack of tissue-specific enhancer annotations in cyno, rat, or dog, but single-gene studies in Drosophila have shown that sex-biased gene expression can also evolve through more complex changes in cis-regulatory elements at larger genomic distances from their target gene (Williams et al., 2008). Future studies cataloguing both tissue- and species-specificity of mammalian enhancers will enable more detailed analyses of the cis-regulatory changes driving gains or losses of sex-biased gene expression during mammalian evolution. Most of the sex-biased transcription factors we identified as contributing to lineage- specific evolution of sex bias are autosomal, meaning that their sex biases could arise as a result of trans-regulatory effects of sex chromosomes or sex hormones; distinguishing between these two possibilities is an important future direction of research.

161 Methods

Selection of human (GTEx) samples

We considered the following tissues as described in GTEx terminology: skin (not sun-exposed), heart – left ventricle, muscle – skeletal, brain – cortex, spleen, adipose – visceral (omentum), thyroid, colon – transverse, lung, liver, pituitary, and adrenal gland. RNA-seq samples from each of these tissues were first subjected to a series of filters based on sample-level quality metrics and individual-level medical history and cause of death. At the sample level, we excluded samples if they met any of the following criteria: flagged by GTEx (SMTORMVE =

FLAGGED), RNA integrity number (SMRIN) less than 6.0, or autolysis score (SMATSSCR) greater than 2. At the individual level, all samples from an individual were excluded if the individual met any of the following criteria: Hardy scale death classification (DTHHRDY) of 4

(slow death), died of metabolic acidosis or shock (DTHCAT), greater than 72 hours on a ventilator prior to death (DTHVNTD), medical history of any of the following: ascites

(MHASCITES), lupus (MHLUPUS), Reyes Syndrome (MHREYES), scleroderma

(MHSCLRDRM), or sarcoidosis (MHSRCDSS). For a number of additional medical histories and causes of death, samples from specific subsets of tissues from an individual were excluded.

We next evaluated notes on sample histology provided by GTEx in order to eliminate samples with grossly skewed cell-type proportions or other significant pathologies (Table 3.2).

Detailed histological evaluation of GTEx samples

For a subset of tissues, we performed detailed additional histological evaluation of all GTEx samples that passed the filters described above, with the goal of ensuring that male and female samples were histologically comparable. Samples were evaluated by a single pathologist (RM),

162 without knowledge of age or sex. Evaluations were based on low- to high-power scans of hematoxylin and eosin-stained tissues as downloaded from the GTEx website and performed in one or two sittings. Adipose tissue (visceral omentum) was evaluated for the percentage of the sample that was either connective tissue or vasculature, as opposed to pure adipocytes. Adrenal gland was assessed for general preservation (degree of autolysis), content of fibrosis, whether there was medulla present or not, and the extent of any associated mononuclear cell infiltrate.

Liver was evaluated on a qualitative scale of 0-3 for the degree of vascular congestion, 0-3 for the extent of periportal mononuclear cell infiltrate, the percentage of fibrosis in the sample

(relative to hepatocytes), the presence of the capsule, and the extent of steatosis. Muscle

(skeletal) was assessed on a qualitative scale of 0-3 for the extent of fibrosis and adipose tissue, and whether there was myocyte atrophy present (none/focal/multifocal). Spleen was evaluated for the percentage of connective tissue, the amount of white pulp (low, average, increased), and the presence of extramedullary hematopoiesis (none/focal/multifocal). Thyroid was evaluated on a qualitative scale of 0-3 for the extent of interstitial fibrosis and 0-3 for mononuclear cell infiltrate. In general, there were no significant differences between male and female samples with respect to these histological evaluations (Table 3.3). Exceptions were the percentage of connective tissue/vasculature in adipose tissue and the degree of interstitial fibrosis in thyroid; we adjusted for these differences in downstream analyses.

Sample collection for non-human mammals

Cynomolgus macaque samples were collected from a breeding colony in Mauritius (Noveprim).

Dog samples were collected from a beagle breeding facility in Wisconsin (Rigdlan Farms).

Mouse and rat samples were collected from the C57/BL6J (Jackson Labs; stock 000664) and

163 Tissue GTEx pathology note exclusionary criteria Adipose >= 10% fibrous areas >”minimal”/“mild”/“small” lymphocytic aggregates Adrenal >20% adherent fat, hyperplasia Brain Any meninges >20% mucosa, no mucosa, no muscularis, surface sloughing, autolyzed Colon mucosa Heart > minimal/mild ischemic damage or fibrosis Liver > mild congestion, >10% steatosis > mild congestion/pneumonia, >10% adipose, > patchy fibrosis, chronic Lung inflammation Muscle >= 10% adipose Pituitary All adenohypophysis (anterior), minute neurohypophysis (posterior), any dura > 10%/minimal subcutaneous fat, abundant adnexal structures Spleen > mild congestion, numerous blood vessels >10% adipose, >10% stroma, interstitial fibrosis, Hashimoto’s thyroiditis, lymphocytic infiltration, > mild inflammation, prominent regressive changes, Thyroid nodular goiter

Table 3.2. GTEx pathology criteria used to exclude samples for each tissue.

Tissue Variable Test Statistic p-value Spleen PERC_FIBR_CONN Wilcoxon rank-sum 127.5 0.7618 Spleen WHITE_PULP Fisher's exact 0.2887 Spleen EMH Fisher's exact 0.7914 Adipose PERC_VESS_CONN Wilcoxon rank-sum 377.5 0.02922 Liver PERC_FIBR_PORT Wilcoxon rank-sum 73 0.3334 Liver CONGESTION Fisher's exact 0.815 Liver INFLAMM_PERIPORT Fisher's exact 0.6753 Liver CAPSULE_STEATOSIS Fisher's exact 0.3052 Adrenal PERC_ADIP_FIBR Wilcoxon rank-sum 270.5 0.6365 Adrenal PRES Fisher's exact 1 Adrenal MEDULLA Fisher's exact 0.1819 Adrenal INFLAMMATION Fisher's exact 0.315 Muscle FIBROSIS Fisher's exact 0.4421 Muscle PERC_FIBROADIPOSE_TIS Wilcoxon rank-sum 1057.5 0.6372 Muscle ATROPHY Fisher's exact 0.5848 Thyroid INTERSTIT_FIBROSIS Fisher's exact 0.0169 Thyroid INFLAMMATION Fisher's exact 0.5677

Table 3.3. Comparisons of histological features between male and female GTEx samples.

164 Brown Norway (Charles River; strain code 091) strains, respectively. Age ranges for each species were as follows: cyno, 72-107 months; dog, 19-37 months; mouse, 8 weeks; rat, 9 weeks.

Female macaques and dogs in estrus phase were excluded based on visual inspection prior to tissue isolation. Vaginal swabs were performed on female mice and rats at the time of tissue collection, and estrous phase was assessed post-mortem by Diff-Quick Stain Kit (Thermo

Fisher). Enough females were collected such that tissues from three non-estrus females were available after post-mortem assessment of estrous phase. All tissues were isolated from freshly sacrificed animals (< 1 hour from time of death to tissue fixation). In order to render data from the non-human mammals maximally comparable to RNA-seq from human tissues from GTEx, we adapted the GTEx Tissue Harvesting Work Instruction (PR-0004-W1, available at https://biospecimens.cancer.gov/resources/sops/library.asp) to each species, sampling the same part of the tissue as GTEx (the specific list of GTEx tissues for which the protocol was adapted is provided in the “Selection of human (GTEx) samples” section). Tissues were washed in cold

PBS and stored in RNALater (Ambion) per the manufacturer’s instructions.

Library construction and sequencing

RNAs were extracted and sequencing libraries prepared in independent batches, with batch assignments randomized with respect to tissue, sex, and species. Technical replicates from each species were included in each library preparation and sequencing batch. Testis samples were used for technical replicates as the testis is among the most transcriptionally complex organs, allowing us to assess the impact of batch on the expression of the greatest number of genes.

Sequencing libraries were prepared using the TruSeq mRNA library kit (Illumina) according to manufacturer’s instructions with the following modifications: After prep, libraries were size-

165 selected using 2% agarose gels on the PippinHT system (Sage Science) with a capture window of 300-600 bases. The size-selected material was then subjected to one additional cycle of PCR with fresh reagents and P5/P7 primers (Illumina) to ensure all library fragments were fully double-stranded. This step significantly reduced the percentage of smaller fragments co- purifying in the final gel size-selection. After PCR, libraries were size-selected a second time on the PippinHT with the same settings, and the resulting libraries were sequenced for either

100x100 or 75x75 cycles on an Illumina HiSeq 2500 sequencer.

Read mapping and analysis

RNA-seq libraries from the non-human mammals were mapped to their respective reference genomes (cyno, macFas5; dog, canFam3; mouse, mm10; rat, rn6) using STAR (Dobin et al.,

2013) v2.5.0 with the following parameters: --outFilterMultimapNmax 50 -- outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.15. For cynomolgus macaque, we generated a custom assembly of the Y chromosome by aligning whole-genome sequencing data (study SRP045278; runs SRR1564773, SRR1564774, SRR1564776,

SRR1564777) to our published assembly of the Y chromosome of the closely related rhesus macaque (Hughes et al., 2012) using bowtie (Langmead et al., 2009) v1.1.1 with default parameters, and manually verifying and correcting substitutions using Consed (Gordon & Green,

2013). StringTie (Pertea et al., 2015) v1.3.3b with the --rf parameter was run on each resulting

.bam file to assemble novel isoforms relative to the reference transcriptome annotation (Ensembl

91 for cyno; Ensembl 84 for dog, mouse, and rat). The sets of novel and existing transcript annotations from each library were then compiled into a single set of annotations for each species using the --merge option in StringTie. Abundances for these sets of transcripts for each

166 species were then quantified using salmon (Patro et al., 2017) v0.9.1 with the following parameters/flags: --seqBias, --gcBias, --useVBOpt. For human samples from GTEx, transcript abundances for GENCODE v26 annotations were quantified using salmon with the same parameters.

Removal of samples identified as expression outliers

To remove expression outliers in the human GTEx dataset, we calculated, for each tissue, the pairwise Pearson correlation rij between samples i and j, on log-transformed TPM values for all genes. The typical relatedness of sample i to other samples in that tissue is �! = median!(�!").

We then express this relatedness in median absolute deviations as

�! = �! − � / median( �! − � ) ! where � is the grand correlation median. Across the 12 tissues, 53 samples with D > 6 were marked as outliers and removed from subsequent analyses. To remove expression outliers in the non-human mammals, we performed hierarchical clustering of all samples within each species based on pairwise Pearson correlations of log-transformed TPM values. Samples that did not cluster clearly with all other samples of the same tissue and were verified to not be label swaps were marked as expression outliers. Across the 12 tissues and four non-human mammals, 15 out of 292 samples were marked as outliers and removed from subsequent analyses.

Gene expression calculation and clustering

The tximport R package (Soneson et al., 2016) (with countsFromAbundance =

“lengthScaledTPM”) was used to sum transcript-level counts and transcripts per million (TPM) values from salmon to the gene level. Jensen-Shannon divergence between counts per million

167 (CPM) values was used as a measure of sample distance, and average-linkage hierarchical clustering was used to cluster samples. First, samples within each species were clustered. Next, gene counts from all five species (all non-human mammal samples, as well as six random human samples per tissue) were collected into a single matrix on the basis of one-to-one orthologs. We relied upon orthology information from Ensembl 91, manually adding orthology information from X- and Y-linked genes DDX3X/DDX3Y, ZFX/ZFY, KDM6A/UTY, KDM5C/D, and

USP9X/USP9Y. Gene-level counts across all samples were normalized using the TMM method implemented in the edgeR (Robinson et al., 2009) package, and normalized resulting CPM values were used for clustering as in Figure 1b.

Principal components analysis to adjust for hidden confounders

For human samples, principal components (PCs) analysis within each tissue was performed on the gene x sample matrix of residual CPM values after first subtracting the mean effect of sex.

We chose the number of PCs to adjust for by picking the smallest number of PCs after which the marginal gain in percent variance explained began to decrease. The chosen number of PCs ranged between two and four for all 12 tissues. Within each tissue, a limma/voom linear model consisting of the selected PCs was then constructed, and the residuals of this model were used as adjusted expression values. These adjusted expression values were then combined with data from the additional species for downstream clustering and differential expression analyses.

Estimation of relative cell-type proportions in bulk tissue samples

We used CIBERSORT version 1.04 (https://cibersort.stanford.edu) to estimate relative cell-type proportions in human brain, skin, and adipose, three tissues where reference datasets of pure

168 tissue-resident cell populations were readily accessible. For each tissue, we input the (unadjusted by PCA) gene x sample TPM matrices. To estimate relative cell-type proportions from these samples, CIBERSORT requires gene expression profiles of reference cell-types expected to make up the tissue, from which it derives a “signature matrix” consisting of genes that most accurately differentiate the cell-types. For brain, we used FPKM estimates in five adult cell-types

(astrocytes, oligodendrocytes, neurons, microglia, endothelial cells) from Zhang et al ( http://web.stanford.edu/group/barres_lab/brainseq2/TableS4-

HumanMouseMasterFPKMList.xlsx) (Zhang et al., 2016). For skin, we used FPKM estimates in three cell types (keratinocytes, fibroblasts, melanocytes) from Reemann et al (Reemann et al.,

2014). For adipose, we constructed the signature matrix as described in Glastonbury et al

(Glastonbury et al., 2018).

Linear modeling and differential expression

Linear modeling was performed using the limma R package, with the voom functionality for analyzing RNA-seq read counts (Ritchie et al., 2015). Within each tissue, we assessed one-to- one orthologs with mean TPM greater than 1 in at least four of the five species. This resulted in a range of 8,166 (muscle) to 10,449 (pituitary) orthologs assessed in each tissue (across all species), with 12,013 orthologs assessed in at least one of the 12 tissues. To detect sex bias conserved across the majority of species, we modeled expression (TMM-normalized CPM values) within each tissue as a linear function of species and sex. A challenge in modeling sex across multiple species is that the species differ in both sample size and variability, with a greater number of human samples that also exhibit greater variability. To control for between-species differences in sample size and variability, we applied a linear mixed model (LMM) approach, in

169 which species was specified as the random effect, using the limma’s duplicateCorrelation function, specifying species as the block variable. We used the voomWithQualityWeights (Liu et al., 2015) function to estimate group-specific variances. P-values from all assessed gene-tissue pairs were adjusted using the qvalue method. In order for a gene-tissue pair to be ultimately classified as showing conserved sex bias, we required that the LMM for that gene-tissue pair yielded a qvalue < 0.1, and that the gene-tissue pair also show a > 1.05 fold-change magnitude, as determined by linear regression in each species separately, in the same direction in at least four of the five species.

To assess lineage-specific sex bias, we performed linear regressions for sex within each of the 60 tissue-species combinations using limma/voom. The effect size estimates for sex and the corresponding standard errors from each tissue-species combination were then combined into a 12,013 x 60 matrix. Missing values (i.e., orthologs not expressed in a particular gene-tissue combination) were set to an effect size of 0 and a standard error of 1000. Mashr (Urbut et al.,

2019) performs Empirical Bayes-based shrinkage on effect size estimates while learning patterns from the data, and works optimally when provided with broad trends in effect sizes, upon which it performs additional refinements. To this end, we performed sparse factor analysis (SFA)

(Engelhardt & Stephens, 2010) on the 12,013 x 60 matrix of ortholog Z-scores (effect size divided by standard error) with default parameters. The learned sparse factors were then converted into 60x60 covariance matrices, combined with canonical covariance matrices and covariance matrices learned using PCA, and input to the main mash function along with the effect size and standard error estimates. Within each of the 60 tissue-species combinations, we called sex-biased genes with mashr local false sign rate < 5%, and estimated posterior mean fold- change > 1.05. Within each tissue, we assigned the lineage of gain or loss of sex bias based on

170 parsimony when considering the five species. Genes sex-biased in only one of the five species were assigned as species-specific gains. Genes sex-biased in the same direction in both human and cyno or in both mouse and rat were assigned as gains in the primate or rodent lineages, respectively. Genes sex-biased in the same direction in human, cyno, and dog were assigned as losses in the rodent lineages, while those sex-biased in mouse, rat, and dog were assigned as losses in the primate lineage. The most parsimonious explanation for genes sex-biased in the same direction in one primate species (human or cyno) and one rodent species (mouse or rat) is two independent gains of sex bias in each lineage. Similarly, genes sex-biased in the same direction in exactly one primate, exactly one rodent, and in dog are most consistent with two independent losses of sex bias. The remainder of cases were assigned as “complex”. In order to obtain empirical estimates of false discovery rates for each of the 10 evolutionary classes of sex bias, we permuted the sample labels for sex within each tissue and repeated the above analyses.

We used the results of this analysis with permuted sample labels to calculate the FDR at two levels: 1) for all sex-biased gene-tissue pairs of a given evolutionary class, regardless of tissue

(i.e. Figure 3.15), and 2) for each tissue separately, an estimate of the number of genes belonging to each evolutionary class of sex bias and an FDR for the number of genes in each class, which we used when performing the motif enrichment analyses described below.

Estimates of sex bias in independent datasets

Transcript counts and abundances were quantified using RNA-seq reads from Liang et al (Liang et al., 2017) (human skin, GSE63980), Lindholm et al (Lindholm et al., 2014) (human muscle,

GSE58387 and GSE59608), Li et al (Li et al., 2017) (mouse liver, muscle, heart, lung, spleen, adrenal, PRJNA375882), and Marin et al (Marin et al., 2017) (mouse liver, heart, GSE97367).

171 Raw read (fastq) files were input to salmon, supplying the human or mouse transcriptome annotations described above. Sex bias in each dataset was estimated by linear regression in limma/voom. Since Lindholm et al sampled multiple muscle regions from the same individual, limma’s linear mixed model approach was used to specify individual as a random effect and sex as a fixed effect. For mouse liver and heart, data from both Li et al and Marin et al were combined, and the effect of sex was estimated while specifying the study as a covariate. Sex bias in microarray data from Yang et al (Yang et al., 2006) and Franco et al (Franco et al., 2010)

(mouse muscle, GSE3088, and lung, GSE16510, respectively) was analyzed using NCBI’s

GEO2R tool. Genes with FDR < 0.1 in these studies were considered significant; for these genes, the male/female fold-changes between the published study and our study were compared. When raw data were not publicly available, estimates for significantly sex-biased genes calculated in each study were used (Werling et al (D. M. Werling et al., 2016), human brain; Newman et al

(M. S. Newman et al., 2017), human heart; Viguerie et al (Viguerie et al., 2012), human adipose;

Zhang et al (Zhang et al., 2011), human liver).

Expression breadth, expression constraint, and sequence conservation

To calculate expression breadth in humans, we first estimated the expression level of each gene in each of the 12 tissues as its median TPM value among the samples from that tissue. For each gene, we then normalized its TPM values across tissues to its maximum TPM value, and took the average of these normalized values across tissues. To calculate expression breadth in chicken, we used the same strategy, except with RNA-seq data from nine male tissues (Merkin et al., 2012).

Rather than comparing expression breadth (or any other gene-level trait) between genes with or without sex bias in any of the 12 tissues, we performed comparisons within each of the 12 tissues

172 separately. This avoids artificial skewing due to detection bias; for example, broadly expressed genes, by simple virtue of their broad expression, could be more likely to be detected as sex- biased in at least one of the 12 tissues. To estimate evolutionary constraint on gene expression in mammals, we used genome-wide percentiles of evolutionary constraint from Chen et al (Chen et al., 2018). We measured sequence conservation by calculating the mean phyloP score for all coding bases of the longest annotated transcript of each gene.

Motif analysis

We identified motifs enriched in the promoters (+ 1 kb around the most distal transcription start site as annotated in human GENCODE v26, or in each of our improved non-human mammalian transcriptomes) of genes with lineage-specific sex bias using a two-step procedure adapted from a recent study of epigenetic poising in the mammalian germline (Lesch et al., 2016). In the first step, we used DREME (Bailey, 2011) with default settings to identify motifs enriched in the promoters (+ 1 kb around the most distal transcription start site) of each set of lineage-specific sex-biased genes within each tissue. For any given type of lineage-specific bias, we only analyzed tissues with FDR < 0.30 as assessed by permutation. In the second step, we used AME

(McLeay & Bailey, 2010) to scan 100 random, equally sized sets of orthologous promoters for enrichment of the motifs detected in the first step. The fraction of random promoter sets showing motif enrichment with a lower p-value than that of the motif in the true promoter sets (which was re-estimated using AME) constituted a raw p-value. This second step ensures that the DREME motif enrichment is specific to the genes in question and not the result of an overall enrichment of the motif in the species or lineage in question. P-values from all tissues for a given type of lineage-specific sex bias (e.g., primate loss or rodent gain) were then adjusted by the Benjamini-

173 Hochberg method. We matched motifs with FDR < 0.10 to binding sites for known transcription factors from the JASPAR (Mathelier et al., 2016) Core vertebrates database using TOMTOM

(Gupta et al., 2007) with parameters –thresh 10 -evalue -dist ed.

We devised a simple metric to estimate the fraction of lineage-specific sex biases which are accounted for by gains or losses of motifs for sex-biased TFs. For each of the 83 instances where the gained or lost motif corresponded to a sex-biased TF in the same tissue (e.g., matches for PKNOX1 motif in genes with a primate loss of sex bias in muscle), we computed the difference between the number of sex-biased orthologs with a motif match (the “observed” number) and the number expected in the promoters of non-sex-biased orthologs, as calculated by

DREME in the initial discovery step. This calculation was done at the level of orthologs in individual species (i.e., for genes with a primate gain of sex bias, human and cyno orthologs are considered independently); to avoid such double counting, we divided this number by two if the sex bias gain/loss in that tissue involved two species in the phylogeny in this study. We then multiplied this calculated excess by the estimated true positive rate (1 - FDR) for that specific lineage/tissue combination. These estimates for each of the 83 instances, provided in Table 3.1, were then summed up to obtain the total estimated number of sex-biased gene-tissue pairs which could be accounted for by motifs for sex-biased TFs.

Validation of TF binding with ChIP-seq data

For sex-biased TFs with publicly available data, ChIP-seq peaks (.bed files) were downloaded from the source indicated in Table 3.1. Each set of TF peaks was then intersected with the set of all promoters (+ 1 kb around the most distal transcription start site) for all genes using bedtools

(Quinlan & Hall, 2010); a gene was classified as TF-bound if it contained at least one ChIP-seq

174 peak. Motif-containing genes (human or mouse, as all analyzed ChIP-seq data were from these two species) were identified by running FIMO (Grant et al., 2011) on the same promoter sequences using the motif discovered in the above-described DREME step. The TF-bound fraction of motif-containing, sex-biased genes of the relevant evolutionary class (i.e. rodent gain in muscle) was compared to the TF-bound fraction of all expressed one-to-one orthologs without a motif by Fisher’s exact test.

References

Assis, R., Zhou, Q., and Bachtrog, D. (2012). Sex-biased transcriptome evolution in drosophila. Genome Biology and Evolution, 4(11), 1189–1200. Bailey, T. L. (2011). DREME: Motif discovery in transcription factor ChIP-seq data. Bioinformatics, 27(12), 1653–1659. Bellott, D. W., Hughes, J. F., Skaletsky, H., Brown, L. G., Pyntikova, T., Cho, T.-J., Koutseva, N., Zaghlul, S., Graves, T., Rock, S., Kremitzki, C., Fulton, R. S., Dugan, S., Ding, Y., Morton, D., Khan, Z., Lewis, L., … Page, D. C. (2014). Mammalian Y chromosomes retain widely expressed dosage-sensitive regulators. Nature, 508(7497), 494–499. Blekhman, R., Marioni, J. C., Zumbo, P., Stephens, M., and Gilad, Y. (2010). Sex-specific and lineage-specific alternative splicing in primates. Genome Research, 20(2), 180–189. Brawand, D., Soumillon, M., Necsulea, A., Julien, P., Csárdi, G., Harrigan, P., Weier, M., Liechti, A., Aximu-Petri, A., Kircher, M., Albert, F. W., Zeller, U., Khaitovich, P., Grützner, F., Bergmann, S., Nielsen, R., Pääbo, S., and Kaessmann, H. (2011). The evolution of gene expression levels in mammalian organs. Nature, 478(7369), 343–8. Chen, J., Swofford, R., Johnson, J., Cummings, B. B., Rogel, N., Lindblad-Toh, K., Haerty, W., Palma, F. di, and Regev, A. (2018). A quantitative framework for characterizing the evolutionary history of mammalian gene expression. Genome Research, 29(1), 53–63. Connallon, T., and Clark, A. G. (2010). Sex linkage, sex-specific selection, and the role of recombination in the evolution of sexually dimorphic gene expression. Evolution, 64(12), 3417–3442. Connallon, T., and Knowles, L. L. (2005). Intergenomic conflict revealed by patterns of sex- biased gene expression. Trends in Genetics, 21(9), 495–499. Disteche, C. M. (2012). Dosage Compensation of the Sex Chromosomes. Annual Review of Genetics, 46(1), 537–560. Disteche, C. M. (2016). Dosage compensation of the sex chromosomes and autosomes. Seminars in Cell and Developmental Biology, 56, 9–18. Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., and Gingeras, T. R. (2013). STAR: Ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), 15–21.

175 Engelhardt, B. E., and Stephens, M. (2010). Analysis of population structure: A unifying framework and novel methods based on sparse factor analysis. PLoS Genetics, 6(9). Franco, M. De, Colombo, F., Galvan, A., Cecco, L. De, Spada, E., Milani, S., Ibanez, O. M., and Dragani, T. A. (2010). Transcriptome of normal lung distinguishes mouse lines with different susceptibility to inflammation and to lung tumorigenesis. Cancer Letters, 294(2), 187–194. Fuente-Martín, E., Argente-Arizón, P., Ros, P., Argente, J., and Chowen, J. A. (2013). Sex differences in adipose tissue. Adipocyte, 2(3), 128–134. Gershoni, M., and Pietrokovski, S. (2017). The landscape of sex-differential transcriptome and its consequent selection in human adults. BMC Biology, 15(1), 1–15. Glastonbury, C. A., Small, K. S., Alves, A. C., and Moustafa, J. S. E.-S. (2018). Cell-type heterogeneity in adipose tissue is associated with complex traits and reveals disease- relevant cell-specific eQTLs. BioRxiv, 283929. Gordon, D., and Green, P. (2013). Consed: A graphical editor for next-generation sequencing. Bioinformatics, 29(22), 2936–2937. Gorski, R. A., Gordon, J. H., Shryne, J. E., and Southam, A. M. (1978). Evidence for a morphological sex difference within the medial preoptic area of the rat brain. Brain Research, 148(2), 333–346. Grant, C. E., Bailey, T. L., and Noble, W. S. (2011). FIMO: Scanning for occurrences of a given motif. Bioinformatics, 27(7), 1017–1018. Grath, S., and Parsch, J. (2012). Rate of amino acid substitution is influenced by the degree and conservation of male-biased transcription over 50 Myr of drosophila evolution. Genome Biology and Evolution, 4(3), 346–359. Green, H. J., Fraser, I. G., and Ranney, D. A. (1984). Male and female differences in enzyme activities of energy metabolism in vastus lateralis muscle. Journal of the Neurological Sciences, 65(3), 323–331. Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L., and Noble, W. S. (2007). Quantifying similarity between motifs. Genome Biology, 8(2). Harrison, P. W., Wright, A. E., Zimmer, F., Dean, R., Montgomery, S. H., Pointer, M. A., and Mank, J. E. (2015). Sexual selection drives evolution and rapid turnover of male gene expression. Proceedings of the National Academy of Sciences, 112(14), 4393–4398. Hayward, C. S., Kalnins, W. V., and Kelly, R. P. (2001). Gender-related differences in left ventricular chamber function. Cardiovascular Research, 49(2), 340–350. Hughes, J. F., Skaletsky, H., Brown, L. G., Pyntikova, T., Graves, T., Fulton, R. S., Dugan, S., Ding, Y., Buhay, C. J., Kremitzki, C., Wang, Q., Shen, H., Holder, M., Villasana, D., Nazareth, L. V, Cree, A., Courtney, L., … Page, D. C. (2012). Strict evolutionary conservation followed rapid gene loss on human and rhesus Y chromosomes. Nature, 483(7387), 82–86. Kanzleiter, T., Rath, M., Penkov, D., Puchkov, D., Schulz, N., Blasi, F., and Schurmann, A. (2014). Pknox1/Prep1 Regulates Mitochondrial Oxidative Phosphorylation Components in Skeletal Muscle. Molecular and Cellular Biology, 34(2), 290–298. Klein, S. L., and Flanagan, K. L. (2016). Sex differences in immune responses. Nature Reviews Immunology, 16(10), 626–638. Langmead, B., Trapnell, C., Pop, M., and Salzberg, S. L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 10(3), R25. Lesch, B. J., Silber, S. J., McCarrey, J. R., and Page, D. C. (2016). Parallel evolution of male

176 germline epigenetic poising and somatic development in animals. Nature Genetics, 48(8), 888–894. Li, B., Qing, T., Zhu, J., Wen, Z., Yu, Y., Fukumura, R., Zheng, Y., Gondo, Y., and Shi, L. (2017). A comprehensive mouse transcriptomic BodyMap across 17 tissues by RNA-seq. Scientific Reports, 7(1), 1–10. Liang, Y., Tsoi, L. C., Xing, X., Beamer, M. A., Swindell, W. R., Sarkar, M. K., Berthier, C. C., Stuart, P. E., Harms, P. W., Nair, R. P., Elder, J. T., Voorhees, J. J., Kahlenberg, J. M., and Gudjonsson, J. E. (2017). A gene network regulated by the transcription factor VGLL3 as a promoter of sex-biased autoimmune diseases. Nature Immunology, 18(2), 152–160. Lindenfors, P., L Gittleman, J., and Jones, K. (2007). Sexual size dimorphism in mammals. In Evolutionary Studies of Sexual Size Dimorphism (pp. 16–26). Oxford University Press. Lindholm, M. E., Huss, M., Solnestam, B. W., Kjellqvist, S., Lundeberg, J., and Sundberg, C. J. (2014). The human skeletal muscle transcriptome: Sex differences, alternative splicing, and tissue homogeneity assessed with RNA sequencing. FASEB Journal, 28(10), 4571–4581. Liu, R., Holik, A. Z., Su, S., Jansz, N., Chen, K., Leong, H. S., Blewitt, M. E., Asselin-Labat, M. L., Smyth, G. K., and Ritchie, M. E. (2015). Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses. Nucleic Acids Research, 43(15). Marin, R., Cortez, D., Lamanna, F., Pradeepa, M. M., Leushkin, E., Julien, P., Liechti, A., Halbert, J., Brüning, T., Mössinger, K., Trefzer, T., Conrad, C., Kerver, H. N., Wade, J., Tschopp, P., and Kaessmann, H. (2017). Convergent origination of a Drosophila -like dosage compensation mechanism in a reptile lineage. Genome Research, 1–14. Mathelier, A., Fornes, O., Arenillas, D. J., Chen, C. Y., Denay, G., Lee, J., Shi, W., Shyr, C., Tan, G., Worsley-Hunt, R., Zhang, A. W., Parcy, F., Lenhard, B., Sandelin, A., and Wasserman, W. W. (2016). JASPAR 2016: A major expansion and update of the open- access database of transcription factor binding profiles. Nucleic Acids Research, 44(D1), D110–D115. Mauvais-Jarvis, F. (2015). Sex differences in metabolic homeostasis, diabetes, and obesity. Biology of Sex Differences, 6(1), 1–9. McLeay, R. C., and Bailey, T. L. (2010). Motif Enrichment Analysis: A unified framework and an evaluation on ChIP data. BMC Bioinformatics, 11. Mele, M., Deluca, D. S., Segrè, A. V, Sullivan, T. J., Young, T. R., Gelfand, E. T., Trowbridge, C. A., Maller, J. B., Tukiainen, T., Lek, M., Ward, L. D., Kheradpour, P., Iriarte, B., Meng, Y., Palmer, C. D., Esko, T., Winckler, W., … Wright, F. A. (2015). The human transcriptome across tissues and individuals. Science (New York, N.Y.), 348(6235), 660– 665. Mercer, K. B., Dias, B., Shafer, D., Maddox, S. A., Mulle, J. G., Hu, P., Walton, J., and Ressler, K. J. (2016). Functional evaluation of a PTSD-associated genetic variant: Estradiol regulation and ADCYAP1R1. Translational Psychiatry, 6(12), 1–7. Merkin, J., Russell, C., Chen, P., and Burge, C. B. (2012). Evolutionary dynamics of gene and isoform regulation in Mammalian tissues. Science (New York, N.Y.), 338(6114), 1593–9. Newman, A. M., Liu, C. L., Green, M. R., Gentles, A. J., Feng, W., Xu, Y., Hoang, C. D., Diehn, M., and Alizadeh, A. A. (2015). Robust enumeration of cell subsets from tissue expression profiles. Nature Methods, 12(5), 453–457. Newman, M. S., Nguyen, T., Watson, M. J., Hull, R. W., and Yu, H.-G. (2017). Transcriptome profiling reveals novel BMI- and sex-specific gene expression signatures for human cardiac

177 hypertrophy. Physiological Genomics, 49(7), 355–367. Ngo, S. T., Steyn, F. J., and McCombe, P. A. (2014). Gender differences in autoimmune disease. Frontiers in Neuroendocrinology, 35(3), 347–369. Olson, H., Betton, G., Robinson, D., Thomas, K., Monro, A., Kolaja, G., Lilly, P., Sanders, J., Sipes, G., Bracken, W., Dorato, M., Van Deun, K., Smith, P., Berger, B., and Heller, A. (2000). Concordance of the toxicity of pharmaceuticals in humans and in animals. Regulatory Toxicology and Pharmacology, 32(1), 56–67. Parsch, J., and Ellegren, H. (2013). The evolutionary causes and consequences of sex-biased gene expression. Nature Reviews Genetics, 14(2), 83–87. Patro, R., Duggal, G., Love, M. I., Irizarry, R. A., and Kingsford, C. (2017). Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods, 14(4), 417– 419. Perry, J. C., Harrison, P. W., and Mank, J. E. (2014). The ontogeny and evolution of sex-biased gene expression in drosophila melanogaster. Molecular Biology and Evolution, 31(5), 1206–1219. Pertea, M., Pertea, G. M., Antonescu, C. M., Chang, T. C., Mendell, J. T., and Salzberg, S. L. (2015). StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology, 33(3), 290–295. Quinlan, A. R., and Hall, I. M. (2010). BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6), 841–842. Ranz, M., Castillo-davis, C. I., and Meiklejohn, C. D. (2003). Sex-Dependent Gene Expression and Evolution of the Drosophila. Science, 300(June), 1742–1745. Reemann, P., Reimann, E., Ilmjärv, S., Porosaar, O., Silm, H., Jaks, V., Vasar, E., Kingo, K., and Kõks, S. (2014). Melanocytes in the skin - Comparative whole transcriptome analysis of main skin cell types. PLoS ONE, 9(12), 1–17. Regitz-Zagrosek, V., Oertelt-Prigione, S., Prescott, E., Franconi, F., Gerdts, E., Foryst-Ludwig, A., Maas, A. H. E. M., Kautzky-Willer, A., Knappe-Wegner, D., Kintscher, U., Ladwig, K. H., Schenck-Gustafsson, K., and Stangl, V. (2016). Gender in cardiovascular diseases: Impact on clinical manifestations, management, and outcomes. European Heart Journal, 37(1), 24–34. Reinius, B., Saetre, P., Leonard, J. a, Blekhman, R., Merino-Martinez, R., Gilad, Y., and Jazin, E. (2008). An evolutionarily conserved sexual signature in the primate brain. PLoS Genetics, 4(6), e1000100. Ressler, K. J., Mercer, K. B., Bradley, B., Jovanovic, T., Mahan, A., Kerley, K., Norrholm, S. D., Kilaru, V., Smith, A. K., Myers, A. J., Ramirez, M., Engel, A., Hammack, S. E., Toufexis, D., Braas, K. M., Binder, E. B., and May, V. (2011). Post-traumatic stress disorder is associated with PACAP and the PAC1 receptor. Nature, 470(7335), 492–497. Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., and Smyth, G. K. (2015). Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47. Robinson, M. D., McCarthy, D. J., and Smyth, G. K. (2009). edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26(1), 139– 140. Ruigrok, A. N. V., Salimi-Khorshidi, G., Lai, M. C., Baron-Cohen, S., Lombardo, M. V., Tait, R. J., and Suckling, J. (2014). A meta-analysis of sex differences in human brain structure. Neuroscience and Biobehavioral Reviews, 39, 34–50.

178 Scotland, R., and Stables, M. (2011). Sex differences in resident immune cell phenotype underlie more efficient acute inflammatory responses in female mice. Blood, 118(22), 5918–5928. Shioura, K. M., Geenen, D. L., and Goldspink, P. H. (2008). Sex-related changes in cardiac function following myocardial infarction in mice. American Journal of Physiology- Regulatory, Integrative and Comparative Physiology, 295(2), R528–R534. Skaletsky, H., Kuroda-kawaguchi, T., Minx, P. J., Cordum, H. S., Hillier, L., Brown, L. G., Repping, S., Pyntikova, T., Ali, J., Bieri, T., Chinwalla, A., Delehaunty, A., Delehaunty, K., Du, H., Fewell, G., Fulton, L., Fulton, R., … Page, D. C. (2003). The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature, 423, 825– 838. Small, K. S., Todorčević, M., Civelek, M., El-Sayed Moustafa, J. S., Wang, X., Simon, M. M., Fernandez-Tajes, J., Mahajan, A., Horikoshi, M., Hugill, A., Glastonbury, C. A., Quaye, L., Neville, M. J., Sethi, S., Yon, M., Pan, C., Che, N., … McCarthy, M. I. (2018). Regulatory variants at KLF14 influence type 2 diabetes risk via a female-specific effect on adipocyte size and body composition. Nature Genetics, 50(4), 572–580. Soneson, C., Love, M. I., and Robinson, M. D. (2016). Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research, 4(2), 1521. Stroth, N., Holighaus, Y., Ait-Ali, D., and Eiden, L. E. (2011). PACAP: A master regulator of neuroendocrine stress circuits and the cellular stress response. Annals of the New York Academy of Sciences, 1220(1), 49–59. Tukiainen, T., Villani, A.-C., Yen, A., Rivas, M. A., Marshall, J. L., Satija, R., Aguirre, M., Gauthier, L., Fleharty, M., Kirby, A., Cummings, B. B., Castel, S. E., Karczewski, K. J., Aguet, F., Byrnes, A., Aguet, F., Ardlie, K. G., … MacArthur, D. G. (2017). Landscape of X chromosome inactivation across human tissues. Nature, 550(7675), 244–248. Urbut, S. M., Wang, G., Carbonetto, P., and Stephens, M. (2019). Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nature Genetics, 51(1), 187–195. Viguerie, N., Montastier, E., Maoret, J. J., Roussel, B., Combes, M., Valle, C., Villa-Vialaneix, N., Iacovoni, J. S., Martinez, J. A., Holst, C., Astrup, A., Vidal, H., Clément, K., Hager, J., Saris, W. H. M., and Langin, D. (2012). Determinants of Human Adipose Tissue Gene Expression: Impact of Diet, Sex, Metabolic Status, and Cis Genetic Regulation. PLoS Genetics, 8(9). Wells, J. C. K. (2007). Sexual dimorphism of body composition. Best Practice & Research Clinical Endocrinology & Metabolism, 21(3), 415–430. Werling, D. M. D. M., and Geschwind, D. H. D. H. (2013). Sex differences in autism spectrum disorders. Current Opinion in Neurology, 26(2), 146–153. Werling, D. M., Parikshak, N. N., and Geschwind, D. H. (2016). Gene expression in human brain implicates sexually dimorphic pathways in autism spectrum disorders. Nature Communications, 7, 1–11. Williams, T. M., Selegue, J. E., Werner, T., Gompel, N., Kopp, A., and Carroll, S. B. (2008). The Regulation and Evolution of a Genetic Switch Controlling Sexually Dimorphic Traits in Drosophila. Cell, 134(4), 610–623. Yang, X., Schadt, E. E., Wang, S., Wang, H., Arnold, A. P., Ingram-Drake, L., Drake, T. a, and Lusis, A. J. (2006). Tissue-specific expression and regulation of sexually dimorphic genes in mice. Genome Research, 16(8), 995–1004. Zhang, Y., Klein, K., Sugathan, A., Nassery, N., Dombkowski, A., Zanger, U. M., and Waxman,

179 D. J. (2011). Transcriptional profiling of human liver identifies sex-biased genes associated with polygenic dyslipidemia and coronary artery disease. PloS One, 6(8), e23506. Zhang, Y., Sloan, S. A., Clarke, L. E., Caneda, C., Plaza, C. A., Blumenthal, P. D., Vogel, H., Steinberg, G. K., Edwards, M. S. B., Li, G., Duncan, J. A., Cheshier, S. H., Shuer, L. M., Chang, E. F., Grant, G. A., Gephart, M. G. H., and Barres, B. A. (2016). Purification and Characterization of Progenitor and Mature Human Astrocytes Reveals Transcriptional and Functional Differences with Mouse. Neuron, 89(1), 37–53. Zhang, Y., Sturgill, D., Parisi, M., Kumar, S., and Oliver, B. (2007). Constraint and turnover in sex-biased gene expression in the genus Drosophila. Nature, 450(7167), 233–237.

180

Chapter 4. Conserved sex bias in autosomal gene expression contributes to sex differences in mammalian height and body size

Sahin Naqvi & David C. Page

Author contributions S.N. performed all analyses. S.N. and D.C.P. designed the study and wrote the paper. All data is either publicly available, where indicated, or available in the main text or the supplementary materials.

Acknowledgements We thank Sasha Gusev for assistance with TWAS methodology and data, and Alex K. Godfrey for helpful discussions.

Adapted from an article currently in review for publication.

181 Summary

In most mammalian species, the distribution of height or body size is shifted upwards in males relative to females, a difference often explained by opposing selective pressures in males and females. Here, by jointly analyzing sex-biased gene expression from five mammalian species and the effects of human genetic variation on both expression and height, we show that conserved sex bias in autosomal gene expression contributes to sex differences in height. Genes with conserved male- or female-biased expression show opposing effects on height, a difference that explains ~1.6 cm (12%) of the sex difference in human height. Further analyses highlight the female-biased transcription factor LCORL, previously associated with height and body size in multiple mammalian species. Thus, sex-biased gene expression, when layered upon genetic pathways acting identically in males and females, can lead to trait distributions that are shifted between the sexes. Taken together with previous studies, our findings suggest that opposing selective pressures on male and female height have resulted in sex-biased expression with important phenotypic consequences.

Introduction

Sex in mammals is genetically determined. Females are XX, and males are XY, which leads to sexual differentiation of the gonads into either ovaries or testes. Despite this initial anatomic binary, male and female distributions for many traits overlap substantially. One example is height. In human populations, males are, on average, 10-15 cm (7-13%) taller than females, but the distributions of heights in males and females overlap substantially (Figure 4.1). This sex

182 13 cm 8% increase 0.07 Females 0.06 Males 0.05 0.04 0.03 Density 0.02 0.01 0.00

140 150 160 170 180 190 200

Height (cm)

Figure 4.1. Overlapping but shifted distributions of male and female heights. Theoretical normal distributions using published means and standard deviations of male and female heights in individuals of European ancestry from the United Kingdom (Sanjak et al., 2017).

183 difference in body size is present in many additional mammalian species, with males more often being the larger sex (Lindenfors et al., 2007). Human height is highly polygenic; a recent meta- analysis reported 712 genome-wide significant loci (Yengo et al., 2018), and one recent study suggested that 100,000 single nucleotide polymorphisms (SNPs), spanning most of the genome, exert independent, causal effects on height (Boyle et al., 2017). However, the genetic architecture of height is very similar between the sexes, with recent human studies reporting a between-sex genetic correlation of 0.96 (Rawlik et al., 2016; Sidorenko et al., 2018). Only a handful of loci with sex-specific associations with height have been discovered, despite the use of increasingly large sample sizes (Sidorenko et al., 2018; Tukiainen et al., 2014). Sexual dimorphism in the growth hormone signaling axis likely plays an important role in the sex difference in mean body sizes; in both humans and rodents, males and females show discrete patterns of growth hormone secretion from the pituitary gland (Jaffe et al., 1998; Pincus et al.,

1996; Robinson & Hindmarsh, 2011). How these and other mechanisms contribute to the conserved sex difference in mammalian height and body size remains underexplored. Despite a largely sex-shared genetic architecture, height is likely subject to opposing selective pressures between the sexes. Studies in humans and other mammals have concluded that increased height favors greater reproductive success in males, while decreased height favors reproductive success in females (Sanjak et al., 2017; Stulp & Barrett, 2016).

These three key features of height as a trait – extreme polygenicity, sex-shared genetic architecture, and opposing selective pressures in males and females – led us to hypothesize that sex differences in gene expression could contribute to the sex difference in height and body size observed in humans and other mammals. Testing this hypothesis requires linking variation in gene expression levels to variation in height. To do so, we turned to transcriptome-wide

184 association studies (TWAS) (Gusev et al., 2016) that integrate two types of data: an expression quantitative trait loci (eQTL) study from a given tissue or cell-type, and genome-wide association studies (GWAS) of a given trait. Briefly, TWAS uses an eQTL study as a reference panel to impute the relative expression of a given gene based on multiple SNPs. This imputation is performed for all individuals assessed in the GWAS, which allows for association of imputed gene expression levels with the trait of interest, leveraging the much larger GWAS sample size.

By integrating the effects of multiple SNPs on both gene expression and the trait of interest,

TWAS provides directional expression-trait associations: genes with trait-increasing effects have a positive association between imputed expression and the trait, while genes with trait-decreasing effects have a negative association.

We reasoned that if sex-biased gene expression contributes to sex differences in height, genes with male-biased expression levels should show largely height-increasing effects, as measured by TWAS, while female-biased genes should show largely height-decreasing effects.

We further reasoned that sex-biased expression in a broad range of cell types could contribute to the sex difference in height; consistent with this, a recent study found that eQTLs in a very wide range of adult tissues were enriched for height GWAS associations (Gamazon et al., 2018). In

Chapter 3, we described sex differences in gene expression across five species (human, cynomolgus macaque, mouse, rat, and dog) and 12 tissues, which allowed us to stratify sex differences across the body in terms of their evolutionary conservation when testing our hypothesis regarding sex differences in height. We focused on three classes of sex bias: conserved (defined as showing consistent direction of bias in at least four of the five species), primate-specific, and human-specific.

185 Results

We first considered genes with genome-wide significant associations for height, as annotated in the NCBI-EBI GWAS catalog (MacArthur et al., 2017). To measure associations between expression of these genes and height, we used publicly available TWAS that integrated height

GWAS summary statistics, as measured by the UK Biobank (Loh et al., 2018) and the GIANT consortium (Wood et al., 2014), with reference eQTL panels from 43 different tissues, generated by the Genotype-Tissue Expression Project (GTEx)(The GTEx Consortium, 2017). For each reference eQTL panel-GWAS combination, TWAS yields expression-trait associations in the form of a signed Z-score for genes whose expression can be reasonably predicted based on

SNPs. TWAS Z-scores using either UK Biobank or GIANT height summary statistics were highly correlated (Figure 4.2), so we used summary statistics from a meta-analysis of the two studies (Yengo et al., 2018). TWAS Z-scores largely agree in sign across the 43 different tissues

(Figure 4.3), to a greater extent than expected by chance (Figure 4.4); we therefore combined Z- scores for each gene across tissues by meta-analysis. Sixty-two genome-wide significant height genes have both computed TWAS Z-scores and conserved sex bias in at least one of 12 tissues.

Genes with conserved male-biased expression have more positive TWAS Z-scores than genes with conserved female-biased expression (Figure 4.5A), but this difference was not observed when repeating the analysis for genes with human-specific (Figure 4.5B) or primate-specific

(Figure 4.5C) sex bias. These results indicate that, among genes with genome-wide significant associations for height, those with conserved male bias show broadly height-increasing effects across tissues, while those with conserved female bias show mostly height-decreasing effects.

We next expanded our analyses to include TWAS results for all genes, not just those in genome-wide significant loci. This allowed for a greater stringency by considering only TWAS

186 20

# gene−tissue pairs (log10) 3 10 2

1

0 0 TWAS Z − score (GIANT) TWAS Pearson r = 0.83 −10 p < 2.2e−16

−20

−40 −20 0 20 40 TWAS Z−score (UKBB)

Figure 4.2. Correlation of TWAS Z-scores using either UK Biobank (x-axis) or GIANT height (y-axis) GWAS. Each point represents the TWAS Z-score for a gene in a single eQTL reference panel (one of 48 GTEx tissues).

187 All genes Genes in GWS height loci

40

40 # genes # genes (log10) 30 20

30 3 15

2 10 20 5 20 1 0

10 10

# tissues w/ negative TWAS Z − score (UKBB) TWAS # tissues w/ negative 0 Z − score (UKBB) TWAS # tissues w/ negative 0

0 10 20 30 40 0 10 20 30 # tissues w/ positive TWAS Z−score (UKBB) # tissues w/ positive TWAS Z−score (UKBB)

Figure 4.3. Cross-tissue agreement of genic TWAS Z-scores. For each gene (left, all genes; right, all genome-wide significant height genes from the GWAS catalog), the number of tissues with positive (x-axis) or negative (y-axis) Z-scores is plotted.

188 Permuted gene−tissue True gene−tissue TWAS Z−scores TWAS Z−scores p < 0.001 250

200

150

100 Frequency

50

0

0.65 0.70 0.75 0.80 0.85

fraction sign agreement of TWAS Z−scores across tissues

Figure 4.4. Greater agreement of TWAS Z-scores across tissues than expected by chance. Fraction of tissues which agree in sign with respect to TWAS Z-scores, averaged across all genes (Figure 4.3). The average fraction for true TWAS Z-scores is in black, and the fractions obtained when permuting the TWAS Z-scores among gene-tissue pairs are shown in grey.

189 A B C 30 30 30

20 p = 0.023 20 20 p = 0.035 AS Z−score 10 ● 10 10 p = 0.21 W ● ● AS Z−score (UKBB) AS Z−score (UKBB) ● W 0 W 0 0 ● ● −10 −10 −10

Cross−tissue T −20 n=34 n=31 −20 n=49 n=60 −20 n=6 n=5 Cross−tissue T Female Male Cross−tissue T Female Male Female Male

Conserved sex bias Human gain sex bias Primate gain sex bias

Figure 4.5. Cross-tissue expression-height associations of genome-wide significant height genes showing sex-biased expression. TWAS Z-scores for genome-wide significant height genes with either female (orange) or male (blue) bias in one of 12 tissues, either (A) conserved across mammals, (B) specific to humans, or (C) specific to primates. For each gene, TWAS Z- scores were meta-analyzed across 48 GTEx tissues. Points represent group means; whiskers represent 95% confidence intervals. P-values calculated by permutation test of mean difference.

190 Z-scores calculated for the same tissue in which sex-biased expression was observed, making the unit of analysis a gene-tissue pair. Five hundred sixty gene-tissue pairs have both computed

TWAS Z-scores and conserved sex bias; these are distributed across all 12 tissues, with the largest numbers in muscle, adipose, and pituitary (Figure 4.6). Gene-tissue pairs with conserved male bias have more positive TWAS Z-scores than those with conserved female bias (Figure

4.7A). No significant difference was observed when considering gene-tissue pairs with human- specific (Figure 4.7B) or primate-specific (Figure 4.7C) sex bias. These results indicate that when considering sex bias and expression-height associations in the same tissue, genes with conserved male-biased expression show height-increasing effects, while genes with conserved female-biased expression show height-decreasing effects.

We next sought to quantify the fraction of sex difference in height that could be explained by conserved sex bias in gene expression. Across human populations, the sex difference in mean heights ranges from approximately 10 to 15 cm, representing an 8-15% increased mean height in males. We focused on gene-tissue pairs where the conserved sex bias was in the same tissue as the TWAS Z-score (i.e. Figure 4.7A), as this represents the more stringent analysis. We estimated the contribution of conserved sex bias to the height sex difference with two approaches (Materials and Methods). In the first approach, we performed calculations on a physical scale, using the height effect size of the SNP that was the best cis- eQTL for each gene-tissue pair, which is likely a conservative approximation due to the relative magnitudes of sex bias and cis-eQTL effect sizes (Figure 4.8). In the second approach, we used the TWAS Z-score to estimate the relative contribution of each gene-tissue pair in terms of a relative fold-change (in height) between the sexes, incorporating both the magnitude of sex bias

191 and the cis-heritability of expression. In both cases we combined the additive effects of each gene-tissue pair based on

70

60

50

40

30

20

10

0 Skin Liver Lung # genes with conserved sex bias & TWAS Z − scores bias & TWAS # genes with conserved sex Brain Heart Colon Spleen Muscle Thyroid Adrenal Adipose Pituitary

Figure 4.6. Representation of conserved sex bias and TWAS Z-scores among 12 tissues. For each tissue, the number of genes with both conserved sex bias (blue, male bias; orange, female bias) and significantly predictive TWAS Z-scores is shown.

192 A B C

2 2 2 p = 0.44 1 1 1 p = 0.19

AS Z−score p = 0.039 AS Z−score AS Z−score

W W ● W ● ● 0 0 0 ●

● ● −1 −1 −1

Same−tissue T −2 n=200 n=248 Same−tissue T −2 n=848 n=834 Same−tissue T −2 n=30 n=35

Female Male Female Male Female Male

Conserved sex bias Human gain sex bias Primate gain sex bias

Figure 4.7. Same-tissue expression-height associations of gene-tissue pairs showing sex- biased expression. TWAS Z-scores for gene-tissue pairs with either female (orange) or male (blue) bias (A) conserved across mammals, (B) specific to humans, or (C) specific to primates, in all cases in the same tissue as the computed TWAS Z-score. Points represent group means; whiskers represent 95% confidence intervals. P-values calculated by permutation test of mean difference.

193 condition ● Sex bias ● best eQTL

Wilcoxon, p = 1.6e−●15

● 1.0 ● ● ● ● ● ● ● ● ● ●

● ● ● ● ●

abs(effect size) abs(effect ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 Sex bias best eQTL

Figure 4.8. Comparison of effect sizes for conserved sex bias and eQTLs. For sex bias, the effect size (y-axis) is the log2(M/F) expression ratio of the gene in the tissue in which it is sex- biased. For the best eQTL, we used the log2 allelic fold-change (aFC) in the same tissue as sex bias, when it was available, scaled to a per-allele estimate (see Materials and Methods).

194 the agreement between the direction of conserved sex bias and TWAS Z-score. The first approach estimated that conserved sex bias contributes a 1.6 cm sex difference in height, which is 12% of the ~13 cm sex difference in mean heights observed in Europeans. The second approach estimated that conserved sex bias contributes a 0.98% difference between males and females, or 12% of the observed 8% increase in the mean height of males. Thus, two different approaches yielded similar estimates of the contribution of conserved sex-biased gene expression to the sex difference in human height: approximately 12% of the observed phenotypic sex difference. As a comparison, all common SNPs explain approximately 60% of the phenotypic variance in total human height (Yang et al., 2010).

Our results thus far have focused on expression-height associations in humans, where large GWAS and eQTL studies provide the richest resources. However, the observation that significant differences between TWAS Z-scores for male- and female-biased genes are only observed when considering conserved sex bias suggests a contribution to sex differences in size in other mammals. Indeed, there are sex differences in size in all five species assessed for conserved sex-biased gene expression (Figure 4.9). A striking example is the transcription factor

LCORL, which shows conserved female bias in the pituitary gland (Figure 4.10A) and is height- decreasing in humans (cross-tissue TWAS Z-score = -28.7). Although LCORL does not have a computed TWAS Z-score for the pituitary gland due to lack of predictive ability of multiple

SNPs for LCORL expression, a single allele at the LCORL locus associated with increased expression in the pituitary gland is also associated with decreased height (Figure 4.10B).

Notably, genetic variation at the LCORL locus has also been associated with height or body size in dogs (Hayward et al. 2016), cattle (Bouwman et al., 2018), and horses (Signer-Hasler et al.,

2012).

195 o 1.20

1.15

1.10

1.05

1.00 Male/female height/body size rati Rat Dog Cyno Human Mouse Figure 4.9. Published estimates of sexual size dimorphism in the five species. Estimates from human (Ruff, 2002), cyno (Dittus, 2004), and dog (Frynta et al., 2012) represent male/female standing height ratios, while mouse (Gregorová et al., 2008) and rat (Porter et al., 2015) represent male/female length ratios (nose to anus/base of tail).

196 A B LCORL rs7669359 (C) LCORL,pituita ry ● Female ● Male ● 7 ● ●

●● 6 ● ● ● (CPM) ● -0.4 -0.2 0.0 0.2 0.4 ● 2 ● ● ● ● GTEx pituitary eQTL slope ● ● ● 5 ● ● ●● ● ● ●● ● ● ●● GIANT ●●● ●● 4 ●● ●● ●● ●●●● ● ●●●● ●●●● ●●●● ●● UK Biobank Expression, log 3 ● ● ● -0.02 -0.01 0.0 0.01 0.02 Human Cyno Mouse Rat Dog Height effect size (s.d.)

C ●

● ● ● ● 1.5 ● ● ● ● ● ● ● ● r = 0.73 ● ● ● ●● ● ● ● ● ● ● ● ● ● 1.0 ● ●●● ● p < 2.2e−16 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ●● ● ●● ●● ● ● ● ● ● ● ●●● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● 0.5 ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●●●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●● ●●● ● ● ● ●● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● 0.0 ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ●●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ●● ● ● ● ● ●●● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ●●● ● ● ●●●●●●● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ●●●●● ● ● ● ● ● ●●●●● ●● ●● ● ● ●●●●● ● ● ● ● ● y log2(M/F), Seo 2016 ● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ●●●●● ● ● ● r ●● ● ●●● ● −0.5 ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● LCORL ● ● ● ● ● −1.0 ● ● ● ● ●

● −1.5 ● Cattle pituita ●

● −1.0 −0.5 0.0 0.5 1.0

Boroeutherian pituitary log2(M/F), Naqvi 2019

Figure 4.10. Conserved sex bias and expression-height associations for LCORL. (A) Conserved female bias of LCORL, a transcription factor genetically associated with height, in the pituitary gland. (B) For a SNP at the LCORL locus, the effects on LCORL expression in the pituitary gland (top) and height (bottom) are plotted as mean +/- standard error. (C) For every gene with conserved sex bias in the pituitary, the average sex bias across five species (human, cyno, mouse, rat, and dog, x-axis) is compared to sex bias in the cattle pituitary (Seo et al., 2016) (y-axis).

197 Re-analysis of publicly available RNA-seq data (Seo et al., 2016) shows that LCORL is one of the most strongly female-biased autosomal genes in the cattle pituitary gland (Figure 4.10C).

Remarkably, an allele associated with increased body size in horses is associated with decreased

LCORL expression in hair root (Metzger et al., 2013), indicating that the negative association between LCORL expression and height extends beyond humans. Together, these observations suggest that female-biased expression of LCORL contributes to sex differences in size in multiple mammalian species. Beyond LCORL, studies have observed significant overlap in genome-wide significant height loci between humans (Wood et al., 2014), dogs (Hayward et al.,

2016), and cattle (Bouwman et al., 2018), suggesting a broader contribution of conserved sex- biased gene expression to sex differences in body size in a range of non-human mammalian species.

Discussion

By integrating information on sex-biased gene expression with transcriptome-wide association studies linking changes in gene expression to variation in height, we have shown that conserved sex bias in gene expression explains an appreciable fraction of the sex difference in human height. Further analysis of LCORL, a transcription factor with demonstrated roles in body size in multiple mammalian species, suggests that conserved sex bias in gene expression also contributes to sex differences in body size in a range of non-human mammalian species.

In addition to the action of sexually antagonistic selection, where increased height favors greater reproductive success in males, while decreased height favors reproductive success in females, height is likely subject to balancing selection, where in both sexes, extreme variation in either direction negatively impacts reproductive fitness (Sanjak et al., 2017). The combination of

198 sexually antagonistic and balancing selection is expected to maintain phenotypic sex differences over long evolutionary times, and is consistent with our finding that conserved (rather than species- or lineage-specific) sex bias in gene expression contributes to sex differences in height.

Studies of selection on height have illustrated how males and females can have different optimal values for a quantitative trait. Our results provide a concrete example of how such optimal values can be reached – through the acquisition and maintenance of sex-biased gene expression (Parsch

& Ellegren, 2013).

Beyond evolutionary considerations, our results illustrate one way in which sex-biased gene expression can lead to phenotypic sex differences: genes that operate identically in males and females to influence a trait can be expressed more abundantly in one sex. While studies support the idea that for height, the vast majority of genetic variation acts identically between the sexes, pronounced sex-specific genetic effects have been demonstrated for waist-to-hip ratio, body mass index, and adjusted combinations of the two (Randall et al., 2013; Shungin et al.,

2015; Winkler et al., 2015), as well as for thyroid hormone levels (Porcu et al., 2013) and obsessive-compulsive disorder (Khramtsova et al., 2018). Fully accounting for such sex differences in genetic architecture in association mapping, and integrating this information with either sex-specific effects on gene expression or overall sex biases in gene expression, will require development of novel analytical approaches (Kang et al., 2018). Nevertheless, our study provides a proof-of-concept for understanding how sex bias in gene expression contributes to phenotypic sex differences.

199 Methods

Transcriptome-wide association study (TWAS) data

We generated TWAS Z-scores using FUSION (Gusev et al., 2016) with default parameters. We used three sources of height GWAS summary statistics: 1) the UK Biobank (Loh et al., 2018)

(https://data.broadinstitute.org/alkesgroup/UKBB/body_HEIGHTz.sumstats.gz), 2) the GIANT consortium (Wood et al., 2014)

(https://portals.broadinstitute.org/collaboration/giant/images/0/01/GIANT_HEIGHT_Wood_et_a l_2014_publicrelease_HapMapCeuFreq.txt.gz), or 3) a meta-analysis of the two studies (Yengo et al., 2018) (https://portals.broadinstitute.org/collaboration/giant/images/6/63/Meta- analysis_Wood_et_al%2BUKBiobank_2018.txt.gz)

For eQTL data, we used preprocessed reference panels for 48 tissues from the GTEx Consortium

(v7) (The GTEx Consortium, 2017) downloaded from: http://gusevlab.org/projects/fusion/. For all downstream analyses, we only used Z-scores from TWAS models that were significantly predictive of expression levels, as determined by model cross-validation (Benjamini-Hochberg

FDR 5%). We combined TWAS Z-scores for the same gene across tissue reference panels under a fixed-effects model, summing all Z-scores for a gene and dividing by the square root of the number of Z-scores for that gene.

Estimating the contribution of conserved sex bias to sex differences in mean height

For each of 560 gene-tissue pairs, we have a direction of conserved sex bias (male- or female- biased) as well as a TWAS Z-score (positive means increased expression is height-increasing, negative means that increased expression is height-decreasing). In the first approach, for each gene, we used the SNP that is the best eQTL (in the same tissue as sex bias) as a proxy. We assume that all of this SNP’s per-allele effect on height is mediated through cis effects on gene

200 expression. While this assumption likely does not hold for all SNPs, we found the magnitude of a gene’s sex bias to be approximately two-fold greater than the per-allele effect size of the best eQTL (Figure 4.8), meaning that even if only ~50% of a SNP’s per-allele effect on height were mediated through cis-effects on gene expression, using the SNP as a proxy for sex bias of a gene is still reasonable. Next, we signed the height effect of the SNP based on the agreement between sex bias and the TWAS Z-score. For example, if a gene shows conserved male (female) bias and the TWAS Z-score is positive (negative), the sex bias of this gene is expected to contribute to the sex difference in height, and therefore the associated SNP effect size for this analysis should be positive. In contrast, if a gene shows conserved male (female) bias and the TWAS Z-score is negative (positive), the sex bias of this gene should reduce the sex difference in height, and therefore the associated SNP effect size should be negative. Making the common assumption of an additive contribution of each SNP to height, we sum up all such signed effect sizes to obtain an estimate of the contribution of conserved sex bias in gene expression to sex differences in height. Finally, since height effect sizes are reported in terms of sex-specific standard deviations

(6.2 cm in UK Biobank females, 6.6 cm in UK Biobank males), we multiplied the summed effect sizes by 6.4, the mean of the male and female standard deviations.

In the second approach, we computed height fold-changes directly from TWAS Z-scores.

Under the assumption that standard errors of TWAS effect sizes are proportional to GWAS sample size N, � = ! . Here, � represents the correlation between imputed expression and ! height. However, the imputation of expression can only be as good as the overall cis-heritability

! of gene’s expression levels, ℎ!"#, which is measured in the reference eQTL panel. Assuming that inaccuracies in imputation are randomly distributed, lower heritability (closer to 0) would mean a lower true correlation between expression and height, thus the correlation between true

201 ! expression and height can then be defined as �!"# !"# = � ∗ ℎ!"#. This number represents the percentage change in height as a result of a 100% increase (i.e. doubling) of expression. For all of the gene-tissue pairs under consideration, the magnitude of sex bias is smaller than a 100% change – we therefore defined �!"#,!"# !"# = �!"# !"# ∗ �!", where �!"# is the percentage increase in either sex. �!"#,!"# !"# represents the percentage change in height as a result of sex bias of a given gene-tissue pair. As in the first approach, we signed �!"#,!"# !"# based on the agreement between the direction of sex bias and the direction of the TWAS Z-score for each gene-tissue pair and summed all �!"#,!"# !"# to obtain an estimate of the total percentage change in height explained by conserved sex bias.

cis-eQTL effect sizes

To quantify cis-eQTL effect sizes, we relied on log2 allelic fold change (aFC) estimates, which are provided for significant eGene associations

(https://storage.googleapis.com/gtex_analysis_v7/single_tissue_eqtl_data/GTEx_Analysis_v7_e

QTL.tar.gz). aFC represents the expected log2 fold-change in expression between individuals homozygous for either the reference or alternative allele (i.e. 0 vs 2 copies of the alternative allele)(Mohammadi et al., 2017); however, in this study we focus on the per-allele effects of the best eQTL on height as reported in GWAS. Therefore, we converted aFC estimates to a per-

!"# allele estimate using the formula ��� = ��� (! !!), where ��� is the original estimate and !!" ! !

���!!" is the new estimate which represents the expected log2 fold-change in expression between individuals homozygous for the reference allele and heterozygous for the reference and alternative alleles.

202 Sex-biased expression

We used sex bias assignments from Chapter 3, where we determined three relevant evolutionary classes of sex bias in each of 12 tissues (adipose, adrenal, brain, colon, liver, lung, muscle, heart, pituitary, spleen, thyroid): conserved sex bias, defined as having a consistent direction of effect in four species out of human, cynomologus macaque, mouse, rat, and dog; human-specific sex bias, and primate-specific sex bias.

To assess sex bias in cattle pituitary gland, we used RNA-seq data from Seo et al (Seo et al., 2016) (GSE65125). Transcript-level transcripts per million (TPM) and counts were quantified using salmon v0.9.1 (Patro et al., 2017) with the following parameters/flags: -- seqBias, --gcBias, --useVBOpt and a salmon index built on cattle transcriptome annotations from

Ensembl v91. The tximport (Soneson et al., 2016) R package (with countsFromAbundance =

“lengthScaledTPM”) was used to sum transcript-level counts and transcript per million (TPM) values from salmon to the gene level. Linear modeling to determine sex-biased expression was performed using the limma (Ritchie et al., 2015) R package, with the voom functionality

(through the voomWithQualityWeights (Liu et al., 2015) function) for analyzing RNA-seq read counts. We assessed sex bias for all genes with mean TPM > 1 in either males or females.

References

Bouwman, A. C., Daetwyler, H. D., Chamberlain, A. J., Ponce, C. H., Sargolzaei, M., Schenkel, F. S., Sahana, G., Govignon-Gion, A., Boitard, S., Dolezal, M., Pausch, H., Brøndum, R. F., Bowman, P. J., Thomsen, B., Guldbrandtsen, B., Lund, M. S., Servin, B., … Hayes, B. J. (2018). Meta-analysis of genome-wide association studies for cattle stature identifies common genes that regulate body size in mammals. Nature Genetics, 50(3), 362–367. Boyle, E. A., Li, Y. I., and Pritchard, J. K. (2017). An expanded view of complex traits: from polygenic to omnigenic. Cell, 169(7), 1177–1186. Dittus, W. (2004). Demography: a window to social evolution. In B. Thierry, M. Singh, & W.

203 Kaumanns (Eds.), Macaque societies: a model for the study of social organization (pp. 87– 112). Cambridge (UK): Cambridge University Press. Frynta, D., Baudyšová, J., Hradcová, P., Faltusová, K., and Kratochvíl, L. (2012). Allometry of sexual size dimorphism in domestic dog. PLoS ONE, 7(9), 5–10. Gamazon, E. R., Segrè, A. V., Van De Bunt, M., Wen, X., Xi, H. S., Hormozdiari, F., Ongen, H., Konkashbaev, A., Derks, E. M., Aguet, F., Quan, J., Nicolae, D. L., Eskin, E., Kellis, M., Getz, G., McCarthy, M. I., Dermitzakis, E. T., … Ardlie, K. G. (2018). Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation. Nature Genetics, 50(7), 956–967. Gregorová, S., Divina, P., Storchova, R., Trachtulec, Z., Fotopulosova, V., Svenson, K. L., Donahue, L. R., Paigen, B., and Forejt, J. (2008). Mouse consomic strains: Exploiting genetic divergence between Mus m. musculus and Mus m. domesticus subspecies. Genome Research, 18(3), 509–515. Gusev, A., Ko, A., Shi, H., Bhatia, G., Chung, W., Penninx, B. W. J. H., Jansen, R., De Geus, E. J. C., Boomsma, D. I., Wright, F. A., Sullivan, P. F., Nikkola, E., Alvarez, M., Civelek, M., Lusis, A. J., Lehtimäki, T., Raitoharju, E., … Pasaniuc, B. (2016). Integrative approaches for large-scale transcriptome-wide association studies. Nature Genetics, 48(3), 245–252. Hayward, J. J., Castelhano, M. G., Oliveira, K. C., Corey, E., Balkman, C., Baxter, T. L., Casal, M. L., Center, S. A., Fang, M., Garrison, S. J., Kalla, S. E., Korniliev, P., Kotlikoff, M. I., Moise, N. S., Shannon, L. M., Simpson, K. W., Sutter, N. B., … Boyko, A. R. (2016). Complex disease and phenotype mapping in the domestic dog. Nature Communications, 7, 1–11. Jaffe, C. A., Ocampo-Lim, B., Guo, W., Krueger, K., Sugahara, I., DeMott-Friberg, R., Bermann, M., and Barkan, A. L. (1998). Regulatory mechanisms of growth hormone secretion are sexually dimorphic. Journal of Clinical Investigation, 102(1), 153–164. Kang, E. Y., Lee, C. H., Furlotte, N. A., Joo, J. W. J., Kostem, E., Zaitlen, N., Eskin, E., and Han, B. (2018). An association mapping framework to account for potential sex difference in genetic architectures. Genetics, 209(July), genetics.300501.2017. Khramtsova, E. A., Heldman, R., Derks, E. M., Yu, D., and Davis, L. K., and Stranger, B. E. (2018). Sex differences in the genetic architecture of obsessive–compulsive disorder. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics, 1–14. Lindenfors, P., L Gittleman, J., and Jones, K. (2007). Sexual size dimorphism in mammals. In Evolutionary Studies of Sexual Size Dimorphism (pp. 16–26). Oxford University Press. Liu, R., Holik, A. Z., Su, S., Jansz, N., Chen, K., Leong, H. S., Blewitt, M. E., Asselin-Labat, M. L., Smyth, G. K., and Ritchie, M. E. (2015). Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses. Nucleic Acids Research, 43(15). Loh, P., Kichaev, G., Gazal, S., Schoech, A. P., and Price, A. L. (2018). Mixed-model association for biobank-scale datasets. Nature Genetics, 50(July), 906–908. MacArthur, J., Bowler, E., Cerezo, M., Gil, L., Hall, P., Hastings, E., Junkins, H., McMahon, A., Milano, A., Morales, J., MayPendlington, Z., Welter, D., Burdett, T., Hindorff, L., Flicek, P., Cunningham, F., and Parkinson, H. (2017). The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Research, 45(D1), D896– D901. Metzger, J., Schrimpf, R., Philipp, U., and Distl, O. (2013). Expression levels of LCORL are associated with body size in horses. PLoS ONE, 8(2), 1–9.

204 Mohammadi, P., Castel, S. E., Brown, A. A., and Lappalainen, T. (2017). Quantifying the regulatory effect size of cis-acting genetic variation using allelic fold change. Genome Research, 27(11), 1872–1884. Naqvi, S., Godfrey, A. K., Hughes, J. F., Goodheart, M. L., Mitchell, R. N., and Page, D. C. (2019). Evolutionary dynamics of sex-biased gene expression in mammalian tissues. In Prep. Parsch, J., and Ellegren, H. (2013). The evolutionary causes and consequences of sex-biased gene expression. Nature Reviews Genetics, 14(2), 83–87. Patro, R., Duggal, G., Love, M. I., Irizarry, R. A., and Kingsford, C. (2017). Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods, 14(4), 417– 419. Pincus, S. M., Gevers, E. F., Robinson, I. C., Berg, G. van den, Roelfsema, F., Hartman, M. L., and Veldhuis, J. D. (1996). Females secrete growth hormone with more process irregularity than males in both humans and rats. American Journal of Physiology - Endocrinology And Metabolism, 270(1), E107--E115. Porcu, E., Medici, M., Pistis, G., Volpato, C. B., Wilson, S. G., Cappola, A. R., Bos, S. D., Deelen, J., den Heijer, M., Freathy, R. M., Lahti, J., Liu, C., Lopez, L. M., Nolte, I. M., O’Connell, J. R., Tanaka, T., Trompet, S., … Naitza, S. (2013). A meta-analysis of thyroid- related traits reveals novel loci and gender-specific differences in the regulation of thyroid function. PLoS Genetics, 9(2). Porter, F. H., Costa, F., Rodrigues, G., Farias, H., Cunha, M., Glass, G. E., Reis, M. G., Ko, A. I., and Childs, J. E. (2015). Morphometric and demographic differences between tropical and temperate Norway rats (Rattus norvegicus). Journal of Mammalogy, 96(2), 317–323. Randall, J. C., Winkler, T. W., Kutalik, Z., Berndt, S. I., Jackson, A. U., Monda, K. L., Kilpeläinen, T. O., Esko, T., Mägi, R., Li, S., Workalemahu, T., Feitosa, M. F., Croteau- Chonka, D. C., Day, F. R., Fall, T., Ferreira, T., Gustafsson, S., … Heid, I. M. (2013). Sex- stratified genome-wide association studies including 270,000 individuals show sexual dimorphism in genetic loci for anthropometric traits. PLoS Genetics, 9(6). Rawlik, K., Canela-Xandri, O., and Tenesa, A. (2016). Evidence for sex-specific genetic architectures across a spectrum of human complex traits. Genome Biology, 17(1), 1–8. Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., and Smyth, G. K. (2015). Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47. Robinson, I. C. A. F., and Hindmarsh, P. C. (2011). The Growth Hormone Secretory Pattern and Statural Growth. In Comprehensive Physiology (pp. 329–395). American Cancer Society. Ruff, C. (2002). Variation in human body size and shape. Annual Review of Anthropology, 31(1), 211–232. Sanjak, J. S., Sidorenko, J., Robinson, M. R., Thornton, K. R., and Visscher, P. M. (2017). Evidence of directional and stabilizing selection in contemporary humans. Proceedings of the National Academy of Sciences, 115(20), 201707227. Seo, M., Caetano-Anolles, K., Rodriguez-Zas, S., Ka, S., Jeong, J. Y., Park, S., Kim, M. J., Nho, W. G., Cho, S., Kim, H., and Lee, H. J. (2016). Comprehensive identification of sexually dimorphic genes in diverse cattle tissues using RNA-seq. BMC Genomics, 17(1), 1–18. Shungin, D., Winkler, T. W., Croteau-Chonka, D. C., Ferreira, T., Locke, A. E., M�gi, R., Strawbridge, R. J., Pers, T. H., Fischer, K., Justice, A. E., Workalemahu, T., Wu, J. M. W., Buchkovich, M. L., Heard-Costa, N. L., Roman, T. S., Drong, A. W., Song, C., … Mohlke,

205 K. L. (2015). New genetic loci link adipose and insulin biology to body fat distribution. Nature, 518(7538), 187–196. Sidorenko, J., Kassam, I., Kemper, K., Zeng, J., Lloyd-Jones, L., Montgomery, G. W., Gibson, G., Metspalu, A., Esko, T., Yang, J., McRae, A. F., and Visscher, P. M. (2018). The effect of X-linked dosage compensation on complex trait variation. BioRxiv, 433870. Signer-Hasler, H., Flury, C., Haase, B., Burger, D., Simianer, H., Leeb, T., and Rieder, S. (2012). A genome-wide association study reveals loci influencing height and other conformation traits in horses. PLoS ONE, 7(5), 3–8. Soneson, C., Love, M. I., and Robinson, M. D. (2016). Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research, 4(2), 1521. Stulp, G., and Barrett, L. (2016). Evolutionary perspectives on human height variation. Biological Reviews, 91(1), 206–234. The GTEx Consortium. (2017). Genetic effects on gene expression across human tissues. Nature, 550(7675), 204–213. Tukiainen, T., Pirinen, M., Sarin, A., Ladenvall, C., Kettunen, J., Lokki, M., Perola, M., Sinisalo, J., and Vlachopoulou, E. (2014). Chromosome X-wide association study identifies loci for fasting insulin and height and evidence for incomplete dosage compensation, 10(2). Winkler, T. W., Justice, A. E., Graff, M., Barata, L., Feitosa, M. F., Chu, S., Czajkowski, J., Esko, T., Fall, T., Kilpeläinen, T. O., Lu, Y., Mägi, R., Mihailov, E., Pers, T. H., Rüeger, S., Teumer, A., Ehret, G. B., … Loos, R. J. F. (2015). The influence of age and sex on genetic associations with adult body size and shape: a large-scale genome-wide interaction study. PLoS Genetics, 11(10), 1–42. Wood, A. R., Esko, T., Yang, J., Vedantam, S., Pers, T. H., Gustafsson, S., Chu, A. Y., Estrada, K., Luan, J., Kutalik, Z., Amin, N., Buchkovich, M. L., Croteau-Chonka, D. C., Day, F. R., Duan, Y., Fall, T., Fehrmann, R., … Frayling, T. M. (2014). Defining the role of common variation in the genomic and biological architecture of adult human height. Nature Genetics, 46(11), 1173–1186. Yang, J., Benyamin, B., McEvoy, B. P., Gordon, S., Henders, A. K., Nyholt, D. R., Madden, P. A., Heath, A. C., Martin, N. G., Montgomery, G. W., Goddard, M. E., and Visscher, P. M. (2010). Common SNPs explain a large proportion of the heritability for human height. Nature Genetics, 42(7), 565–569. Yengo, L., Sidorenko, J., Kemper, K. E., Zheng, Z., Wood, A. R., Weedon, M. N., Frayling, T. M., Hirschhorn, J., Yang, J., Visscher, P. M., and Consortium, the G. (2018). Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry. BioRxiv.

206

Chapter 5.

Conclusions and future directions

207 Sex differences are widespread in mammalian health and disease, but remain poorly understood.

In this thesis, I have described three studies that trace the pathway from the sex chromosomes to phenotypic sex differences, all with an evolutionary lens. In this chapter, I discuss the broad implications of these studies and consider future directions of research.

The contribution of the mammalian sex chromosomes to sex differences

The occurrence of naturally occurring sex chromosome aneuploidies and sex-reversed mouse models represent exciting possibilities to study the contribution of sex chromosomes to sex differences in gene expression. These approaches have already provided evidence that the sex chromosomes do contribute to phenotypes known to be sexually dimorphic. For example, individuals with an XXY karyotype, which often results in Kleinfelter’s syndrome, have an increased incidence of autoimmune disorders such as Sjogren’s syndrome, multiple sclerosis, and systemic lupus erythematosus relative to XY indviduals (Seminog et al., 2015), suggesting a contribution of X chromosome dosage to susceptibility to autoimmune disorders. In the sex- reversed mouse model, Sry is deleted from the Y chromosome and instead located on an autosome, allowing for the disassociation of gonadal and chromosomal sex through the breeding of XY males, XX females, XY females, and XX males (Arnold & Chen, 2009; Itoh et al., 2015).

This model, often referred to as the “four core genotypes” model, has been used to show that sex chromosome complement influences a number of sexually dimorphic phenotypes, including adiposity and metabolism (Chen et al., 2013, 2012), immune responses (Palaszynski et al., 2005), and brain differentiation and behavior (Carruth et al., 2002; Quinn et al., 2007). However, studying both human sex chromosome aneuploidies and sex-reversed mice integrates the effects

208 of all genes on the sex chromosomes, with follow-up studies required to interrogate the effect of specific genes.

As discussed in the introductory chapter of this thesis, X-linked genes with a surviving Y homolog (X-Y pairs) and those with no Y homolog and escaping X inactivation (X escape genes) are candidates for creating sex-biased gene expression across the genome as the result of differences in their dosage between males and females. They are therefore also candidates for mediating the above-described effects observed in human sex chromosome aneuploidies or sex- reversed mice. Chapter 2 described a study of dosage sensitivity on the mammalian and avian sex chromosomes using the tool of conserved microRNA (miRNA) targeting. This approach revealed that among X-linked genes, X-Y pairs are the most dosage-sensitive, X-inactivated genes are of intermediate dosage sensitivity, and X escape genes are the least dosage sensitive.

The fact that X escape genes are the least dosage sensitive suggests that differences in their expression between males and females due to escape from X inactivation may not be of great functional consequence. In contrast, X-Y pairs are highly dosage-sensitive, which suggests that even small sex differences in expression resulting from the X-linked homolog escaping X inactivation, relative changes in the expression or function of the Y-linked homolog, or a combination of two, could have significant impacts on gene expression throughout the rest of the genome.

An omnigenic model for sex differences in complex traits

Chapter 4 described how conserved sex bias in autosomal gene expression explains a significant fraction of the sex difference in mean human height. As described in Chapter 3, the magnitude of sex bias for individual instances is relatively small (mostly less than two-fold), but many

209 instances of conserved sex bias (560 total gene-tissue pairs with both conserved sex bias and a contribution to height) seem to contribute. This finding, of a large number of genes each contributing a small amount to sex differences, has similarity with a recently proposed

“omnigenic” model for the link between genetic variation and complex traits (Boyle et al., 2017).

This model was motivated by the observation that for many complex traits, a huge number of common genetic variants spread widely across the genome explain most of the heritability. To explain these observations, the omnigenic model proposes that while there is a set of “core” genes that directly affect a trait, gene regulatory networks are sufficiently interconnected such that any gene expressed in a relevant cell-type (so-called “peripheral” genes) can contribute to variation by impacting the expression of core genes. These effects, while individually small and largely driven by trans-regulatory interactions, can cumulatively contribute the majority of trait heritability (Liu et al., 2018).

This model provides a convenient explanation for how sex-biased gene expression across the genome can contribute to sex differences in complex traits. Even sex bias in the expression of peripheral genes, through interconnected regulatory networks, can lead to sex-biased activity of the core gene pathways. Boyle et al proposed that a better understanding of regulatory networks through high-throughput functional approaches could aid in the identification of core and peripheral genes, which are currently conceptually rather than empirically defined (Boyle et al.,

2017). Understanding where sex-biased genes fit into these these regulatory networks should aid in further dissecting how sex-biased gene expression impacts phenotypic differences between males and females.

210 Sex bias in mRNA splicing

The studies described in this thesis have focused on characterizing and explaining sex differences in mRNA levels. While sex differences in functional protein levels are what ultimately determine the differences in cellular and physiological functions observed in mammals, studies have shown that at steady-state conditions, mRNA levels are a reasonable proxy for protein levels, with relatively little regulation at mRNA translation stage (Jovanovic et al., 2015; Li et al., 2014). However, another important source of variation both between and within species is mRNA splicing. Briefly, splicing is a highly conserved eukaryotic process that removes segments of transcribed mRNA, introns, leaving the segments that either form the ultimate protein or are required for regulation of mRNA stability of translation, exons

(Blencowe, 2006). Splicing can regulate mRNA levels, but one way in which splicing contributes orthogonal variation is by modifying the sequence of the translated protein, even when expression levels are held constant. Indeed, in Drosophila, alternative splicing of the Sxl gene is the central point of the sex determination pathway (Bell et al., 1988). A study of cis- regulatory variation between individuals suggested that an important fraction of disease risk was mediated through effects on mRNA splicing, independent of gene expression (Li et al., 2016).

Thus, there are reasons to believe that sex differences in mRNA splicing may contribute significantly to phenotypic sex differences in mammals.

Despite these suggestions and the generation of RNA sequencing datasets that allow for assessment of sex differences in splicing, there have been relatively few such studies in mammals. The first such study in mammals, studying sex bias in liver from primate species, found some evidence of sex-biased splicing (Blekhman et al., 2010). A recent study of sex- biased splicing using data from the GTEx Consortium found close to 6,000 sex-biased alternative

211 splicing events, also showing that genes encompassing these events overlap more than expected by chance with sex-biased gene expression (Karlebach et al., 2018). A subsequent question would then concern the degree to which sex-biased splicing, widespread in human tissues, is conserved or lineage-specific in a range of mammalian tissues, a question that could be addressed using the RNA sequencing data described in Chapter 3 of this thesis. Another important goal is to connect sex-biased splicing to phenotypic sex differences. This is arguably a more challenging objective than making the same link for sex-biased gene expression, in part due to the difficulty of interpreting the impact of differences in splicing on ultimate protein function. Nevertheless, transcriptome-wide associations studies analogous to the ones used in chapter four, to connect sex-biased gene expression to phenotypic sex differences, are also possible with splicing data; this represents one possible avenue forwards.

From tissues to cell-types to single cells

The studies described in Chapter 3 and Chapter 4 of this thesis are concerned with sex differences in gene expression measured in tissue samples, which constitute a heterogeneous mix of different cell-types. While measures were taken to control for the possibility that males and female tissues have different cell-type compositions, the results described nevertheless represent an average of sex-biased gene expression from any number of different cell-types, weighted by the contribution of each cell-type to the bulk mRNA pool. Thus, one important future direction is to deconvolve this averaged signal by assessing sex-biased gene expression in purified cell populations. Typically, cell populations have been defined by the expression of cell surface markers and then purified through cell sorting techniques, with RNA sequencing then performed on the pool of cells. This approach has been used to examine sex-biased gene expression in

212 various immune cell populations (Ecker et al., 2017; Schmiedel et al., 2018), but performing this systematically across the body is a major challenge due to the relative lack of cell-type markers in additional tissues.

While purifying pre-defined cell populations represents one possibility for deconvolving estimates of sex bias from tissue samples, recent improvements in single-cell RNA sequencing

(scRNA-seq) represent exciting alternatives. Broadly, performing scRNA-seq on a dissociated tissue or tissue sections would allow for the unbiased grouping of single cells through clustering

(Andrews & Hemberg, 2018), and subsequent analysis of sex differences either within or across clusters/cell-types. As mentioned above, this approach is particularly attractive in tissues with few marker-defined cell-types. There is a plethora of scRNA-seq methods (reviewed in

(Kolodziejczyk et al., 2015)) that trade off between the depth and breadth of profiling in any given cell and numbers of cells sequenced. These tradeoffs, as well as the high levels of both technical and biological noise in scRNA-seq data, pose challenges when discovering sex differences in gene expression, as many instances of sex bias are small in magnitude or could be present in only a subset of cells. As an example, a recent study that performed scRNA-seq using two different technologies in 20 different mouse tissues from both males and females found only a handful of sex-biased genes when analyzing single hepatocytes (Schaum et al., 2018), whereas studies using RNA from whole livers have found over 1,000 genes to be sex-biased (Clodfelter et al., 2006). An alternative approach with current technologies could involve first defined sex- biased genes from bulk tissue, and then using in situ, microscopy-based profiling methods such as MERFISH (Chen et al., 2015) or STARmap (Wang et al., 2018) to assess expression of this more limited set of genes. Despite these challenges, the increasing availability of very large

213 scRNA-seq datasets such as the Human Cell Atlas (Regev et al., 2017) and novel technologies should allow for the profiling of sex-biased gene expression at the single cell level.

References

Andrews, T. S., and Hemberg, M. (2018). Identifying cell populations with scRNASeq. Molecular Aspects of Medicine, 59, 114–122. Arnold, A., and Chen, X. (2009). What does the “four core genotypes” mouse model tell us about sex differences in the brain and other tissues? Frontiers in Neuroendocrinology, 30(1), 1–9. Bell, L. R., Maine, E. M., Schedl, P., and Cline, T. W. (1988). Sex-lethal, a Drosophila sex determination switch gene, exhibits sex-specific RNA splicing and sequence similarity to RNA binding proteins. Cell, 55(6), 1037–1046. Blekhman, R., Marioni, J. C., Zumbo, P., Stephens, M., and Gilad, Y. (2010). Sex-specific and lineage-specific alternative splicing in primates. Genome Research, 20(2), 180–189. Blencowe, B. J. (2006). Alternative Splicing: New Insights from Global Analyses. Cell, 126(1), 37–47. Boyle, E. A., Li, Y. I., and Pritchard, J. K. (2017). An expanded view of complex traits: from polygenic to omnigenic. Cell, 169(7), 1177–1186. Carruth, L. L., Reisert, I., and Arnold, A. P. (2002). Sex chromosome genes directly affect brain sexual differentiation. Nature Neuroscience, 5(10), 933–4. Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S., and Zhuang, X. (2015). Spatially resolved, highly multiplexed RNA profiling in single cells. Science, 348(6233), 1360–1363. Chen, X., McClusky, R., Chen, J., Beaven, S. W., Tontonoz, P., Arnold, A. P., and Reue, K. (2012). The number of x chromosomes causes sex differences in adiposity in mice. PLoS Genetics, 8(5), e1002709. Chen, X., McClusky, R., Itoh, Y., Reue, K., and Arnold, A. P. (2013). X and Y chromosome complement influence adiposity and metabolism in mice. Endocrinology, 154(3), 1092– 104. Clodfelter, K. H., Holloway, M. G., Park, S.-H., Waxman, D. J., Hodor, P., and Ray, W. J. (2006). Sex-Dependent Liver Gene Expression Is Extensive and Largely Dependent upon Signal Transducer and Activator of Transcription 5b (STAT5b): STAT5b-Dependent Activation of Male Genes and Repression of Female Genes Revealed by Microarray Analysis. Molecular Endocrinology, 20(6), 1333–1351. Ecker, S., Chen, L., Pancaldi, V., Bagger, F. O., Fernández, J. M., Carrillo de Santa Pau, E., Juan, D., Mann, A. L., Watt, S., Casale, F. P., Sidiropoulos, N., Rapin, N., Merkel, A., Stunnenberg, H. G., Stegle, O., Frontini, M., Downes, K., … Ouwehand, W. H. (2017). Genome-wide analysis of differential transcriptional and epigenetic variability across human immune cell types. Genome Biology, 18(1), 1–17. Itoh, Y., Kampf, K., O’Neill, R., Brown, J. D., Domadia, S., Arnold, A. P., and Mackie, R. (2015). Four Core Genotypes mouse model: localization of the Sry transgene and bioassay for testicular hormone levels. BMC Research Notes, 8(1), 69. Jovanovic, M., Rooney, M. S., Mertins, P., Przybylski, D., Chevrier, N., Satija, R., Rodriguez, E. H., Fields, A. P., Schwartz, S., Raychowdhury, R., Mumbach, M. R., Eisenhaure, T., Rabani, M., Gennert, D., Lu, D., Delorey, T., Weissman, J. S., … Regev, A. (2015).

214 Dynamic profiling of the protein life cycle in response to pathogens. Science, 347(6226). Karlebach, G., Veiga, D. F. T., Mays, A. D., Kesarwani, A. K., Danis, D., Kararigas, G., Zhang, X. A., George, J., Ananda, G., Steinhaus, R., Hansen, P., Seelow, D., Bizon, C., Boyles, R., Ball, C., McMurry, J. A., Haendel, M. A., … Robinson, P. N. (2018). The impact of sex on alternative splicing. BioRxiv, 1, 490904. Kolodziejczyk, A. A., Kim, J. K., Svensson, V., Marioni, J. C., and Teichmann, S. A. (2015). The Technology and Biology of Single-Cell RNA Sequencing. Molecular Cell, 58(4), 610– 620. Li, J. J., Bickel, P. J., and Biggin, M. D. (2014). System wide analyses have underestimated protein abundances and the importance of transcription in mammals. PeerJ, 2, e270. Li, Y. I., Geijn, B. Van De, Raj, A., Knowles, D. a, Petti, A. a, Golan, D., Gilad, Y., and Pritchard, J. K. (2016). RNA splicing is a primary link between genetic variation and disease. Science, 352(6285), 600–4. Liu, X., Li, Y. I., and Pritchard, J. K. (2018). Trans effects on gene expression can drive omnigenic inheritance. BioRxiv, 425108. Palaszynski, K. M., Smith, D. L., Kamrava, S., Burgoyne, P. S., Arnold, A. P., and Voskuhl, R. R. (2005). A yin-yang effect between sex chromosome complement and sex hormones on the immune response. Endocrinology, 146(8), 3280–5. Quinn, J. J., Hitchcott, P. K., Umeda, E. a, Arnold, A. P., and Taylor, J. R. (2007). Sex chromosome complement regulates habit formation. Nature Neuroscience, 10(11), 1398– 400. Regev, A., Teichmann, S. A., Lander, E. S., Amit, I., Benoist, C., Birney, E., Bodenmiller, B., Campbell, P., Carninci, P., Clatworthy, M., Clevers, H., Deplancke, B., Dunham, I., Eberwine, J., Eils, R., Enard, W., Farmer, A., … Participants, H. C. A. M. (2017). The Human Cell Atlas. ELife, 6, e27041. Schaum, N., Karkanias, J., Neff, N. F., May, A. P., Quake, S. R., Wyss-Coray, T., Darmanis, S., Batson, J., Botvinnik, O., Chen, M. B., Chen, S., Green, F., Jones, R. C., Maynard, A., Penland, L., Pisco, A. O., Sit, R. V, … investigators, P. (2018). Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature, 562(7727), 367–372. Schmiedel, B. J., Singh, D., Madrigal, A., Valdovino-Gonzalez, A. G., White, B. M., Zapardiel- Gonzalo, J., Ha, B., Altay, G., Greenbaum, J. A., McVicker, G., Seumois, G., Rao, A., Kronenberg, M., Peters, B., and Vijayanand, P. (2018). Impact of Genetic Polymorphisms on Human Immune Cell Gene Expression. Cell, 175(6), 1701–1715.e16. Seminog, O. O., Seminog, A. B., Yeates, D., and Goldacre, M. J. (2015). Associations between Klinefelter’s syndrome and autoimmune diseases: English national record linkage studies. Autoimmunity, 48(2), 125–128. Wang, X., Allen, W. E., Wright, M. A., Sylwestrak, E. L., Samusik, N., Vesuna, S., Evans, K., Liu, C., Ramakrishnan, C., Liu, J., Nolan, G. P., Bava, F.-A., and Deisseroth, K. (2018). Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science, 361(6400), eaat5691.

215