The Evolution of Gene Function in spp.

by

Adrian Verster

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Molecular Genetics University of Toronto

c Copyright 2014 by Adrian Verster Abstract

The Evolution of Gene Function in Caenorhabditis spp.

Adrian Verster Doctor of Philosophy Graduate Department of Molecular Genetics University of Toronto 2014

The sequence of the gradually evolves, and such changes can affect the function of the genes encoded within. Here I try to understand the causes and consequences of changes in gene function between related species, particularly in Caenorhabditis nema- todes. The goal of the first section of my thesis is to compare biological gene function of 1-1 orthologs by using loss of function phenotypes in C. elegans and C. briggsae. I did this by constructing and screening an RNA interference (RNAi) library in C. briggsae and comparing these RNAi phenotypes to those in C. elegans. This approach found 91 examples of orthologs with different in vivo functions, around 7% of the genes screened. For one of these examples, sac-1, I was able to explain the different biological function by a difference in molecular function, in this case by a difference in gene expression. Given the extremely high phenotypic similarity of these two species I hypothesize that many of these examples of different RNAi phenotypes likely represent cases of changes in gene function which have preserved the developmental phenotypes of the due to high levels of stabilizing selection, a model known as Developmental System Drift. In support of this, I found several cases where a phenotype is not present in the C. briggsae copy of an ortholog, but it is present in another related family member, suggesting that this related gene has taken over the function of the original gene in C. briggsae.

I also found that recently evolved genes are enriched for having different RNAi phe- notypes, which led me to consider what could explain their rapid rate of evolution of in

ii vivo function. By measuring the co-expression of novel genes to genes in different molec- ular pathways, I was able to construct a series of features which could accurately predict which novel genes become essential. An analysis of this model showed that co-expressions to ancient, essential pathways is highly predictive of a novel gene acquiring an essential function. These results supported a picture of how novel genes become integrated into cellular networks, and subsequently are preserved by evolutionary forces.

iii Acknowledgements

First and foremost I would like to thank my thesis advisor, Andy Fraser who guided me through these intellectually challenging years. You were always available to set me on track when I was lost. I am also grateful to my thesis committee (Gary Bader, Asher Cutter and Sabine Cordes) who were always willing to give me advice on a multitude of issues. You each provided an important perspective on the work, and I could not have completed this without your help. Members of the Fraser lab have provided tremendous help over the years, Arun Ra- mani, Nadege Pelte, Mark Spensley, Nattha Wannissorn, Victoria Vu, June Tan, Steve Van Doormaal, Tunga Chuluunbaatar and Mike Schertzberg. Many people helped me with difficult scientific issues, but I would particularly like to thank those who taught me computational biology, namely Leopold Parts, Azin Sayad, Traver Hart, Lee Zamparo and Carl de Boer. I could not have made it through without my friends and others who are close to me. Kathleen Cook, my lovely fianc´e, was always there to help me through difficult times. I will not exhaustively name you, but anyone else who was close to me, you know who you are. I would also like to acknowledge Marie-Anne Flix for the gift of the SID-2 expressing JU1018 transgenic C. briggsae line, and the pJH1774 plasmid containing mWormCherry was a gift from Arshad Desai’s lab. Finally, I would like to thank Google and Wikipedia for providing me with a deep well of knowledge that I repeatedly drew from when I was missing pieces of the puzzle.

iv Contents

1 Introduction 1

1.1 Mechanismsforgenomechange ...... 2

1.1.1 Mutation ...... 3

1.1.2 Selection...... 4

1.2 ExperimentalEvolution ...... 7

1.2.1 Howreproducibleisevolution? ...... 8

1.3 Change in gene function of orthologous genes...... 8

1.3.1 Case studies from quantitative genetics ...... 9

1.3.2 Case studies from the evolution of development ...... 10

1.4 Evolutionofnovelgenes ...... 14

1.4.1 Evolution of gene function for taxonomically restrictedgenes . . . 14

1.4.2 The evolution of novel genes by gene duplication ...... 16

1.4.3 The evolution of novel genes from non-coding DNA ...... 19

1.5 Pathwayevolution ...... 20

1.5.1 Convergent evolution in the C. elegans and C. briggsae sex devel- opmentpathways ...... 20

1.5.2 RNAinterferenceandArgonauteproteins ...... 21

1.6 High-throughput approaches to genome evolution ...... 23

1.6.1 How much of evolution can single genes explain? ...... 23

1.6.2 Geneexpressiondivergence ...... 24

v 1.6.3 Transcription factor binding divergence ...... 26

1.7 DevelopmentalSystemDrift ...... 28

1.8 Howmuchofmolecularfunctionisnoise?...... 31

1.9 Openquestionsandthesisgoals ...... 34

2 Evolution of ortholog gene function in Caenorhabditis spp. 37

2.1 Abstract...... 38

2.2 Introduction...... 38

2.3 Results...... 42

2.3.1 Construction and screening of the C. briggsae RNAi library . . . 42

2.3.2 Genes with different phenotypes are enriched for transcription fac- torsandrecentlyevolvednovelgenes ...... 48

2.3.3 Changes in gene function during DSD are often the result of pro- moterevolution...... 50

2.3.4 Ortholog pairs encoding more divergent protein sequences are more likelytohavedifferentRNAiphenotypes ...... 52

2.3.5 Orthologs may have different organismal roles due to changes in othergenes ...... 54

2.3.6 Conservation of function can be maintained at the level of gene familyandnotgenefamilymembers ...... 55

2.4 Discussion...... 57

2.5 Methods...... 63

2.5.1 Construction of the C. briggsae RNAi Library ...... 63

2.5.2 Manual screening of the C. briggsae RNAi Library ...... 63

2.5.3 FitnessAssay ...... 64

2.5.4 qPCR ...... 65

2.5.5 Examination of C. elegans and C. briggsae gene phylogenetic age 66

2.5.6 GFPStitchingandMicroscopy ...... 66

vi 2.5.7 Transgenicrescueexperiments ...... 66

2.5.8 Examination of C. elegans and C. briggsae protein similarity . . . 67

2.5.9 Predictability of phenotype differences ...... 67

2.5.10 Identifying functionally related genes ...... 68

3 Evolution of essential functions in novel genes 83

3.1 Abstract...... 84

3.2 Introduction...... 84

3.3 Results...... 86

3.3.1 ThenumberofTRGsinthewormgenome ...... 87

3.3.2 Novel genes preferentially form functional links with other novel genes...... 87

3.3.3 Essential TRGs are enriched for functional links ...... 88

3.3.4 Prediction of novel gene function based on co-expressionprofiles . 89

3.3.5 Novel genes contribute to drug resistance predictions ...... 91

3.4 Discussion...... 92

3.5 Methods...... 96

3.5.1 Finding Taxonomically restricted genes ...... 96

3.5.2 TRGfunctionaldata ...... 96

3.5.3 EssentialTRGclassification ...... 97

3.5.4 Featureimportanceanalysis ...... 98

3.5.5 Drugresistancepredictions...... 98

4 Discussion and concluding remarks 109

4.1 Summary ...... 109

4.2 Turnoverofgeneswithinpathways ...... 110

4.3 Transversalofadaptivepeaks ...... 113

4.4 Novelgenefunctionathighresolution ...... 115

vii 4.5 How do novel genes change functional networks? ...... 116 4.6 OverallSignificance...... 118

Bibliography 122

viii Chapter 1

Introduction

Evolution has generated the incredibly diverse array of life forms which inhabit the planet. Small changes in the genetic material, DNA, are passed down to offspring and these changes can eventually become fixed in the population. Improvements in genomic technologies have completely revolutionized our understanding of evolution. Changes in the heritable material are now measurable, and thus, we can measure the changes that occur during evolution. For example, the old view of the tree of life was that of 5 kingdoms of life, Monera (bacteria), Protists, Plants, and Fungi (Whittaker, 1969), which is based on our (naive) understanding of the biology of different organisms. However, the first attempts to measure the heritable material (rDNA sequences) seriously challenged this: they found that different types of bacteria were as divergent as bacteria are from eukaryotes, and this led to a classification system of eukaryotes, bacteria, as well as a third group, archeabacteria (Woese and Fox, 1977). Since then, similar molecular taxonomics has changed our view of the eukaryotic tree of life as well. The emerging view is of 5 major groupings: Unikonts which include fungi and animals, Plantae which include land plants and algae, Excavates, Cercozoa and Chromalveolates (Keeling et al., 2005); the exact relationships between these groups remain unclear. In parallel to our ability to measure heritable changes in DNA, we have also acquired

1 Chapter 1. Introduction 2 the technology to characterize the function of the genes encoded within the DNA se- quence. Gene function exists at two levels; consider for example the Cyclin Dependant Kinases (CDKs). We can understand the biological function of a gene by characterizing mutant phenotypes (for example, CDKs have a necessary function for cell cycle progres- sion), and we can also understand the molecular function of a gene by doing biochemical experiments (for example, CDKs have a kinase function). Evolutionary biology is very interested in understanding the phenotypic differences between related species, which are often the result of adaptive evolution. Many adaptive evolutionary changes can be understood in terms of changes of molecular gene function which also affect the biolog- ical gene function. For example, adaptive loss of spines in benthic forms of sticklebacks has occurred because of evolution at the pelvic enhancer of the Pitx1 homeobox gene (Chan et al., 2010). In this case, changes in the molecular regulation of the gene have eliminated a biological function in pelvic spine development, which in turn has led to adaptive evolution. The goal of this thesis is to try and understand the causes of consequences of evolu- tionary differences in gene function between related species. I will address a small part of this by comparing loss of function phenotypes in Caenorhabditis spp. and trying to explain which novel genes acquire essential functions by using co-expression interactions to different molecular pathways.

1.1 Mechanisms for genome change

In the pre-genome era many evolutionary biologists thought that selection was the major force causing evolutionary change. However this was before we had any molecular data, and this view changed with the first argument which would become the Neutral Theory of Molecular Evolution; at the molecular level, there was far too much change and that it must be mostly selectively neutral (Kimura, 1968). Chapter 1. Introduction 3

Evolution is the process of genetic alleles changing in frequency through time, and the mechanisms by which this can occur are mutation, genetic drift, recombination and selection. While they are all important, the topic of this thesis is the evolution of gene function, and thus I will focus on mutation and selection which are the most likely processes to be consequential for changes in gene function.

1.1.1 Mutation

One of the major mechanisms for altering genome function is mutation. The best under- stood mechanism for this is single nucleotide errors induced during replication or repair of DNA. One experimental technique for measuring this is mutation accumulation lines, in which a single individual is propagated every generation. Application of this method has revealed that the rate of spontaneous point mutations is around 10−8 to 10−9 per base per generation in worms (Denver et al., 2004), yeast (Lynch et al., 2008) and Ara- bidopsis (Ossowski et al., 2010). There is increasing evidence that the mutation rate can be reduced by natural selection, and that the lower limit is determined by the strength of selection relative to drift in different populations (Lynch, 2010a). However, not all muta- tions are point mutations: there are also insertions, deletions and the recently discovered Copy Number Variants (CNV). In human populations these sorts of mutations occur regularly with a handful of insertions/deletions in every offspring (Lynch, 2010a). Addi- tionally nine novel CNV greater than 100kb were identified in a study of 772 offspring (Itsara et al., 2010). Although most novel mutations are neutral with respect to gene function, some mu- tations can have serious effects on the function of genes. Many Mendelian diseases are caused by loss of function point mutations in protein coding genes with critical functions for organism physiology (Hamosh et al., 2005). Other types of mutation can also cause disease. For example, CNV deletions overlapping with the chromodomain containing CHD7 gene are associated with CHARGE syndrome, whose patients experience congen- Chapter 1. Introduction 4 ital deformations in the heart, ear and retina (Vissers et al., 2004). There is growing evidence that large de novo CNVs are associated with complex diseases such as autism (Sebat et al., 2007) and schizophrenia (Xu et al., 2008). Negative selection is thought to be far more prevalent in nature than positive selection, which reflects the fact that mutations are much more likely to be loss of function than gain of function, since it is easier to break something than to create, which I will discuss in more detail below. One consequence of random mutation is that genes are continuously acquiring dele- terious mutations which must be purged by natural selection. For example, any de novo mutation that causes sterility will be purged from the population by natural selection. Such loss of function mutations cause an overall reduction in the average fitness of the population, a phenomenon called the mutational load (reviewed in Agrawal and Whit- lock (2012)). When the effects of selection are weak, populations can acquire a large number of mildly deleterious mutations, which collectively lower the overall fitness and thus health of a population (Lynch, 2010b). There is evidence that they can be pulled to high frequencies by genetic hitchhiking; consider the TYR polymorphism (S192Y) which has been selected and is associated with a loss of freckles, while another polymor- phism (R402Q) is deleterious and is associated with minor ocular albinism (Chun and Fay, 2011). It is possible that deleterious mutations fix in non-outcrossing species, which can eventually lead to their extinction. Consider C. elegans which has recently evolved hermaphroditism; it has been suggested that C. elegans will go extinct within a million years (Loewe and Cutter, 2008).

1.1.2 Selection

At the molecular level, by far the most common form of selection is negative selection, which acts to preserve established function from the constant stream of random mutation to the genome. Considering that the human genome sequence revealed far fewer genes and far more transposable elements than expected (Lander et al., 2001), an ongoing Chapter 1. Introduction 5 question has been how much of the human genome sequence is under negative selection. Based on whole genome alignments to the mouse genome, Chiaromonte et al. (2003) estimated that 19.2% of 50 bp windows were under selection, and since 27.1% of the human genome is covered by the alignment, this analysis gives an estimate of around 5% of the human genome being under selection. Siepel et al. (2005) used a more sophisticated statistical model, a phylogenetic Hidden Markov Model and found that 3%-8% of the human genome is conserved. Studies which consider conservation at a level above that of sequence conservation, for example consideration of DNA topology, have estimated closer to 12% of the human genome is under negative selection (Parker et al., 2009). These results have been criticized, however, since they do not take into consideration the evolutionary turnover which can occur in non-coding sequence (Ponting and Hardison, 2011); I will discuss these ideas in more detail in the section on Developmental System Drift. Although it is less common, there is considerable interest in finding instances of posi- tive selection since these can illuminate the molecular basis behind adaptation. The most widely used method looks at coding sequence by comparing the ratio of non-synonomous changes to that of synonomous changes (Ka/Ks). An equal ratio in a given gene would indicate neutral evolution, while an elevated number of non-synonomous changes would occur after a period of positive selection. This approach revealed that across many lin- eages the genes experiencing positive selection are those involved in the immune system and reproduction (Yang and Bielawski, 2000). Rapid evolution of proteins in the im- mune system likely represents the legacy of an ongoing evolutionary arms race between pathogen and host. Sometimes selection can only be detected in part of the gene, for example, detailed studies of co-evolution between the mouse TfR1 receptor and the viral GP1 protein suggest that only a few residues on the surface are actively changing under positive selection, and that these residues are divorced from the iron uptake function of TfR1 (Demogines et al., 2013). Similarly, rapid evolution of reproductive genes is Chapter 1. Introduction 6 thought to be due to a different type of evolutionary arms race - that between sperm cells in a process known as sperm competition, or other processes like sexual selection or sexual conflict (Swanson and Vacquier, 2002). One gene of great interest which shows positive selection based off the Ka/Ks metric is FoxP2, which may have contributed to the emergence of language in modern humans. This gene was originally identified as the genetic cause of a developmental disorder of which patients had pronounced problems with language (Lai et al., 2001). When the human sequence of this gene was compared to other species of great apes, it was found that there had been positive selection on the human branch from a conserved ancestor, suggesting that positive selection on this gene was involved in human language capabilities (Enard et al., 2002). Ka/Ks methods are limited for a number of reasons, but a primary one is that they can only test for selection within coding sequences. Another approach took advantage of multiple species alignments and looked for regions with a statistically elevated number of substitutions along the human branch. Several of these regions were shown to be pu- tatively functionally important for aspects of human specific biology, for example HAR1 is an RNA gene expressed in the developing neocortex and is hypothesized to have a role in the evolution of human brain function (Pollard et al., 2006). Another region, HAR2, is an enhancer: the human version show specific activity in the developing hindlimb in a transgenic mouse assay and may contribute to the evolution of bipedalism in modern humans (Prabhakar et al., 2008). The publication of the Neutral Theory of Molecular Evolution left many evolutionary biologists believing that most molecular substitutions were neutral. This has been chal- lenged recently with several results from a statistical method developed by McDonald and Kreitman (1991), originally used at the Adh gene. This method compares the ratio of fixed polymorphisms between species to polymorphic polymorphisms within species for non-synonymous and synonymous SNPs; the rationale is that positively selected polymor- Chapter 1. Introduction 7 phisms will spread through the population rapidly and will lead to many fixed differences but little standing variation. The most dramatic results using this method came from a study of 91 genes in Drosophila, which alleged that 95% of fixed non-synonomous dif- ferences were positively selected, though with very small selective coefficients (Sawyer et al., 2007). A genome scale analysis of polymorphisms in Drosophila at over 10,000 genes suggested that 54% of non-synonomous polymorphisms had been fixed by positive selection (Begun et al., 2007). However, these results remain controversial, however, be- cause when they are applied to other lineages such as yeast, humans or Arabidopsis, they detect primarily neutral evolution (Nei et al., 2010).

1.2 Experimental Evolution

One of the methods that experimental biologists have been using to make a connection between genome change and phenotype evolution is experimental evolution in which experimenters passage an organism for long periods of time and measure how its fitness changes. By using DNA sequencing, biologists can explain these changes in fitness in terms of functional alterations in the genome, and often they can understand what has happened at the resolution of changes in individual genes. One of the longest running studies is the evolution of 12 replicate populations of E. coli in laboratory conditions, which has been in progress for 20 years (Barrick et al., 2009). During this experiment E. coli fitness has increased substantially relative to their ancestor (Lenski and Travisano, 1994). There are on average less than 50 mutations in the genome of these strains after propagation for 20,000 generations (Barrick et al., 2009). Many of the same mutations occurred in different experimental lines which manifests at the level of similar changes in gene expression between these lines (Cooper et al., 2003). The experimental evolution experiments in E. coli have suggested that the majority of mutations which occur are adaptive (Barrick et al., 2009), but similar experiments in Chapter 1. Introduction 8 other organisms such as S. cerevisiae support a different picture. In yeast populations which were propagated for 1000 generations, Lang et al. (2013) found that mutations tend to move through populations in ‘cohorts’, which is suggestive of genetic hitchhiking.

1.2.1 How reproducible is evolution?

A famous debate in evolutionary biology surrounds the issue of “rewinding the tape”; if the whole of evolutionary history were to run again would we get the same result? (Gould, 1989). Experimental evolution of replicate populations can shed some light on this question. In the experimental evolution of E. coli detailed above, there are twelve replicates and certain events only occur in a subset of these populations. For example, the hypermutability phenotype arises in only seven cultures. Certain adaptations, such as the evolution of citrate metabolism, occurred in only a single culture. This evolu- tionary transition was contingent upon another unknown mutation (Blount et al., 2008), suggesting that certain evolutionary events are contingent on other stochastic events and thus that certain evolutionary transitions are fundamentally random occurrences. Conversely, some evolutionary changes are very reproducible, for example, the conver- gent loss of stickleback pelvic spines in different populations in similar lake environments (Coyle et al., 2007). It is unclear what the general picture is, but small numbers of evolutionary transitions which are contingent upon other random mutations essentially guarantee that aspects of evolutionary history would come out differently if the process was repeated.

1.3 Change in gene function of orthologous genes

The most important functional part of the genome is protein or RNA encoding genes. In many cases changes in the function of genes results in clear effects on the phenotype of the organism. These examples come to us from quantitative genetics and from studies of Chapter 1. Introduction 9 the evolution of development, and I will discuss each in turn here.

1.3.1 Case studies from quantitative genetics

There has been substantial interest in mapping polymorphic population traits to genomic loci (Quantitative Trait Loci - QTL). The locations of many of these traits have been narrowed down even further to a specific nucleotide and the gene whose function it alters. Since these population traits may eventually become fixed, they represent the first step in the evolution of gene function between related species. One of the first polymorphisms in C. elegans that was mapped to a causative gene is the plugging phenotype. In some natural isolates males deposit a copulatory plug after mating, which reduces the ability of subsequent males to successfully mate with the hermaphrodite (Barker, 1994). Based on QTL mapping between non-plugging Bristol and 2 separate plugging isolates, researchers were able to discover a loss of function mutation in a gene plg-1 which had been disrupted by the insertion of transposable element (Palopoli et al., 2008). Plg-1 encodes a large glycoprotein, which is expressed in the tail cells of the male and likely forms a significant structural component of the copulatory plug. Hermaphroditism has evolved recently in C. elegans (Cutter et al., 2008), and evolution in the plg-1 gene is thought to have arisen due to reduced selection on male function (Palopoli et al., 2008). Another mapped polymorphism in C. elegans is involved in drug susceptibility. Sensi- tivity to the drug avermectin was mapped to glc-1, a glutamated-gated chloride channel, whereby a four amino acid deletion in the gene confers resistance (Ghosh et al., 2012). This allele is thought to be maintained in the population because of balancing selection, as the deletion causes a sensitivity to the bacterium S. avermitilis. Possibly the most well known example of the evolution of gene function from quan- titative genetics in C. elegans is the gene npr-1. Several strains of C. elegans exhibit a social feeding phenotype, whereby they aggregate into groups on bacterial lawns. This Chapter 1. Introduction 10

phenotype is associated with a single nonsynonomous mutation in the neuropepide Y receptor npr-1, which phenocopies loss of function mutations (de Bono and Bargmann, 1998). The mechanism by how npr-1 affects this is thought to be by sensing the hy- pooxygenic environment, C. elegans prefers which is found in the social aggregations (Gray et al., 2004). This variation is thought to be adaptation to laboratory conditions and not a natural polymorphism (McGrath et al., 2009). The polymorphism in npr-1 has very pleiotropic effects on the phenotypes of the animals, for example it also affects ethanol tolerance of the animals (Davies et al., 2004), as well as pathogen susceptibility (Reddy et al., 2009). While npr-1 is a major effect gene which has very pleiotropic effects on the behavior and physiology of C. elegans, other genes with smaller effects have also been discovered for phenotypes that it affects. For example, QTL mapping for the rate of food patch abandonment identifies a peak around npr-1, and finer mapping with near introgression lines (NILs) was able to discover that a second gene was also contributing to this trait, tyra-3 (Bendesky et al., 2011). Tyra-3 contains a number of different polymorphisms and cross isolate rescue experiments showed that mutations in the promoter of tyra-3 were responsible for the difference in food patch leaving behaviour. Another example is that social aggregation behaviour is influenced by non-synonomous substitutions in exc-1 (Bendesky et al., 2012). These represent some of the examples that experimental biologists have identified of how the variation which exists in a population affects gene function at the molecular or biological level. There are also a number of examples of fixed differences between species, particularly affecting developmental pathways which I will survey in the next section.

1.3.2 Case studies from the evolution of development

Studies of the evolution of development have provided numerous examples to illustrate the evolution of gene function. This viewpoint attempts to understand how evolutionary Chapter 1. Introduction 11

change of developmental genes and pathways can lead to the wide diversity of form observable in nature. One of the successful lines of inquiry in the field has been on the evolution of pig- mentation patterns in different species of Drosophila, for example at the yellow gene (Gompel et al., 2005). Yellow has pleiotropic functions and it is thought that protein coding mutations within the gene could be deleterious. Transcription factor binding sites present in D. biampes but not D. melanogaster drive yellow expression within the wing, which result in adult wing spots. This mechanism has been proposed to explain the diversity of wing spots that are observed throughout Diptera (Wittkopp et al., 2003). Evolution of gene function during the evolution of fly pigmentation patterns has occured in many more genes such as tan (Jeong et al., 2008), bab (Williams et al., 2008), and ebony (Rebeiz et al., 2009). Another example of changes in orthologous gene function is in the evolution of tri- chome patterns. Different species of Drosophila have lost dorsal trichomes and a phy- logenetic analysis shows this has occurred in three different lineages. The evolutionary loss of expression of Shavenbaby, a transcription factor which is genetically required for trichome development in D. melanogaster, is highly correlated with the loss of such trichomes in other species of Drosophila (Sucena et al., 2003). Furthermore, three en- hancers were identified which control the expression of Shavenbaby in D. melanogaster, of which the orthologous regions in D. sechellia showed modified expression patterns when tested transgenically (McGregor et al., 2007). Finally, based off interspecies recombinant mapping between D. sechellia and D. mauritiana, the researchers found that the pheno- typic difference mapped to a region upstream of Shavenbaby which overlaps with these enhancers. The clearest example of a change in gene function affecting developmental pathways in Caenorhabditis spp. is the changes of salt tolerance due to evolution in the lin-48 gene. C. elegans has a much greater tolerance of high salt conditions than related Chapter 1. Introduction 12

species and this change correlates with a difference in the morphology of the excretory duct (Wang and Chamberlin, 2004). Lin-48 mutations in C. elegans phenocopy the C. briggsae duct cell morphology and salt tolerance phenotypes and rescue experiments with the C. elegans regulatory sequence combined with coding sequence from other species indicate that the evolution in this gene occurs because of changes in the promoter (Wang and Chamberlin, 2004). Promoter bashing experiments have detected a piece of DNA with several predicted ces-2 binding sites in the C. elegans promoter which are not present in the C. briggsae promoter, but transgenic experiments into C. briggsae suggest that changes in the trans expression environment are also important for the change in expression between species (Wang and Chamberlin, 2002). A final case study that we will consider is of the evolution in Pitx1 between marine and freshwater sticklebacks. A pelvic spine is present in marine stickleback species that has been reduced in several different populations of freshwater sticklebacks (Bell, 1987). This phenotype difference is thought to be adaptive either due to change in the preda- tory environment (Reimchen, 1980) or calcium availability (Giles, 1983). QTL mapping between marine and freshwater fish have found a large QTL which accounts for 13.5% - 43.7% of the variation in pelvic size (Shapiro et al., 2004). Using a candidate gene ap- proaches the authors identified Pitx1 as the causative gene behind the major QTL, given that it is known to have a role in hindlimb development in model vertebrate systems (Szeto et al., 1999). QTL mapping in other stickleback isolates has shown that Pitx1 is likely the likely functional gene in multiple populations which have lost pelvic spines (Coyle et al., 2007). Different sticklebacks do not have non-synonomous changes in their Pitx1 coding se- quence (Shapiro et al., 2004), and thus the functional difference is likely to be regulatory. GFP promoter bashing experiments found a region upstream of Pitx1 that drove expres- sion in the pelvic regions and that is absent in isolates of stickleback with reduced pelvic spines (Chan et al., 2010). Transgenic expression experiments using this enhancer and Chapter 1. Introduction 13

Pitx1 were able to create pelvic spines in pelvic reduced isolates. Finally, the authors were able to find evidence of genomic selection at this locus based on an excess of derived alleles. Evolution at the Pitx1 locus provides us with an excellent model for understanding how adaptive evolution can occur because of positive selection in the expression pattern of a single gene. It has shown us that in certain circumstances there are specific genes that are mutated to cause large phenotypic changes for the organism, and that such mutations can occur repeatedly when different populations exist in similar environments. Sean Carroll has argued extensively that the evolution of gene expression is a ma- jor force in the evolution of development because many of the developmental regulators controlling changes in form are highly pleiotropic and evolutionary changes in their regu- latory regions allow for a fine tuning of their function without disrupting other biological processes (Carroll, 2008). There has been some pushback to the idea that regulatory evo- lution is the dominant mechanism in genomic adaptation (Hoekstra and Coyne, 2007), since there are several examples of protein coding evolution having a significant effect on the evolution of forms. For example, in D. melanogaster Ultrabithorax has the capacity to repress Distall-less expression in the abdomen of flies to prevent legs from developing, but the homologous protein from the onychophora phylum does not have this capacity (Grenier and Carroll, 2000). This difference arises from expansion of the QA motif next to the homeodomain (Galant and Carroll, 2002; Ronshaugen et al., 2002). Another exam- ple of protein evolution having an important role in the evolution of form is the MC1R gene. MC1R is a receptor for melanocortin and mutations which reduce its function can explain part of the variation in pigmentation color in mice (Hoekstra et al., 2006). The same gene is at least partially responsible for variation in pigmentation in reptiles (Rosenblum et al., 2004) and catfish (Gross et al., 2009). Like many of the competing models in the biological sciences, both regulatory and coding change are important evolutionary forces. Clearly there are examples of both Chapter 1. Introduction 14 being required for adaptive evolutionary change.

1.4 Evolution of novel genes

Sequencing the yeast genome revealed that around one third of identified genes had no ortholog in any other known species and thus were called ‘orphan’ (Dujon, 1996). Orphan genes have been found in many different in multiple lineages, however their def- inition is problematic because it depends on the density of published genome sequences. Consider that with the completion of genomes of other species in the Saccharomyces genus many of the originally identified orphan genes were found to have homologs. For this reason, a more descriptive name was proposed: Taxonomically Restricted Gene (TRG), which reflects the fact that many genes exist in restricted lineages and are likely impor- tant for lineage specific biology, even if they exist in multiple genomes (Wilson et al., 2005). TRGs are somewhat enigmatic because we are often clueless to their function, but they could be very important for the evolution of lineage specific traits.

1.4.1 Evolution of gene function for taxonomically restricted

genes

A current model for the evolution of development is that a core set of developmental genes are re-used in different ways to produce the wide array of animal forms we observe today. Early studies supported this by demonstrating that there had been little increase in the number of Hox genes from the onychophoran outgroup to the radiation of arthropod species (Grenier et al., 1997). This supported a model in which core developmental genes are reused through novel cis-regulatory modules (Carroll, 2008). Despite this, there are many examples of how novel genes can be important for species specific biology. TRGs are hypothesized to play an important part in lineage specific biology, but they are very difficult to study because of the lack of homology to genes in other lineages. For Chapter 1. Introduction 15

example, the Sphinx gene has emerged within the past 2-3 million years in the Drosophila melanogaster genome from a restrotransposition of the ATPS-F gene with another exon and intron to create an RNA gene (Wang et al., 2002). The function of this gene would remain opaque except that mutations in Sphinx cause defects in proper mating behavior, with an increase in male-male courtship behaviors (Dai et al., 2008). Another example of the biological importance of such genes comes from species of Hydra. Nematocytes are cell types specific to Cnidarians which are used to sting and capture their prey; based on a large scale expression study of these cells in Hydra magnipapillata, 80% of genes expressed specifically in Nematocytes were found to not have homologs outside the phylum (Hwang et al., 2007). A more specific example from Cnidarians is Hym301, which is found in H. oligactis and is expressed in 2 long tentacles which develop before the rest. Transgenic expression experiments in H. vulgaris phenocopy the developmental pattern, arguing for an important role in tentacle development (Khalturin et al., 2009). There are a number of similar examples in model organisms. BSC4 has evolved recently in the S. cerevisiae genome and it is undetectable in related species of Saccha- romyces by sequence searches or southern blot. Based on synthetic lethal interactions and its coexpression with members of the DNA damage pathway, it is thought to be a novel member of that pathway (Cai et al., 2008). Another example is the mouse gene Poldi, which has arisen within the past 3.5 million years from non-coding DNA and is expressed in testis. Poldi knockout mice display a reduced sperm motility phenotype (Heinen et al., 2009). TRGs in the D. melanogaster genome appear to also be expressed preferentially in the testis and may play a role in the evolution of germ line specific functions (Levine et al., 2006). Similar testis biased expression patterns for evolutionary novel genes were identified in D. yakuba ESTs (Begun et al., 2007). Recently evolved genes are also thought to be important for human evolution. By looking for H. sapiens genes which are not present in the chimpanzee genome and sup- ported by proteomic databases, Knowles and McLysaght (2009) were able to detect three Chapter 1. Introduction 16

genes which are specific to the human genome. The sequence of the syntenic chimpanzee DNA to these genes is highly similar, but there are inactivating features, such as a pre- mature stop, which prevent them from being genes. These inactivating features are found in other species such as gorilla and thus are ancestral; the derived removal of these in- activating features is required for the creation of these novel genes. Another example of a novel gene in the human genome is FLJ33706, which arose from non-coding DNA along with the insertion of an Alu sequence to form an RNA gene which is enriched in expression in the brain and associated with nicotine addiction in human populations (Li et al., 2010). Novel genes are being created throughout evolution and in some cases it is possible to understand the function they have acquired. Next I will consider what sort of mechanisms can create novel genes.

1.4.2 The evolution of novel genes by gene duplication

One significant source of novel genes is the process of gene duplication, which had been hypothesized to be of great importance since long before genome sequences were available (Ohno, 1970). After many genomes were sequenced, scientists were surprised by how many duplicate genes were present: for example nearly 10% of genes in C. elegans were found to be duplicates (Lynch, 2000). Duplicated genes can occur in tandem because of unequal crossing over during meiosis, which results in a duplicated pair sitting next to one another. An alternative mechanism is by reverse transcription and transposition back into the genome. Since the processed gene copy is what ends up being duplicated, this method has been thought to produce pseudogenes. However, it has recently been discovered that reverse transcribed sequence can be used for more innovative forms of novel gene evolution, such as gene fusion (reviewed in Kaessmann (2010)). Once gene duplicates are created there are two possible outcomes for the evolution of their function. One possibility is neofunctionalization in which one copy retains the Chapter 1. Introduction 17

ancestral function, while the other copy adopts a novel function. An example of this is the evolution of trichromatic vision in H. sapiens. The human red and green opsins are extremely similar in amino acid sequence and are thought to be tandem duplicates on the X-chromosome (Nathans et al., 1986). Despite the relatively small difference in amino acid sequence, the red and green opsin proteins have a difference in peak absorbance of about 30nm (Bowmaker, 1981), which gives us the capacity to distinguish red from green. An alternative model is that of subfunctionalization in which a pleiotropic ancestral gene has its separate functions partitioned into different child genes. An example of this is ZAG1 and ZMM2 in maize which are homologous to AGAMOUS in Arabidopsis, which is involved in the development of both carpels and stamens (Coen and Meyerowitz, 1991). In maize this function appears to be subfunctionalized: ZAG1 is expressed in developing carpels, and knockout phenotype confirm a necessary function in that tissue (Schmidt et al., 1993), while ZMM2 is highly expressed in stamens and likely required for the development of that tissue (Mena et al., 1996). There are two models to explain how selection operates during subfunctionalization, the first argues for neutral evolution and is called the duplication, degeneration, comple- mentation (DDC) model. Under this model, selection is relaxed on the duplicates and neutral mutations which degenerate different aspects of each duplicate can become fixed, until both are required for the full function of the ancestral gene (Force et al., 1999). Such a mechanism is thought to explain the subfunctionalization of ZAG1 and ZMM2 which I described above. The alternative selective model for subfunctionalization of gene duplicates is the es- cape from adaptive conflict model. In this model, a multifunctional protein can have each function optimized separately in the duplicated genes and such changes are selected for; it was originally proposed to explain an example from yeast (Hittinger and Car- roll, 2007). In Kluyveromyces lactis (a yeast species which diverged before the whole Chapter 1. Introduction 18 genome duplication), GAL1 is a multifunctional protein which acts as a coinducer to the galactose pathway, as well as the first enzyme in the pathway, a galactokinase. In species which have experienced a whole genome duplication such as S. cerevisiae, GAL1 has the galactokinase activity, while GAL3 has the coinducer activity. In S. cerevisiae there has been apparent adaptation of each gene for their specific function; GAL1 now induces over 1000 fold in response to galactose, compared to 3-5 fold in K. lactis. This has occurred because of evolution of GAL4 binding sites in the promoter of these genes (Hittinger and Carroll, 2007). In this example the pleiotropic ancestral gene has had each of its functions optimized post duplication. There are two other known examples of this model: divergence of the dihydroflavonol-4-reductase gene into separate enzymatic activities in Ipomoea (Des Marais and Rausher, 2008), and the split of sialic acid binding activity and ice binding activity in the evolution of the antifreeze gene in an Antarctic zoarcid fish (Deng et al., 2010). An intriguing mechanism with significant consequences for gene duplication is where the entire genome duplicates, producing a duplication of every single gene. The first serious evidence for such a model came with the sequence of the Kluyveromyces waltii genome, which contained sequences that were syntenic to two separate regions of the S. cerevisiae genome, unlike the 1-1 relationship found for genomes of most closely related species (Kellis et al., 2004). In the post duplication lineage many duplicate pairs reverted to single copies, likely 50% before the divergence of S. castelli (Scannell et al., 2006). Yeast is not the only lineage in which there is evidence for whole genome duplication, a detailed analysis of the growing number of genomes is providing more evidence. There has been a duplication in the Arabidopsis lineage (Bowers et al., 2003), two rounds in the H. sapiens lineage (Dehal and Boore, 2005), and at least three rounds in the ciliate Paramecium tetraurelia (Aury et al., 2006). Currently it is thought that Drosophila and C. elegans have not experienced a whole genome duplication, but it is likely a recurring phenomenon throughout evolution. Chapter 1. Introduction 19

1.4.3 The evolution of novel genes from non-coding DNA

Another model for the creation of genes would be the transition of non-coding DNA into a bona fide gene by the acquisition of genic features such as transcription and translation start/stop sites. Searches of the yeast genome find 267,000 ORFs (start codon to in frame stop codon), far more than the legitimate 6,000 genes. The “proto-Gene” model argues that these ORFs exist in a large continuum of gene-like entities - there is some evidence that a subset of these proto-genes are translated based on ribosome footprinting data, representing intermediates in the continuum between intergenic and genic sequence (Carvunis et al., 2012). The three novel genes in the human genome I discussed earlier are likely created through such a mechanism, since the crucial mutations resulting in them becoming genes have removed “disablers” such as eliminating a premature stop in C22orf45 (Knowles and McLysaght, 2009). These different mechanisms for the evolution of novel genes have likely had different functional outcomes. When a gene evolves from gene duplication, it will contain domains which give it an established molecular function that it can evolve from; for example FOG-2 emerged with an F-box domain (discussed in the next section) which allowed it to participate in protein-protein interactions while its derived biological function in sex development was being established. However, if a non-coding sequence turns into a gene then it is unlikely to have any existing protein domains in it, and no previously existing molecular function can form the base for its evolution. Chapter 1. Introduction 20

1.5 Pathway evolution

1.5.1 Convergent evolution in the C. elegans and C. briggsae

sex development pathways

Since genes fit into molecular pathways, we could also understand their evolution in terms of how they affect these pathways. An excellent example of this is sex determination in Caenorhabditis since three separate species have made the transition to a hermaphrodite- male system from a male-female ancestor. This transition is thought to have occurred recently based an analysis of when the relaxation in selection occurred in those species (Cutter et al., 2008). In C. elegans males, male germ cell identity comes from the soma which secretes HER- 1; this binds to and inactivates the receptor TRA-2, which induces a signalling cascade to initiate sperm production (Kuwabara and Kimble, 1995). In order for an ancestral female to become a hermaphrodite, a pathway must evolve to downregulate TRA-2 in females. In C. elegans hermaphrodites TRA-2 is downregulated post-transcriptionally by the RNA binding protein GLD-1, which complexes with FOG-2 (Nayak et al., 2005). FOG-2 is an F-box protein which is found only in C. elegans and thus its recent creation could be central to the mechanism which C. elegans has become a hermaphrodite. In C. briggsae, evolution of hermaphroditism does not depend on fog-2 since it is not present in the C. briggsae genome. Instead, SHE-1, which is a different F-box protein, acts genetically upstream of tra-2, but does not physically bind to GLD-1 (Guo et al., 2009). Thus the mechanism by which C. elegans and C. briggsae repress TRA-2 is distinct, the only common theme is the existence of an F-box protein. The idea that the evolution of hermaphroditism only required the downregulation of tra-2 has been partially recapitulated by experimental biologists in the male-female species C. remanei. Here, RNAi of tra-2 produces pseudohermaphrodites in some animals (Baldi et al., 2009). However, such an evolutionary transition must have required changes Chapter 1. Introduction 21

in more genes, since such experiments also required inactivation of swm-1 in order to activate the sperm cells and produce viable hermaphrodites.

1.5.2 RNA interference and Argonaute proteins

RNA interference (RNAi) is a widely used technique to substantially reduce the expres- sion of a gene of interest, simulating loss of function. It is of interest to this thesis because it is the central experimental technique that I use and because related pathways are experiencing ongoing evolutionary change in Caenorhabditis. For these reasons I will briefly discuss it here. RNAi was originally observed in plants when researchers were attempting to increase the amount of pigmentation in Petunias by transgenically expressing chalcone synthase, a central gene in the pathway (Van Blokland et al., 1994). However, instead of increasing the amount of pigmentation, it was reduced - a phenomenon termed co-suppression. Although at the time the mechanism behind this was not understood, this is now known to be caused by RNA interference (RNAi). In C. elegans RNAi was originally observed in the form of double stranded RNA complementary to the gene of interest, which when experimentally introduced produced a robust knockdown in expression level (Fire et al., 1998). The authors remarked that there was likely an amplification mechanism involved, since only a few molecules of RNA were required to elicit knockdown. Using this system, a number of genetic mutants were identified (Tabara et al., 1999), and along with the purification of Dicer from Drosophila S2 cells (Hammond et al., 2000), led to a characterization of the mechanism behind RNAi. In outline, double stranded RNA is cleaved into small segments of RNA by the Dicer complex, this signal is amplified by RNA-dependant RNA polymerases, and then used by the RNA induced silencing complex (RISC) to cleave complementary endogenous RNA molecules (Grishok, 2005). The biological role of RNAi and Argonaute genes in C. elegans is thought to be an Chapter 1. Introduction 22

anti-viral or anti-transposable element system. There is only 1 virus known to infect C. elegans naturally, which is an RNA nodavirus, and mutants in critical RNAi machinery, such as rde-1 /rde-2 /rde-4, are necessary for keeping the viral load and phenotypes low (F´elix et al., 2011). There are 25 Argonaute genes in the C. elegans genome and many of these belong to the worm-specific ago (WAGO) family not found in other lineages (Buck and Blaxter, 2013). This diversification of Argonaute genes have led to their involvement in many different pathways (Yigit et al., 2006); for example csr-1 is involved in chromosome segregation (Claycomb et al., 2009). Only a subset of these genes are conserved in other Caenorhabditis species, suggesting that they are a rapidly expanding family of proteins (Dalzell et al., 2011). C. elegans has traits which makes RNAi particularly useful for experimental biolo- gists: it can uptake double stranded RNA from the environment and silencing signals will reach every tissue in the body (except sperm and neurons) once double stranded RNA is in the body wall cavity. This has led to the experimental technique of RNAi by feeding, which is the experimental basis of chapter 2 of my thesis. Several systemic RNA interference-deficient (sid) genes have been discovered which are required for these processes. The first characterized was sid-1 which is a transmembrane protein expressed on cell surfaces and required for uptake of the silencing signal in cells which are capable of RNAi and transgenic expression of sid-1 into neurons turns them from systemic RNAi resistant to systemic RNAi competent (Winston et al., 2002). A biochemical study of sid-1 when expressed in Drosophila S2 cells led to a model in which sid-1 is a dsRNA gated dsRNA specific channel which allows for passive diffusion (Shih and Hunter, 2011). Another gene is sid-2 which is an intestinal gene required for uptake of double stranded RNA from the environment via the gut lumen (Winston et al., 2007). The ability to uptake environmental RNAi signals is uncommon in Caenorhabditis; however transgenic expression of C. elegans sid-2 can render some of these other species capable of envi- ronmental RNAi (Nuez and F´elix, 2012). Two other pathway members required for the Chapter 1. Introduction 23

import of dsRNA into cells have been described: sid-3 is a cytoplasmic tyrosine kinase (Jose et al., 2012) and sid-5 is an endosome associated protein with a transmembrane domain (Hinas et al., 2012). An understanding of the rest of the pathway is still ongo- ing, but it provides an interesting example of a C. elegans pathway which has evolved recently, since homology searches do not detect these genes in more divergent (Dalzell et al., 2011).

1.6 High-throughput approaches to genome evolu-

tion

1.6.1 How much of evolution can single genes explain?

Through case studies we are led to believe a picture whereby phenotypic differences between related species occur through changes in individual genes with substantial effects on the development or physiology of the organism. However, it is unclear if this is a general picture since researchers are drawn to what is likely to give positive results - large effect QTLs. Consider for example in the case of pelvic spine reduction in stickleback which makes a very nice story about how selection at a single gene leads to adaptive evolution. Mapping experiments have identified a number of small effect QTLs in addition to the Pitx1 locus (Shapiro et al., 2004), so even in this case the story isn’t quite as simple as we would like to believe. Rockman (2012) has argued that by focusing on large effect mutations to explain phenotype evolution, researchers have missed out of a large part of the picture as there is no theoretical reason why evolution should proceed by changes in single large effect loci. Based on results from quantitative genetics, the genomic basis behind many different traits can be explained by a large number of small effect loci. For example pooled QTL mapping experiments in yeast can identify upwards of 20 loci which can explain Chapter 1. Introduction 24 the majority of the variance behind chemical resistance (Ehrenreich et al., 2010). In addition, mapping loci for yeast growth under different environmental conditions yielded 591 QTLs for 46 traits - an average of nearly 13 per trait (Bloom et al., 2013), again explaining most of the variance in the trait. Technology development in molecular biology is changing the perspective by giving us the ability to look at the function of large numbers of genes at once. Different types of experiments allow the measurements of different facets of gene expression: for example, it is now possible to measure on a genomic scale the expression levels of different genes (Hibbs et al., 2007), protein-protein physical interactions (Gavin et al., 2002), protein- DNA interactions (Martone et al., 2003) genetic interactions (Costanzo et al., 2010), and gene loss of function phenotypes (Kamath et al., 2003). Many groups have been comparing these measurements between related species in order to try and understand the evolution of gene function.

1.6.2 Gene expression divergence

The first technology that opened the door to a high-throughput look at the evolution of gene function was the microarray. One of the early studies to make use of microarrays to understand the evolution of gene expression looked at expression in several different tissues in human, chimpanzees, orangutans and macaques which found that the brain was highly enriched for expression differences in the human lineage (Enard et al., 2002). Fol- low up studies confirmed this general pattern by showing that human-specific expression differences are not observed in other tissues such as liver (Preuss et al., 2004). The brain genes which are upregulated in the human lineage come from a diverse set of functions such as regulation of transcription, signal transduction and lipid metabolism. While there is substantial interest in identifying positive selection leading to gene ex- pression divergence, it is likely that most transcriptome divergence is neutral or negatively selected. Khaitovich et al. (2004) found that expression divergence accumulated linearly Chapter 1. Introduction 25 with divergence time in primates, and that transcribed pseudogenes diverge at the same rate as real genes, suggesting that neutral evolution dominated the transcriptome. In contrast to this, a study in C. elegans compared expression divergence in mutational accumulation lines (where selection is inefficient) to natural isolates (where selection is efficient) and found that there was higher divergence in mutational accumulation lines, suggesting that negative selection dominates the transcriptome (Denver et al., 2005). Not all types of genes are equally constrained, it appears that signal transduction genes are under higher levels of negative selection than carbohydrate, amino acid and lipid metabolism genes. One of the most intriguing results to come from applying microarrays to the study of gene expression divergence is the finding that most divergence results from changes in cis. S. cerevisiae and S. paradoxus can hybridize, and using custom designed microarrays, Tirosh et al. (2009) compared allele specific expression between the species in question and the hybrid to determine how much of the divergence occurs in trans and how much occurs in cis. They found that the majority of the change could be explained by changes in cis, and furthermore, that changes in cis were independent of the condition the yeast were grown in while changes in trans were dependant upon condition. In another study, gene expression measurements from human chromosome 21 was expressed in mouse hep- atocytes confirmed this result by finding that the majority of the human expression was recapitulated within the trans environment of the mouse cell (Wilson et al., 2008). These results support a model in which the majority of the evolution of gene expression levels occurs due to changes in the cis regulatory sequence such as transcription factor binding sites and sequence which influences promoter chromatin organization. A model which was hypothesized prior to the development of high-throughput tech- nologies is that genetic pathways should be most conserved at the middle stage of animal development, when the body plan of the animal was being established; the so called “hourglass” model (Kalinka and Tomancak, 2012). For example, observations of devel- Chapter 1. Introduction 26

opment in different nematode species have found that establishment of AP asymmetry and P granule segregation occurs differently than in C. elegans, despite similar devel- opment after this point (Goldstein et al., 1998). After correcting for differences in the timing of developmental progression in species of Drosophila it was found that gene ex- pression divergence supports the hourglass model of development, with the highest level of conservation during the middle arthropod phylotypic period (Kalinka et al., 2010). Further results from zebrafish have confirmed this pattern, not by looking at average ex- pression over all genes in the genome but by attempting to build relevant gene modules (Piasecka et al., 2013). The hourglass model has been confirmed at a molecular level in multiple phyla and using multiple methods; it is an excellent example of how a molecular finding can support previously conceived models in evolutionary developmental biology. By using genomic technologies evolutionary biologists have been able to understand aspects of what governs the evolution of gene expression between related species. Such genomic technologies are also useful in studying how the transcription factor binding sites which control gene expression levels change between related species, and I will survey some of this research in the next section.

1.6.3 Transcription factor binding divergence

The evolution of gene expression in cis is thought to be due to the evolution of tran- scription factor binding sites or sequencing controlling the organization of nucleosomes. Technically, the way we can study transcription factor binding sites is using Chromatin ImmunoPrecipitation followed by microarray (ChIP-chip). One of the first studies to use such an experiment for an evolutionary comparison was mapping the binding sites of the pseudohyphal regulators Ste12 and Tec1 in different yeast species (Borneman et al., 2007). They found that binding sites turn over far more rapidly than orthologous gene content, implying that such changes could be an important phenomenon in evolution. Comparisons between multicellular organisms are more complicated due to the presence Chapter 1. Introduction 27 of multiple different cell types in differing ratios between different species. However, the liver is an excellent tissue for comparison because it is very homogeneous in cell type. A study looking at the binding sites of 4 key liver transcription factors (FOXA2, HNF1A, HNF4A and HNF6) found something very similar to yeast - there is a very high level of change in the genes whose promoters are bound by these transcription factors (Schmidt et al., 2010). Furthermore, when these factors do bind to the same promoter, the site often does not align, implying turnover of binding sites within genes. Given how quickly transcription factor binding sites turn over, the question remains whether all of these changes result in corresponding changes in gene expression. If we examine yeast genes promoters with binding site divergence there is no statistically sig- nificant increase in expression divergence (Tirosh et al., 2008). However, when binding site turnover is examined for a single transcription factor (Ste12) in a biological context that is well understood (mating response), about half the divergence in gene expression can be explained by divergence in Ste12 binding (Tirosh et al., 2008). These results highlight the potential difficulty in understanding evolution from a functional genomics perspective, especially in the context of transcription factor binding data in aggregate. Another model for studying the evolution of transcription factor binding sites is the development of the Drosophila blastoderm embryo, which is a conserved process through- out the genus. In comparing binding of core transcription factors (BICOID, GIANT, HUNCHBACK and KRUPPEL)¨ in the early developing blastoderm, it was found that only one third of ChIP-chip peaks were conserved between D. melanogaster and D. pseu- doobscura (Paris et al., 2013), and that such peaks are changing far faster than gene expression levels. Another study which compared D. melanogaster to the more closely related D. yakuba found there were significant changes in quantitative binding, even when there were peaks at the same locus (Bradley et al., 2010). Another force that could explain evolutionary turnover in gene expression is evolu- tion of the chromatin environment around genes. Nucleosome occupancy was originally Chapter 1. Introduction 28

measured genome wide in yeast using micrococcal nuclease followed by hybridization to a DNA microarray (Lee et al., 2007). Similar measurements in 12 Hemiascomycota yeast species suggest that the evolution of chromatin organization can affect the evolu- tion of gene expression, since nucleosomes can obscure transcription factor binding sites (Tsankov et al., 2010). Another study looking at hybrid yeast found that the majority of divergence in nucleosome occupancy occurs in cis rather than in trans (Tirosh et al., 2010), similar to what is the case for expression levels (Tirosh et al., 2009). Genomic technologies such as microarrays and high-throughput sequencing are greatly expanding our current views of evolution, moving us past thinking about finding the gene which underlies a specific trait, towards trying to explain how sets of genes and their biochemical activities can explain a given trait.

1.7 Developmental System Drift

I have discussed numerous examples of how a positively selected change in gene function can lead to adaptation, and I have also discussed how many polymorphisms do not affect gene function, they are selectively neutral and change in population frequency through the process of genetic drift. There is a third outcome, in which genome changes affect gene function, but due to strong stabilizing selection the phenotypic output of the genetic circuit remains conserved (Figure 1.1), this is known as Developmental System Drift (DSD) (True and Haag, 2001). It implies that molecular evolution is a continuously occurring process, even when species are in apparent periods of evolutionary stasis. A significant mechanism of DSD is when changes in the transcription factor bind- ing sites occur which maintain the gene expression output of the promoter/enhancer (Weirauch and Hughes, 2010). A good model for this is the stripe 2 enhancer of even- skipped in Drosophila, whose activation is controlled by 12 binding sites for Bicoid, Hunch- back, Giant and Kr¨uppel. Ludwig et al. (1998) found that the locations and numbers of Chapter 1. Introduction 29

these binding sites are not conserved in different species of Drosophila, but that those enhancers still produced the correct stripe 2 of even-skipped when tested transgenically in D. melanogaster. Furthermore, the locations and/or numbers of the binding sites within each species is critical, since a hybrid enhancer between D. melanogaster and D. pseu- doobscura did not produce a functional stripe (Ludwig et al., 2000). Follow up studies in Sepsid flies had similar findings: the function of even-skipped enhancers is conserved despite widely divergent sequence and divergent binding site motifs (Hare et al., 2008). These data are consistent with the billboard model of enhancer function, in which each binding site or group of binding sites is read independently by the molecular machinery of the cell, and the sum of their effects determines activation or repression (Kulkarni and Arnosti, 2003). Another example is unc-47, which has a conserved low level of expression in SDQR and SDQL neurons in both C. elegans and C. briggsae. The regulatory circuitry controlling this has diverged, as hybrid promoters between the species do not hold the conserved expression pattern (Barri`ere et al., 2012). Furthermore there is apparent co-evolution between the cis and trans regulatory machinery to maintain unc-47 expression, possibly due to stabilizing selection. The evolution of post-translational gene regulatory mechanisms such as phosphory- lation could also be explained by DSD. Consider as an example the cell cycle in related species of yeast, which is an extremely highly conserved biological process. CDK phospho- sites in the human or yeast ORC1 linker region are often disrupted by a mutation in the critical S/T or P residue in closely related species; however, there are often other possible sites in the same linker region, which could be explained by a turnover model (Moses et al., 2007). Consistent with this, these phospho-sites exist in loose clusters along pre- RC proteins and these phosphorylation events does not induce allosteric changes in the conformation of these proteins; rather, its change of function occurs through bulk elec- trostatics, thus allowing the number of location of these sites to change without affecting Chapter 1. Introduction 30

the function of the interaction (Serber and Ferrell, 2007). Experiments measuring RNA polymerase III binding to tRNA genes suggest that there can be little conservation at the level of the individual gene while that conservation

exists at a higher order level. Specifically, Pol III binding was divergent between ∼500 tRNA genes in 5 mammalian species, but more conservation was observed when the authors merged the tRNA genes into sets of aggregate anticodon isoacceptor or amino acid isotype (Kutter et al., 2011). These results are consistent with a changing binding profile among a redundant set of tRNA genes, as long as there is conservation of the levels of different tRNA bound amino acids available to the cell. Another excellent example is the regulation of a-specific genes (asgs) in S. cerevisiae and C. albicans. In both species, asgs are transcribed in a cells and repressed in α cells, but while in C. albicans this occurs by activation of the asgs in a cells, in S. cerevisiae it occurs via repression of the asgs in α cells (Tsong et al., 2003). This difference exists because of evolution on the S. cerevisiae branch of the tree; the circuitry of C. albicans is ancestral. Molecularly, this difference exists for two reasons: first, there was the loss of the interaction between a2 and Mcm1, which controlled activation of asgs in a cells. Secondly, there was a gain of an interaction between the α2 protein and Mcm1, which led to repression of asgs in α cells (Tsong et al., 2006). In order to avoid a state without regulation of the asgs, there was likely an intermediate state with both activation and repression, which still partially exists in intermediate extant species (Baker et al., 2012). Beyond gene expression, there are examples of molecular pathways which can be fairly divergent with little effect on the phenotype of the organism. In the nematodes C. elegans and C. briggsae, the pattern and timing of cell division and cell death is the same up until the 350-cell stage of embryogenesis (Zhao et al., 2008). However, the genetic networks that regulate this are not the same: knocking down the Wnt-pathway effector pop-1 by RNAi causes the E cell to adopt an MS cell like fate, whereas in C. briggsae it causes the MS cell to adopt an E cell like fate (Lin et al., 2009; Zhao et al., 2010). Chapter 1. Introduction 31

Another example of pathway evolution with a conserved phenotype is in the development of the vulva between C. elegans and Pristionchus pacificus. While in C. elegans, vulval development is induced by EGF/Ras signaling from the anchor cell (with a minor role of Wnt signaling), in P. pacificus it is induced by the Wnt pathway from the anchor cell and gonad (Tian et al., 2008). This change in reliance on different pathways may have at least partially been driven by changes in protein architecture. In P. pacificus there are multiple SH3 interaction peptide motifs in the Wnt pathway gene lin-18 (Ryk), which are not present in C. elegans - these are thought to be negative regulators of the gene (Wang and Sommer, 2011). Although these examples help demonstrate several mechanisms for DSD, it is cur- rently not clear how promoters or molecular pathways can change from one state to another because it would involve moving across a region of low fitness in which the molecular pathway wouldn’t produce the desired output. An attractive hypothesis is that there is a transition through a redundant intermediate such as an enhancer with multiple binding sites, or a developmental process with multiple parallel pathways or redundant genes (Haag, 2006).

1.8 How much of molecular function is noise?

In the previous section I discussed how the mechanics of transcription factors activating gene expression or molecular pathways can change without any effect on the phenotype of the organism. This raises questions about how much of the molecular function high- throughput biologists are currently measuring has any effect on the phenotype of the organism at all. The ENCODE project claimed that 80% of the human genome holds a function (Bernstein et al., 2012), in contrast to the current thought that it is mostly disposable (eg. a megabase size gene desert can be deleted from the mouse genome without ob- Chapter 1. Introduction 32

vious phenotypes, N´obrega et al. (2004)). Given ENCODE’s definition of function as a DNA segment which produces a transcript or has a biochemical signature such as protein binding or histone modification, the statement that 80% holds a function is technically correct. However, this has been criticized because only 10% of the human genome shows any evidence of negative selection, and thus there is no evidence that most of this bio- chemical function manifests itself in a phenotype which is detectable by evolution (Graur et al., 2013). If we think about function from a gene centric rather than a DNA centric point of view then the molecular function of a transcription factor is the activation or repression of genes by binding to their promoters. When transcription factor binding sites were first mapped using ChIP-chip on the human genome they estimated there were over 10,000 binding sites - it was been suggested that at least some of these are likely not functional, binding events that do not activate or repress the transcription of any gene (Euskirchen and Snyder, 2004). Thus, many of these DNA elements may not appear to be functional when we consider their effect on gene expression. Consider the example of pha-4, which is a FoxA transcription factor whose activity is necessary and sufficient for pharynx specificity in C. elegans (Horner et al., 1998). Microarray experiments have identified 240 genes which are expressed in the pharynx (Gaudet and Mango, 2002), but ChIP-Seq experiments have found 4800 genes whose promoters are bound by pha-4 (Zhong et al., 2010). These data are consistent with model in which there is pervasive transcription factor binding across the genome due to a lack of selection against the random creation of spurious binding sites, and that these spurious binding sites do not activate gene transcription. In support of this, evolutionary comparisons of binding events of core developmental transcription factors in Drosophila species show that there is extremely little conservation of individual sites and this rapid turnover likely indicates that many binding events are undetectable by evolution (Paris et al., 2013). Chapter 1. Introduction 33

Much like how transcription factor binding sites change rapidly due to a lack of negative selection, it has been suggested that a lack of negative selection against non- functional protein-protein interactions, particularly in species with a low effective pop- ulation size where selection is ineffectual, has led to a number of such non-functional interactions in PPI networks (Levy et al., 2009). As evidence for this hypothesis, Landry et al. (2009), found that S/T sites with evidence for phosphorylation, but without ev- idence of a functional role of that phosphorylation event, evolve at the same rate as non-phosphorylated S/T residues. While the random birth of a non-functional PPI in- tuitively seems like it should be a rare evolutionary occurrence, groups have found that PPIs can be formed by mutation of a single or a handful of residues (Skerker et al., 2008; Grueninger et al., 2008). Along with protein-DNA or protein-protein interactions, another example of a molec- ular event which may be widespread with little functional consequence is alternative splicing. High-throughput techniques have identified that up to 95% of genes in the hu- man genome are alternatively spliced (Pan et al., 2008). However, it is apparent that only a small subset of these events are conserved between related species; only 50% of human alternatively spliced exons are conserved in chimpanzees and only 20% are con- served in mouse (Barbosa-Morais et al., 2012). These results are consistent with a model in which a lack of negative selection against functionally irrelevant splice isoforms causes their proliferation, especially in species with very ineffectual selection such as humans. It is difficult to convincingly show that a given molecular event does not have a phenotypic effect, however, our current understanding of the strength of selection suggests that many other molecular events are likely to represent evolutionary noise. For example, neutral evolution could generate binding sites, which in turn lead to gene expression in new tissues, despite no functional importance in that tissue. It has been estimated that pol-II has only a 104-fold higher specificity for maximally active promoters than average sequence (compare this to 108 for nucleotide addition during DNA replication), and based Chapter 1. Introduction 34

on the size of the yeast genome 90% of pol-II initiation events in yeast are transcriptional noise (Struhl, 2007). Consistent with this, genes which express in a single tissue are rare in C. elegans (Hunt-Newbury et al., 2007). If we consider evolution as the mechanisms which can change population allele fre- quency, and if we accept how ineffectual negative selection can be, particularly in the human genome, then we should also accept that there is a neutral proliferation of molec- ular functions which are irrelevant for the organism level phenotype. Evolution is not a process which generates optimal molecular pathways without any form of waste, rather a process which creates and maintains pathways which are just barely good enough to get the population to the next generation.

1.9 Open questions and thesis goals

High-throughput technologies such as gene expression microarrays have been successfully used on related species for making evolutionary comparisons. However, there are other high-throughput technologies for looking at gene function, which have not yet been used on closely related species to understand how gene function evolves. Particularly, RNAi has been used extensively to measure the biological gene function in C. elegans (Kamath et al., 2003), and recent developments which have made C. briggsae amenable to RNAi by feeding (Winston et al., 2007), have opened the door to a genome scale comparison of loss of function phenotypes in order to understand how biological gene function differs between species. While C. elegans and C. briggsae are a good pair of species for which to investigate changes in gene function using RNAi by feeding for reasons that I have discussed, there do not appear to be many instances of adaptation between these species. They look nearly identical under the microscope, they have a highly conserved developmental lineage (Zhao et al., 2010), they are even capable of cross fertilizing one another although the embryos Chapter 1. Introduction 35

die (Baird et al., 1992), and they occur in an overlapping ecological niche (F´elix and Duveau, 2012). These points are highly suggestive that strong stabilizing selection has acted over the past tens of millions of years to preserve the biology of these species, which can be explained by the model of DSD. Thus, a high throughput look at differences in biological gene function by using RNAi screening in the model pair of C. elegans and C. briggsae could shed substantial light on how much gene function changes during DSD - this is one of the goals I have for this thesis. Another aspect of the evolution of gene function which is not well understood is how novel genes acquire functions after being born. There are some examples from the literature of an evolutionary novel gene being created and then changing enough to acquire a biological function, for example Poldi has acquired a necessary function for sperm motility (Heinen et al., 2009). However, there is no high-throughput look at which novel genes go on to develop essential functions for the organism. Given that there are large numbers of high-throughput datasets which have been produced which contain a large amount of information on gene function, it should be possible to use that information to create predictions of which novel genes become essential, and an understanding of what kinds of functional features cause novel genes to become essential. This is the goal of the second section of my thesis. Chapter 1. Introduction 36

ANCESTRAL

organism-level molecular phenotype phenotype

genome evolution

DERIVED

ADAPTATION DSD NEUTRAL

molecular CHANGED CHANGED same phenotype organism-level CHANGED same same phenotype

Figure 1.1: Possible outcomes of genome evolution. As a genome evolves, the accumulated mutations can be neutral, having no impact on the molecular phenotype (that is, the functions encoded in the genome and the ways that these are regulated), or they can lead to adaptation via changes in heritable phenotype due to changes in the molecular phenotype. Developmental System Drift (DSD) describes a third possibility: while the overall phenotype of the organism remains identical, the underlying genetic networks underpinning this phenotype have changed. A key outcome of this is that some orthologous genes play different in vivo roles in phenotypically identical, related species. Chapter 2

Evolution of ortholog gene function in Caenorhabditis spp.

This work in this chapter was published in part in: Verster AJ, Ramani AK, McKay SJ, Fraser AG. (2014). Comparative RNAi Screens in C. elegans and C. briggsae Reveal the Impact of Developmental System Drift on Gene Function. PLoS Genetics, 10(2):e1004077 I performed all work associated with this chapter except Arun K. Ramani who helped me by providing a second set of eyes in the manual RNAi screening, Sheldon J. Mckay who wrote the C. briggsae RNAi library primer design pipeline, and Andrew G. Fraser and Arun K. Ramani who contributed to writing the text.

37 Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 38

2.1 Abstract

Although two related species may have extremely similar phenotypes, the genetic net- works underpinning this conserved biology may have diverged substantially since they last shared a common ancestor. This is termed Developmental System Drift (DSD) and reflects the plasticity of genetic networks. One consequence of DSD is that some ortholo- gous genes will have evolved different in vivo functions in two such phenotypically similar, related species and will therefore have different loss of function phenotypes. Here I report an RNAi screen in C. elegans and C. briggsae to identify such cases. I screened 1333 genes in both species and identified 91 orthologs that have different RNAi phenotypes. Intriguingly, I found that recently evolved genes of unknown function have the fastest evolving in vivo functions and, in several cases, I identify the molecular events driving these changes. I thus find that DSD has a major impact on the evolution of gene function and I anticipate that the C. briggsae RNAi library reported here will drive future studies on comparative functional genomics screens in these nematodes.

2.2 Introduction

As genomes evolve, new genes are born and older genes may adopt novel functions, fuse, or disappear altogether. What are the phenotypic consequences of this continual molecular change? One striking consequence of the evolution of genomes is adaptation: novel genetic variants can underpin the evolution of novel organism-level phenotypes such as new anatomical structures or behaviors and, if these result in improved fitness, these can become fixed in the population through selection. At the molecular level, such novel organism-level phenotypes can arise through the evolution of entirely novel biochemical activities such as novel genes, new protein domains, or new classes of functional RNAs: for instance, metazoan genomes encode classes of proteins that are absent from single- Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 39

celled eukaryotes and that participate in metazoan-specific processes (e.g. netrins in axon guidance, immunoglobulins and MHC complex subunits in the immune system). New organism-level phenotypes can also result from the rewiring of already existing activities such as the shuffling of existing domains into novel combinations (e.g. the rapidly evolving architectures of chromatin regulators, Lander et al. (2001)) or through changes in the regulation of expression of otherwise conserved genes - for example, evolution of lin-48 expression affects salt tolerance in C. elegans (Wang and Chamberlin, 2004), evolution of the yellow gene alters wing spots in different Drosophila species (Gompel et al., 2005), and evolution at the Pitx1 locus causes adaptive loss of pelvic spines in sticklebacks (Chan et al., 2010). Adaptation is dependent on changes in the molecular phenotype of the organism - the functional activities encoded by the genome and the way they are regulated - which result in selectable changes in the phenotype of the organism. At the other end of the spectrum from adaptation is neutral drift. Many genomic changes have no impact on the phenotype of the organism since they do not have any impact on the molecular phenotype, that is, on the functions encoded in the genome and their precise regulation. Such changes are therefore under no selection - while they may disappear or become fixed in a species, neither outcome is a consequence of their effect on phenotype. All changes in organism-level phenotype (such as those that result in adaptation) are thus underpinned by changes in molecular phenotype and, conversely, genomic changes that do not affect molecular phenotype cannot alter organism level phenotype and are therefore neutral. However, there is a third outcome, a phenomenon known as Develop- mental System Drift (DSD) (True and Haag, 2001). In DSD two related species share an identical organism-level function that was also present in their last shared ancestor; however, since the species diverged, the genetic networks that underpin this function have drifted. Unlike in classical drift, molecular change in DSD is under strong stabi- lizing selection to preserve the phenotype of the organism. In DSD then, the molecular Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 40

phenotype has changed, while the organism-level phenotype has remained unaltered; this is a reflection of the plasticity of genetic networks. One effect of the changes in molecular phenotype that accompany DSD is that some orthologs evolve different roles in related organisms - these will therefore have different loss of function phenotypes. If I knew the entire set of orthologous genes that have differ- ent loss of function phenotypes in two related species that have very similar phenotypes, this would provide a global view of how gene function can drift while maintaining the same organism level phenotype - this is my goal here. Specifically, by examining how DSD affects gene function in a systematic manner, I would like to examine whether the in vivo function of certain classes of genes evolves faster than others and begin to ex- plore the molecular changes which underpin the types of changes in gene function that nonetheless preserve the same overall organism-level phenotype. C. elegans and C. briggsae are both free-living hermaphroditic nematodes that share the same ecological niche (F´elix and Duveau, 2012). Their anatomical structures are strikingly similar and, up to the 350-cell stage of embryogenesis, the lineages and tim- ings of cell division are nearly identical (Zhao et al., 2008). However, their genomes

have diverged significantly in the ∼20 Mya since they last shared a common ancestor

(Cutter, 2008): only ∼60% of their genes have 1-1 orthologs, with many species-specific expansions, losses, and chromosomal rearrangements (Stein et al., 2003). There is al- ready good evidence that while C. elegans and C. briggsae have very similar biology, the genetic networks that control this are not the same, since while they can fertilize each other, the resulting interspecific hybrids die as embryos (Baird et al., 1992). More specifically, a small number of genes is also known to play very different roles in otherwise identical processes - for example, while early embryogenesis is identical in both species, knocking down the Wnt-pathway effector pop-1 by RNA-mediated interference (RNAi) causes opposite cell fate transformations in the two nematodes (Zhao et al., 2010; Lin et al., 2009). Thus, while many of the organism-level phenotypes are highly conserved Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 41

between these two worms, the genetic networks underpinning these functions may have diverged considerably. A systematic comparison of loss of function phenotypes between orthologous genes in these two related nematodes might thus shed light on how DSD affects gene function. RNAi-based screens have been used extensively in C. elegans to identify the in vivo (i.e. organism-level) functions of each gene (Fraser et al., 2000; Kamath et al., 2003; S¨onnichsen et al., 2005; G¨onczy et al., 2000). However, no analogous screens have been carried out in C. briggsae. In this chapter, I describe the construction of a C. brig- gsae RNAi library of 1333 dsRNA-expressing bacterial strains analogous to the well- characterized C. elegans RNAi library (Fraser et al., 2000; Kamath et al., 2003) - feeding any single bacterial strain to C. briggsae targets a single C. briggsae gene. The genes targeted in the library are the great majority of the C. briggsae 1-1 orthologs of C. el- egans genes that have a well-characterized RNAi phenotype (see Methods). Comparing the RNAi phenotypes of the C. briggsae gene with the RNAi phenotypes of its C. elegans ortholog thus allows identification of orthologs that have different loss of function phe- notypes in these two worms indicating that they play different roles in the development and function of these anatomically highly similar animals. In this section of my thesis I report the construction of a C. briggsae RNAi library and a screen to identify orthologs that have different RNAi phenotypes in C. elegans and C. briggsae. This data indicate that while these two species have very similar morphology and behavior, many orthologous genes have different in vivo functions suggesting that DSD has a major impact on the evolution of gene function. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 42

2.3 Results

2.3.1 Construction and screening of the C. briggsae RNAi li-

brary

RNAi is an extremely powerful tool for examining gene function in C. elegans (Fire et al., 1998). RNAi allows the knock-down of any gene in vivo and thus can be used to rapidly identify the role any gene plays in the development and function of the worm, that is, its organism-level function. In C. elegans, RNAi can be induced by feeding worms with bacteria expressing dsRNA complementary to a gene of interest (so-called RNAi by feeding, Timmons and Fire (1998); Kamath and Martinez-Campos (2001)) and a library of dsRNA-expressing bacteria has been constructed that allows the researcher to individually target over 80% of all predicted C. elegans genes (Fraser et al., 2000; Kamath et al., 2003). I wished to construct an analogous library for C. briggsae and use it to compare RNAi phenotypes of orthologous genes between species. Constructing and screening a genome-scale RNAi library for C. briggsae is a huge undertaking. Since my principal goal was to identify genes that have different RNAi phenotypes in C. elegans and C. briggsae, the great majority of genes will be uninforma-

tive since they will have no readily detectable RNAi phenotype in either worm (∼85% of genes have no readily detectable phenotype in C. elegans (Kamath et al., 2003), and this is likely to be broadly similar in C. briggsae). I thus decided to construct a library tar- geting only the set of 1437 C. briggsae genes that had direct 1-1 orthologs with the 1640 genes which were previously shown to have a robust, readily detectable RNAi phenotype in C. elegans (Kamath et al., 2003) (see Methods, Figure 2.1A). Although this excludes a small number of genes that have no apparent phenotype in C. elegans but that have a phenotype in C. briggsae, this set will nonetheless cover the great majority of genes that have phenotypes in C. briggsae. I made the library according to the same design prin- ciples as the C. elegans RNAi library (Fraser et al., 2000; Kamath et al., 2003), and as Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 43

far as possible targeted an orthologous region of the C. briggsae gene as was targeted by the C. elegans RNAi fragment (Figure 2.1B). In total, I was able to construct targeting strains for 93% (1333) of the 1437 targeted genes in C. briggsae (Methods, Figure 2.1A). The central goal of this project is to compare the loss of function phenotypes of orthologous genes in C. elegans and C. briggsae - accurate identification of orthologs is thus critical. I initially used InParanoid 6.1 (Berglund et al., 2008) to identify putative 1-1 orthologs - these candidates are similar to candidates that would be identified using reciprocal BLAST, and this is a reasonable place to start. To increase my confidence that the identified putative orthologs are indeed likely to be true orthologs, I carried out three sets of additional tests. First, I determined whether there are additional closely related genes in either genome, in which case orthology can be harder to assign, or whether the putative orthologs appear to be the sole related gene in either genome, in which case orthology is fairly unambiguous. For example, K04G7.1 and CBG16609 are reciprocal best hits and have a BLAST E-value of 0 in either direction; in C. briggsae, the next closest BLAST hit is CBG20138, with a E-value of 8×10−4, and in the other direction, the next closest C. elegans hit is C01H6.2 with an E-value of 4.3. When the difference in E-value is greater than 20 on a log10 scale I called these unambiguous and 72% of my ortholog pairs fall into this class. Second, I checked whether the ortholog pairs identified via InParanoid, a graph-based method, were also identified using a tree-based method, which is a very different and complementary approach (Kuzniar et al., 2008). In this case I used TreeFam (Li et al., 2006) and I found that 90% of my putative orthologs are identified as orthologs in TreeFam. Finally, I used synteny to resolve harder assignments. Alignments of the C. elegans and C. briggsae genome indicate that considerable proportions of these genomes are syntenic (Stein et al., 2003), that is, many segments can be identified in which gene order has been preserved in both species since the last common ancestor. Synteny can be used to aid in identification of likely true orthologs in complex cases (e.g. large families of closely related genes, or cases where Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 44

orthologs have diverged greatly). I was able to find evidence of upstream or downstream synteny in 87% of cases. Together these results suggest that my ortholog identification is correct in the great majority of cases - 72% are unambiguous, and a further 27% of the putative orthologs can be confirmed either through TreeFam or synteny - thus 99% of my orthologs can be confirmed by other complementary approaches. These data are all summarized in Table 2.1. To screen the C. briggsae library, I followed an identical screening protocol to that used in the first genome-scale screens in C. elegans (Fraser et al., 2000; Kamath et al., 2003) and assessed the same developmental and morphological phenotypes (see Methods for a complete list). However, while wild-type C. briggsae is capable of RNAi when the dsRNA is delivered by injection, RNAi by feeding is ineffective at least in part because of the inability of the C. briggsae SID-2 to actively uptake dsRNA (Winston et al., 2007). This defect can be rescued by transgenic expression of C. elegans sid-2 (Nuez and F´elix, 2012), however, and thus all my screening was not in wild-type C. briggsae but in a transgenic line expressing C. elegans sid-2. I note that this could produce some false positive results due to genetic interactions in the background I am using, such as synthetic lethality with the expression of SID-2, but this is likely to be only a minority of cases. To identify genes with different phenotypes in C. elegans and C. briggsae,I not only compared the phenotypes in C. briggsae to previously published data for C. elegans (Kamath et al., 2003) but I also screened C. elegans side by side with C. briggsae as shown schematically in Figure 2.2A. The RNAi phenotypes of each pair of orthologs were compared in the two species at two time points by two independent observers; three C. elegans replicates and six C. briggsae replicates were examined in any single experiment. Any differences were repeated in an independent experiment, and genes where I detected a different phenotype in at least 3 out of 4 observations between the 2 observers and 2 experiments were considered as potential hits. Based on these criteria, I examined the loss of function phenotypes of 1333 orthologous genes by RNAi in C. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 45 elegans and C. briggsae and identified 679 orthologs that have different phenotypes in the two species (Figure 2.3A). There are two major sources of false positives in this initial screen, which I try to deal with using secondary filters and rescreening. The first source of false positives in my primary screen is that RNAi is more efficient in the transgenic SID-2-expressing line of C. briggsae than in C. elegans - I generally get a stronger RNAi knockdown of C. briggsae genes as measured by qPCR (see Figure 2.4). Many genes thus have stronger RNAi phenotypes in C. briggsae (eg. ytk-6 has a growth defect in C. elegans but is completely sterile in C. briggsae) but this does not reflect any true difference in in vivo function. To partly test this idea, I tested a 111 gene subset of the 508 genes that have a stronger phenotype in C. briggsae in the lin-35(n745) C. elegans strain, which has increased RNAi efficiency compared with wild-type C. elegans. I found that a substantial proportion of these genes (36%; 40/111) also have stronger phenotypes in lin-35(n745) worms than in wild-type C. elegans which provides some support for the view that the stronger phenotypes seen for many genes in C. briggsae may be due to an increased level of knockdown in C. briggsae than C. elegans. Crucially, however, this increased RNAi efficiency in C. briggsae means that in the cases where the RNAi phenotype is weaker in C. briggsae, this is unlikely to be due to a weaker knockdown in C. briggsae (given that a majority of a tested set of genes showed a stronger knockdown in C. briggsae as shown by qPCR in Figure 2.4), rather that it likely reflects a genuine difference in the in vivo function. I thus focus the rest of this chapter on studying genes whose phenotypes are weaker in C. briggsae than C. elegans and excluded all genes that had stronger RNAi phenotypes in C. briggsae from any downstream analysis. The second source of false positives is that some of the C. briggsae RNAi library clones do not produce adequate knockdowns in C. briggsae - these genes will thus appear to have weaker phenotypes in C. briggsae than in C. elegans. To address this, I made independent RNAi clones targeting a different region of the gene to that used in the Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 46

primary screen (where possible) and screened these. I re-examined the RNAi phenotypes of all 204 genes that had weaker phenotypes in C. briggsae in this way and found that 91 genes still showed reproducibly weaker phenotypes in C. briggsae with the independent clones (final breakdown of hits is shown in Figure 2.3B, genes are shown in Table 2.2). I note that while rescreening with independent targeting clones is fairly rigorous, it is still possible that both independent clones failed to generate good knockdown in C. briggsae. To assess how often this may happen, I used qPCR to examine levels of knockdown in C. elegans and C. briggsae for genes that have weaker phenotypes in C. briggsae - of the 8 genes examined, 7 showed similar or stronger knockdown in the SID-2 expressing transgenic C. briggsae than in C. elegans (and thus are true positives) and only a single example had weaker knockdown in C. briggsae. This last example, tsr-1, is a false positive in my dataset. I thus estimate that around 80-90% of my hits are true positives, but acknowledge that a few rare examples are false positives due to poor knockdown in C. briggsae. As a final confirmation of the differences in RNAi phenotype seen using the manual phenotyping described above, I retested 50 of the hits from my manual screen and a random subset of 324 additional genes using a fully automated phenotyping method (shown schematically in Figure 2.2B). This is highly complementary to manual screening. The manual screening described above has many advantages - multiple time-points are examined, many phenotypes are scored at once and, for the purposes of this screen, it allowed us to assess RNAi phenotypes in C. briggsae using the exact same methodology used for the initial screens in C. elegans. One disadvantage, however, is that it is not fully quantitative and this affects sensitivity in two ways. Firstly, there is a limit to what the eye can detect at high throughput: while differentiating between a sterile worm and one with a normal brood size is trivial, it is hard to tell the difference between a worm that has 50% of normal brood size and one that has 35% normal brood size. Secondly, different worm strains and especially different worm species do not grow identically. The C. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 47 briggsae sid-2 -expressing transgenic line that I use for all my experiments grows slightly more slowly than N2, and this inherent difference in growth rate can make identification of subtle differences in phenotype more difficult. For these reasons, I also carried out a fully automated quantitative screen using a commercially available worm sorter (Union Biometrica) which addresses both the issues of sensitivity and normalization for different growth of the two species. In outline, RNAi experiments are set up in liquid culture in 96-well format. At the start of the experiment, each well contains a saturated culture of dsRNA-expressing bacteria and 10 L1 worms; phenotypes are examined after 96 hours by which time, in a normally growing culture, the initial L1 animals have grown to fertile adults, laid the next generation, and these will have hatched. Using the worm sorter, I quantify the number of worms in each well, as well as the sizes and optical densities of each worm in each well. These data allow us to precisely measure brood size as well as identify differences in growth rate, body size, and embryonic lethality (see Methods for more details in analysis). Crucially, by comparing the phenotypes seen after targeting a specific gene with phenotypes of worms growing in bacteria expressing a control non-targeting dsRNA, all phenotypes are normalized for any inherent differences in worm growth between the two species. Using this pipeline, I confirmed statistically significant differences in phenotype for 26 of the 50 tested manual phenotyping hits; 21 showed brood size differences and a further 5 showed differences in growth rate or embryonic lethality (see Methods for data processing details). I failed to see differences in phenotype for 24 - the majority of these show subtle phenotypic differences (e.g. cuticle defects, or movement defects) that are not readily detectable in the sorter and I believe this explains the difference in the two assays. Finally, I note that I see an additional 57 genes having significantly different effects on brood size in these two species using the automated pipeline, suggesting that the true number of genes with different phenotypes in these two species is significantly greater than was detected by manual phenotyping which has few false positives but a Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 48

substantial false negative rate. In summary, I constructed an RNAi library of targeting 1333 C. briggsae genes. I used this library to compare the RNAi phenotypes of orthologs in C. elegans and C. briggsae using manual phenotyping and identified 91 genes that have different RNAi phenotypes in these two species that is likely to be due to a genuine difference in their in vivo function. The majority of these differences could be confirmed by a quantitative phenotyping method designed specifically to measure differences in brood size, lethality, and growth rate. This list of genes undoubtedly has some false positives due to inadequate RNAi knockdown in C. briggsae (e.g. the example of tsr-1 above, or pal-1 which has a detectable embryonic lethal phenotype in C. briggsae when using RNAi by soaking Winston et al. (2002), but has no phenotype in my screen) - however, my qPCR analysis suggests that only ∼15% of my reported hits are such false positives and thus that the great majority of my hits are true positives. The rest of this chapter is concerned with examining this set of genes to explore the molecular changes that underlie this difference.

2.3.2 Genes with different phenotypes are enriched for tran-

scription factors and recently evolved novel genes

I identified 91 genes that have a different RNAi phenotype in C. elegans and C. briggsae - I refer to these from here on as the ‘Different Function’ genes. To begin to under- stand why these ‘Different Function’ genes have such differing in vivo roles, I initially assessed whether this set of genes was enriched for any specific molecular functions. I annotated genes into the functional categories previously used by Kamath et al. (2003) and find that transcription factors and genes of unknown function are enriched among the ‘Different Function’ genes, while genes involved in protein synthesis are under-enriched (Figure 2.5A, p <0.01, Hypergeometric test). This indicates firstly that the basic ma- chinery of the eukaryotic cell has changed very little in organismal function over time and, secondly, suggests that transcription factors appear to have more rapidly evolving Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 49

organismal roles than other classes of gene. These two findings are unsurprising. The individual genes that encode for components of the basic eukaryotic cell machineries (e.g. DNA replication, transcription, translation etc.) are essential in organisms as divergent as worms and yeasts (Kamath et al., 2003; Tischler et al., 2006), so finding great similarity in these genes between two related species is expected. Likewise, transcription networks are well-known to be extremely plastic across evolution (Tsong et al., 2006) and thus finding an enrichment of transcription factors in the set of genes with different in vivo functions in C. elegans and C. briggsae is not unexpected. However, the finding that genes with different phenotypes are enriched for genes of unknown function is intriguing since many of these unknown function genes are nematode-specific (see analysis below). This suggested that more recently evolved genes may have most rapidly changing in vivo roles and I examined this further. To investigate more closely whether there was any correlation between the evolution- ary age of a gene (i.e. when any such gene arose de novo from non-coding sequences) and the likelihood that it had a different in vivo function between C. elegans and C. briggsae, I carried out a phylogenetic analysis for each gene screened and date the emergence of these genes to their last common ancestor in a similar method to the phylostratum ap- proach (Domazet-Loˇso and Tautz (2010), see Methods). I found that the more recently a gene has arisen, the more likely it is to have a different phenotype between C. elegans and C. briggsae. Ancient genes (those that I was able to date to the emergence of the Opisthokont lineage) are the least likely to show a difference in phenotype (<5%, p <0.01 Hypergeometric test, Figure 2.5B) while extremely recently evolved genes (those which date to the emergence of the Caenorhabditis genus) are the most likely (>15%, p <0.01 Hypergeometric test, Figure 2.5B), suggesting that phylogenetically novel genes have a high rate of evolution of their in vivo functional roles. These bulk analyses thus reveal that just as changes in transcriptional networks and the ‘invention’ of entirely novel classes of gene are major forces driving the evolution of Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 50 novel organismal functions in adaptive evolution (for example, Dai et al. (2008)), these classes of gene are those that have fastest evolving in vivo functions during DSD.

2.3.3 Changes in gene function during DSD are often the result

of promoter evolution

I found that the 91 genes that have significantly different in vivo functions in C. elegans and C. briggsae are enriched for both transcription factors and for recently evolved genes of known function. However, this does not tell us why they have different in vivo functions (and thus different RNAi phenotypes). There are three possible reasons that orthologous genes could have a different RNAi phenotype in C. elegans and C. briggsae, they might encode the same molecular function but be expressed in different tissues, the coding sequences might have diverged such that they have different molecular functions, or, while the orthologs are functionally identical both in terms of expression and encoded functions, changes in some other genes may have altered the level at which these orthologs are required in these two worms. I examined each possibility in turn. I initially focused on testing whether genes with different RNAi phenotype in C. elegans and C. briggsae might have different expression patterns in these two species. This could be due to many different levels of gene regulation from transcriptional to post- transcriptional and translational control - for the purposes of these analyses, I focused on transcriptional control of gene expression since this is a major step of regulation of gene expression. In outline, I used PCR stitching (Hobert, 2002) to generate pairs of constructs in which either the promoter of the C. elegans gene drives GFP expression or the syntenic region of the orthologous C. briggsae promoter drives expression of mWormCherry. In this way, I could make C. elegans worms transgenic for both constructs and rapidly identify cells that were either exclusively GFP or mWormCherry positive, indicating that the C. elegans and C. briggsae orthologs might be expressed in different cell types. In any cases where I found differences in C. elegans, I repeated the experiment in C. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 51

briggsae to test whether any differences in tissue expression were due to evolved changes in the promoters or any changes in trans-acting factors. Since it would have been an impractical amount of work to do this analysis for all 91 ortholog pairs, I focused my effort on examining the expression patterns of the ‘Differ- ent Function’ genes of unknown function that are uniquely found in nematode genomes since this class of gene was enriched in my dataset. I analyzed expression patterns for 12 such worm-specific ‘Different Function’ genes; in addition, to sample other gene classes, I examined expression patterns of 10 random ‘Different Function’ genes in my dataset. I identified 3 worm-specific orthologs, C03D6.1, K04G7.1, and C27F2.7, that had clearly visible differences in expression pattern between C. elegans and C. briggsae (Figure 2.6A- C); in addition, one gene in my random set, sac-1, also had a different expression pattern in the two species (Figure 2.6D). In all four cases this was due to differences in the pro- moter and not to differences in trans-acting factors since the expression patterns seen in C. elegans could be faithfully recapitulated in C. briggsae (all data shown in Figure 2.6 are expression patterns in transgenic C. briggsae). Crucially, in all four cases, the dif- ference in expression pattern is likely to explain the difference in phenotype since the tissue expression in C. briggsae, where the phenotype is weaker, is a restricted subset of the tissue expression in C. elegans. For example, C03D6.1 has a strong growth defective RNAi phenotype in C. elegans and is expressed in the gut, the hypodermis, and a small number of tail cells; in C. briggsae, where its expression is restricted to only a handful of cells in the tail, it has no obvious phenotype at all. These data strongly suggest that the reason for the differences in RNAi phenotypes between C. elegans and C. briggsae for the four genes examined here is that they are expressed in a very different set of tissues in these two animals, leading to a differential requirement for these genes for organismal viability. To test this prediction directly, I took a cross species rescue strategy. In outline, I examined the ability of a set of transgenes (shown schematically in Figure 2.7A) to rescue the phenotype of a null C. elegans mutant Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 52 and designed these to be able to test which parts of the C. elegans and C. briggsae genes are functionally interchangeable - the promoter, the coding region, neither, or both. Of the four orthologs that I could have tested, there was only a suitable null mutant for one of these, sac-1, and I focused my attention on this gene. I found that transgenic expression of the C. elegans sac-1 ORF under control of the C. elegans sac-1 promoter gives robust rescue of the growth arrest phenotype of C. elegans homozygous for the null allele sac-1(ok1602), but that the C. briggsae sac-1 ORF under control of the syntenic region of the C. briggsae sac-1 promoter does not, indicating that these genes have indeed functionally diverged. When I use hybrid rescue constructs, I found that while the coding sequences are apparently functionally interchangeable, the promoters are not: only the C. elegans promoter drives expression in the correct tissues to rescue the sac-1(ok1602) phenotype (Figure 2.7B). These data show that at least in the case of sac-1 the difference in RNAi phenotype in C. elegans and C. briggsae is entirely due to promoter evolution.

2.3.4 Ortholog pairs encoding more divergent protein sequences

are more likely to have different RNAi phenotypes

I examined the expression patterns of 22 pairs of orthologs that have different RNAi phenotypes in C. elegans and C. briggsae and found that 4 of these have obviously different expression patterns, suggesting that promoter evolution underlies the differences in in vivo function that I observe for these genes. However, as shown in Figure 2.7A differences in in vivo function might also be due to evolution of coding sequences - if the C. elegans and C. briggsae orthologs encode different enzymatic activities, for example, this could result in different in vivo functions. Using a similar hybrid transgene rescue strategy to that for sac-1 above, I tested whether coding sequences of bli-4, bli-5, vha-5, flr-1, sma-3, and sem-5 were functionally interchangeable, or whether there was evidence that they had evolved functional differences. I selected these 6 genes since for each gene Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 53

there was a null allele available in C. elegans that had a readily detectable phenotype; for most genes, there was no null allele available at the time, and thus I could not carry out similar tests for most of my dataset. I found no clear examples where the difference in RNAi phenotype of orthologs in C. elegans and C. briggsae could be conclusively shown to be due to evolution of coding sequences. However, I only tested a very small number of cases and, in many of these cases, I failed to get strong enough rescue of the null phenotype by transgenic expres- sion of the C. elegans coding sequence under the C. elegans promoter to allow us to distinguish between the ability of different hybrid transgenes to give different levels of rescue. These are therefore inconclusive experiments and, as more null alleles are being generated, it will be interesting to revisit this. I note however that bulk analyses of the protein sequences encoded by C. elegans and C. briggsae orthologs indicates that diver- gence of protein sequence between orthologs does appear to correlate with the likelihood that orthologs have different RNAi phenotypes. I compared the proteins encoded by orthologous ‘Different Function’ genes in C. elegans and C. briggsae and find that the ‘Different Function’ genes have drifted slightly more in sequence than the Same Func- tion genes as would be expected if changes in protein function have in part driven the evolution of different organismal functions for these genes. I found that the alignable regions are more divergent (as measured by the Ka or Ka/Ks metrics; see Figure 2.8A,B, p <0.01 Mann Whitney U test) and that both the number and the total length of non- alignable regions are slightly increased (Figure 2.8C,D, p <0.01 Mann Whitney U test). This is consistent with a model in which drift in the proteins encoded by orthologous genes might contribute to DSD, but this effect is modest at this level of bulk analysis. It is nonetheless predictive: the orthologs that differ most in sequence are substantially more likely to have different RNAi phenotypes than more similar orthologs and this is shown in Figure 2.8E. I thus find that the greater the divergence in protein sequence between orthologs, Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 54

the greater the likelihood that they will have different in vivo functions, as identified by different RNAi phenotypes. However, I have no conclusive evidence to show that this is causative rather than correlative: it could simply be that genes with differing in vivo roles have more rapidly diverging coding sequences and this is still an open question from my data.

2.3.5 Orthologs may have different organismal roles due to changes

in other genes

I tested whether changes in RNAi phenotype might be due either to changes in gene expression or to changes in the molecular functions of the encoded protein. I identified four genes with a different RNAi phenotype between C. elegans and C. briggsae which is likely to be due to changes in promoter sequence and for one of these, sac-1, I showed that to be the case. In addition, given the increased protein divergence between orthologs that have different RNAi phenotypes in the two worms, it appears that many of the molecular events that lead to changes in the level of requirement for a specific gene are likely to be linked to changes in the gene itself, either in its promoter or in its coding region. As shown in Figure 2.7A, there is a final possibility: that orthologs in the two species might encode identical proteins and be expressed in an identical manner, yet still have very different RNAi phenotypes due to changes in other genes that alter the level at which the orthologs are required. In such cases, both the coding regions and the regulatory sequences are functionally interchangeable between the orthologs, but the RNAi phenotypes in the two species still differs. Similar cross-species transgenic approaches have been used to great effect between C. elegans and C. briggsae. For example, a similar cross species rescue experiment has been used to show that the different RNAi phenotype of gld-1 lies in the overall genetic context of C. elegans and C. briggsae and not in the molecular function of gld-1 (Beadell et al., 2011; Liu et al., 2012) and careful analysis of unc-47 has revealed extensive compensatory evolution in the regulation of gene expression in these two species Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 55

(Barri`ere et al., 2011, 2012). I found two examples of orthologs that have differing in vivo functions in C. elegans and C. briggsae due to changes in other genes. bli-4 and bli-5 act together to regulate molting and have very different phenotypes in the two species studied - for example bli-5 has a strong blistering phenotype in C. elegans but not in C. briggsae (Figure 2.9A,B). Bli-4 encodes a subtilisin-like serine protease (Peters et al., 1991) whereas bli-5 encodes a kunitz family serine protease inhibitor thought to act with BLI-4 (Page et al., 2006). Given that these genes are hypothesized to act together to affect cuticle development, I wondered whether the difference in requirement for these two genes in C. elegans and C. briggsae might not be due to independent functional changes in bli-4 and bli-5, but to changes in the requirement for this entire pathway between the two worms due to changes in other genes. Using transgenic rescue experiments I found that both the coding sequences and the promoters of C. elegans and C. briggsae are functionally inter- changeable for both bli-4 and bli-5 : expression of the C. briggsae bli-5 under control of the C. briggsae bli-5 promoter gives as robust rescue of the C. elegans bli-5(e518) null mutant as expression of C. elegans bli-5 coding region under control of the C. elegans bli-5 promoter (Figure 2.9C); the same is true for rescue of the C. elegans bli-4(e937) mutant by C. briggsae bli-4 (Figure 2.9D). Thus, at least in these two cases, I have found examples where the difference in the RNAi phenotype for orthologs in C. elegans and C. briggsae is not due to any difference in the genes themselves, but rather in the level of requirement for the pathway in which the genes act.

2.3.6 Conservation of function can be maintained at the level

of gene family and not gene family members

In the case of bli-4 and bli-5 above, these genes have differing RNAi phenotypes in the two species studied because of changes elsewhere in the genetic networks of these worms but one cannot trivially pinpoint these other changes. However, for a subset of genes Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 56

with differing phenotypes one can make an educated guess - the set of genes that are members of multigene families. In these cases, it is possible that both worms have an essential requirement for a specific gene activity but that this is carried out by different members of the same gene family in the two worms. Although I have not followed this in depth, I have data that are consistent with this. I first examined all 91 C. briggsae genes that had a weaker phenotype and searched for related genes in the C. briggsae genome (see Methods for details) that might instead be carrying out the required molecular function. If this is indeed the case, these related genes would thus be expected to have a stronger phenotype in C. briggsae. There are 49 genes with a weaker phenotype in C. briggsae for which I was able to find one or more related genes in the C. briggsae genome that might have a similar molecular function. When I compared RNAi phenotypes in C. elegans and C. briggsae for these related genes, I found 5 examples where the C. briggsae gene has a stronger phenotype than C. elegans (Figure 2.10). The numbers of genes I examined here (5 out of 49 examples) are too low to support a statistical analysis of these findings, rather they are exploratory. For example rsp-3 is an SR protein which is 100% embryonic lethal in C. elegans, but not in C. briggsae, while rsp-6, a different SR protein, is 100% embryonic lethal in C. briggsae but not C. elegans. The family of rsp genes is known to have multiple functional overlaps in C. elegans (Longman et al., 2001; Kawano et al., 2000) and I suggest that not only is this true in C. briggsae but, crucially, that the relative importance of each family member differs in the two species. This is consistent with a model in which both C. elegans and C. briggsae require a specific molecular function, but that this function can be carried out by different members of the same family of genes in the two species. In summary, I have a generated an RNAi library targeting 1333 C. briggsae genes; each targeted gene is the direct ortholog of a C. elegans gene known to have a clear detectable RNAi phenotype. I screened for genes that have major differences in in vivo function between the two nematodes but clearly many more refined RNAi screens are Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 57

possible using this reagent and I anticipate that the availability of my library will help drive progress in this area of comparative evolutionary development. I identified 91 genes with obviously different in vivo functions and examining these genes reveals key features of the molecular events driving the changes in gene function that accompany DSD. In more focused studies, I showed that multiple genes with different in vivo functions have evolved different expression patterns and, in the case of sac-1, I showed that promoter evolution is indeed the cause of the change in in vivo function. This is only one example and I anticipate that my dataset, along with the RNAi library itself, will provide a rich source of other future detailed studies to pinpoint the molecular causes of the changes in in vivo function that I observe.

2.4 Discussion

C. elegans and C. briggsae are phenotypically extremely similar. They live in the same ecological niche (F´elix and Duveau, 2012), they have near-identical development (Zhao et al., 2008), and are sufficiently morphologically close that they can be crossed and can fertilize each other (Baird et al., 1992). The resulting interspecies hybrids are not viable, however, indicating that while the biology of these two nematodes is nearly identical, the molecular pathways that underpin this conserved biology have diverged substantially, a phenomenon termed Developmental System Drift (DSD) (True and Haag, 2001). One of the consequences of DSD is that some orthologous genes play different in vivo roles in the two species and thus their loss of function phenotypes will be different. My goal in this study was to investigate the consequences of DSD on gene function in C. elegans and C. briggsae. Rather than examine one specific process in great detail, as has been done successfully before in these two species (F´elix, 2007; P´enigault and F´elix, 2011; Hoyos et al., 2011), I chose instead to carry out a broad screen to identify as many cases as possible of genes that have different in vivo roles due to DSD and hence to gain insight Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 58

into the following questions. How many genes have changed their in vivo roles as these species diverged? Is this common or extremely rare? Do specific classes of genes change more frequently? Finally, can I identify any common features in the molecular events that underlie the changes in gene function that I identify? Addressing these questions gives insight both into how great an impact DSD has on the evolution of gene function and into how gene functions evolve during DSD. I used RNAi to target over 1300 genes in both C. elegans and C. briggsae. Each of these genes has a readily detectable RNAi phenotype in C. elegans and thus I could identify genes whose RNAi phenotypes (and hence whose in vivo functions) differ between these two species as the result of DSD. Using a manual phenotyping method designed to screen for a broad range of phenotypes, I identified 91 orthologs that have obviously different RNAi phenotypes in these two species (the ‘Different Function’ genes). In parallel to this, I also screened 374 genes using an automated quantitative phenotyping method which allows detection of more subtle differences in brood size and growth rate.

This more sensitive assay identified significant differences in phenotype for ∼21% of genes. Taken together, I estimate that over 25% of genes have different in vivo functions in C. elegans and C. briggsae as the result of DSD. I note that this estimate is likely to be a substantial underestimate of the true rate at which gene functions are diverging during DSD for several reasons. Firstly, while I tried hard to eliminate false positives from my dataset, both through multiple rounds of rescreening and by re-designing additional RNAi clones for each potential hit, I have little means to estimate my false negative rate. This is likely to be significant: the screen was carried out at high throughput, the phenotypes examined were fairly crude and, at least in the case of the manual phenotyping, differences needed to be quite large for us to detect. All these factors will result in false negatives and thus the proportion of the genes that I screened which have truly different phenotypes is almost certainly higher than I report here. Secondly, because of the difference in RNAi efficacy in the two species, I Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 59

could only detect biologically meaningful differences in RNAi phenotype if the phenotype was weaker in C. briggsae than in C. elegans. In all likelihood, there are as many genes that have a weaker phenotype in C. elegans than in C. briggsae as vice versa, I just cannot identify them in my screen. Finally, I screened an extremely selectively chosen gene set i.e. the set of genes that have a readily detectable RNAi phenotype in C. elegans (<15% of all C. elegans genes (Kamath et al., 2003)) and that also have a 1-1 ortholog in

C. briggsae. While only ∼60% of genes have a 1-1 ortholog in C. elegans and C. briggsae

(Stein et al., 2003), my gene set is extremely highly conserved: ∼90% have 1-1 orthologs between these two species. Furthermore, many of the genes I screened are known to be functionally conserved over extremely long evolutionary distances: for example, 60% of the genes giving lethal or sterile phenotypes in C. elegans are also essential for viability in S. cerevisiae (Tischler et al., 2006). The set of genes I screened are likely to be the most functionally conserved between C. elegans and C. briggsae of any genes in the genome. Taking this all together, my finding that during DSD over 25% of these have evolved different functional roles in the two species is surprisingly high and suggests that DSD has a major impact on the evolution of gene function. What are the underlying molecular causes of the differences in gene function that I observe as differences in RNAi phenotype? I found that three main types of molecular events explain many of the changes in gene function that I identified. Firstly, I found multiple examples in which orthologous genes that have different RNAi phenotypes also have different in vivo expression patterns. I examined the expression patterns of 22 such ‘Different Function’ genes in both species and find that 4 have a clearly different expression pattern in C. elegans and C. briggsae that is entirely due to promoter evolution. In all four cases, the species in which the RNAi phenotype is less penetrant (and thus the species which has a lower requirement for the function of that ortholog) is also the species in which the expression is far more restricted, suggesting that the difference in phenotype might indeed be explained by the difference in expression. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 60

In one case, sac-1, I tested this explicitly and showed that this is indeed true. Gene expression, as a result of promoter evolution, thus plays a significant role in the way genes change in vivo functions during DSD. Secondly, certain types of gene are more likely than others to evolve different in vivo functions as a result of DSD. While most of the core conserved components of the eukaryotic cell (the ribosome, the proteasome etc.) tend to have the same functions in both species, transcription factors and recently evolved genes of unknown function often have different phenotypes. In the case of transcription factors, this result is perhaps expected: transcriptional networks are known to be extremely plastic and can rewire extensively while still having similar outputs and responses (Baker et al., 2012). For the recently evolved genes, however, this is intriguing. None of them have orthologs outside nematodes and indeed many are specific to Caenorhabditis species, and few have any functional annotation. Why should a gene that is absolutely essential for C. elegans viability be more likely to be dispensable in C. briggsae if it evolved recently than if it is an ancient gene? What essential roles do these novel genes play in nematode biology and why do they seem to be changing so rapidly? Some carefully dissected examples already exist such as the example of fog-2 and she-1 in the independent evolution of hermaphroditism in these two species. fog-2 is a recently evolved gene which has evolved a specific function in sperm development in C. elegans, while the non-orthologous F-box protein she-1 plays the same role in C. briggsae (Clifford et al., 2000; Nayak et al., 2005; Guo et al., 2009) . The roles of such novel recently evolved genes in nematode biology and evolution are intriguing open questions that will require extensive follow-up studies. Finally, my data suggest that the individual members of multigene families frequently adopt different in vivo roles during DSD. There are often multiple redundancies among members of gene families and I suggest that this results in the requirement for any single family member to be extremely fluid over time. For example, there are well described redundancies in the SR family of splicing regulators in C. elegans (Longman et al., 2001; Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 61

Kawano et al., 2000). I found that while rsp-3 is essential for viability of C. elegans, targeting rsp-3 in C. briggsae has little effect; conversely, targeting rsp-6 in C. briggsae has a strong RNAi phenotype, but in C. elegans rsp-6 has no obvious phenotype. In this example, while both worms require an rsp activity, in C. elegans the essential rsp is rsp-3 whereas in C. briggsae it is rsp-6 and I suggest this is a common feature of drift in gene function during DSD. I note that all the three key molecular drivers of gene functional change during DSD - changes in gene expression, the rapid evolution of novel genes, and subfunctionalisation among related family members - are also central molecular drivers of changes in gene function that result in adaptation (Chan et al., 2010; Dai et al., 2008; Hittinger and Carroll, 2007). One explanation for this is that DSD and adaptation are unrelated and unlinked phenomena- for example, some evolved alterations in gene expression have advantageous phenotypic outcomes while others have no impact on phenotype and neither set of changes has any influence on the other. While this is completely plausible, there is an alternative view: that the reason that the molecular events that often underpin the changes in gene function that accompany both DSD and adaptation are very similar is that DSD and adaptation are intimately linked evolutionary phenomena. One possible conceptual model for a link between DSD and adaptation comes from detailed studies of in vitro molecular evolution (Fontana and Schuster, 1998; Schuster and Fontana, 1999). In these studies, the evolution of a new phenotype (in this case, a new fold or activity) is rarely the result of a single adaptative mutation alone. Rather, a series of phenotypically neutral mutations (the molecular equivalent of DSD) results in a derived molecule that is phenotypically indistinguishable from the ancestor, but that is different with respect to its evolvability. While a final adaptive mutation results in a new adaptive phenotype in the derived molecule, making the same mutation has no effect on the phenotype of the ancestral form. The derived and ancestral molecules are thus functionally equivalent, but a single base change has radically different phenotypic Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 62

consequences for adaptation in these two molecular species. In this way, at least at the level of adaptation of in vitro molecular phenotypes, neutral drift and adaptation are often intimately linked. I speculate that DSD and adaptation might be linked in an analogous manner at the level of whole organism phenotypes. While the widespread changes in gene function that occur during DSD do not appear to have any direct impact on phenotype, they might have profound consequences on the effect of additional subsequent changes. The effect of DSD, viewed in this way, is that while two species such as C. elegans and C. briggsae are phenotypically extremely similar at present, the possible evolutionary trajectories of the two species are very different since the phenotypic outcomes of identical molecular changes can be very different in the two animals. Changes in gene function that would be deleterious in C. elegans might have no effect in C. briggsae (e.g. mutation or change in gene expression of sac-1 ) or, at the limit, might confer a selective advantage that would drive adaptation. This idea of a potential link between DSD and adaptation is still speculative but the finding I report here that similar molecular events underlie the evolution of gene function in both processes is consistent with this notion. In summary, then, I used RNAi to identify genes with different in vivo functions in two extremely phenotypically similar nematode worms, C. elegans and C. briggsae. This study is the first systematic survey of the outcome of DSD on the in vivo functions of orthologous genes in any closely-related animal species and my data suggest that DSD has major consequences for the evolution of gene function. I anticipate that the dataset from my RNAi screen will help to drive deeper characterization of the molecular events underlying DSD and, just as the public availability of the C. elegans RNAi library was key for the systematic analysis of gene function in C. elegans, so the availability of the C. briggsae RNAi library will drive extensive comparative screens in these two related nematodes. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 63

2.5 Methods

2.5.1 Construction of the C. briggsae RNAi Library

I used InParanoid 6.1 (Berglund et al., 2008) to identify C. briggsae genes that are puta- tive 1-1 orthologs of C. elegans with a reported RNAi phenotype (Kamath et al., 2003). To further validate these ortholog assignments, I also used orthology assignments from TreeFam, which use phylogeny relationships, and also synteny to resolve complex or- tholog assignments. In order to design the C. briggsae clones I identified the orthologous region in the C. briggsae genome to that targeted by the C. elegans RNAi clone using BLAST and used this as a seed region. Predicted clones that had at least 80% identity over 200bp to additional C. briggsae genes were eliminated as having potential off target effects and manually redesigned. Secondary clones were designed by hand according to the principles above and were targeted to a separate group of exons to the first clone I used. For cloning I digested L4440 with EcoRV (Fermentas) and then dephosphorylated with Shrimp Alkaline Phosphatase (Fermentas). PCR products were amplified from AF16 genomic DNA using Pfu (Fermentas) and then phosphorylated with PolyNucleotide Kinase (Neb) for blunt end cloning. The vector and PCR products were ligated together overnight and then transformed into HT115 bacteria. The colonies were screened using a T7 colony PCR, and positives were reassembled into the correct locations in 96 well plates, and then finally verified using an insert specific colony PCR.

2.5.2 Manual screening of the C. briggsae RNAi Library

Caenorhabditis species were maintained by feeding on OP50 on NGM plates at 20 ◦C. Screening was done on 12 well agar plates as previously described (Kamath et al., 2003). I screened for a list of visible phenotypes which have been previously reported (Kamath et al., 2003), listed here: Emb (embryonic lethal), Ste (sterile), Stp (sterile progeny), Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 64

Gro (slow post-embryonic growth), Lva (larval arrest), Lvl (larval lethality), Adl (adult lethal), Bli (blistering of cuticle), Bmd (body morphological defect), Clr (clear), Dpy (dumpy), Egl (egg-laying defective), Him (high incidence of males), Lon (long), Mlt (moult defect), Muv (multivulva), Prz (paralysed), Pvl (protruding vulva), Rol (roller), Rup (ruptured), Sck (sick) and Unc (uncoordinated). Each ortholog pair was screened by 2 people in 2 fully independent experimental set-ups on separate weeks. My confidence score is the number of observations of a phenotype difference out of 4 possible observa- tions. Genes with at least 3 out of 4 observations of a different phenotype in the two species were potential hits and were tested in secondary screens. For these, I designed additional RNAi clones which targeted a different region of the C. briggsae gene where possible and screened these secondary RNAi clones in an identical way to the first screen. Genes were called as final hits if I saw a consistent phenotype difference using both the primary and secondary RNAi clones.

2.5.3 Fitness Assay

L1 animals were grown and filtered for purification as described above. RNAi clones were grown overnight at 37 ◦C in LB media with 1mM Carbenicillin and induced at a final concentration of 4 µM IPTG for one hour. After induction, bacterial cultures were spun down and resuspended in NGM containing 4 µM IPTG and 1mM Carbenicillin. 10 µl of a ∼1 worm/ µl solution were put into each well of a 96 well plate and then 40 µl of the bacterial suspension was added. Each row of the 96 well plate had 5 replicates of each RNAi clone for each species and 2 blank wells. In each plate non-targeting dsRNA- expressing bacteria (GFP) were also present as negative controls. After growing at 20 ◦C with shaking at 200 rpm for 96 hours I quantified the number of progeny using a COPAS worm sorter; the length (measured as the Time of Flight - TOF) and optical darkness (measured as Extinction - EXT) of each counted animal are also recorded. From these data, I calculated the relative brood number following RNAi as the ratio Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 65

between the worm number in the targeted cultures and the worm number in cultures grown with non-targeting GFP RNAi bacterial controls. To assess differences in relative brood size, I calculated the log ratio of the relative brood sizes for C. elegans and C. briggsae for each targeted gene, and used the empirical distribution of 60 independent non-targeting GFP RNAi bacterial controls to determine a cutoff for statistical signif- icance. In order to identify embryonic lethal phenotypes I counted objects with TOF less than 50 and EXT less than 30 (which identifies embryos) and calculated the ratio of the number of embryos to non-embryos for each RNAi and control experiment. By comparing the empirical distribution of these ratios in the control experiments to the targeting RNAi I was able to identify genes that resulted in embryonic lethality when knocked down by RNAi.

2.5.4 qPCR

For each knock down, 50 L4 larvae were grown on a lawn of dsRNA-expressing bacteria on NGM plates containing 1mM IPTG and 1mM Carbenicillin for 72 hr. RNA was harvested using Trizol (Invitrogen) and was cleaned-up using an RNeasy kit (QIAGEN). Following a DNase I digestion (Invitrogen) I carried out first strand cDNA synthesis using superscript II (Invitrogen). I calculated the efficiency of the primers by dilution curves and ensured they were between 1.85 and 2.05. The qPCR was done in a CFX96 (Bio- Rad) using Sybr Green (Clonetech) according to the manufacturers protocols. Relative expression was calculated using the Pfaffl efficiency correction (Pfaffl, 2001) where each sample was normalized to the expression of tbg-1. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 66

2.5.5 Examination of C. elegans and C. briggsae gene phylo-

genetic age

In order to define the phylogenetic position of genes I took curated lists of orthologs to the C. elegans gene from Wormbase (WS233) (Harris et al., 2010). I downloaded a phylogenetic tree from the NCBI database, (downloaded on the 8th of January 2013) for the species which have genomes available and found the last common ancestor as the point of emergence of each gene.

2.5.6 GFP Stitching and Microscopy

PCR primers were designed to amplify 2kb upstream of the translation start site or up until the next gene. C. elegans promoters were combined through PCR stitching to the coding sequence from GFP and unc-54 3’UTR from the vector pPD95.75, while C. briggsae promoters were stitched onto the coding sequence from mWormCherry and unc-54 3’UTR from the vector pJH1774. Stitched PCR products were quantified on an agarose gel and then diluted to the same concentration and injected with pRF4 into C. elegans (N2) worms as a co-injection marker. F2 animals were isolated and then imaged on a custom Quorum confocal microscope. For each expression pattern I imaged a minimum of 3 lines to ensure I had consistent expression patterns. Any genes with obvious expression differences were then validated by injection into C. briggsae (AF16) in order to ensure that I get a consistent expression pattern.

2.5.7 Transgenic rescue experiments

I created the rescue constructs shown in Figure 2.7 by first generating constructs that encode a C-terminal GFP fusion for each ORF to be expressed using the pPD95.75 vector. For each of these I cloned the region upstream to either C. elegans or C. briggsae orthologs to make a total of 4 constructs, 2 containing DNA specific to one species, and Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 67

the other 2 being hybrids between species. The bli-4 and bli-5 constructs were injected at 15 ng / µl with pCFJ90 as a co-injection marker. I isolated F2 progeny which were positive for myo-2 ::mCherry and then I counted the proportion of RFP+ adult animals with blisters. The sac-1 constructs were injected at 1ng/µl with pRF4 as a co-injection marker into sac-1(ok1602) animals. Rol positive F2s were isolated and the proportion of homozygous adult rescued animals were scored by the absence of myo-2 GFP signal from the hT2 balancer. A subset of animals were confirmed to be homozygous by single worm genotyping PCR.

2.5.8 Examination of C. elegans and C. briggsae protein simi-

larity

Orthologs between C. elegans and C. briggsae were defined using InParanoid 6.1 and their CDS sequences were downloaded from Wormbase (WS190). I translated these to protein sequences, aligned them using ClustalW 2.0 (Larkin et al., 2007) and then projected these alignments back to the CDS sequences. I then used the Yn00 program from PAML (4.3) (Yang, 2007) to calculate Ka, Ks and the Ka/Ks ratios for C. elegans and C. briggsae orthologs. I measured evolutionary novel segments between C. elegans and C. briggsae by taking the protein alignments defined above and then identifying segments which did not align between the 2 species (minimum of 4 residues). I then counted the total number of such unique segments as well as the total residues involved.

2.5.9 Predictability of phenotype differences

In this procedure I ranked orthologs by either the Ka metric. Then I randomly picked pairs of orthologs, one with a different phenotype and one with the same phenotype, and I asked whether the ortholog with a greater Ka was the ortholog with a different phenotype. If so I classified this as a positive prediction and put it into bins based on Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 68

the rank difference of the Ka. This randomization procedure was repeated one million times and the results were plotted.

2.5.10 Identifying functionally related genes

I identified genes which have weaker RNAi phenotypes in C. briggsae and then searched for related C. briggsae genes by using BLASTP; I considered any gene with a BLASTP hits with an E-value less than 10−5 as a possible related gene. These genes display some sequence similarity to both the C. elegans and C. briggsae copy of the gene but should not be considered orthologs since they are far more divergent than the true ortholog and do not cluster together on the gene tree (Figure 2.10). I then constructed RNAi clones for the sets of related genes but excluded families with greater than five related members as being too complex. All RNAi clones were screened in C. briggsae side by side with RNAi experiments in C. elegans using clones targeting the C. elegans orthologs. In this way I compared the RNAi phenotypes of C. elegans and C. briggsae orthologs for small gene families that contain at least one member that had a weaker phenotype in C. briggsae. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 69

A

1640 Genes with an RNAi phenotype (Kamath et al. 2003)

110 genes have no ortholog at all in C. briggsae

57 genes are duplicated in C. elegans

36 genes are duplicated in C. briggsae

1437 Genes have a 1-1 ortholog in C. briggsae

104 genes failed to clone 1333 genes that are 1-1 and have an RNAi clone in both C. elegans and C. briggsae

B

chrI_random 860k 861k Gene Models CBG24498 (Cbr-sac-1)

C. elegans RNAi Clone (BLAST) Sac-1 C. briggsae RNAi Clone CBG24498

Figure 2.1: Design of the C. briggsae RNAi library. A. Breakdown of how I arrived at the final set of genes in the C. briggsae RNAi library. 1640 genes with an RNAi phenotype in C. elegans were targeted and then filtered for having a 1-1 ortholog in C. briggsae. B. Molecular design of the C. briggsae RNAi library. C. elegans RNAi clones form the Ahringer library were mapped by BLAST to the C. briggsae genome and used as a seed region for RNAi clone design. Primers were designed around this region targeting the maximum number of bases of exons.

Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 71

A

Identical Stronger 204 Weaker

639

475

B

Identical Stronger 91 Weaker

697

508

Figure 2.3: The breakdown of the final screening results I observed. A. Breakdown of primary RNAi screen. Genes were placed into three classes: those with an identical phenotype in both species (Identical); those with a stronger phenotype in C. briggsae (JU1018) (Stronger); and those with a weaker phenotype in C. briggsae (JU1018) (Weaker). B. Breakdown of the final result after rescreening with secondary RNAi constructs. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 72

1.4 C. elegans C. briggsae 1.2

1.0

0.8

0.6

Expression relative to control relative Expression 0.4

0.2

0.0

Identical Weaker Figure 2.4: Comparison of levels of knock-down achieved in C. elegans and C. briggsae using bacterial-mediated RNAi. 9 genes were individually targeted by RNAi in both C. elegans and C. briggsae, RNA was harvested after 72 hrs of treatment, and qPCR was used to examine levels of knockdown. One of the genes had an identical RNAi phenotype in the two nematodes, the other 8 had weaker RNAi phenotypes in C. briggsae as indicated in the figure. The data in the graph represent the means of three independent biological replicates; each biological replicate had two independent technical replicates. The error bars shown are the standard error and expression levels are expressed relative to the expression of tbg-1. Genes are from left to right pqn-85, nekl-2, K04G7.1, csn-5, unc-62, mcm-7, sac-1, apr-1, tsr-1. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 73

A 0.30

0.25

0.20

0.15

0.10

0.05 Fraction of genes with a different phenotype a different with of Fraction genes 0.00 All Prot. TF Unk. B 0.25

0.20

0.15

0.10

0.05 Fraction of genes with a different phenotype a different with of Fraction genes 0.00 Opis. Coel. Nem. Chro. Caen. Figure 2.5: A. Functional enrichment in genes with different in vivo functions in C. elegans and C. briggsae. All 1333 genes analyzed were manually placed into the functional categories described in Kamath et al. (2003). The graph shows the proportion of genes that have different RNAi phenotypes in several different functional classes: all genes analyzed (All), genes annotated to have roles in Protein Synthesis (Prot. Synth.), Transcription Factors (TF), or genes of Unknown function (Unk). Classes with significantly fewer genes with different RNAi phenotypes are shown in blue; those with statistically increased numbers are shown in red. Enrichments are significant with an FDR of 0.05 (Hypergeometric test). B. RNAi phenotypes differ most for more recently evolved genes. All 1333 genes analyzed were placed into five classes based on their evolutionary age as described in Methods. The most ancient genes could be dated back to the emergence of the Opisthokta lineage (Opis.), then becoming progressively younger, I could date sets of genes back to the emergence of the Coelomata (Coel.), Nematoda (Nem.), (Chro.), and finally some genes had arisen so recently that they were only detectable in Caenorhabditis species (Caen.). In each case, the graph shows the proportion of genes in each evolutionary class that had a different RNAi phenotype. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 74

A C03D6.1

B K04G7.1

C C27F2.7

D sac-1

Figure 2.6: in vivo expression of a subset of genes with different RNAi pheno- types. C03D6.1, K04G7.1, C27F2.7, and sac-1 had strongly different RNAi phenotypes in C. elegans and C. briggsae. I generated transgenic C. elegans strains (N2) expressing GFP under control of the promoter of the C. elegans ortholog or C. briggsae strains (AF16) expressing mWormCherry under control of the orthologous C. briggsae promoter for each gene. In each case, four panels are shown: DIC image of N2 worms transgenic for the C. elegans promoter driving GFP, fluorescence image of N2 worms transgenic for the C. elegans promoter driving GFP, DIC image of AF16 worms transgenic for the C. briggsae promoter driving mWormCherry expression, fluorescence image of AF16 worms transgenic for the C. briggsae promoter driving mWormCherry expression. Images are confocal projections at 200X magnification, and scale bars represent 100 µm, except for C. elegans K04G7.1 which is at 400X magnification with a scale bar representing 50 µm. Images are representative of 3 independent lines. A. Expression difference for C03D6.1. Arrow heads indicate tail cells (white), intestine (red) and hypodermis (blue). B. Expression difference for K04G7.1. Arrow heads indicate head neurons (white) and body wall muscle (red). C. Expression difference for C27F2.7. Arrow heads indicate head neurons (white), hypodermis (blue), intestine (yellow), vulva (green) and tail neurons (red). D. Expression difference for sac-1. Shown are 3 confocal projections along the body of the animals 400X magnification. Scale bars represent 50 µm. Arrowheads indicate the intestine (blue), pharynx and pharyngeal neurons (white), spermatheca (yellow), and tail cells (red). Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 75

A B C. elegans 0.4 C. briggsae Change in gene expression 0.3

Change in protein sequence 0.2

0.1 Change in pathway structure of Fraction homozygous survive which progeny

0.0

Figure 2.7: Differences in sac-1 RNAi phenotype are due to differences in sac- 1 promoter function. A. Schematic illustrating transgenic rescue approach. To determine whether the difference in in vivo function of sac-1 in C. elegans was due to changes in the sac-1 coding region, differences in sac-1 expression, or due to changes in other genes, I tested the ability of the hybrid transgenes illustrated to rescue the sac- 1(ok1602) phenotype in C. elegans. B. The ability to rescue a sac-1 null allele requires the C. elegans sac-1 promoter. I generated C. elegans lines transgenic either for the C. elegans sac-1 ORF under control of the C. elegans sac-1 promoter; the C. briggsae sac-1 ORF under control of the C. briggsae sac-1 promoter; or for the two hybrid constructs shown. In each case, I examined the ability of the transgenic array to rescue the developmental arrest phenotype of sac-1(ok1602) homozygous animals - the graph shows the percentage of animals that reached the adult stage that are homozygous for the sac-1(ok1602) allele, indicating rescue. Either the C. elegans sac-1 ORF or the C. briggsae sac-1 ORF under control of the C. elegans sac-1 promoter could partially rescue; no rescue was seen for the remaining constructs, indicating that the sac-1 promoter has diverged in the two species, while the sac-1 coding regions appear to be functionally interchangeable. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 76

A B 0.30

0.6 0.25

0.20

0.4 0.15 Ka Ka/Ks 0.10 0.2 0.05

0.0 0.00 Different Same Different Same C D 10 100

8 80

6 60

4 40 Sum of all the Sum of non-aligned all alignment per regions Number of non-aligned Number alignment per regions 2 20

0 0 Different Same Different Same E Positive Negative Positive predictive value predictive Positive 0.0 0.2 0.4 0.6 0.8 1.0 0 0.10.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Rank difference Figure 2.8: A.,B. Enrichment of metrics relating to protein divergence for genes with a different phenotype. Ka (Non-synonymous substitutions per non- synonymous site) and Ka/Ks ratio was calculated from the Yn00 program of PAML 4.3 C.,D. Non-alignable differences in protein sequence between C. elegans and C. briggsae orthologs. The number of non-aligned regions is calculated by counting the number of gaps (minimum size of 4 residues) in the protein alignments, and the total number of residues in all of those gaps. E. Ka is predictive of a different phenotype when the difference in large. Genes were ranked by the Ka metric and random pairs were chosen, one with the same phenotype and one with a different phenotype. If a Ka in C. elegans predicted the gene with a different phenotype, then this is classified as a positive prediction. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 77

A B

1.0 0.8 0.6 0.4 C. elegans C. briggsae 0.2 bli-5 (RNAi) bli-5 (RNAi) 0.0

Fraction of adults with blisters with of Fraction adults C. elegans C. briggsae bli-5 (RNAi) bli-5 (RNAi)

C D

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2 Fraction of adults with blisters with of Fraction adults blisters with of Fraction adults 0.0 0.0 bli-5(e518) C. elegans C. briggsae bli-4(e937) C. elegans C. briggsae rescue rescue rescue rescue Figure 2.9: bli-5 and bli-4 have an identical gene function despite showing different RNAi phenotypes. A. RNAi phenotype of bli-5 in C. elegans (N2) and C. briggsae (JU1018). B. Quantification of the phenotype shown in panel A. C. Rescue of the bli-5(e518) phenotype by either C. elegans or C. briggsae bli-5 genes. I generated transgenic bli-5(e518) lines in which either C. elegans bli-5 coding region was expressed under the control of C. elegans bli-5 promoter (C. elegans rescue) or the C. briggsae bli-5 under the control of the C. briggsae bli-5 promoter (C. briggsae rescue). I examined adult animals and assessed the proportion with blistered cuticles; the results were combined across lines, with a minimum of 3 lines. Error bars represent the standard error on the binomial proportion. D. Similar data to panel C but instead showing rescue of the bli-4(e937) allele with analogous C. elegans and C. briggsae bli-4 constructs. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 78

100% Emb Wt

✵ ✝

Cbr-cyp-32B1

✶ ✷

F53B1.4 ✸

✿ ❀

CBG14085

✣ ✤ ✥✦✧★✩✦ ✪✫✬✭

✏ ✎ Gro ✍ CBG22228 Y71G12B.6 Gro C01F1.3

Wt

✂ ✜

Wt CBG03656 ✄

✚ ✆

CBG01773 ✙

✘ ✗ Wt Gro ✖ Lvl Wt

Ste Wt

♠ 100% Emb Wt

♥ ❦

10% Emb

❝ ❜

40% Emb ②

❯ Wt

✈ ✩ ✦✪ ✥ 100% Emb

CBG20414

■ ❇

❛ CBG15138

▼ ◆

CBG10243

Wt

CBG23414 100% Emb

mel-28 100% Emb Figure 2.10: Comparison of phenotypes of individual members of multigene families in C. elegans and C. briggsae. I searched for putative related family members of genes that had weaker RNAi phenotypes in C. briggsae by identifying any other C. briggsae genes that had a BLAST evalue cutoff of 10−5 or better, up to a maximum of 5 related proteins - larger families were excluded from our analysis due to their complexity. These genes and their C. elegans orthologs were aligned and a gene tree was constructed with the dnaml program in phylip. RNAi phenotypes are shown in bold at the tips of the tree; if no phenotype in indicated then the gene is wildtype. Abbreviations are as follows: Wt - wildtype, Gro - Growth defective, Emb - Embryonic Lethal, Lvl - Larval lethal. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 79 Proteases Transcription Factor Degradation Degradation Chromatin Transcription Factor Small Molecule Transport Metabolism Protein Synthesis RNA Synthesis DNA/Cell Cycle Degradation RNA Synthesis Transcription Factor Cell Architecture Cell Architecture Cell Architecture Cell Architecture Cell Architecture Protein Synthesis Protein Synthesis RNA Synthesis RNA Synthesis Transcription Factor Transcription Factor Cell Architecture Cell Architecture Unknown Unknown Cell Architecture Unknown Unknown Metabolism Unknown Unknown Signalling Metabolism Metabolism DNA/Cell Cycle Unknown Signalling Metabolism Unknown RNA Synthesis Metabolism Unknown TreeFam Class Functional ++++ -+ ++ ++ - - + - + + + + + - + + + + + + + + + + + + + + + - + + + + + + + + + + + + + + - + + + + - + + + + + + + + + + + + + + + + + + + + - + + - + + + + + + + + + + + + - + + - + + + + - - + + + + + + + + + + + - + + + + + + + + + + + + + + - + + + + + + + + - 113 114 47.30103 103.39794 114.79588 118.845098 137.845098 110.5228787 101.8239087 104.6690068 71.82390874 115.5228787 164.5228787 161.8573325 115.4259687 66.42596873 21.77815125 117.6532125 74.81291336 88.77815125 93.17609126 136.0511525 41.80163235 3 33 333 33 Inf 33 Inf 3 Inf 33 Inf 3 Inf 3 Inf 3 3 Inf 3 Inf 33 3 Inf 3 3 33 Inf 333 33 Inf 3 3 Inf 3 Inf 33 Inf 3 Inf 3 3 Inf 3 Inf 3 Inf 3333 Inf 3 Inf 3 Inf Inf Inf Gene(WBID)WBGene00000255 bli-5WBGene00000431 Gene(Common) ceh-6WBGene00000814 Ortholog csn-2WBGene00000817 csn-5WBGene00001086 Score Confidence dpy-27WBGene00001345 difference E-value fos-1 CBG21218 WBGene00001465 BLASTP CBG12531 flr-1 Synteny WBGene00001662 CBG22097 gop-3WBGene00002152 CBG19889 iars-1WBGene00003123 CBG20143 mag-1WBGene00003159 CBG09268 mcm-7WBGene00003209 mel-26 CBG05329 WBGene00003904 CBG21181 pabp-2WBGene00003912 CBG23754 pal-1WBGene00004197 CBG18690 prx-12WBGene00004198 CBG21868 prx-13WBGene00004201 CBG12451 prx-19 CBG03741 WBGene00004271 rab-7WBGene00004374 CBG17980 rme-2WBGene00004430 CBG17349 rpl-18WBGene00004450 CBG13114 rpl-36WBGene00004700 CBG06892 rsp-3WBGene00004705 CBG20853 rsp-8WBGene00004735 CBG03794 sbp-1WBGene00004786 CBG22372 sex-1WBGene00004857 CBG08350 sma-3WBGene00006647 CBG03563 tsr-1WBGene00007275 CBG13299 C03D6.1WBGene00009259 CBG13284 F29G6.3WBGene00009264 CBG17178 sac-1WBGene00009504 CBG16541 F37B12.1WBGene00009626 CBG03701 CBG20440 F42A8.1WBGene00010941 CBG17311 M176.2WBGene00011538 T06E6.1WBGene00012235 CBG03244 CBG24498 W04A4.6WBGene00012704 CBG00873 Y39C12A.1WBGene00012885 Y45F10D.4 CBG13395 WBGene00013585 CBG05554 cyp-42A1WBGene00014066 CBG13613 CBG06255 rev-1WBGene00014229 CBG22375 ZK1128.3WBGene00015146 abi-1WBGene00015298 CBG20323 C01F1.3WBGene00016169 C27F2.7WBGene00016323 CBG09871 CBG03293 C32E8.5WBGene00016442 C35D10.5WBGene00017358 CBG03656 CBG08994 F10E9.7 CBG18209 CBG12001 CBG17958 CBG16641

Table 2.1: Validation of the orthology assignments. To determine if there were additional possible alternative BLAST hits in either genome that could confound correct ortholog assignments, I examined the E-values of the next best hits in either genome. If the E-values of next best hits in either genome were greater than 1020 higher (i.e. worse match) than the best hits, I called this an unambiguous hit and gave this a confidence value of 3, indicating that these orthologs are the sole similar genes in either genome (BLASTP column in table). Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 80 Metabolism Small Molecule Transport Unknown Metabolism Unknown Unknown Protein Synthesis Protein Synthesis Unknown Unknown Unknown Metabolism Protein Synthesis NA Binding Unknown Cell Architecture Unknown Signalling RNA Synthesis Signalling Proteases Collagen Cell Architecture Transcription Factor Signalling Transcription Factor Cell Architecture Cell Architecture Cell Architecture Small Molecule Transport Unknown Protein Synthesis Unknown Metabolism Metabolism Unknown Metabolism Metabolism Chromatin Cell Architecture Transcription Factor Cell Architecture Neuro Unknown Unknown TreeFam Class Functional ---- +- +- +- + + - + + - + + - + + - + + - + + - + + - + + - + + - + + - + + - + + - + + - + + - + + - + + - + + - + + - - + - + + - - - - + + - + + + - - ++++ ++ ++ ++ + + + + + + + + + + - + + + + + + + + + + - + + + + + + - + + - + - + + + + + + + + + + 44 60 107 11.69897 133.30103 112.69897 106.30103 134.544068 107.3679768 27.79835464 151.1303338 96.60205999 102.4149733 86.18563658 76.52287875 178.7781513 13.36172784 10.82390874 19.47712125 33 3 3 3 3 3 Inf 3 3 33 3 3 33 3 Inf 3 3 22 Inf 22222 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 11 0 1 0 11 0 1 0 1 0 0 0 0 0 Gene(WBID)WBGene00017769 F25B4.6WBGene00017852 Gene(Common) F27C1.2WBGene00017916 Ortholog F29A7.6WBGene00017982 F32D1.2WBGene00018144 CBG09344 Score Confidence F37C4.4WBGene00018492 difference E-value CBG12308 F46E10.11WBGene00018793 BLASTP CBG19575 F54C4.1 Synteny WBGene00018961 CBG21867 F56D1.3WBGene00019126 CBG10738 CBG18960 F59E12.11WBGene00019400 K04G7.1WBGene00019455 CBG00465 K06H7.1WBGene00020149 CBG02407 CBG02460 T01D1.4WBGene00020705 T22H9.1WBGene00021365 CBG16609 smgl-2WBGene00021465 CBG09232 Y39G10AR.7WBGene00022027 CBG06970 vps-20WBGene00022117 CBG01378 Y71F9AL.12WBGene00022631 CBG03924 nekl-2WBGene00000079 CBG08291 adr-1WBGene00000156 CBG04229 apr-1WBGene00000254 CBG05144 bli-4WBGene00000675 col-101WBGene00001130 CBG12695 dyn-1WBGene00001824 CBG19440 hbl-1WBGene00001979 CBG08224 hmp-2WBGene00003651 nhr-61 CBG12593 CBG22542 WBGene00004194 prx-5WBGene00004855 CBG07725 sma-1WBGene00004951 CBG14758 spc-1WBGene00006914 CBG19745 vha-5WBGene00008166 CBG04240 saps-1WBGene00008670 CBG11123 F11A3.2WBGene00009880 CBG23326 F49C12.11WBGene00012803 CBG14139 Y43F4B.5WBGene00016020 CBG01592 sptl-1WBGene00017853 CBG02933 CBG19294 CBG21705 F27C1.3WBGene00021626 Y47D7A.14WBGene00022201 CBG03574 Y71H10B.1WBGene00001974 hmg-4WBGene00001980 CBG06999 CBG12311 CBG24582 hmr-1WBGene00003044 CBG02038 lir-1WBGene00003210 mel-28WBGene00006428 tag-49WBGene00016721 CBG09136 C46G7.1WBGene00021468 CBG07964 epg-2 CBG13028 CBG17984 CBG19991 CBG09471 CBG03878

Table 2.1: (cont). If the situation was more complex, I examined both tree-based (TreeFam column in table) and synteny-based methods (Synteny column in table) to re- solve true orthology - if both synteny and tree-based methods confirm the initial ortholog assignment, it has a confidence value of 2, if it is supported by only one line of evidence, it scores only 1. If I can find no support for the ortholog assignment from either synteny or tree-based approaches, I score this as a zero - I note that only 9 genes in total fell in this category. For the tree-based orthology, I used the precomputed data in TreeFam (Li et al., 2006). For synteny, I defined syntenic genes if the C. briggsae ortholog of either of the two upstream genes was found up to 2 genes adjacent to the C. elegans gene, or similarly for the downstream gene. Functional annotations are from the manual annotation in Kamath et al. (2003). Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 81 44444 2 24 2 RNASynthesis 2 Signalling 4 2 Proteases Degradation 4 2 CellArchitecture 4 2 Signalling 4 2 TranscriptionFactor 44 2 Degradation 4 2 TranscriptionFactor 4 2 CellArchitecture 4 24 2 ProteinSynthesis 4 CellArchitecture 2 Neuro 4 24 2 Unknown 4 2 ProteinSynthesis 4 CellArchitecture 2 Unknown 4 24 2 Unknown 4 2 Metabolism 4 DNA/CellCycle 2 Metabolism 4 24 2 Unknown 4 2 Unknown 4 Metabolism 2 Unknown 4 24 2 ProteinSynthesis 2 Unknown 4 Unknown 2Binding NA 4 2 Unknown 2 Metabolism 1 Unknown TranscriptionFactor 4 2 Unknown Primary SecondaryClass Functional phenotype C.briggsae C.briggsae Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Red Wt Wt Wt Wt Wt Wt Wt Wt Wt Unc Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt phenotype C.elegans Ste;Lvl Unc;Bmd;Lvl Bli 50-80%_Emb;Gro 100%_Emb;Unc;Mlt;Clr Ste;Rup;Pvl100%_Emb;Unc;Dpy;Bmd 10%_Emb;Unc;Bmd;Dpy10%_Emb;Lvl;Unc;Mlt 90%_Emb;Gro;Dpy100%_Emb 10%_Emb100.00%_Emb20-40%_Emb;Gro;Pch;Sma Gro GroSte;Sck Gro;DpySte;Sck Sma Ste;Sck 100%_Emb 4 GroGro 20-40%_Emb;Gro Gro 50-80%_Emb;Lva;Dpy;Bmd;uniq 2 420-40%_Emb;Gro;Clr;Thn;Pch 4 Gro90%_Emb;Gro;Unc;Bmd CellArchitecture 20-40%_Emb;WeakGro Gro;Lvl 100%_Emb 50%_Emb 2 4Gro;Lvl;Unc 2100%_Emb;1-May;Gro 4Gro RNASynthesis TranscriptionFactor Ste;Gro;Sck 20-40%_Emb;Gro;Lvl;Bmd;Unc 2Gro;Stp 2 20%_Emb20-40%_Emb;Gro Metabolism 4 410%_Emb;Gro CellArchitecture 100%_Emb 100%_Emb;Adl;Lvl;Pvl 100%_Emb;Unc;Lvl;Dpy 2 2100%_Emb;Lvl 20-40%_Emb 490%_Emb ProteinSynthesis CellArchitecture Gro;Unc;Prz20-40%_Emb;Ste Mlt;Dpy;Lvl 2Unc;Mlt 50-80%_Emb;Red;Gro 4 RNASynthesis WeakGro;Unc;Prz 2 Gro Mlt ProteinSynthesis 4 2 4 CellArchitecture 4 1 1 ProteinSynthesis Proteases 100%_Emb;1-May;Lva;Dpy;Sma;Un Gene(WBID)WBGene00000079 adr-1 WBGene00000156 apr-1 Gene(Common) WBGene00000255 bli-5 WBGene00000814 csn-2 WBGene00001130 dyn-1 WBGene00001345 fos-1 WBGene00001979 hmp-2 WBGene00001980 hmr-1 WBGene00003044 lir-1 WBGene00003123 mag-1 WBGene00003209 mel-26 WBGene00003210 mel-28 WBGene00003651 nhr-61 WBGene00004198 prx-13 WBGene00004430 rpl-18 WBGene00004450 rpl-36 WBGene00004857 sma-3 WBGene00006428 tag-49 WBGene00006647 tsr-1 WBGene00007275 C03D6.1 WBGene00008670 F11A3.2 WBGene00009264 sac-1 WBGene00009626 F42A8.1 WBGene00010941 M176.2 WBGene00012235 W04A4.6 WBGene00012803 Y43F4B.5 WBGene00014066 rev-1 WBGene00016020 sptl-1 WBGene00016323 C32E8.5 WBGene00016721 C46G7.1 WBGene00017358 F10E9.7 WBGene00017769 F25B4.6 WBGene00017853 F27C1.3 WBGene00018793 F54C4.1 WBGene00018961 F56D1.3 WBGene00019126 F59E12.11 WBGene00019455 K06H7.1 WBGene00021365 smgl-2 WBGene00021465 Y39G10AR.7 WBGene00021468 epg-2 WBGene00021626 Y47D7A.14 WBGene00022027 vps-20 WBGene00022117 Y71F9AL.12 WBGene00000254 bli-4 WBGene00000431 ceh-6 WBGene00002152 iars-1

Table 2.2: Genes identified by eye as having a different phenotype between C. elegans and C. briggsae. The ‘Primary’ column refers to the number of support- ing observations in the primary screen (out of 4) and the ‘Secondary’ column refers to the number of supporting observations in the secondary screen (out of 2). Functional annotations are from the manual annotation in Kamath et al. (2003). Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 82 Collagen 44444 14 14 1 RNASynthesis 1 TranscriptionFactor 3 1 CellArchitecture 3 1 RNASynthesis 1 TranscriptionFactor CellArchitecture 2 Unknown 3 2 Degradation Chromatin 33 233 CellArchitecture 2 2 2 Signalling 3 2 Metabolism 3 Metabolism 3 SmallMolecule Transport 3 2 2 2 Metabolism 1 ProteinSynthesis Metabolism 333 133 1 CellArchitecture 33 1 TranscriptionFactor 3 13 1 Unknown 3 1 Unknown 1 Unknown 1 Signalling 1 Metabolism 1 Metabolism Unknown Unknown Primary SecondaryClass Functional phenotype Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt Wt C.briggsae C.briggsae phenotype C.elegans 90%_Emb;Lva;Unc 100%_Emb Gro;Clr;Thn Gro 20-40%_Emb;mult;Dpy;Bmd mult;Bmd;Dpy Gro;Unc;Rup;Pvl Gro;Stp;Pch;Pvl50-80%_Emb Dpy;Egl 100%_Emb;GroSte100%_EmbGro;Bmd;Sma;Rol Weak50-80%_Emb;Unc;Clr;StpGro Gro50-80%_Emb;Bmd;Adl GroLvl;Unc;Pch 50-80%_Emb20-40%_Emb;Gro Him 50-80%_Emb;Gro;Stp 50%_Emb50-80%_Emb 450-80%_Emb;Gro;Thn;Stp GroUnc;Dpy 3Gro;Pch;Stp GroGro;Clr;Thn;Stp Gro 1Gro;Clr;Sma WeakGro 3Gro;Clr 2 3 Unknown 20-40%_Emb;mult;Pvl;Dpy 50%_Emb50-80%_Emb;Gro;Unc;Prz;Lvl100%_Emb Unknown 2 50-80%_Emb;GroGro;Clr 20-40%_Emb 2Gro 320-40%_Emb;Prz;Bmd;Dpy Metabolism 36-Oct;Gro;Clr;Sck RNASynthesis 3 310%_Emb;Gro;mult;Unc;Clr 3 Prz;Bmd;DpyGro;Clr 3 2Pvl;Rup;Egl Weak 3Gro 10%_Emb 2Unc;Dpy 50-80%_Emb 2 CellArchitecture 2Unc;Lvl;Clr 1Gro 2 Unknown Unc Metabolism WeakGro;Clr 1 Unknown Gro;Bmd Chromatin 3 Unknown Gro;Lvl;Prz TranscriptionFactor Unc;Lvl;Rol 3 3 3 1 3 1 CellArchitecture 1 1 SmallMolecule Transport DNA/CellCycle 1 SmallMolecule Transport Unc,Sck CellArchitecture 3 1 Signalling mcm-7 Gene(WBID)WBGene00003904 pabp-2 WBGene00003912 pal-1 Gene(Common) WBGene00004194 prx-5 WBGene00004705 rsp-8 WBGene00004786 sex-1 WBGene00004951 spc-1 WBGene00014229 ZK1128.3 WBGene00017916 F29A7.6 WBGene00000817 csn-5 WBGene00001086 dpy-27 WBGene00001662 gop-3 WBGene00004374 rme-2 WBGene00004700 rsp-3 WBGene00004855 sma-1 WBGene00008166 saps-1 WBGene00011538 T06E6.1 WBGene00015146 abi-1 WBGene00015298 C01F1.3 WBGene00016442 C35D10.5 WBGene00017852 F27C1.2 WBGene00017982 F32D1.2 WBGene00018144 F37C4.4 WBGene00019400 K04G7.1 WBGene00020149 T01D1.4 WBGene00020705 T22H9.1 WBGene00022201 Y71H10B.1 WBGene00000675 col-101 WBGene00001465 flr-1 WBGene00001824 hbl-1 WBGene00001974 hmg-4 WBGene00003159 WBGene00004197 prx-12 WBGene00004201 prx-19 WBGene00004271 rab-7 WBGene00004735 sbp-1 WBGene00006914 vha-5 WBGene00009259 F29G6.3 WBGene00009504 F37B12.1 WBGene00009880 F49C12.11 WBGene00012704 Y39C12A.1 WBGene00012885 Y45F10D.4 WBGene00013585 cyp-42A1 WBGene00016169 C27F2.7 WBGene00018492 F46E10.11 WBGene00022631 nekl-2

Table 2.2: (cont). Chapter 3

Evolution of essential functions in novel genes

This work is unpublished. I performed all the work associated with this chapter.

83 Chapter 3. Evolution of essential functions in novel genes 84

3.1 Abstract

New genes, which are known as Taxonomically Restricted Genes (TRGs), are being created at a high rate in diverse lineages. While many of these genes are quickly lost due to a lack of negative selection, a subset will be preserved, and some of those have detectable essential phenotypes when queried experimentally. Here I try to understand which TRGs acquire essential functions using the wide array of functional genomics data available in model organisms, particularly protein-protein physical interactions, genetic interactions and gene-gene co-expression measurements. I find that TRGs are more likely to have protein-protein or genetic interaction profile correlations to other TRGs, but when TRGs show interactions to older more essential genes there can be significant consequences. Specifically, I built a logistic regression model using a set of features which correspond to the co-expression profile each TRG has to different molecular pathways. This model was highly accurate in C. elegans, and an analysis of model features showed that co-expressions to ancient, essential pathways was most predictive of TRGs acquiring essential functions. This suggests that which pathways a novel gene joins determine whether it will acquire an indispensable role to the organism. Finally, I tested the role of TRGs in the prediction of drug response from genetic variation in expression levels and found that for some drugs knowledge of variation in TRGs is required for an accurate prediction of drug response.

3.2 Introduction

The publication of every new genome has biologists discovering a large suite of genes found nowhere in any other organisms’ genome, so called ‘orphan’ genes (Wilson et al., 2005). A better term is Taxonomically Restricted Gene (TRG) which refers to genes found only in a restricted part of the phylogenetic tree (Khalturin et al., 2009), since many genes will be incorrectly identified as orphan genes if no closely related genomes are Chapter 3. Evolution of essential functions in novel genes 85

sequenced. These novel genes can be created through a variety of different mechanisms such as de novo evolution from non-coding DNA (which will have no domains found anywhere on the tree of life), or by gene duplication (which will often contain an existing domain) (Toll-Riera et al., 2009). Studies from diverse lineages suggest that such novel genes are being created at high rates (Tautz and Domazet-Loˇso, 2011), and thus may be important for evolution. In order to be preserved by negative selection, novel genes must acquire a function within the organism’s biological processes. However, in the absence of sequence conser- vation we are often clueless to their function. How do TRGs acquire a function within organismal physiology and thus be preserved by natural selection? In a number of cases, experimental studies have been able to ascribe functions to such TRGs. For example, Sphynx has a function in courtship behaviour in D. melanogaster (Dai et al., 2008), BSC4 has a function in DNA damage in S. cerevisiae (Cai et al., 2008), and Hym301 has a function in tentacle development in H. oligactis (Khalturin et al., 2009). Furthermore, TRGs have been found to be associated with the evolution of lineage specific traits. For example, genes upregulated during infection in the wasp Nasonia vitripennis are highly enriched for TRGs (Sackton et al., 2013), as are genes involved in the transition to eusociality in honey bees (Johnson and Tsutsui, 2011). As a consequence of the genomics revolution, there have been many high-throughput efforts to look at gene function, particularly in model organisms. These include mea- surements of gene expression (Hibbs et al., 2007), protein-protein physical interactions (Gavin et al., 2002), and genetic interactions (Costanzo et al., 2010). These datasets contain a wealth of data on gene function, and while they do contain biases, they are much less biased than individual gene studies. Such data has been used in the past to assess TRG function: for example a putative role in DNA damage for the S. cerevisiae gene BSC4 because it is synthetic lethal with other DNA damage proteins (Cai et al., 2008). Such datasets could be interrogated in a much more thorough way to paint a more Chapter 3. Evolution of essential functions in novel genes 86

complete picture of TRG function. The questions that I seek to answer in this chapter concern the evolution of novel genes. Particularly, why do a subset of novel genes develop essential biological functions for the organism and can we predict which ones do? Furthermore, are there aspects of biology that novel genes are playing a significant part in? I will address these questions by examining TRG function using high-throughput datasets available in model organisms such as C. elegans and S. cerevisiae which measure different aspects of molecular gene function. Previous work has shown that functional genomics data are highly predictive of biological gene function (McGary et al., 2007; Lee et al., 2008), and I intend to leverage such data in order to understand how recently evolved genes acquire functions within a biological process. Here I will create a machine learning model to predict which TRGs will acquire an essential function. I show that co-expression to ancient, essential pathways is highly predictive of a TRG acquiring an essential function. These results suggest a model for how novel genes gain essential functions in important biological processes and are subsequently kept by evolution.

3.3 Results

In Chapter 2 of this thesis I found the youngest genes were the most likely to show different RNAi phenotypes between C. elegans and C. briggsae and thus likely had the highest rate of evolution of in vivo gene function (Figure 2.5). This caused me to become interested in what was determining if a TRG will acquire/change biological function, and I decided to pursue this by trying to explain it based on the functional genomics data available for the gene. Chapter 3. Evolution of essential functions in novel genes 87

3.3.1 The number of TRGs in the worm genome

In order to define TRGs in the worm genome I employed a method similar to the phylo- stratum approach (Domazet-Loˇso and Tautz, 2010). To do so I took the C. elegans pro- teome and used BLAST to find sequence similarity to genes in the NCBI non-redundant protein database. This is in contrast to what I did in Chapter 2 of this thesis where I used curated orthologs. The reason for this change comes for consistency with analysis I will do with S. cerevisiae (although the results of the analyses I will present are unaffected by this change). Using this method, I define 10469 TRGs out of 19983 genes queried, suggesting that there is a very high rate of ongoing novel gene evolution in Caenorhabditis - which is in line with my previous estimates. It is possible that some of these TRGs are actually artefacts of gene prediction, and are not actually recognized by the cellular machinery of the organism. In order to evaluate this I looked for evidence of transcription based on RNA-Seq data (data from Ramani et al. (2011)). I can detect 85% of TRG transcribed with at least one read in either L4 or young adult animals, implying that the majority of these genes are unlikely to be artefacts of gene annotation.

3.3.2 Novel genes preferentially form functional links with other

novel genes

The first question I assessed is if TRGs are more likely to have functional links with older more established genes or other younger genes. I tested this using protein-protein physical interaction data from S. cerevisiae since there is better coverage of TRGs and found that TRGs are more likely than expected by chance to show interactions to other TRGs (Figure 3.1A, P <10−110, χ2 test). I also looked at S. cerevisiae genetic interaction data; while I find no difference from expected in the number of TRG-TRG genetic interactions (P = 0.32, χ2 test) there is a statistical increase in the number of TRG-TRG links from Chapter 3. Evolution of essential functions in novel genes 88

expected using genetic interaction profile correlations as defined by Costanzo et al. (2010) (Figure 3.1A, P <0.001, χ2 test). Given this result, I tried to assess if TRGs preferentially appear in coherent network clusters by using a network clustering algorithm, (Markov Cluster Algorithm, Enright et al. (2002)), on the yeast protein-protein physical interactions and was able to find 2 clusters which are statistically enriched for TRGs (Figure 3.1B, P <0.05, Bonferonni corrected hypergeometric test). I looked at these clusters’ function using GO enrichment and found that the first cluster is enriched for terms like “retrotransposon nucleocapsid” (Table 3.1), while the other cluster is enriched for terms like “chromosome segregation” (Table 3.2). This suggests that there are functionally coherent pathways in which TRGs are preferentially being established, and that we can use this information to understand TRG function.

3.3.3 Essential TRGs are enriched for functional links

If a TRG is located in a network cluster enriched for chromosome segregation genes, then it is likely to also be involved in the chromosome segregation pathway. Clustering is not always robust, and thus I wanted to assess if TRGs with a specific function have any statistical properties which might explain how they acquired such a function. I am using essentiality in yeast and presence of a strong RNAi phenotype in worm (I will refer to both as essentiality) as a proxy for function - the label is artificial (Ramani et al., 2012), but it is still useful given that genes with strong RNAi phenotypes tend to have higher levels of negative selection on them (Cutter et al., 2003). As a first attempt I looked at the number of interactions that a TRG acquires after gene birth. When I look at protein-protein physical interactions from S. cerevisiae,I found that essential TRGs have more interactions than non-essential TRGs (Figure 3.2, P <5×10−29, Wilcoxon rank sum test). I also found the same result for worm protein- protein interactions (P <0.0002, Wilcoxon rank sum test). These results suggest that Chapter 3. Evolution of essential functions in novel genes 89

TRGs that interact with more genes are more likely to join an essential biological process of the organism.

3.3.4 Prediction of novel gene function based on co-expression

profiles

Given that essential TRGs have statistical explanatory variables, I next wanted to address how strong predictions are of gene function of a TRG from functional genomics datasets. From here I decided to focus on C. elegans co-expression because such datasets are less biased than the protein-protein physical interaction or genetic interaction datasets, and because there are many more TRGs in the C. elegans genome than in the S. cere- visiae genome. I created a set of co-expression measurements which are the average co-expression to a defined pathway of genes (Figure 3.3). For gene sets I used KEGG pathways as well as genes at different phylogenetic levels (for example, genes dating to the emergence of Opisthokonta). These features represent the co-expression profiles to different molecular pathways, and together they form a rough measurement of where each TRG functions within the organism. I tested how well these features can predict TRG essentiality using logistic regression regularized with an elastic net. I assessed prediction quality using area under the ROC curve (Figure 3.4A), which shows that overall I have very good predictions, with an area under of the curve of 0.85. Since area under the ROC curve can be inflated when there is a substantially unbalanced number of positive and negative examples, as in this data set, I also examined the precision recall curve, which shows high precision at the top of the recall (Figure 3.4B), indicating that our top predictions are highly enriched for positives. As a final prediction metric I used the number of positives in the top predictions, which is highly enriched over random genes; while random predictions have 1-2% accuracy, my top 100 predictions have an accuracy of over 30% (Figure 3.4C). The advantage of this metric is that it is highly intuitive, and furthermore, most experimental follow up would focus on validating Chapter 3. Evolution of essential functions in novel genes 90 the top predictions. These results suggest that using co-expression profiles to different molecular pathways it is possible to make high quality predictions as to whether a TRG will become essential or not. From here, I wanted to assess which co-expressions tell us the most about TRG function. I approached this with a feature importance analysis, using top 100 fraction positives as my prediction metric. I found that features corresponding to pathways with a high percentage of essential genes are the most important features for prediction (Figure 3.5, r = 0.53). Many of the pathways scoring highly are core, ancient, essential cellular complexes such as the proteasome, and it is worth noting that co-expressions to genes dating to the emergence of Opisthokont score highly. These results suggest that when TRGs do develop functional links with older, essential genes, they are more likely themselves to become essential. Many of these features are highly correlated, and in order to deal with this I used principal components analysis and found that 1 principal component was explaining more than 50% of the variance in the dataset. Furthermore, unlike the untransformed features which tend to increase in prediction quality in a linear way (Figure 3.6A), this principal component was contributing substantially more to the predictions than any other prin- cipal component (Figure 3.6B). When I looked at which features were contributing to the dominant principal component, I found a positive correlation between the percentage of essential genes, and their contribution to this principal component (Figure 3.6C, r = 0.56), much like the positive correlation between the percentage of essential genes and their prediction accuracy I noticed above. Together these results suggest a model for which biological processes novel genes will take part in. If a novel gene has co-expressions to essential, ancient pathways then it is likely to become essential itself, while if it has co-expressions with more recently evolved, less essential pathways then it is likely to remain non-essential. Put simply, the functional links that a novel gene acquires correlate with which biological process it will join. Chapter 3. Evolution of essential functions in novel genes 91

3.3.5 Novel genes contribute to drug resistance predictions

If TRGs evolve quickly then perhaps an understanding of them will be useful in the genomic era. Previous work has shown that drug sensitivity in yeast recombinants is predictable from gene expression measurements for around 100 different drugs (Chen et al., 2009). I wanted to assess the contribution that TRGs are making with this sort of application. In order to do this I first made predictions using Non-TRGs, found the variance unexplained by this model and then built a second model using TRGs to see how much of the previously unexplained variance could now be accounted for, once the TRGs are considered. In more detail, I created a regression model by first selecting genes using LASSO regularization from all non-TRGs genes, and then calculating the statistical residuals, which represent the variation in the data which could not be explained by the first model. I then created another model selecting genes from only the TRGs and attempting to explain the remaining variation (the residuals). I found that I was still getting a significant fit between the predicted and measured values (Figure 3.7A). In order to show that these additional genes I identified can increase prediction quality, I created new models using the genes identified from the non-TRGs as well as those TRGs identified on the residuals, and found that in many cases there was an increase in predictive accuracy (Figure 3.7B, average increase is larger than 0, P <0.001, t-test). These results show that consideration of TRGs gives us additional information for predictions of drug resistance in yeast. For some specific drugs the TRGs are the main contributing genes to my predictions, for example wiskostatin which inhibits WASP protein activation of Arp2/3 dependant actin nucleation, though possibly with pleiotropic effects by decreasing total cellular ATP concentration (Guerriero and Weisz, 2007). For this drug the correlation between predicted and measured is 0.36 without the TRGs (Figure 3.7C), and raises to 0.86 after genes identified from the TRGs are added to the predictions (Figure 3.7D). Furthermore, many genes appear repeatedly in the models for different drugs such as MST27, which Chapter 3. Evolution of essential functions in novel genes 92

contributes to the predictions for 9 drugs. Its function is unstudied other than it is part of the DUP240 gene family which encode non-essential membrane proteins specific to Saccharomyces sensu stricto (Poirey et al., 2002). These results suggest that an understanding of the function of TRGs can aid in an understanding of drug resistance in yeast, and that further study of such genes will help us understand the biology behind such phenomenon.

3.4 Discussion

Every lineage has a large number of genes found nowhere else on the tree of life. In many cases they are associated with lineage specific biology, such as the transition to eusociality in honeybees (Johnson and Tsutsui, 2011), the evolution of tentacle development in Hydra (Khalturin et al., 2009) and the immune system in wasps (Sackton et al., 2013). Understanding what determines the function that these genes acquire will be critical to our understanding of lineage specific biology. Here I have attempted to address this question by predicting which novel genes gain an essential biological function by using functional interactions to different genes and pathways. Gene-gene co-expression measurements have previously been found to parse genes into functional modules (Eisen et al., 1998; Stuart et al., 2003). I was able to construct a set of co-expression measurements to different cellular pathways from published microarrays in C. elegans which can explain which TRGs acquire essential phenotypes. My finding that co-expressions to ancient, essential pathways predict TRGs with essential phenotypes is intuitive, since a gene with a function in a pathway which does an essential biological process would likely be essential itself. Unfortunately there are very few examples of how a novel gene can modify a gene module and become essential. One example is fog-2 in C. elegans, a novel F-box gene which recruits gld-1 to downregulate tra-2 to allow sperm production in hermaphrodite animals (Nayak et al., 2005). In this example, this Chapter 3. Evolution of essential functions in novel genes 93

novel gene is co-opted into a highly essential pathway (germ line development) by linking existing genes to produce lineage specific biology (hermaphroditism). If a novel gene gains an essential biological function, there are several possible ways to view this in terms of functional interaction networks. The first is the novel gene modifies network structure of an essential pathway in order to create new biology, and in the process becomes essential itself; this is the case with fog-2. Alternatively, it could join an essential pathway and optimize its function by allowing other genes to adapt to their specific role and thus making the novel gene essential. Finally, the novel gene could replace the function of an existing essential gene and then allow the established gene to fulfil other functions or be lost due to negative selection. In diverse lineages such as mouse, Drosophila, and Arabidopsis it has been found that there is a very high rate of gene emergence in the current evolutionary time period (Tautz and Domazet-Loˇso, 2011). Rather than reflect ongoing adaptive radiation, this is likely due to high rates of de novo gene creation, and that many of these genes are quickly lost in the absence of selection. Thus, being able to predict if a TRG acquires an essential function could also be predictive of TRGs being kept in the genome, since genes with stronger RNAi phenotypes experience stronger levels of negative selection (Cutter et al., 2003; Ramani et al., 2012). In this case, my finding that functional links to ancient, essential pathway can predict gene essentiality, may also predict which TRGs will be kept by evolution. I also found that essential TRGs tend to have more protein-protein interactions, which is consistent with what has been found for all genes in the yeast genome (Jeong et al., 2001). The result that yeast essentials tend to have more protein-protein physical interactions has been explained for reasons of network topology, but it also could be explained by the fact that many of the essential genes with a large number of physical interactions tend to be part of ancient, essential protein complexes such as the ribosome, proteasome or spliceosome (Zotenko et al., 2008). TRGs are not ancient members of Chapter 3. Evolution of essential functions in novel genes 94

highly conserved complexes, but it could be that they are interacting with a number of genes in such complexes, given my finding that co-expressions to ancient essential pathways can predict essentiality. One concrete possibility is that essential TRGs could be integrating into the kineto- chore, which could be part of the establishment of worm specific biology, since nematode worms have holocentric chromosomes (Cheeseman et al., 2004). The kinetochore has been well studied in yeast, it contains over 70 proteins representing at least 14 multi- protein complexes (McAinsh et al., 2003), and furthermore these proteins are diverging very rapidly (Meraldi et al., 2006). In C. elegans, many of the genes in the kinetochore are taxonomically restricted such as kbp-1, kbp-2 and knl-3 and the establishment of this trait from a monocentric ancestor could have depended on the evolution of these novel genes. Many of the features identified as being most predictive for understanding TRG essentiality are in some way related to DNA and chromosome segregation, for exam- ple purine metabolism and pyrimidine metabolism in Figure 3.5. Identification of these pathways could reflect essential TRGs being established in the chromosome segregation pathway, as a result of the evolution of holocentric chromosomes. Novel genes integrating into the kinetochore could be important in other lineages, since the evolution of holo- centric chromosomes has occurred repeatedly throughout evolution; they occur in 768 extant species, which represents at least 13 different evolutionary transitions (Melters et al., 2012). The finding that TRGs are enriched for functional links with other TRGs and that there are network clusters which are highly enriched for TRG-TRG links suggests that the evolution of novel genes does not always simply modulate existing gene groups, more that novel gene groups are continuously being created. Specifically the finding of a yeast PPI cluster enriched for chromosome segregation genes, along with the evolution of holocentric chromosomes in many different lineages, suggest that the centromere could be an ongoing site where novel genes are integrating. In support of this, it was found Chapter 3. Evolution of essential functions in novel genes 95 that many S. pombe kinetochore proteins do not have an ortholog in S. cerevisiae (Liu et al., 2005). One possible explanation for the rapid evolution of the kinetochore is that centromeres are thought to be selfish drivers, and that they are subject to ‘arms races’ much like host parasite conflicts (Henikoff and Malik, 2002). Centromeres are bound by a special H3- like histone, which is undergoing rapid adaptive evolution in different lineages, possibly to combat the meiotic drive of centromeres (Talbert et al., 2004). In this light then the cluster enriched for chromosome segregation genes I identified as being enriched for TRGs could be involved in such evolutionary ‘arms races’, and that the evolution of novel genes could be used as ‘ammunition’ in such a process. My finding that TRGs are important for explaining how genetic variation in gene expression can affect drug resistance in S. cerevisiae reflects the fact that an understand- ing of these genes will be important for many applications involving genetic variation in the future. An understanding of TRGs could be important for an understanding of human disease genes. It is true that human disease genes are under-enriched for TRGs (Domazet-Loso and Tautz, 2008), but this may contribute to a bias away from studying such diseases. Similarly in cancer cells, it may be that many essential TRGs are being upregulated in specific cancer cell lines, which would provide a druggable target. Since these genes do not have homology to the ancient, essential cellular machinery, if drugs were specific they would be unlikely to have serious side effects. In summary, I used co-expression profiles to try to understand what causes some TRGs to acquire essential phenotypes. I found that links with ancient essential path- ways is highly predictive of a novel gene acquiring an essential phenotype, and thus, becoming indispensable to the cell. These results shed insight onto how novel genes become integrated into specific biological processes. Chapter 3. Evolution of essential functions in novel genes 96

3.5 Methods

3.5.1 Finding Taxonomically restricted genes

In order to define TRGs in the worm genome I employed a method similar to the ‘phy- lostratum’ approach (Domazet-Loˇso and Tautz, 2010). To do so I took the C. elegans proteome (WS230) and used BLASTP to find genes with sequence similarity in the NCBI non-redundant protein database (downloaded on February 7th, 2013). Hits with an E-value <0.000001 which match to a minimum of 30% of the protein length were considered evidence of orthology. I downloaded a phylogenetic tree from the NCBI tax- onomy database (Sayers et al. (2009), downloaded on the 8th of January 2013) for the species which have genomes available and found the last common ancestor as the point of emergence of each gene. Genes which dated to Chromadorea, Caenorhabditis or C. elegans were considered taxonomically restricted for the purposes of this project. For S. cerevisiae I used the same method but considered genes which date to the emergence of Saccharomycetaceae, Saccharomyces, or S. cerevisiae. I defined an essential gene in C. elegans by whether or not it has a Non-Viable or Growth defective phenotype in Kamath et al. (2003), since these are the most robust phenotypes. In S. cerevisiae I defined essential genes by if they were identified by the Saccharomyces Genome Deletion Project (Giaever et al., 2002).

3.5.2 TRG functional data

Worm protein-protein interaction data were downloaded from Worm-Interactome 8 (Si- monis et al., 2009), which contains high-throughput yeast two hybrid interactions as well as low throughput literature curated interactions. In order to ensure that the gene was represented by the dataset, I restricted myself to genes with at least 1 interaction. Yeast protein-protein interaction data was affinity purification-mass spectrometry (AP-MS) data from the BioGRID database (Stark et al., 2006). Genetic interactions were taken Chapter 3. Evolution of essential functions in novel genes 97

from Costanzo et al. (2010), and genetic interaction profile correlations were defined as a correlation greater than 0.2 across the genetic interaction profile as used by Costanzo et al. (2010). In order to assess how many TRG-TRG network links occur compared to what is expected by chance I counted the number of interactions to genes at different phylogenetic levels (excluding self-self network links) and compared to the expected number of network links given the number of genes at each phylogenetic level, then assessed significance using a χ2 test. I clustered the network using Markov Clustering Algorithm (MCL) (Enright et al., 2002), with an inflation factor of 1.5. I tested all clusters with more than 30 members for enrichment in the number of TRGs using a hypergeometric test and corrected for multiple tests using the Bonferroni correction.

3.5.3 Essential TRG classification

For measuring co-expression profiles I used a curated and normalized set of microarray datasets from the WormSPELL database. For each TRG I calculated the average Pearson correlation coefficient across the microarray expression measurements to each other gene represented on the array. In order to create features which correspond to functional links with different cellular pathways I calculated co-expression scores as defined above for genes involved in different KEGG pathways, as well as genes in different phylogenetic groups. In order to improve the comparability of correlation coefficients between datasets I used Fishers z-transformation where r is the Pearson correlation coefficient (Hibbs et al., 2007). 1 1 + r Z = log( ) 2 1 − r

I averaged Z-transformed pathway correlation coefficient across the arrays to give a mea- Chapter 3. Evolution of essential functions in novel genes 98

surement for each TRG corresponding to the co-expression profile between the gene and the pathway of interest - this gave us a set of features for use in machine learning. In order to create a classifier I used logistic regression regularized with an elastic net (α = 0.5). Since I have an unbalanced class set I trained and evaluated the classification schemes by splitting the data using stratified cross validation. I scored the predictions using area under the ROC curve as well as precision recall curves, since this is an un- balanced class problem. As a final prediction metric I calculated the fraction of true positives in the top N predictions, with N varying in the set of 50,100,150,200.

3.5.4 Feature importance analysis

In order to assess which features were most important for classification accuracy I tested each feature individually in the logistic regression model using the fraction of true pos- itives in the top N predictions classification metric. Since these co-expression features display a substantial amount of internal correlation, I also used principal component analysis and then repeated the above analysis.

3.5.5 Drug resistance predictions

I download yeast recombinant inbred line expression data from Gene Expression Omnibus using GSE1990 (Brem and Kruglyak, 2005), and growth measurements in different drugs from Perlstein et al. (2007). In order to create a set of predictions for growth levels I selected a set of gene expression traits using the LASSO algorithm and used those in a linear regression model, and evaluated the fit using a Pearson correlation coefficient. Predictions of growth levels for different recombinants were calculated in 10 fold cross validation. In order to find TRGs which increase the predictive capacity past what is possible using Non-TRG, I calculated a fit using the method described above, then calculated the statistical residuals, then using LASSO on the residuals again to identify contributing Chapter 3. Evolution of essential functions in novel genes 99

TRGs. From these two sets of genes, I constructed a final classifier, and compared its results to the first classifier using only the Non-TRGs. Chapter 3. Evolution of essential functions in novel genes 100

A 300 s s 900 ion ion ct ct 200 era era nt nt 600 Expected Expected f I f I Observed Observed o o 100 300 Number Number 0 0 Protein-Protein Interaction Network Genetic Interaction Profile Network

YDR121W

B YPR037C YER159C-A YPR148C YDL124W YGR038C-A YIR012W YJR025C YPR158W-A YJL065C YER007C-A YDR394W YGR276C YGL133W YJL101C YOR304W YPL226W YER137C-A YPL257W-A YOR261C YDR321W YER156C YKR001C YER069W YBR089C-A YMR096W YLR421C YLR372W YBR233W-A

YOR026W YML022W YJR060W YFR044C YGL253W YGR188C YDR016C YDR316W-A YKR083C YGL061C YJR133W YNL054W-A YGL234W YNL108C YKR037C YDR081C YKL052C YNL241C YGR113W YDR320C-A

YKL138C-A YDR210C-C YDR532C YDR201W YJL170C YDR365W-A YFR004W YIR010W YMR039C YDR130C

YIL116W YLR157C-A YJR112W YGL093W YLR227W-AYJR026W YPL032C YKL089W YLR256W-A YMR255W YGR109W-A YOL061W YJR028W YOL069W YOL103W-A YPL233W YAL034W-A YER004W YOL054W YDR318W YPR158C-C YGR161C-C YLR381W YFR046C YIR036C YML040W YMR117C YDL106C YPR046WYGR179C YPR058W YPL018W YJL148W YMR051C YDR383C YER018C YPR163C YML045W-A YJR135CYBR211C YBR040W YLR315W YMR092C YBR107C YGR027W-A YDR098C-A YBL005W-AYDR170W-AYNL284C-B YHR201C YHR214C-C YPL192C YNL010W YDR368W YMR046C YNL284C-A YOR142W-A YDR161WYOR109W YDL003W TRG YER119C Non-TRG Figure 3.1: TRG-TRG functional links are enriched over random. A. Observed and expected by chance number of TRG-TRG functional links for protein-protein interactions and genetic interaction profile correlations. Protein-protein interaction network is AP-MS data from BioGRID and genetic inter- action profile network links are correlations between genetic interaction profiles greater than 0.2 as defined by Costanzo et al. (2010). TRGs here are defined as genes which are dated to Saccharomycetaceae, Saccharomyces, or S. cerevisiae. B. A network showing 2 clusters identified using markov clustering from the protein-protein inter- action data. TRG are colored in red, while other genes are colored in blue. Chapter 3. Evolution of essential functions in novel genes 101

S. cerevisiae TRGs C. elegans TRGs ✤ 60 ✤

500 ✤ 400 ✤

40 ✤ ✤

300 ✤

✤ ✤

200 ✤

✤ ✤

✤ ✤

✤ ✤

✤ 20

✤ ✤

✤ ✤

✤ ✤ ✤ ✤

✤ ✤ ✤ ✤

✤ ✤ ✤

✤ ✤

✤ ✤

✤ ✤

✤ ✤

100 ✤

✤ ✤

✤ ✤ ✤ ✤

✤ ✤

✤✤ ✤

✤ ✤

✤ ✤ ✤ ✤

✤ ✤ ✤

✤ ✤

✤ ✤ ✤

✤ ✤

✤ ✤✤ ✤ ✤ ✤ ✤ ✤

✤ ✤

✤✤

✤ ✤ ✤ ✤

✤ ✤ ✤

✤ ✤

✤ ✤ ✤

✤ ✤ ✤

✤ ✤ ✤ ✤ ✤ ✤ ✤ ✤ ✤

✤ ✤

✤ ✤

✤ ✤ ✤ ✤

✤ ✤ ✤

✤ ✤ ✤

✤ ✤ ✤ ✤

✤ ✤ ✤ ✤ ✤ ✤ ✤ ✤ ✤ ✤ ✤ ✤

✤ ✤ ✤

✤ ✤

✤ ✤ ✤ ✤ ✤

✤ ✤

✤ ✤

✤ ✤ ✤ ✤✤ ✤ ✤ ✤ ✤ ✤ ✤ ✤ ✤ ✤ ✤ ✤

✤ ✤ ✤ ✤ ✤ ✤

✤ ✤ ✤

✤ ✤ ✤

✤ ✤ ✤ ✤ ✤

✤ ✤ ✤ ✤ ✤

✤ ✤ ✤✤

✤ ✤ ✤

✤ ✤ ✤ ✤ ✤ ✤ ✤✤✤ ✤ ✤ ✤✤ ✤ ✤ ✤✤ ✤✤✤✤ ✤ ✤ ✤ ✤ ✤ ✤✤

✤ ✤ ✤ ✤ ✤

✤ ✤ ✤ ✤

✤ ✤ ✤ ✤ ✤

✤ ✤ ✤ ✤ ✤

✤ ✤ ✤ ✤ ✤✤ ✤ ✤ ✤✤✤ ✤ ✤ ✤ ✤ ✤ ✤✤ ✤ ✤ ✤✤ ✤✤ ✤ ✤✤ ✤ ✤ ✤ ✤ ✤✤✤ ✤ ✤ ✤✤✤✤ Protein-Protein Interaction Degree Interaction Protein-Protein Degree Interaction Protein-Protein 0 0 Non--Essential Essential Non-Essential Essential Figure 3.2: Boxplots of the number of protein-protein physical interactions for TRGs in S. cerevisiae and C. elegans. S. cerevisiae interactions are AP- MS data from BioGRID and the C. elegans interactions are from Worm Interactome 8. Both differences are statistically significant: P <5×10−29, Wilcoxon rank sum test for S. cerevisiae data and P <0.0002, Wilcoxon rank sum test for C. elegans data.

Chapter 3. Evolution of essential functions in novel genes 103

A B

1.00 0.4

0.75 0.3

0.50 0.2

0.25 Precision 0.1 True positive rate positive True 0.00 0.0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 False positive rate Recall

C 0.4

0.3

0.2

0.1 Fraction positives Fraction

0.0 All genes 50 100 150 200 Top N genes Figure 3.4: Metrics for the predictions of C. elegans TRG essentials. A. Receiver operator characteristic (ROC) curve. This graph shows the fraction of true positives recovered against the fraction of false positives recovered for a ranked list of predictions. Area under the curve is 0.85. B. A precision-recall curve. This graph shows precision (# True Positives / (#True Positives + #False Positives)) against recall (#True Positives / (#True Positives + #False Negatives)) measured along a ranked list of predictions. C. Percentage of positives in the top 50,100,150,200 predictions. Shown for comparison is the percentage of positives in all genes in our set. Predictions were generated in 5 fold cross validation.

Chapter 3. Evolution of essential functions in novel genes 106

A B Allgenes count count Residuals

C Pearson r D Difference in correlations Measured growth Measured Measured growth Measured

Predicted growth Predicted growth Figure 3.7: Consideration of TRGs increase the predictive accuracy of drug resistance profiles. Expression and growth in media containing drug were both mea- sured in a panel of yeast recombinants. Gene expression traits were selected for regression using LASSO and fits were measured using the Pearson correlation between predicted and measured drug resistance. Predicted drug resistance values for different recombi- nants were calculated in 10 fold cross validation. There are multiple measurements for each drug taken at different concentrations, and I reported the single best fit for each drug, which is a total of 94 unique drugs. A. Histogram showing the correlations between predicted and measured for models created using all available genes, and on the residuals from models creating using the non-TRGs. B. Difference in correlation between mod- els created by using the TRGs and non-TRGs and those using only the TRG. Positive means an increase in prediction quality. C. Best prediction for wiskostatin from only the non-TRGs. D. Best prediction for wiskostatin using non-TRGs as well as TRGs. Chapter 3. Evolution of essential functions in novel genes 107

Term Name Adjusted p-value Enrichment Factor Percentage of genes GO:0003723 RNA binding 3.97E-12 5.15 44.93 GO:0005737 cytoplasm 1.46E-08 2.01 81.16 GO:0000943 retrotransposon nucleocapsid 0.00E+00 24.28 43.48 GO:0032197 transposition, RNA-mediated 0.00E+00 24.00 43.48 Table 3.1: Gene ontology (GO) enrichment analysis for the first cluster identi- fied from yeast protein-protein physical interactions data. Shown are the GOID, the GO name, and the bonferonni adjusted p-value. The enrichment factor is the ratio of fraction of genes in this cluster with the GOID divided by the fraction of genes in the whole genome. The final metric I’ve included is the fraction of genes in this cluster which are annotated with this GOID. Chapter 3. Evolution of essential functions in novel genes 108

Term Name Adjusted p-value Enrichment Factor Percentage of genes GO:0005634 nucleus 6.36E-10 2.30 88.24 GO:0005816 spindle pole body 2.14E-04 11.61 17.65 GO:0000775 chromosome, centromeric region 0.00E+00 50.12 64.71 GO:0007049 cell cycle 0.00E+00 10.08 62.75 GO:0007067 mitosis 0.00E+00 21.17 56.86 GO:0051301 cell division 0.00E+00 15.27 58.82 GO:0007059 chromosome segregation 0.00E+00 37.09 50.98 GO:0000776 kinetochore 0.00E+00 62.77 62.75 GO:0000777 condensed chromosome kinetochore 0.00E+00 70.62 58.82 GO:0000818 nuclear MIS12/MIND complex 5.05E-05 94.16 7.84 GO:0005694 chromosome 0.00E+00 26.79 64.71 GO:0034501 protein localization to kinetochore 3.43E-03 47.08 7.84 GO:0005200 structural constituent of cytoskeleton 4.97E-12 29.73 23.53 GO:0005856 cytoskeleton 2.86E-07 9.62 27.45 GO:0007020 microtubule nucleation 2.28E-04 26.90 11.76 GO:0007126 meiosis 1.96E-05 10.36 21.57 GO:0005819 spindle 1.29E-11 34.52 21.57 GO:0005874 microtubule 3.63E-09 22.04 21.57 GO:0005876 spindle microtubule 5.98E-04 36.21 9.80 GO:0034087 establishment of mitotic sister chromatid cohesion 1.34E-05 40.35 11.76 GO:0000817 COMA complex 5.05E-05 94.16 7.84 GO:0008608 attachment of spindle microtubules to kinetochore 2.51E-04 75.33 7.84 GO:0008017 microtubule binding 0.00E+00 59.47 23.53 GO:0030472 mitotic spindle organization in nucleus 0.00E+00 47.08 21.57 GO:0031110 regulation of microtubule polymerization or depolymerization 0.00E+00 94.16 19.61 GO:0042729 DASH complex 0.00E+00 94.16 19.61 GO:0000778 condensed nuclear chromosome kinetochore 0.00E+00 52.03 41.18 GO:0000780 condensed nuclear chromosome, centromeric region 2.21E-05 37.66 11.76 GO:0008623 chromatin accessibility complex 5.05E-05 94.16 7.84 GO:0031262 Ndc80 complex 2.01E-02 70.62 5.88

Table 3.2: Gene ontology (GO) enrichment analysis for the second cluster identified from yeast protein-protein physical interactions data. Shown are the GOID, the GO name, and the bonferonni adjusted p-value. The enrichment factor is the ratio of fraction of genes in this cluster with the GOID divided by the fraction of genes in the whole genome. The final metric I’ve included is the fraction of genes in this cluster which are annotated with this GOID. Chapter 4

Discussion and concluding remarks

4.1 Summary

The main goal of this thesis has been to understand the causes and consequences of the evolution of gene function. I have addressed two small parts of this goal, which I will summarize and expand on in the following sections. In Chapter 2 I sought to find examples of orthologs with different biological functions between C. elegans and the related species C. briggsae by comparing RNAi phenotypes. I was able to detect 91 genes which had a different RNAi phenotype with stringent con- fidence metrics, although additional experiments with a lower false negative rate suggest the true number of genes with different RNAi phenotypes is significantly higher. I was also able to identify that transcription factors and recently evolved genes are overrep- resented among the genes with differing RNAi phenotypes. In order to assess if these genes also have different molecular functions I used cross species rescue experiments and found that in some cases such as bli-4 and bli-5, the two genes had conserved molecular functions despite having different RNAi phenotypes. However, I was also able to find a gene, sac-1, for which I found the molecular basis of the different RNAi phenotype. In this case changes in the promoter sequence yielded a different expression pattern. Given

109 Chapter 4. Discussion and concluding remarks 110 the high morphological and developmental similarity between C. elegans and C. briggsae I argue that these differences in gene function represent changes in pathway structure with no affect on the organism - a model known as Developmental System Drift. As a result of my finding that TRGs are showing high rates of evolution of their biological functions, I became interested in trying to understand how such genes acquire and change function after being born, the subject of Chapter 3. First I started looking at which types of genes TRGs tend to interact with. I found that TRGs preferentially interact with other TRGs and that there are network clusters enriched for TRGs. Next I focused on trying to understand what causes some TRGs to acquire an important bi- ological function; I used having a strong RNAi phenotype (essentiality) as a proxy for acquiring a specific function. I found that essential TRGs have an elevated number of total interactions, implying that pleiotropic TRGs are more likely to become essential. Next, I designed a set of features which measured the co-expression profile to different molecular pathways, and used these features to predict which C. elegans TRGs are es- sential. I found that pathways with high percentage of essential genes have the most predictive power, implying that TRGs that join essential biological processes are more likely to become essential themselves. Finally, I provided some evidence that TRGs are important for understanding personalized medicine applications, by using drug resistance in yeast as a model.

4.2 Turnover of genes within pathways

The model of Developmental System Drift argues that stabilizing selection has acted to preserve developmental phenotype despite divergent molecular pathways, and it is likely pervasive between C. elegans and C. briggsae because of how of morphologically and developmentally similar they are (Baird et al., 1992; Zhao et al., 2008). The results of the RNAi screen I did support the hypothesis that developmental pathways are significantly Chapter 4. Discussion and concluding remarks 111

diverged in these species, and I believe that in many cases there is no phenotype change associated with this. However, I cannot rule out that some of these differences of in vivo gene function correspond to some (adaptive or not) evolutionary change between these species that I have been unable to measure. It is possible that there are significant yet subtle developmental differences we have been unable to measure. It is likely that a subset of the 91 genes I identified can be explained by the DSD model and some other subset correspond to some unidentified phenotype change. In this thesis I found that bli-4 and bli-5 have different RNAi phenotypes between C. elegans and C. briggsae despite having conserved molecular functions as established by rescue assays. This likely means that there is an overall difference in the structure of the cuticle development pathways between these species. However, establishing the nature of this difference is quite difficult. There could be alterations in the functions of a number of different genes, or the differences I have observed here could stem from a change in the function of a single pleiotropic gene. One possibility would be to do follow up candidate gene experiments by comparing their RNAi phenotypes at a higher resolution or looking at some other aspect of gene function such as reporter expression. The collagen biosynthesis pathway has been well studied in vertebrates, which forms the basis of our understanding of the C. elegans cuticle development pathway, in which these bli genes act. Bli-4 is a protease which acts on the N-terminus of collagen helices (Peters et al., 1991), while bli-5 is a protease inhibitor which is hypothesized to act on bli-4 (Page et al., 2006). Many other genes are known or hypothesized to act in this pathway including Prolyl-4-hydroxylases (dpy-18 ), protein disulphide isomerases, and dual oxidases (bli-3 ) (Page and Winter, 2003; Page and Johnstone, 2007). Furthermore, there are many other mutants with cuticle phenotypes (dpy and bli), many of which are actual collagen genes (Page and Winter, 2003). A final possibility would be to look at the master regulators of the epidermal tissue (elt-1 and elt-3, Gilleard and McGhee (2001)) or a specific regulator of larval moulting (nhr-23, Kostrouchova et al. (2001)). All of Chapter 4. Discussion and concluding remarks 112

these genes could potentially have different functions between C. elegans and C. briggsae and experimentally following that up could provide insight into the difference of in vivo function of bli-4 and bli-5. If a given gene in a molecular pathway is required for development in one species, but dispensable in another, it raises the question of how these pathways have changed. Under DSD we might expect that the molecular activity of gene in question to have shifted to another gene capable of filling that role. In support of this, I was able to detect a number of cases where an ortholog pair had a phenotype in C. elegans but not in C. briggsae and that the missing phenotype had appeared in a related family member in C. briggsae. These cases could be examples of where the pathway requirement for a given biochemical activity had changed from one gene to another, but the output of the pathway remains conserved. Synthetic transcriptional circuits in E. coli support this possibility; it was found that some subnetworks have the same logical output despite having a different connectivity (Guet et al., 2002). In that sense these examples I identified in worms are pathway-level analogues of promoters changing their constituent parts without changing their transcriptional output, for example, the even-skipped stripe 2 enhancer. In this ex- ample, the numbers and locations of the binding sites for 4 different transcription factors are not conserved, but stabilizing selection has conserved the output of this molecular machine, the second stripe of eve expression is generated correctly (Ludwig et al., 1998). One possible model for a change in transcription factor binding site location could be turnover through a redundant intermediate, in which acquisition of multiple sites allow previously essential sites to be lost. A similar model is thought to occur in the transition from asgs being activated in a cells in C. albicans to asgs being repressed in α cells in S. cerevisiae (Tsong et al., 2003; Baker et al., 2012). For some of the genes I identified with different RNAi phenotypes in this thesis, I posit a similar mechanism in which an ancestral essential pathway has gone through an intermediate state with two redundant genes holding the same function (Figure 4.1). Chapter 4. Discussion and concluding remarks 113

To build support for this model it would be beneficial to identify redundant systems in intermediate species, such as C. remanei. The most productive line of experimental inquiry would be to take pairs of orthologs with apparent turnover (Figure 2.10) and to attempt double RNAi experiments to find instances of redundant gene function in inter- mediate species. Technical problems surrounding the false negative rate of double RNAi might be prevalent, especially in species such as C. remanei which has not undergone extensive testing of RNAi efficacy. It might be necessary to use the novel technology of TALENs or CRISPR/Cas9 to introduce deletions (Lo et al., 2013). Furthermore, in order to show that the circuitry I identified in C. elegans is ancestral, it would be necessary to test gene function by RNAi in an outgroup such as a member of the Japonica group (Kiontke et al., 2011). The major shortcoming of this line of inquiry is that intermediate species are not the same as ancestral species, and my model posits the existence of redundant states as a temporary state which can move to a single requirement state in a neutral way. Thus, it is not clear that a redundant state which existed in distant past would necessarily exist in organisms which are extant today. Consider that in Baker et al. (2012), intermediate species in which there was redundant regulation in a and α cells, only a small subset of the asgs were redundantly regulated.

4.3 Transversal of adaptive peaks

One model for evolutionary change that helps to illuminate the data I have presented here is that of a fitness landscape. In this model, the genotype space of the organism exists on the x-axis, and the corresponding fitness values are on the y-axis. It has been argued that often there can be multiple local maxima within an adaptive landscape which could prevent reaching the global optimum. The existence of multiple adaptive peaks is supported by the adaptation of φ6 phage to laboratory conditions after acquiring Chapter 4. Discussion and concluding remarks 114 deleterious mutations during population bottlenecks (Burch and Chao, 1999). In these experiments, some phage isolates did not return to their original fitness level, implying that they had reached a local optimum in the adaptive landscape. The fact that replicates of the same isolate arrived at the same local optimum argued that this was not due to insufficient time to acquire the correct mutations. It has been argued that traversing adaptive peaks could be accomplished through epistasis (Weinreich et al., 2005); this has been supported by an experimental study of the evolution of resistance to cefotaxime in E. coli by 5 amino acid changes in the TeM β-lactamase gene. Weinreich et al. (2006), found that while there is a single adaptive peak, there was epistasis in the transitions observed; the Met182Thr mutation reduced fitness when added to the wild-type gene, but increased resistance once the Gly238Ser replacement had occurred. This sort of epistasis is the ‘neutral networks’ hypothesized by Wagner (2008), in which phenotypically neutral mutations permit random exploration of a genotype space until a point where it is possible to climb to a higher point in the adaptive landscape. Another example would be the evolution of citrate metabolism during the experimental evolution of E. coli populations to laboratory environments which was contingent on an unknown mutation in the past (Blount et al., 2008). It is possible that pathway turnover could allow transversal of adaptive peaks. DSD could allow movement in a genotype space to a point where climbing to a higher point in the fitness landscape is possible (Figure 4.2). Under this model, when a gene has an essential cellular function, mutations which would overall increase the fitness of the organism by introducing a novel function to that gene are selected against due to being in a local optimum in the adaptive landscape. When the pathway reconfigures and the gene is no longer essential, it is free to evolve towards the higher fitness peak. In order to make such a model convincing, an example must be found. However, finding such an example would be difficult since small genome changes are required to measure a fitness landscape (which is impossible given the 20-100 million years C. Chapter 4. Discussion and concluding remarks 115

elegans and C. briggsae are separated). One high possibility would be to do experimental evolution after engineering redundancy into the genome of the organism and then test outcomes to experimental evolution - although this would be a high risk project.

4.4 Novel gene function at high resolution

One weakness of my work predicting the function of TRGs is the use of essentiality as a proxy for gene function in TRGs. In C. elegans the classification of genes as having a strong RNAi phenotype is an artificial cutoff (Ramani et al., 2012). While this can be justified because genes with stronger RNAi phenotypes tend to have higher levels of negative selection (Cutter et al., 2003; Ramani et al., 2012), for future work it might be helpful to try to predict quantitative RNAi phenotypes or levels of negative selection on TRGs based on the co-expression profiles to different pathways I designed in Chapter 3. Alternatively, it might be useful to define the biological function in a more specific way. It is very difficult to assign specific gene function to TRGs, since little is known about these genes. One possibility would be to apply high content microscopy to determine subcellular mutant phenotypes, for example, phenotypes relating to the microtubule spindle in S. cerevisiae (Vizeacoumar et al., 2010), or phenotypes in early development in C. elegans (S¨onnichsen et al., 2005). Another possibility would be to look at genes upregulated during a biological process in which we expect heavy involvement of TRGs, for example in response to pathogen infection (Pukkila-Worley et al., 2011). Detailed studies of expression or localization of a subset of the TRGs with an RNAi phenotype would yield further support for my results. For example, if some of these genes have a function at the kinetochore then this could be assessed by measuring kinetochore localization in vivo using GFP translational fusion experiments. Which genetic changes allowed for a TRG to be expressed at the kinetochore? Were these changes neutral or positively selected for? Chapter 4. Discussion and concluding remarks 116

The major difficulty with this approach is that it is challenging to trace the evolution of novel genes in C. elegans, since it does not have a sister species; the most closely related clade is that which contains C. briggsae (Kiontke et al., 2011). The evolution of novel genes in the human or fly lineages can be traced to specific mutations because it is possible to compare to very closely related species (Wang et al., 2002; Knowles and McLysaght, 2009) - this is impossible in C. elegans given the current phylogeny. An alternative could be to look at a closely related species pair such as C. briggsae and C. sp. 9.

4.5 How do novel genes change functional networks?

The best example of the evolution of a novel gene modifying an existing functional net- work is that of the emergence of fog-2 in C. elegans. In the ancestor, male germ cell fate was established by the secreted factor HER-1 binding to and inactivating TRA-2 to induce a signal transduction cascade to promote male germ cell development (Kuwabara and Kimble, 1995), while the RNA binding protein GLD-1 likely had an essential func- tion in oocyte differentiation (Jones et al., 1996). The emergence of FOG-2 introduced novel protein-protein interactions through its F-box domain, which allowed GLD-1 to acquire an RNA-protein interaction to tra-2 in females, which initiates male germ cell development and produces hermaphrodites (Nayak et al., 2005). In this example, a novel set of functional interactions link the established pathway of male sex cell differentiation to the GLD-1 regulator expressed in female germline development, to allow male germ cell differentiation in the female germline - the result is a hermaphrodite. This example shows how the evolution of a novel gene can join existing pathways to create new biology. How many of the essential TRGs in C. elegans are creating novel biology? I raised alternative possibilities in Chapter 3 - how many of these genes are optimizing an essential pathway or replacing the function of an essential gene? These questions are difficult to Chapter 4. Discussion and concluding remarks 117 answer. Here I wish to discuss a few approaches which could measure how the evolution of novel genes would change genetic networks. It has long been thought that the evolution of novel genes through gene duplication has allowed for protein-protein interaction network structure to diverge, since duplicates will rapidly lose their shared interactions (Wagner, 2001). It would be interesting to look at divergent gene duplicates which are now located in different network subgraphs and to see how often only one duplicate copy is essential. The evolution of co-expression profiles could be highly predictive here, since duplicate pairs with high co-expression are much more likely to retain shared protein-protein interactions (eg. Arabidopsis Interactome Mapping Consortium (2011)). Another potentially productive line of inquiry would be to take genes which have recently emerged in one species, and transgenically expressing them in the closely related species without the gene and then examining changes in the co-expression network with RNA-seq. This would reveal the degree to which expression levels are perturbed through the evolution of a novel gene, and would give a sense as to what extent the functional networks can change when novel genes are created. Transgenic expression has been previously used to show the effect of adding genes, for example adding Ago1 and Dcr1 to S. cerevisiae created an RNAi competent strain which suppressed Ty1 transcription (Drinnenberg et al., 2009). To this point I have focused my discussion on genetic interactions, protein-protein interactions and co-expression profiles, but I have avoided metabolic reaction networks. I will briefly discuss them here since they have tremendous power in systems biology. Using the current state of knowledge of metabolic reactions, a mathematical model called Flux Balance Analysis attempts to solve a steady state of fluxes between different metabolites to build an in silico model of the metabolism of the cell. This model is capable of providing ∼80% accurate predictions of which yeast genes are essential (F¨orster et al., 2003; Famili et al., 2003). Chapter 4. Discussion and concluding remarks 118

How could a novel gene modify an existing metabolic network? One way to assess this would be to look at the profile of metabolites in TRG deletions using Mass Spectrometry as has been done to previous yeast gene deletions (Clasquin et al., 2011). Genes with significant change in the measured spectra may be novel parts of metabolism. Alterna- tively, previously characterized metabolic genes which have evolved relatively recently could be studied in detail within the help of an in silico model - how much does their removal affect the cell compared to more ancient proteins?

4.6 Overall Significance

In this thesis I have tried to assess how gene function changes through evolutionary time. This problem is important for our efforts to understand what makes one species different from another, since these differences are due to changes in the genes encoded in the genome. While some of these changes can lead to adaptive evolution, I have been more interested in how changes in biological gene function as assessed by different loss of function phenotypes may not have any affect on the phenotype of the organism due to strong stabilizing selection to preserve the outputs of developmental pathways, a model known as Developmental System Drift. C. elegans and C. briggsae are a good model system for this, and I discovered several examples of orthologous genes which have altered gene function between these related species, and for a subset of such genes I was able to show the molecular basis behind such changes. My other major contribution was using the available functional genomics data to help understand why some recently evolved genes go onto acquire essential functions. I found that gene-gene co-expressions to specific molecular pathways are associated with acquiring an essential function, and thus suggests that many of the high-throughput datasets have the potential to illuminate the evolution of novel gene function. While I have only addressed a small part of these large questions in Caenorhabditis Chapter 4. Discussion and concluding remarks 119 nematodes, I hope that my work will help future investigators understand how genes and pathways can change between related species. Chapter 4. Discussion and concluding remarks 120

Essential cellular Essential cellular Essential cellular function function function Figure 4.1: A model for how pathways can turnover throughout evolution. In this model a protein has an essential biochemical activity that is required for organism survival. Turnover between one protein doing the essential biochemical activity and the other protein having that role occurs by transition through a redundant intermediate in which both proteins redundantly hold the essential function.

Bibliography

Agrawal, A. F. and Whitlock, M. C. (2012). Mutation load: the fitness of individuals in populations where deleterious alleles are abundant. Annual Review of Ecology, Evolution, and Systematics, 43:115–135.

Arabidopsis Interactome Mapping Consortium (2011). Evidence for network evolution in an Arabidopsis interactome map. Science, 333(6042):601–7.

Aury, J.-M., Jaillon, O., Duret, L., Noel, B., Jubin, C., Porcel, B. M., S´egurens, B., Daubin, V., Anthouard, V., Aiach, N., Arnaiz, O., Billaut, A., Beisson, J., Blanc, I., Bouhouche, K., Cˆamara, F., Duharcourt, S., Guigo, R., Gogendeau, D., Katinka, M., Keller, A.-M., Kissmehl, R., Klotz, C., Koll, F., Le Mou¨el, A., Lep`ere, G., Malinsky, S., Nowacki, M., Nowak, J. K., Plattner, H., Poulain, J., Ruiz, F., Serrano, V., Zagulski, M., Dessen, P., B´etermier, M., Weissenbach, J., Scarpelli, C., Sch¨achter, V., Sperling, L., Meyer, E., Cohen, J., and Wincker, P. (2006). Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature, 444(7116):171–8.

Baird, S. E., Sutherlin, M. E., and Emmons, S. W. (1992). Reproductive Isolation in (Nematoda: Secernentea); Mechanisms That Isolate Six Species of Three Genera. Evolution, 46(3):585–594.

Baker, C. R., Booth, L. N., Sorrells, T. R., and Johnson, A. D. (2012). Protein modular- ity, cooperative binding, and hybrid regulatory states underlie transcriptional network diversification. Cell, 151(1):80–95.

122 BIBLIOGRAPHY 123

Baldi, C., Cho, S., and Ellis, R. E. (2009). Mutations in two independent pathways are sufficient to create hermaphroditic nematodes. Science, 326(5955):1002–5.

Barbosa-Morais, N. L., Irimia, M., Pan, Q., Xiong, H. Y., Gueroussov, S., Lee, L. J., Slo- bodeniuc, V., Kutter, C., Watt, S., Colak, R., Kim, T., Misquitta-Ali, C. M., Wilson, M. D., Kim, P. M., Odom, D. T., Frey, B. J., and Blencowe, B. J. (2012). The evolution- ary landscape of alternative splicing in vertebrate species. Science, 338(6114):1587–93.

Barker, D. (1994). Copulatory plugs and paternity assurance in the nematode Carnorhab- ditis elegans. Animal Behavior, 48:147–156.

Barrick, J. E., Yu, D. S., Yoon, S. H., Jeong, H., Oh, T. K., Schneider, D., Lenski, R. E., and Kim, J. F. (2009). Genome evolution and adaptation in a long-term experiment with Escherichia coli. Nature, 461(7268):1243–7.

Barri`ere, A., Gordon, K. L., and Ruvinsky, I. (2011). Distinct Functional Con- straints Partition Sequence Conservation in a cis-Regulatory Element. PLoS Genetics, 7(6):e1002095.

Barri`ere, A., Gordon, K. L., and Ruvinsky, I. (2012). Coevolution within and between regulatory loci can preserve promoter function despite evolutionary rate acceleration. PLoS genetics, 8(9):e1002961.

Beadell, A. V., Liu, Q., Johnson, D. M., and Haag, E. S. (2011). Independent recruitments of a translational regulator in the evolution of self-fertile nematodes. Proceedings of the National Academy of Sciences of the United States of America, 108(49):1–6.

Begun, D. J., Lindfors, H. a., Kern, A. D., and Jones, C. D. (2007). Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade. Genetics, 176(2):1131–7. BIBLIOGRAPHY 124

Bell, M. A. (1987). Interacting evolutionary constraints in pelvic reduction of threespine ( Pisces , Gasterosteidae ). Biological Journal of the Linnean Society, 31:347–382.

Bendesky, A., Pitts, J., Rockman, M. V., Chen, W. C., Tan, M.-W., Kruglyak, L., and Bargmann, C. I. (2012). Long-range regulatory polymorphisms affecting a GABA re- ceptor constitute a quantitative trait locus (QTL) for social behavior in . PLoS genetics, 8(12):e1003157.

Bendesky, A., Tsunozaki, M., Rockman, M. V., Kruglyak, L., and Bargmann, C. I. (2011). Catecholamine receptor polymorphisms affect decision-making in C. elegans. Nature, 472(7343):313–318.

Berglund, A.-C., Sj¨olund, E., Ostlund, G., and Sonnhammer, E. L. L. (2008). InParanoid 6: eukaryotic ortholog clusters with inparalogs. Nucleic acids research, 36(Database issue):D263–6.

Bernstein, B. E., Birney, E., Dunham, I., Green, E. D., Gunter, C., and Snyder, M. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature, 489(7414):57–74.

Bloom, J. S., Ehrenreich, I. M., Loo, W. T., Lite, T.-L. V. o., and Kruglyak, L. (2013). Finding the sources of missing heritability in a yeast cross. Nature, 494(7436):234–7.

Blount, Z. D., Borland, C. Z., and Lenski, R. E. (2008). Historical contingency and the evolution of a key innovation in an experimental population of Escherichia coli. Proceedings of the National Academy of Sciences, 105(23):7899–906.

Borneman, A. R., Gianoulis, T. a., Zhang, Z. D., Yu, H., Rozowsky, J., Seringhaus, M. R., Wang, L. Y., Gerstein, M., and Snyder, M. (2007). Divergence of transcription factor binding sites across related yeast species. Science, 317(5839):815–9. BIBLIOGRAPHY 125

Bowers, J. E., Chapman, B. A., Rong, J., and Paterson, A. H. (2003). Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature, 422(6930):433–8.

Bowmaker, J. K. (1981). Visual pigments and colour vision in man and monkeys. Journal of the Royal Society of Medicine, 74(5):348–56.

Bradley, R. K., Li, X.-Y., Trapnell, C., Davidson, S., Pachter, L., Chu, H. C., Tonkin, L. A., Biggin, M. D., and Eisen, M. B. (2010). Binding site turnover produces pervasive quantitative changes in transcription factor binding between closely related Drosophila species. PLoS biology, 8(3):e1000343.

Brem, R. B. and Kruglyak, L. (2005). The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proceedings of the National Academy of Sciences, 102(5):1572–7.

Buck, A. H. and Blaxter, M. (2013). Functional diversification of Argonautes in nema- todes: an expanding universe. Biochemical Society transactions, 41(4):881–6.

Burch, C. L. and Chao, L. (1999). Evolution by small steps and rugged landscapes in the RNA virus phi6. Genetics, 151(3):921–7.

Cai, J., Zhao, R., Jiang, H., and Wang, W. (2008). De novo origination of a new protein- coding gene in Saccharomyces cerevisiae. Genetics, 179(1):487–96.

Carroll, S. B. (2008). Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell, 134(1):25–36.

Carvunis, A.-R., Rolland, T., Wapinski, I., Calderwood, M. a., Yildirim, M. a., Simo- nis, N., Charloteaux, B., Hidalgo, C. a., Barbette, J., Santhanam, B., Brar, G. a., Weissman, J. S., Regev, A., Thierry-Mieg, N., Cusick, M. E., and Vidal, M. (2012). Proto-genes and de novo gene birth. Nature, pages 1–5. BIBLIOGRAPHY 126

Chan, Y. F., Marks, M. E., Jones, F. C., Villarreal, G., Shapiro, M. D., Brady, S. D., Southwick, A. M., Absher, D. M., Grimwood, J., Schmutz, J., Myers, R. M., Petrov, D., J´onsson, B., Schluter, D., Bell, M. a., and Kingsley, D. M. (2010). Adaptive evolution of pelvic reduction in sticklebacks by recurrent deletion of a Pitx1 enhancer. Science, 327(5963):302–5.

Cheeseman, I. M., Niessen, S., Anderson, S., Hyndman, F., Yates, J. R., Oegema, K., and Desai, A. (2004). A conserved protein network controls assembly of the outer kinetochore and its ability to sustain tension. Genes & development, 18(18):2255–68.

Chen, B.-J., Causton, H. C., Mancenido, D., Goddard, N. L., Perlstein, E. O., and Pe’er, D. (2009). Harnessing gene expression to identify the genetic basis of drug resistance. Molecular systems biology, 5(310):310.

Chiaromonte, F., Weber, R. J., Roskin, K. M., Diekhans, M., Kent, W. J., and Haus- sler, D. (2003). The share of human genomic DNA under selection estimated from human-mouse genomic alignments. Cold Spring Harbor symposia on quantitative biol- ogy, 68:245–54.

Chun, S. and Fay, J. C. (2011). Evidence for hitchhiking of deleterious mutations within the human genome. PLoS genetics, 7(8):e1002240.

Clasquin, M. F., Melamud, E., Singer, A., Gooding, J. R., Xu, X., Dong, A., Cui, H., Campagna, S. R., Savchenko, A., Yakunin, A. F., Rabinowitz, J. D., and Caudy, A. A. (2011). Riboneogenesis in yeast. Cell, 145(6):969–80.

Claycomb, J. M., Batista, P. J., Pang, K. M., Gu, W., Vasale, J. J., van Wolfswinkel, J. C., Chaves, D. A., Shirayama, M., Mitani, S., Ketting, R. F., Conte, D., and Mello, C. C. (2009). The Argonaute CSR-1 and its 22G-RNA cofactors are required for holocentric chromosome segregation. Cell, 139(1):123–34. BIBLIOGRAPHY 127

Clifford, R., Lee, M. H., Nayak, S., Ohmachi, M., Giorgini, F., and Schedl, T. (2000). FOG-2, a novel F-box containing protein, associates with the GLD-1 RNA binding protein and directs male sex determination in the C. elegans hermaphrodite germline. Development, 127(24):5265–76.

Coen, E. S. and Meyerowitz, E. M. (1991). The war of the whorls: genetic interactions controlling flower development. Nature, 353(6339):31–7.

Cooper, T. F., Rozen, D. E., and Lenski, R. E. (2003). Parallel changes in gene expression after 20,000 generations of evolution in Escherichiacoli. Proceedings of the National Academy of Sciences, 100(3):1072–7.

Costanzo, M., Baryshnikova, A., Bellay, J., Kim, Y., Spear, E. D., Sevier, C. S., Ding, H., Koh, J. L. Y., Toufighi, K., Mostafavi, S., Prinz, J., St Onge, R. P., VanderSluis, B., Makhnevych, T., Vizeacoumar, F. J., Alizadeh, S., Bahr, S., Brost, R. L., Chen, Y., Cokol, M., Deshpande, R., Li, Z., Lin, Z.-Y., Liang, W., Marback, M., Paw, J., San Luis, B.-J., Shuteriqi, E., Tong, A. H. Y., van Dyk, N., Wallace, I. M., Whitney, J. A., Weirauch, M. T., Zhong, G., Zhu, H., Houry, W. A., Brudno, M., Ragibizadeh, S., Papp, B., P´al, C., Roth, F. P., Giaever, G., Nislow, C., Troyanskaya, O. G., Bussey, H., Bader, G. D., Gingras, A.-C., Morris, Q. D., Kim, P. M., Kaiser, C. A., Myers, C. L., Andrews, B. J., and Boone, C. (2010). The genetic landscape of a cell. Science, 327(5964):425–31.

Coyle, S. M., Huntingford, F. a., and Peichel, C. L. (2007). Parallel evolution of Pitx1 underlies pelvic reduction in Scottish threespine stickleback (Gasterosteus aculeatus). The Journal of heredity, 98(6):581–6.

Cutter, A. D. (2008). Divergence Times in Caenorhabditis and Drosophila Inferred from Direct Estimates of the Neutral Mutation Rate. Molecular Biology, 25(4):778–786.

Cutter, A. D., Payseur, B. A., Salcedo, T., Estes, A. M., Good, J. M., Wood, E., Hartl, T., BIBLIOGRAPHY 128

Maughan, H., Strempel, J., Wang, B., Bryan, A. C., and Dellos, M. (2003). Molecular correlates of genes exhibiting RNAi phenotypes in Caenorhabditis elegans. Genome research, 13(12):2651–7.

Cutter, A. D., Wasmuth, J. D., and Washington, N. L. (2008). Patterns of molecular evolution in Caenorhabditis preclude ancient origins of selfing. Genetics, 178(4):2093– 104.

Dai, H., Chen, Y., Chen, S., Mao, Q., Kennedy, D., Landback, P., Eyre-Walker, a., Du, W., and Long, M. (2008). The evolution of courtship behaviors through the origination of a new gene in Drosophila. Proceedings of the National Academy of Sciences, 105(21):7478–7483.

Dalzell, J. J., McVeigh, P., Warnock, N. D., Mitreva, M., Bird, D. M., Abad, P., Fleming, C. C., Day, T. A., Mousley, A., Marks, N. J., and Maule, A. G. (2011). RNAi Effector Diversity in Nematodes. PLoS neglected tropical diseases, 5(6):e1176.

Davies, A. G., Bettinger, J. C., Thiele, T. R., Judy, M. E., and McIntire, S. L. (2004). Natural variation in the npr-1 gene modifies ethanol responses of wild strains of C. elegans. Neuron, 42(5):731–43. de Bono, M. and Bargmann, C. I. (1998). Natural variation in a neuropeptide Y receptor homolog modifies social behavior and food response in C. elegans. Cell, 94(5):679–89.

Dehal, P. and Boore, J. L. (2005). Two rounds of whole genome duplication in the ancestral vertebrate. PLoS biology, 3(10):e314.

Demogines, A., Abraham, J., Choe, H., Farzan, M., and Sawyer, S. L. (2013). Dual host- virus arms races shape an essential housekeeping protein. PLoS biology, 11(5):e1001571.

Deng, C., Cheng, C.-H. C., Ye, H., He, X., and Chen, L. (2010). Evolution of an antifreeze BIBLIOGRAPHY 129

protein by neofunctionalization under escape from adaptive conflict. Proceedings of the National Academy of Sciences, 107(50):21593–8.

Denver, D. R., Morris, K., Lynch, M., and Thomas, W. K. (2004). High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genome. Nature, 430(7000):679–682.

Denver, D. R., Morris, K., Streelman, J. T., Kim, S. K., Lynch, M., and Thomas, W. K. (2005). The transcriptional consequences of mutation and natural selection in Caenorhabditis elegans. Nature genetics, 37(5):544–8.

Des Marais, D. L. and Rausher, M. D. (2008). Escape from adaptive conflict after duplication in an anthocyanin pathway gene. Nature, 454(7205):762–5.

Domazet-Loso, T. and Tautz, D. (2008). An ancient evolutionary origin of genes associ- ated with human genetic diseases. Molecular biology and evolution, 25(12):2699–707.

Domazet-Loˇso, T. and Tautz, D. (2010). A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature, 468(7325):815–818.

Drinnenberg, I. A., Weinberg, D. E., Xie, K. T., Mower, J. P., Wolfe, K. H., Fink, G. R., and Bartel, D. P. (2009). RNAi in budding yeast. Science (New York, N.Y.), 326(5952):544–50.

Dujon, B. (1996). The yeast genome project: what did we learn? Trends in genetics, 12(7):263–70.

Ehrenreich, I. M., Torabi, N., Jia, Y., Kent, J., Martis, S., Shapiro, J. a., Gresham, D., Caudy, A. a., and Kruglyak, L. (2010). Dissection of genetically complex traits with extremely large pools of yeast segregants. Nature, 464(7291):1039–42.

Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998). Cluster analysis BIBLIOGRAPHY 130

and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences, 95(25):14863–8.

Enard, W., Khaitovich, P., Klose, J., Z¨ollner, S., Heissig, F., Giavalisco, P., Nieselt- Struwe, K., Muchmore, E., Varki, A., Ravid, R., Doxiadis, G. M., Bontrop, R. E., and P¨a¨abo, S. (2002). Intra- and interspecific variation in primate gene expression patterns. Science, 296(5566):340–3.

Enright, a. J., Van Dongen, S., and Ouzounis, C. a. (2002). An efficient algorithm for large-scale detection of protein families. Nucleic acids research, 30(7):1575–84.

Euskirchen, G. and Snyder, M. (2004). A plethora of sites. Nature genetics, 36(4):325–6.

Famili, I., Forster, J., Nielsen, J., and Palsson, B. O. (2003). Saccharomyces cerevisiae phenotypes can be predicted by using constraint-based analysis of a genome-scale re- constructed metabolic network. Proceedings of the National Academy of Sciences of the United States of America, 100(23):13134–9.

F´elix, M.-A. (2007). Cryptic quantitative evolution of the vulva intercellular signaling network in Caenorhabditis. Current biology, 17(2):103–14.

F´elix, M.-A., Ashe, A., Piffaretti, J., Wu, G., Nuez, I., B´elicard, T., Jiang, Y., Zhao, G., Franz, C. J., Goldstein, L. D., Sanroman, M., Miska, E. A., and Wang, D. (2011). Nat- ural and experimental infection of Caenorhabditis nematodes by novel viruses related to nodaviruses. PLoS biology, 9(1):e1000586.

F´elix, M.-A. and Duveau, F. (2012). Population dynamics and habitat sharing of natural populations of Caenorhabditis elegans and C. briggsae. BMC biology, 10:59.

Fire, A., Xu, S., Montgomery, M. K., Kostas, S. A., Driver, S. E., and Mello, C. C. (1998). Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature, 391(6669):806–11. BIBLIOGRAPHY 131

Fontana, W. and Schuster, P. (1998). Continuity in evolution: on the nature of transi- tions. Science, 280(5368):1451–5.

Force, A., Lynch, M., Pickett, F. B., Amores, A., Yan, Y. L., and Postlethwait, J. (1999). Preservation of duplicate genes by complementary, degenerative mutations. Genetics, 151(4):1531–45.

F¨orster, J., Famili, I., Palsson, B. O., and Nielsen, J. (2003). Large-scale evaluation of in silico gene deletions in Saccharomyces cerevisiae. Omics : a journal of integrative biology, 7(2):193–202.

Fraser, a. G., Kamath, R. S., Zipperlen, P., Martinez-Campos, M., Sohrmann, M., and Ahringer, J. (2000). Functional genomic analysis of C. elegans chromosome I by sys- tematic RNA interference. Nature, 408(6810):325–30.

Galant, R. and Carroll, S. B. (2002). Evolution of a transcriptional repression domain in an insect Hox protein. Nature, 415(6874):910–3.

Gaudet, J. and Mango, S. E. (2002). Regulation of organogenesis by the Caenorhabditis elegans FoxA protein PHA-4. Science, 295(5556):821–825.

Gavin, A.-C., B¨osche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J. M., Michon, A.-M., Cruciat, C.-M., Remor, M., H¨ofert, C., Schelder, M., Brajenovic, M., Ruffner, H., Merino, A., Klein, K., Hudak, M., Dickson, D., Rudi, T., Gnau, V., Bauch, A., Bastuck, S., Huhse, B., Leutwein, C., Heurtier, M.- A., Copley, R. R., Edelmann, A., Querfurth, E., Rybin, V., Drewes, G., Raida, M., Bouwmeester, T., Bork, P., Seraphin, B., Kuster, B., Neubauer, G., and Superti-Furga, G. (2002). Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 415(6868):141–7.

Ghosh, R., Andersen, E. C., Shapiro, J. a., Gerke, J. P., and Kruglyak, L. (2012). BIBLIOGRAPHY 132

Natural variation in a chloride channel subunit confers avermectin resistance in C. elegans. Science, 335(6068):574–8.

Giaever, G., Chu, A. M., Ni, L., Connelly, C., Riles, L., V´eronneau, S., Dow, S., Lucau- Danila, A., Anderson, K., Andr´e, B., Arkin, A. P., Astromoff, A., El-Bakkoury, M., Bangham, R., Benito, R., Brachat, S., Campanaro, S., Curtiss, M., Davis, K., Deutschbauer, A., Entian, K.-D., Flaherty, P., Foury, F., Garfinkel, D. J., Gerstein, M., Gotte, D., G¨uldener, U., Hegemann, J. H., Hempel, S., Herman, Z., Jaramillo, D. F., Kelly, D. E., Kelly, S. L., K¨otter, P., LaBonte, D., Lamb, D. C., Lan, N., Liang, H., Liao, H., Liu, L., Luo, C., Lussier, M., Mao, R., Menard, P., Ooi, S. L., Revuelta, J. L., Roberts, C. J., Rose, M., Ross-Macdonald, P., Scherens, B., Schimmack, G., Shafer, B., Shoemaker, D. D., Sookhai-Mahadeo, S., Storms, R. K., Strathern, J. N., Valle, G., Voet, M., Volckaert, G., Wang, C.-y., Ward, T. R., Wilhelmy, J., Winzeler, E. A., Yang, Y., Yen, G., Youngman, E., Yu, K., Bussey, H., Boeke, J. D., Snyder, M., Philippsen, P., Davis, R. W., and Johnston, M. (2002). Functional profiling of the Saccharomyces cerevisiae genome. Nature, 418(6896):387–91.

Giles, N. (1983). The possible role of environmental calcium levels during the evolution of phenotypic diversity in Outer Hebridean populations of the Three-spined stickleback , Gasterosteus aculeatus. Journal of Zoology, 199(4):535–544.

Gilleard, J. S. and McGhee, J. D. (2001). Activation of hypodermal differentiation in the Caenorhabditis elegans embryo by GATA transcription factors ELT-1 and ELT-3. Molecular and cellular biology, 21(7):2533–44.

Goldstein, B., Frisse, L. M., and Thomas, W. K. (1998). Embryonic axis specification in nematodes: evolution of the first step in development. Current biology, 8(3):157–60.

Gompel, N., Prud’homme, B., Wittkopp, P. J., Kassner, V. a., and Carroll, S. B. (2005). BIBLIOGRAPHY 133

Chance caught on the wing: cis-regulatory evolution and the origin of pigment patterns in Drosophila. Nature, 433(7025):481–7.

G¨onczy, P., Echeverri, C., Oegema, K., Coulson, A., Jones, S. J., Copley, R. R., Duperon, J., Oegema, J., Brehm, M., Cassin, E., Hannak, E., Kirkham, M., Pichler, S., Flohrs, K., Goessen, A., Leidel, S., Alleaume, a. M., Martin, C., Ozl¨u, N., Bork, P., and Hyman, a. a. (2000). Functional genomic analysis of cell division in C. elegans using RNAi of genes on chromosome III. Nature, 408(6810):331–6.

Gould, S. J. (1989). Wonderful life: The burgess shale and the nature of history. W. W. Norton, New York.

Graur, D., Zheng, Y., Price, N., Azevedo, R. B. R., Zufall, R. A., and Elhaik, E. (2013). On the immortality of television sets: ”function” in the human genome according to the evolution-free gospel of ENCODE. Genome biology and evolution, 5(3):578–90.

Gray, J. M., Karow, D. S., Lu, H., Chang, A. J., Chang, J. S., Ellis, R. E., Marletta, M. a., and Bargmann, C. I. (2004). Oxygen sensation and social feeding mediated by a C. elegans guanylate cyclase homologue. Nature, 430(6997):317–22.

Grenier, J. K. and Carroll, S. B. (2000). Functional evolution of the Ultrabithorax protein. Proceedings of the National Academy of Sciences, 97(2):704–9.

Grenier, J. K., Garber, T. L., Warren, R., Whitington, P. M., and Carroll, S. (1997). Evolution of the entire arthropod Hox gene set predated the origin and radiation of the onychophoran/arthropod clade. Current biology, 7(8):547–53.

Grishok, A. (2005). RNAi mechanisms in Caenorhabditis elegans. FEBS letters, 579(26):5932–9.

Gross, J. B., Borowsky, R., and Tabin, C. J. (2009). A novel role for Mc1r in the par- BIBLIOGRAPHY 134

allel evolution of depigmentation in independent populations of the cavefish Astyanax mexicanus. PLoS genetics, 5(1):e1000326.

Grueninger, D., Treiber, N., Ziegler, M. O. P., Koetter, J. W. A., Schulze, M.-S., and Schulz, G. E. (2008). Designed protein-protein association. Science, 319(5860):206–9.

Guerriero, C. J. and Weisz, O. A. (2007). N-WASP inhibitor wiskostatin nonselectively perturbs membrane transport by decreasing cellular ATP levels. American journal of physiology. Cell physiology, 292(4):C1562–6.

Guet, C. C., Elowitz, M. B., Hsing, W., and Leibler, S. (2002). Combinatorial synthesis of genetic networks. Science (New York, N.Y.), 296(5572):1466–70.

Guo, Y., Lang, S., and Ellis, R. E. (2009). Independent recruitment of F box genes to regulate hermaphrodite development during nematode evolution. Current biology, 19(21):1853–60.

Haag, E. S. (2006). Compensatory vs. pseudocompensatory evolution in molecular and developmental interactions. Genetica, 129(1):45–55.

Hammond, S. M., Bernstein, E., Beach, D., and Hannon, G. J. (2000). An RNA- directed nuclease mediates post-transcriptional gene silencing in Drosophila cells. Na- ture, 404(6775):293–6.

Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A., and McKusick, V. A. (2005). Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic acids research, 33(Database issue):D514–7.

Hare, E. E., Peterson, B. K., Iyer, V. N., Meier, R., and Eisen, M. B. (2008). Sepsid even- skipped enhancers are functionally conserved in Drosophila despite lack of sequence conservation. PLoS genetics, 4(6):e1000106. BIBLIOGRAPHY 135

Harris, T. W., Antoshechkin, I., Bieri, T., Blasiar, D., Chan, J., Chen, W. J., De La Cruz, N., Davis, P., Duesbury, M., Fang, R., Fernandes, J., Han, M., Kishore, R., Lee, R., M¨uller, H.-M., Nakamura, C., Ozersky, P., Petcherski, A., Rangarajan, A., Rogers, A., Schindelman, G., Schwarz, E. M., Tuli, M. A., Van Auken, K., Wang, D., Wang, X., Williams, G., Yook, K., Durbin, R., Stein, L. D., Spieth, J., and Sternberg, P. W. (2010). WormBase: a comprehensive resource for nematode research. Nucleic acids research, 38(Database issue):D463–7.

Heinen, T. J. A. J., Staubach, F., H¨aming, D., and Tautz, D. (2009). Emergence of a new gene from an intergenic region. Current biology, 19(18):1527–31.

Henikoff, S. and Malik, H. S. (2002). Selfish drivers. Nature, 417(6886):227.

Hibbs, M. a., Hess, D. C., Myers, C. L., Huttenhower, C., Li, K., and Troyanskaya, O. G. (2007). Exploring the functional landscape of gene expression: directed search of large microarray compendia. Bioinformatics, 23(20):2692–9.

Hinas, A., Wright, A. J., and Hunter, C. P. (2012). SID-5 is an endosome-associated protein required for efficient systemic RNAi in C. elegans. Current biology, 22(20):1938– 43.

Hittinger, C. T. and Carroll, S. B. (2007). Gene duplication and the adaptive evolution of a classic genetic switch. Nature, 449(7163):677–81.

Hobert, O. (2002). PCR fusion-based approach to create reporter gene constructs for expression analysis in transgenic C. elegans. BioTechniques, 32(4):728–30.

Hoekstra, H. E. and Coyne, J. A. (2007). The locus of evolution: evo devo and the genetics of adaptation. Evolution; international journal of organic evolution, 61(5):995–1016.

Hoekstra, H. E., Hirschmann, R. J., Bundey, R. a., Insel, P. a., and Crossland, J. P. BIBLIOGRAPHY 136

(2006). A single amino acid mutation contributes to adaptive beach mouse color pat- tern. Science, 313(5783):101–4.

Horner, M. A., Quintin, S., Domeier, M. E., Kimble, J., Labouesse, M., and Mango, S. E. (1998). pha-4, an HNF-3 homolog, specifies pharyngeal organ identity in Caenorhab- ditis elegans. Genes & development, 12(13):1947–52.

Hoyos, E., Kim, K., Milloz, J., Barkoulas, M., P´enigault, J.-B., Munro, E., and F´elix, M.-A. (2011). Quantitative Variation in Autocrine Signaling and Pathway Crosstalk in the Caenorhabditis Vulval Network. Current biology, 21(7):527–538.

Hunt-Newbury, R., Viveiros, R., Johnsen, R., Mah, A., Anastas, D., Fang, L., Halfnight, E., Lee, D., Lin, J., Lorch, A., McKay, S., Okada, H. M., Pan, J., Schulz, A. K., Tu, D., Wong, K., Zhao, Z., Alexeyenko, A., Burglin, T., Sonnhammer, E., Schnabel, R., Jones, S. J., Marra, M. A., Baillie, D. L., and Moerman, D. G. (2007). High-throughput in vivo analysis of gene expression in Caenorhabditis elegans. PLoS biology, 5(9):e237.

Hwang, J. S., Ohyanagi, H., Hayakawa, S., Osato, N., Nishimiya-Fujisawa, C., Ikeo, K., David, C. N., Fujisawa, T., and Gojobori, T. (2007). The evolutionary emergence of cell type-specific genes inferred from the gene expression analysis of Hydra. Proceedings of the National Academy of Sciences, 104(37):14735–40.

Itsara, A., Wu, H., Smith, J. D., Nickerson, D. A., Romieu, I., London, S. J., and Eichler, E. E. (2010). De novo rates and selection of large copy number variation. Genome research, 20(11):1469–81.

Jeong, H., Mason, S. P., Barab´asi, a. L., and Oltvai, Z. N. (2001). Lethality and centrality in protein networks. Nature, 411(6833):41–2.

Jeong, S., Rebeiz, M., Andolfatto, P., Werner, T., True, J., and Carroll, S. B. (2008). The evolution of gene regulation underlies a morphological difference between two Drosophila sister species. Cell, 132(5):783–93. BIBLIOGRAPHY 137

Johnson, B. R. and Tsutsui, N. D. (2011). Taxonomically restricted genes are associated with the evolution of sociality in the honey bee. BMC genomics, 12:164.

Jones, A. R., Francis, R., and Schedl, T. (1996). GLD-1, a cytoplasmic protein essential for oocyte differentiation, shows stage- and sex-specific expression during Caenorhab- ditis elegans germline development. Developmental biology, 180(1):165–83.

Jose, A. M., Kim, Y. A., Leal-ekman, S., and Hunter, C. P. (2012). Conserved tyro- sine kinase promotes the import of silencing RNA into Caenorhabditis elegans cells. Proceedings of the National Academy of Sciences, 109(36):14520–14525.

Kaessmann, H. (2010). Origins, evolution, and phenotypic impact of new genes. Genome research, 20(10):1313–26.

Kalinka, A. T. and Tomancak, P. (2012). The evolution of early animal embryos: con- servation or divergence? Trends in ecology & evolution, 27(7):385–93.

Kalinka, A. T., Varga, K. M., Gerrard, D. T., Preibisch, S., Corcoran, D. L., Jarrells, J., Ohler, U., Bergman, C. M., and Tomancak, P. (2010). Gene expression divergence recapitulates the developmental hourglass model. Nature, 468(7325):811–814.

Kamath, R. and Martinez-Campos, M. (2001). Effectiveness of specific RNA-mediated in- terference through ingested double-stranded RNA in Caenorhabditis elegans. Genome biology, 2(1):1–10.

Kamath, R. S., Fraser, A. G., Dong, Y., Poulin, G., Durbin, R., Gotta, M., Kanapin, A., Le Bot, N., Moreno, S., Sohrmann, M., Welchman, D. P., Zipperlen, P., and Ahringer, J. (2003). Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature, 421(6920):231–7.

Kawano, T., Fujita, M., and Sakamoto, H. (2000). Unique and redundant functions of SR BIBLIOGRAPHY 138

proteins, a conserved family of splicing factors, in Caenorhabditis elegans development. Mechanisms of development, 95(1-2):67–76.

Keeling, P. J., Burger, G., Durnford, D. G., Lang, B. F., Lee, R. W., Pearlman, R. E., Roger, A. J., and Gray, M. W. (2005). The tree of eukaryotes. Trends in ecology & evolution, 20(12):670–6.

Kellis, M., Birren, B. W., and Lander, E. S. (2004). Proof and evolutionary anal- ysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature, 428(6983):617–24.

Khaitovich, P., Weiss, G., Lachmann, M., Hellmann, I., Enard, W., Muetzel, B., Wirkner, U., Ansorge, W., and P¨a¨abo, S. (2004). A neutral model of transcriptome evolution. PLoS biology, 2(5):E132.

Khalturin, K., Hemmrich, G., Fraune, S., Augustin, R., and Bosch, T. C. G. (2009). More than just orphans: are taxonomically-restricted genes important in evolution? Trends in genetics, 25(9):404–13.

Kimura, M. (1968). Evolutionary rate at the molecular level. Nature, 217(5129):624–6.

Kiontke, K. C., Felix, M.-A., Ailion, M., Rockman, M. V., Braendle, C., Penigault, J.-B., and Fitch, D. H. (2011). A phylogeny and molecular barcodes for Caenorhabditis, with numerous new species from rotting fruits. BMC evolutionary biology, 11(1):339.

Knowles, D. G. and McLysaght, A. (2009). Recent de novo origin of human protein- coding genes. Genome research, 19(10):1752–9.

Kostrouchova, M., Krause, M., Kostrouch, Z., and Rall, J. E. (2001). Nuclear hormone receptor CHR3 is a critical regulator of all four larval molts of the nematode Caenorhab- ditis elegans. Proceedings of the National Academy of Sciences, 98(13):7360–5. BIBLIOGRAPHY 139

Kulkarni, M. M. and Arnosti, D. N. (2003). Information display by transcriptional enhancers. Development, 130(26):6569–75.

Kutter, C., Brown, G. D., Gon¸calves, A., Wilson, M. D., Watt, S., Brazma, A., White, R. J., and Odom, D. T. (2011). Pol III binding in six mammals shows conservation among amino acid isotypes despite divergence among tRNA genes. Nature genetics, 43(10):948–55.

Kuwabara, P. E. and Kimble, J. (1995). A predicted membrane protein, TRA-2A, directs hermaphrodite development in Caenorhabditis elegans. Development, 121(9):2995– 3004.

Kuzniar, A., van Ham, R. C. H. J., Pongor, S., and Leunissen, J. a. M. (2008). The quest for orthologs: finding the corresponding gene across genomes. Trends in genetics, 24(11):539–51.

Lai, C. S., Fisher, S. E., Hurst, J. a., Vargha-Khadem, F., and Monaco, a. P. (2001). A forkhead-domain gene is mutated in a severe speech and language disorder. Nature, 413(6855):519–23.

Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., Heaford, A., Howland, J., Kann, L., Lehoczky, J., LeVine, R., McEwan, P., McKernan, K., Meldrim, J., Mesirov, J. P., Miranda, C., Morris, W., Naylor, J., Raymond, C., Rosetti, M., San- tos, R., Sheridan, A., Sougnez, C., Stange-Thomann, N., Stojanovic, N., Subramanian, A., Wyman, D., Rogers, J., Sulston, J., Ainscough, R., Beck, S., Bentley, D., Burton, J., Clee, C., Carter, N., Coulson, A., Deadman, R., Deloukas, P., Dunham, A., Dun- ham, I., Durbin, R., French, L., Grafham, D., Gregory, S., Hubbard, T., Humphray, S., Hunt, A., Jones, M., Lloyd, C., McMurray, A., Matthews, L., Mercer, S., Milne, S., Mullikin, J. C., Mungall, A., Plumb, R., Ross, M., Shownkeen, R., Sims, S., Waterston, BIBLIOGRAPHY 140

R. H., Wilson, R. K., Hillier, L. W., McPherson, J. D., Marra, M. A., Mardis, E. R., Fulton, L. A., Chinwalla, A. T., Pepin, K. H., Gish, W. R., Chissoe, S. L., Wendl, M. C., Delehaunty, K. D., Miner, T. L., Delehaunty, A., Kramer, J. B., Cook, L. L., Fulton, R. S., Johnson, D. L., Minx, P. J., Clifton, S. W., Hawkins, T., Branscomb, E., Predki, P., Richardson, P., Wenning, S., Slezak, T., Doggett, N., Cheng, J. F., Olsen, A., Lucas, S., Elkin, C., Uberbacher, E., Frazier, M., Gibbs, R. A., Muzny, D. M., Scherer, S. E., Bouck, J. B., Sodergren, E. J., Worley, K. C., Rives, C. M., Gorrell, J. H., Metzker, M. L., Naylor, S. L., Kucherlapati, R. S., Nelson, D. L., Weinstock, G. M., Sakaki, Y., Fujiyama, A., Hattori, M., Yada, T., Toyoda, A., Itoh, T., Kawa- goe, C., Watanabe, H., Totoki, Y., Taylor, T., Weissenbach, J., Heilig, R., Saurin, W., Artiguenave, F., Brottier, P., Bruls, T., Pelletier, E., Robert, C., Wincker, P., Smith, D. R., Doucette-Stamm, L., Rubenfield, M., Weinstock, K., Lee, H. M., Dubois, J., Rosenthal, A., Platzer, M., Nyakatura, G., Taudien, S., Rump, A., Yang, H., Yu, J., Wang, J., Huang, G., Gu, J., Hood, L., Rowen, L., Madan, A., Qin, S., Davis, R. W., Federspiel, N. A., Abola, A. P., Proctor, M. J., Myers, R. M., Schmutz, J., Dickson, M., Grimwood, J., Cox, D. R., Olson, M. V., Kaul, R., Shimizu, N., Kawasaki, K., Minoshima, S., Evans, G. A., Athanasiou, M., Schultz, R., Roe, B. A., Chen, F., Pan, H., Ramser, J., Lehrach, H., Reinhardt, R., McCombie, W. R., de la Bastide, M., Ded- hia, N., Bl¨ocker, H., Hornischer, K., Nordsiek, G., Agarwala, R., Aravind, L., Bailey, J. A., Bateman, A., Batzoglou, S., Birney, E., Bork, P., Brown, D. G., Burge, C. B., Cerutti, L., Chen, H. C., Church, D., Clamp, M., Copley, R. R., Doerks, T., Eddy, S. R., Eichler, E. E., Furey, T. S., Galagan, J., Gilbert, J. G., Harmon, C., Hayashizaki, Y., Haussler, D., Hermjakob, H., Hokamp, K., Jang, W., Johnson, L. S., Jones, T. A., Kasif, S., Kaspryzk, A., Kennedy, S., Kent, W. J., Kitts, P., Koonin, E. V., Korf, I., Kulp, D., Lancet, D., Lowe, T. M., McLysaght, A., Mikkelsen, T., Moran, J. V., Mulder, N., Pollara, V. J., Ponting, C. P., Schuler, G., Schultz, J., Slater, G., Smit, A. F., Stupka, E., Szustakowski, J., Thierry-Mieg, D., Thierry-Mieg, J., Wagner, L., BIBLIOGRAPHY 141

Wallis, J., Wheeler, R., Williams, A., Wolf, Y. I., Wolfe, K. H., Yang, S. P., Yeh, R. F., Collins, F., Guyer, M. S., Peterson, J., Felsenfeld, A., Wetterstrand, K. A., Patrinos, A., Morgan, M. J., de Jong, P., Catanese, J. J., Osoegawa, K., Shizuya, H., Choi, S., Chen, Y. J., and Szustakowki, J. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822):860–921.

Landry, C. R., Levy, E. D., and Michnick, S. W. (2009). Weak functional constraints on phosphoproteomes. Trends in genetics, 25(5):193–7.

Lang, G. I., Rice, D. P., Hickman, M. J., Sodergren, E., Weinstock, G. M., Botstein, D., and Desai, M. M. (2013). Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations. Nature, 500(7464):571–4.

Larkin, M. A., Blackshields, G., Brown, N. P., Chenna, R., McGettigan, P. A., McWilliam, H., Valentin, F., Wallace, I. M., Wilm, A., Lopez, R., Thompson, J. D., Gibson, T. J., and Higgins, D. G. (2007). Clustal W and Clustal X version 2.0. Bioin- formatics, 23(21):2947–8.

Lee, I., Lehner, B., Crombie, C., Wong, W., Fraser, A. G., and Marcotte, E. M. (2008). A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans. Nature genetics, 40(2):181–8.

Lee, W., Tillo, D., Bray, N., Morse, R. H., Davis, R. W., Hughes, T. R., and Nislow, C. (2007). A high-resolution atlas of nucleosome occupancy in yeast. Nature genetics, 39(10):1235–44.

Lenski, R. E. and Travisano, M. (1994). Dynamics of adaptation and diversification: a 10,000-generation experiment with bacterial populations. Proceedings of the National Academy of Sciences, 91(15):6808–14.

Levine, M. T., Jones, C. D., Kern, A. D., Lindfors, H. a., and Begun, D. J. (2006). Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently BIBLIOGRAPHY 142

X-linked and exhibit testis-biased expression. Proceedings of the National Academy of Sciences, 103(26):9935–9.

Levy, E. D., Landry, C. R., and Michnick, S. W. (2009). How perfect can protein interactomes be? Science signaling, 2(60):pe11.

Li, C.-Y., Zhang, Y., Wang, Z., Zhang, Y., Cao, C., Zhang, P.-W., Lu, S.-J., Li, X.-M., Yu, Q., Zheng, X., Du, Q., Uhl, G. R., Liu, Q.-R., and Wei, L. (2010). A human- specific de novo protein-coding gene associated with human brain functions. PLoS computational biology, 6(3):e1000734.

Li, H., Coghlan, A., Ruan, J., Coin, L. J., H´erich´e, J.-K., Osmotherly, L., Li, R., Liu, T., Zhang, Z., Bolund, L., Wong, G. K.-S., Zheng, W., Dehal, P., Wang, J., and Durbin, R. (2006). TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic acids research, 34(Database issue):D572–80.

Lin, K. T.-H., Broitman-Maduro, G., Hung, W. W. K., Cervantes, S., and Maduro, M. F. (2009). Knockdown of SKN-1 and the Wnt effector TCF/POP-1 reveals differences in endomesoderm specification in C. briggsae as compared with C. elegans. Developmental biology, 325(1):296–306.

Liu, Q., Stumpf, C., Thomas, C., Wickens, M., and Haag, E. S. (2012). Context- dependent function of a conserved translational regulatory module. Development, 1521(March):1509–1521.

Liu, X., McLeod, I., Anderson, S., Yates, J. R., and He, X. (2005). Molecular analysis of kinetochore architecture in fission yeast. The EMBO journal, 24(16):2919–30.

Lo, T.-W., Pickle, C. S., Lin, S., Ralston, E. J., Gurling, M., Schartner, C. M., Bian, Q., Doudna, J. A., and Meyer, B. J. (2013). Precise and heritable genome editing in evolutionarily diverse nematodes using TALENs and CRISPR/Cas9 to engineer insertions and deletions. Genetics, 195(2):331–48. BIBLIOGRAPHY 143

Loewe, L. and Cutter, A. D. (2008). On the potential for extinction by Muller’s ratchet in Caenorhabditis elegans. BMC evolutionary biology, 8:125.

Longman, D., McGarvey, T., McCracken, S., Johnstone, I. L., Blencowe, B. J., and C´aceres, J. F. (2001). Multiple interactions between SRm160 and SR family pro- teins in enhancer-dependent splicing and development of C. elegans. Current biology, 11(24):1923–33.

Ludwig, M. Z., Bergman, C., Patel, N. H., and Kreitman, M. (2000). Evidence for stabilizing selection in a eukaryotic enhancer element. Nature, 403(6769):564–7.

Ludwig, M. Z., Patel, N. H., and Kreitman, M. (1998). Functional analysis of eve stripe 2 enhancer evolution in Drosophila: rules governing conservation and change. Development, 125(5):949–58.

Lynch, M. (2000). The Evolutionary Fate and Consequences of Duplicate Genes. Science, 290(5494):1151–1155.

Lynch, M. (2010a). Evolution of the mutation rate. Trends in genetics, 26(8):345–352.

Lynch, M. (2010b). Rate, molecular spectrum, and consequences of human mutation. Proceedings of the National Academy of Sciences, 107(3):961–8.

Lynch, M., Sung, W., Morris, K., Coffey, N., Landry, C. R., Dopman, E. B., Dickinson, W. J., Okamoto, K., Kulkarni, S., Hartl, D. L., and Thomas, W. K. (2008). A genome- wide view of the spectrum of spontaneous mutations in yeast. Proceedings of the National Academy of Sciences, 105(27):9272–7.

Martone, R., Euskirchen, G., Bertone, P., Hartman, S., Royce, T. E., Luscombe, N. M., Rinn, J. L., Nelson, F. K., Miller, P., Gerstein, M., Weissman, S., and Snyder, M. (2003). Distribution of NF-kappaB-binding sites across human chromosome 22. Pro- ceedings of the National Academy of Sciences, 100(21):12247–52. BIBLIOGRAPHY 144

McAinsh, A. D., Tytell, J. D., and Sorger, P. K. (2003). Structure, function, and regu- lation of budding yeast kinetochores. Annual review of cell and developmental biology, 19:519–39.

McDonald, J. H. and Kreitman, M. (1991). Adaptive protein evolution at the Adh locus in Drosophila. Nature, 351(6328):652–4.

McGary, K. L., Lee, I., and Marcotte, E. M. (2007). Broad network-based predictabil- ity of Saccharomyces cerevisiae gene loss-of-function phenotypes. Genome biology, 8(12):R258.

McGrath, P. T., Rockman, M. V., Zimmer, M., Jang, H., Macosko, E. Z., Kruglyak, L., and Bargmann, C. I. (2009). Quantitative mapping of a digenic behavioral trait implicates globin variation in C. elegans sensory behaviors. Neuron, 61(5):692–9.

McGregor, A. P., Orgogozo, V., Delon, I., Zanet, J., Srinivasan, D. G., Payre, F., and Stern, D. L. (2007). Morphological evolution through multiple cis-regulatory mutations at a single gene. Nature, 448(7153):587–90.

Melters, D. P., Paliulis, L. V., Korf, I. F., and Chan, S. W. L. (2012). Holocentric chromo- somes: convergent evolution, meiotic adaptations, and genomic analysis. Chromosome research, 20(5):579–93.

Mena, M., Ambrose, B. A., Meeley, R. B., Briggs, S. P., Yanofsky, M. F., and Schmidt, R. J. (1996). Diversification of C-function activity in maize flower development. Sci- ence, 274(5292):1537–40.

Meraldi, P., McAinsh, A. D., Rheinbay, E., and Sorger, P. K. (2006). Phylogenetic and structural analysis of centromeric DNA and kinetochore proteins. Genome biology, 7(3):R23. BIBLIOGRAPHY 145

Moses, A. M., Liku, M. E., Li, J. J., and Durbin, R. (2007). Regulatory evolution in proteins by turnover and lineage-specific changes of cyclin-dependent kinase consensus sites. Proceedings of the National Academy of Sciences, 104(45):17713–8.

Nathans, J., Thomas, D., and Hogness, D. S. (1986). Molecular genetics of human color vision: the genes encoding blue, green, and red pigments. Science, 232(4747):193–202.

Nayak, S., Goree, J., and Schedl, T. (2005). fog-2 and the evolution of self-fertile hermaphroditism in Caenorhabditis. PLoS biology, 3(1):e6.

Nei, M., Suzuki, Y., and Nozawa, M. (2010). The neutral theory of molecular evolution in the genomic era. Annual review of genomics and human genetics, 11:265–89.

N´obrega, M. A., Zhu, Y., Plajzer-Frick, I., Afzal, V., and Rubin, E. M. (2004). Megabase deletions of gene deserts result in viable mice. Nature, 431(7011):988–93.

Nuez, I. and F´elix, M.-A. (2012). Evolution of Susceptibility to Ingested Double-Stranded RNAs in Caenorhabditis Nematodes. PLoS ONE, 7(1):e29811.

Ohno, S. (1970). Evolution by gene duplication. Springer-Verlag, New York.

Ossowski, S., Schneeberger, K., Lucas-Lled´o, J. I., Warthmann, N., Clark, R. M., Shaw, R. G., Weigel, D., and Lynch, M. (2010). The rate and molecular spectrum of sponta- neous mutations in Arabidopsis thaliana. Science, 327(5961):92–4.

Page, A. P. and Johnstone, I. L. (2007). The cuticle. WormBook, pages 1–15.

Page, A. P., McCormack, G., and Birnie, A. J. (2006). Biosynthesis and enzymology of the Caenorhabditis elegans cuticle: identification and characterization of a novel serine protease inhibitor. International journal for parasitology, 36(6):681–9.

Page, A. P. and Winter, A. D. (2003). Enzymes involved in the biogenesis of the nematode cuticle. Advances in parasitology, 53:85–148. BIBLIOGRAPHY 146

Palopoli, M. F., Rockman, M. V., TinMaung, A., Ramsay, C., Curwen, S., Aduna, A., Laurita, J., and Kruglyak, L. (2008). Molecular basis of the copulatory plug polymorphism in Caenorhabditis elegans. Nature, 454(7207):1019–22.

Pan, Q., Shai, O., Lee, L. J., Frey, B. J., and Blencowe, B. J. (2008). Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature genetics, 40(12):1413–5.

Paris, M., Kaplan, T., Li, X. Y., Villalta, J. E., Lott, S. E., and Eisen, M. B. (2013). Extensive divergence of transcription factor binding in Drosophila embryos with highly conserved gene expression. PLoS genetics, 9(9):e1003748.

Parker, S. C. J., Hansen, L., Abaan, H. O., Tullius, T. D., and Margulies, E. H. (2009). Local DNA topography correlates with functional noncoding regions of the human genome. Science, 324(5925):389–92.

P´enigault, J.-B. and F´elix, M.-A. (2011). Evolution of a system sensitive to stochastic noise: P3.p cell fate in Caenorhabditis. Developmental biology, 357(2):419–27.

Perlstein, E. O., Ruderfer, D. M., Roberts, D. C., Schreiber, S. L., and Kruglyak, L. (2007). Genetic basis of individual differences in the response to small-molecule drugs in yeast. Nature genetics, 39(4):496–502.

Peters, K., Mcdowall, J., and Rose, A. M. (1991). Mutations in the bli-4 (I) Locus of Caenorhabditis elegans Disrupt Both Adult Cuticle and Early Larval Development. Genetics, 129:95–102.

Pfaffl, M. W. (2001). A new mathematical model for relative quantification in real-time RT-PCR. Nucleic acids research, 29(9):e45.

Piasecka, B., Lichocki, P., Moretti, S., Bergmann, S., and Robinson-Rechavi, M. (2013). BIBLIOGRAPHY 147

The hourglass and the early conservation models–co-existing patterns of developmental constraints in vertebrates. PLoS genetics, 9(4):e1003476.

Poirey, R., Despons, L., Leh, V., Lafuente, M.-J., Potier, S., Souciet, J.-L., and Jauniaux, J.-C. (2002). Functional analysis of the Saccharomyces cerevisiae DUP240 multigene family reveals membrane-associated proteins that are not essential for cell viability. Microbiology, 148(Pt 7):2111–23.

Pollard, K. S., Salama, S. R., Lambert, N., Lambot, M.-A., Coppens, S., Pedersen, J. S., Katzman, S., King, B., Onodera, C., Siepel, A., Kern, A. D., Dehay, C., Igel, H., Ares, M., Vanderhaeghen, P., and Haussler, D. (2006). An RNA gene expressed during cortical development evolved rapidly in humans. Nature, 443(7108):167–72.

Ponting, C. P. and Hardison, R. C. (2011). What fraction of the human genome is functional? Genome research, 21(11):1769–76.

Prabhakar, S., Visel, A., Akiyama, J. a., Shoukry, M., Lewis, K. D., Holt, A., Plajzer- Frick, I., Morrison, H., Fitzpatrick, D. R., Afzal, V., Pennacchio, L. a., Rubin, E. M., and Noonan, J. P. (2008). Human-specific gain of function in a developmental enhancer. Science, 321(5894):1346–50.

Preuss, T. M., C´aceres, M., Oldham, M. C., and Geschwind, D. H. (2004). Human brain evolution: insights from microarrays. Nature reviews. Genetics, 5(11):850–60.

Pukkila-Worley, R., Ausubel, F. M., and Mylonakis, E. (2011). Candida albicans infec- tion of Caenorhabditis elegans induces antifungal immune defenses. PLoS pathogens, 7(6):e1002074.

Ramani, A., Chuluunbaatar, T., Verster, A., Na, H., Vu, V., Pelte, N., Wannissorn, N., Jiao, A., and Fraser, A. (2012). The Majority of Animal Genes Are Required for Wild-Type Fitness. Cell, 148(4):792–802. BIBLIOGRAPHY 148

Ramani, A. K., Calarco, J. a., Pan, Q., Mavandadi, S., Wang, Y., Nelson, A. C., Lee, L. J., Morris, Q., Blencowe, B. J., Zhen, M., and Fraser, A. G. (2011). Genome-wide analysis of alternative splicing in Caenorhabditis elegans. Genome research, 21(2):342– 8.

Rebeiz, M., Pool, J. E., Kassner, V. A., Aquadro, C. F., and Carroll, S. B. (2009). Stepwise modification of a modular enhancer underlies adaptation in a Drosophila population. Science, 326(5960):1663–7.

Reddy, K. C., Andersen, E. C., Kruglyak, L., and Kim, D. H. (2009). A polymorphism in npr-1 is a behavioral determinant of pathogen susceptibility in C. elegans. Science (New York, N.Y.), 323(5912):382–4.

Reimchen, T. E. (1980). Spine deficiency and polymorphism in a population of Gas- terosteus aculeatus : an adaptation to predators ? Canadian Journal of Zoology, 58(7):1232–1244.

Rockman, M. V. (2012). the Qtn Program and the Alleles That Matter for Evolution: All That’s Gold Does Not Glitter. Evolution, 66(1):1–17.

Ronshaugen, M., McGinnis, N., and McGinnis, W. (2002). Hox protein mutation and macroevolution of the insect body plan. Nature, 415(6874):914–7.

Rosenblum, E. B., Hoekstra, H. E., and Nachman, M. W. (2004). Adaptive reptile color variation and the evolution of the Mc1r gene. Evolution; international journal of organic evolution, 58(8):1794–808.

Sackton, T. B., Werren, J. H., and Clark, A. G. (2013). Characterizing the Infection-Induced Transcriptome of Nasonia vitripennis Reveals a Preponderance of Taxonomically-Restricted Immune Genes. PloS one, 8(12):e83984. BIBLIOGRAPHY 149

Sawyer, S. A., Parsch, J., Zhang, Z., and Hartl, D. L. (2007). Prevalence of positive selection among nearly neutral amino acid replacements in Drosophila. Proceedings of the National Academy of Sciences, 104(16):6504–10.

Sayers, E. W., Barrett, T., Benson, D. A., Bryant, S. H., Canese, K., Chetvernin, V., Church, D. M., DiCuccio, M., Edgar, R., Federhen, S., Feolo, M., Geer, L. Y., Helm- berg, W., Kapustin, Y., Landsman, D., Lipman, D. J., Madden, T. L., Maglott, D. R., Miller, V., Mizrachi, I., Ostell, J., Pruitt, K. D., Schuler, G. D., Sequeira, E., Sherry, S. T., Shumway, M., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusova, T. A., Wag- ner, L., Yaschenko, E., and Ye, J. (2009). Database resources of the National Center for Biotechnology Information. Nucleic acids research, 37(Database issue):D5–15.

Scannell, D. R., Byrne, K. P., Gordon, J. L., Wong, S., and Wolfe, K. H. (2006). Multiple rounds of speciation associated with reciprocal gene loss in polyploid yeasts. Nature, 440(7082):341–5.

Schmidt, D., Wilson, M. D., Ballester, B., Schwalie, P. C., Brown, G. D., Marshall, A., Kutter, C., Watt, S., Martinez-Jimenez, C. P., Mackay, S., Talianidis, I., Flicek, P., and Odom, D. T. (2010). Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science, 328(5981):1036–40.

Schmidt, R. J., Veit, B., Mandel, M. A., Mena, M., Hake, S., and Yanofsky, M. F. (1993). Identification and molecular characterization of ZAG1, the maize homolog of the Arabidopsis floral homeotic gene AGAMOUS. The Plant cell, 5(7):729–37.

Schuster, P. and Fontana, W. (1999). Chance and Necessity in Evolution : Lessons from RNA. Physica D: Nonlinear Phenomena, 133(1):427—-452.

Sebat, J., Lakshmi, B., Malhotra, D., Troge, J., Lese-Martin, C., Walsh, T., Yamrom, B., Yoon, S., Krasnitz, A., Kendall, J., Leotta, A., Pai, D., Zhang, R., Lee, Y.-H., Hicks, J., Spence, S. J., Lee, A. T., Puura, K., Lehtim¨aki, T., Ledbetter, D., Gregersen, P. K., BIBLIOGRAPHY 150

Bregman, J., Sutcliffe, J. S., Jobanputra, V., Chung, W., Warburton, D., King, M.-C., Skuse, D., Geschwind, D. H., Gilliam, T. C., Ye, K., and Wigler, M. (2007). Strong association of de novo copy number mutations with autism. Science, 316(5823):445–9.

Serber, Z. and Ferrell, J. E. (2007). Tuning bulk electrostatics to regulate protein func- tion. Cell, 128(3):441–4.

Shapiro, M. D., Marks, M. E., Peichel, C. L., Blackman, B. K., Nereng, K. S., J´onsson, B., Schluter, D., and Kingsley, D. M. (2004). Genetic and developmental basis of evolutionary pelvic reduction in threespine sticklebacks. Nature, 428(6984):717–23.

Shih, J. D. and Hunter, C. P. (2011). SID-1 is a dsRNA-selective dsRNA-gated channel. RNA, 17(6):1057–65.

Siepel, A., Bejerano, G., Pedersen, J. S., Hinrichs, A. S., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L. W., Richards, S., Weinstock, G. M., Wilson, R. K., Gibbs, R. A., Kent, W. J., Miller, W., and Haussler, D. (2005). Evolutionarily con- served elements in vertebrate, insect, worm, and yeast genomes. Genome research, 15(8):1034–50.

Simonis, N., Rual, J.-f., Carvunis, A.-r., Tasan, M., Lemmens, I., Hirozane-kishikawa, T., Hao, T., Sahalie, J. M., Venkatesan, K., Gebreab, F., Cevik, S., Klitgord, N., Fan, C., Braun, P., Li, N., Yildirim, M. A., Lin, C., Smet, A.-s. D., Kao, H.-l., Simon, C., Smolyar, A., Ahn, J. S., Tewari, M., Boxem, M., Milstein, S., Yu, H., Dreze, M., Vandenhaute, J., Gunsalus, K. C., Cusick, M. E., Hill, D. E., Tavernier, J., Roth, F. P., and Vidal, M. (2009). Empirically controlled mapping of the Caenorhabditis elegans protein-protein interactome network. Nature Methods, 6(1):1–7.

Skerker, J. M., Perchuk, B. S., Siryaporn, A., Lubin, E. A., Ashenberg, O., Goulian, M., and Laub, M. T. (2008). Rewiring the specificity of two-component signal transduction systems. Cell, 133(6):1043–54. BIBLIOGRAPHY 151

S¨onnichsen, B., Koski, L. B., Walsh, A., Marschall, P., Neumann, B., Brehm, M., Al- leaume, A.-M., Artelt, J., Bettencourt, P., Cassin, E., Hewitson, M., Holz, C., Khan, M., Lazik, S., Martin, C., Nitzsche, B., Ruer, M., Stamford, J., Winzi, M., Heinkel, R., R¨oder, M., Finell, J., H¨antsch, H., Jones, S. J. M., Jones, M., Piano, F., Gunsalus, K. C., Oegema, K., G¨onczy, P., Coulson, A., Hyman, a. a., and Echeverri, C. J. (2005). Full-genome RNAi profiling of early embryogenesis in Caenorhabditis elegans. Nature, 434(7032):462–9.

Stark, C., Breitkreutz, B.-J., Reguly, T., Boucher, L., Breitkreutz, A., and Tyers, M. (2006). BioGRID: a general repository for interaction datasets. Nucleic acids research, 34(Database issue):D535–9.

Stein, L. D., Bao, Z., Blasiar, D., Blumenthal, T., Brent, M. R., Chen, N., Chinwalla, A., Clarke, L., Clee, C., Coghlan, A., Coulson, A., D’Eustachio, P., Fitch, D. H. a., Fulton, L. a., Fulton, R. E., Griffiths-Jones, S., Harris, T. W., Hillier, L. W., Kamath, R., Kuwabara, P. E., Mardis, E. R., Marra, M. a., Miner, T. L., Minx, P., Mullikin, J. C., Plumb, R. W., Rogers, J., Schein, J. E., Sohrmann, M., Spieth, J., Stajich, J. E., Wei, C., Willey, D., Wilson, R. K., Durbin, R., and Waterston, R. H. (2003). The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics. PLoS biology, 1(2):E45.

Struhl, K. (2007). Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nature structural & molecular biology, 14(2):103–5.

Stuart, J. M., Segal, E., Koller, D., and Kim, S. K. (2003). A gene-coexpression network for global discovery of conserved genetic modules. Science, 302(5643):249–55.

Sucena, E., Delon, I., Jones, I., Payre, F., and Stern, D. L. (2003). Regulatory evolu- tion of shavenbaby/ovo underlies multiple cases of morphological parallelism. Nature, 424(6951):935–8. BIBLIOGRAPHY 152

Swanson, W. J. and Vacquier, V. D. (2002). The rapid evolution of reproductive proteins. Nature Reviews Genetics, 3(2):137–144.

Szeto, D. P., Rodriguez-Esteban, C., Ryan, a. K., O’Connell, S. M., Liu, F., Kioussi, C., Gleiberman, a. S., Izpis´ua-Belmonte, J. C., and Rosenfeld, M. G. (1999). Role of the Bicoid-related homeodomain factor Pitx1 in specifying hindlimb morphogenesis and pituitary development. Genes & development, 13(4):484–94.

Tabara, H., Sarkissian, M., Kelly, W. G., Fleenor, J., Grishok, A., Timmons, L., Fire, A., and Mello, C. C. (1999). The rde-1 gene, RNA interference, and transposon silencing in C. elegans. Cell, 99(2):123–32.

Talbert, P. B., Bryson, T. D., and Henikoff, S. (2004). Adaptive evolution of centromere proteins in plants and animals. Journal of biology, 3(4):18.

Tautz, D. and Domazet-Loˇso, T. (2011). The evolutionary origin of orphan genes. Nature reviews. Genetics, 12(10):692–702.

Tian, H., Schlager, B., Xiao, H., and Sommer, R. J. (2008). Wnt signaling induces vulva development in the nematode Pristionchus pacificus. Current biology, 18(2):142–6.

Timmons, L. and Fire, a. (1998). Specific interference by ingested dsRNA. Nature, 395(6705):854.

Tirosh, I., Reikhav, S., Levy, A. a., and Barkai, N. (2009). A yeast hybrid provides insight into the evolution of gene expression regulation. Science, 324(5927):659–62.

Tirosh, I., Sigal, N., and Barkai, N. (2010). Divergence of nucleosome positioning between two closely related yeast species: genetic basis and functional consequences. Molecular systems biology, 6(365):365.

Tirosh, I., Weinberger, A., Bezalel, D., Kaganovich, M., and Barkai, N. (2008). On BIBLIOGRAPHY 153

the relation between promoter divergence and gene expression evolution. Molecular systems biology, 4:159.

Tischler, J., Lehner, B., Chen, N., and Fraser, A. G. (2006). Combinatorial RNA inter- ference in Caenorhabditis elegans reveals that redundancy between gene duplicates can be maintained for more than 80 million years of evolution. Genome Biology, 7(8):1–13.

Toll-Riera, M., Bosch, N., Bellora, N., Castelo, R., Armengol, L., Estivill, X., and Alb`a, M. M. (2009). Origin of primate orphan genes: a comparative genomics approach. Molecular biology and evolution, 26(3):603–12.

True, J. R. and Haag, E. S. (2001). Developmental system drift and flexibility in evolu- tionary trajectories. Evolution & development, 3(2):109–19.

Tsankov, A. M., Thompson, D. A., Socha, A., Regev, A., and Rando, O. J. (2010). The role of nucleosome positioning in the evolution of gene regulation. PLoS biology, 8(7):e1000414.

Tsong, A. E., Miller, M. G., Raisner, R. M., Johnson, A. D., and Francisco, S. (2003). Evolution of a Combinatorial Transcriptional Circuit : A Case Study in Yeasts. Cell, 115:389–399.

Tsong, A. E., Tuch, B. B., Li, H., and Johnson, A. D. (2006). Evolution of alternative transcriptional circuits with identical logic. Nature, 443(7110):415–20.

Van Blokland, R., Van Der Geest, N., Mol, J. N. M., and Kooter, J. M. (1994). Transgene- mediated suppression of-chalcone synthase expression in Petunia hybrida results from an increase in RNA turnover. The Plant Journal, 6(6):861–877.

Vissers, L. E. L. M., van Ravenswaaij, C. M. A., Admiraal, R., Hurst, J. A., de Vries, B. B. A., Janssen, I. M., van der Vliet, W. A., Huys, E. H. L. P. G., de Jong, P. J., Hamel, B. C. J., Schoenmakers, E. F. P. M., Brunner, H. G., Veltman, J. A., and van BIBLIOGRAPHY 154

Kessel, A. G. (2004). Mutations in a new member of the chromodomain gene family cause CHARGE syndrome. Nature genetics, 36(9):955–7.

Vizeacoumar, F. J., van Dyk, N., S Vizeacoumar, F., Cheung, V., Li, J., Sydorskyy, Y., Case, N., Li, Z., Datti, A., Nislow, C., Raught, B., Zhang, Z., Frey, B., Bloom, K., Boone, C., and Andrews, B. J. (2010). Integrating high-throughput genetic interac- tion mapping and high-content screening to explore yeast spindle morphogenesis. The Journal of cell biology, 188(1):69–81.

Wagner, A. (2001). The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Molecular biology and evolution, 18(7):1283–92.

Wagner, A. (2008). Neutralism and selectionism: a network-based reconciliation. Nature reviews. Genetics, 9(12):965–74.

Wang, W., Brunet, F. G., Nevo, E., and Long, M. (2002). Origin of sphinx, a young chimeric RNA gene in Drosophila melanogaster. Proceedings of the National Academy of Sciences, 99(7):4448–53.

Wang, X. and Chamberlin, H. M. (2002). Multiple regulatory changes contribute to the evolution of the Caenorhabditis lin-48 ovo gene. Genes & development, 16(18):2345–9.

Wang, X. and Chamberlin, H. M. (2004). Evolutionary innovation of the excretory system in Caenorhabditis elegans. Nature genetics, 36(3):231–2.

Wang, X. and Sommer, R. J. (2011). Antagonism of LIN-17/Frizzled and LIN-18/Ryk in Nematode Vulva Induction Reveals Evolutionary Alterations in Core Developmental Pathways. PLoS Biology, 9(7):e1001110.

Weinreich, D. M., Delaney, N. F., Depristo, M. A., and Hartl, D. L. (2006). Dar- winian evolution can follow only very few mutational paths to fitter proteins. Science, 312(5770):111–4. BIBLIOGRAPHY 155

Weinreich, D. M., Watson, R. A., and Chao, L. (2005). Perspective: Sign epistasis and genetic constraint on evolutionary trajectories. Evolution; international journal of organic evolution, 59(6):1165–74.

Weirauch, M. T. and Hughes, T. R. (2010). Conserved expression without conserved regulatory sequence: the more things change, the more they stay the same. Trends in genetics, 26(2):66–74.

Whittaker, R. H. (1969). New concepts of kingdoms of organisms. Science, 163(3863):150–160.

Williams, T. M., Selegue, J. E., Werner, T., Gompel, N., Kopp, A., and Carroll, S. B. (2008). The regulation and evolution of a genetic switch controlling sexually dimorphic traits in Drosophila. Cell, 134(4):610–23.

Wilson, G. A., Bertrand, N., Patel, Y., Hughes, J. B., Feil, E. J., and Field, D. (2005). Orphans as taxonomically restricted and ecologically important genes. Microbiology, 151(Pt 8):2499–501.

Wilson, M. D., Barbosa-Morais, N. L., Schmidt, D., Conboy, C. M., Vanes, L., Ty- bulewicz, V. L. J., Fisher, E. M. C., Tavar´e, S., and Odom, D. T. (2008). Species- specific transcription in mice carrying human chromosome 21. Science, 322(5900):434– 8.

Winston, W. M., Molodowitch, C., and Hunter, C. P. (2002). Systemic RNAi in C. elegans requires the putative transmembrane protein SID-1. Science, 295(5564):2456–9.

Winston, W. M., Sutherlin, M., Wright, A. J., Feinberg, E. H., and Hunter, C. P. (2007). Caenorhabditis elegans SID-2 is required for environmental RNA interference. Pro- ceedings of the National Academy of Sciences, 104(25):10565–70. BIBLIOGRAPHY 156

Wittkopp, P. J., Williams, B. L., Selegue, J. E., and Carroll, S. B. (2003). Drosophila pigmentation evolution: divergent genotypes underlying convergent phenotypes. Pro- ceedings of the National Academy of Sciences, 100(4):1808–13.

Woese, C. R. and Fox, G. E. (1977). Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proceedings of the National Academy of Sciences, 74(11):5088– 90.

Xu, B., Roos, J. L., Levy, S., van Rensburg, E. J., Gogos, J. A., and Karayiorgou, M. (2008). Strong association of de novo copy number mutations with sporadic schizophre- nia. Nature genetics, 40(7):880–5.

Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood. Molecular biology and evolution, 24(8):1586–91.

Yang, Z. and Bielawski, J. (2000). Statistical methods for detecting molecular adaptation. Trends in ecology & evolution, 15(12):496–503.

Yigit, E., Batista, P. J., Bei, Y., Pang, K. M., Chen, C.-C. G., Tolia, N. H., Joshua-Tor, L., Mitani, S., Simard, M. J., and Mello, C. C. (2006). Analysis of the C. elegans Argonaute family reveals that distinct Argonautes act sequentially during RNAi. Cell, 127(4):747–57.

Zhao, Z., Boyle, T. J., Bao, Z., Murray, J. I., Mericle, B., and Waterston, R. H. (2008). Comparative analysis of embryonic cell lineage between Caenorhabditis briggsae and Caenorhabditis elegans. Developmental biology, 314(1):93–9.

Zhao, Z., Flibotte, S., Murray, J. I., Blick, D., Boyle, T. J., Gupta, B., Moerman, D. G., and Waterston, R. H. (2010). New Tools for Investigating the Comparative Biology of Caenorhabditis briggsae and C. elegans. Genetics, 184(3):853–863. BIBLIOGRAPHY 157

Zhong, M., Niu, W., Lu, Z. J., Sarov, M., Murray, J. I., Janette, J., Raha, D., Sheaffer, K. L., Lam, H. Y. K., Preston, E., Slightham, C., Hillier, L. W., Brock, T., Agarwal, A., Auerbach, R., Hyman, A. A., Gerstein, M., Mango, S. E., Kim, S. K., Waterston, R. H., Reinke, V., and Snyder, M. (2010). Genome-wide identification of binding sites defines distinct functions for Caenorhabditis elegans PHA-4/FOXA in development and environmental response. PLoS genetics, 6(2):e1000848.

Zotenko, E., Mestre, J., O’Leary, D. P., and Przytycka, T. M. (2008). Why do hubs in the yeast protein interaction network tend to be essential: reexamining the con- nection between the network topology and essentiality. PLoS computational biology, 4(8):e1000140.