The Evolution of Gene Function in Caenorhabditis spp.
by
Adrian Verster
A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Molecular Genetics University of Toronto
c Copyright 2014 by Adrian Verster Abstract
The Evolution of Gene Function in Caenorhabditis spp.
Adrian Verster Doctor of Philosophy Graduate Department of Molecular Genetics University of Toronto 2014
The sequence of the genome gradually evolves, and such changes can affect the function of the genes encoded within. Here I try to understand the causes and consequences of changes in gene function between related species, particularly in Caenorhabditis nema- todes. The goal of the first section of my thesis is to compare biological gene function of 1-1 orthologs by using loss of function phenotypes in C. elegans and C. briggsae. I did this by constructing and screening an RNA interference (RNAi) library in C. briggsae and comparing these RNAi phenotypes to those in C. elegans. This approach found 91 examples of orthologs with different in vivo functions, around 7% of the genes screened. For one of these examples, sac-1, I was able to explain the different biological function by a difference in molecular function, in this case by a difference in gene expression. Given the extremely high phenotypic similarity of these two species I hypothesize that many of these examples of different RNAi phenotypes likely represent cases of changes in gene function which have preserved the developmental phenotypes of the animal due to high levels of stabilizing selection, a model known as Developmental System Drift. In support of this, I found several cases where a phenotype is not present in the C. briggsae copy of an ortholog, but it is present in another related family member, suggesting that this related gene has taken over the function of the original gene in C. briggsae.
I also found that recently evolved genes are enriched for having different RNAi phe- notypes, which led me to consider what could explain their rapid rate of evolution of in
ii vivo function. By measuring the co-expression of novel genes to genes in different molec- ular pathways, I was able to construct a series of features which could accurately predict which novel genes become essential. An analysis of this model showed that co-expressions to ancient, essential pathways is highly predictive of a novel gene acquiring an essential function. These results supported a picture of how novel genes become integrated into cellular networks, and subsequently are preserved by evolutionary forces.
iii Acknowledgements
First and foremost I would like to thank my thesis advisor, Andy Fraser who guided me through these intellectually challenging years. You were always available to set me on track when I was lost. I am also grateful to my thesis committee (Gary Bader, Asher Cutter and Sabine Cordes) who were always willing to give me advice on a multitude of issues. You each provided an important perspective on the work, and I could not have completed this without your help. Members of the Fraser lab have provided tremendous help over the years, Arun Ra- mani, Nadege Pelte, Mark Spensley, Nattha Wannissorn, Victoria Vu, June Tan, Steve Van Doormaal, Tunga Chuluunbaatar and Mike Schertzberg. Many people helped me with difficult scientific issues, but I would particularly like to thank those who taught me computational biology, namely Leopold Parts, Azin Sayad, Traver Hart, Lee Zamparo and Carl de Boer. I could not have made it through without my friends and others who are close to me. Kathleen Cook, my lovely fianc´e, was always there to help me through difficult times. I will not exhaustively name you, but anyone else who was close to me, you know who you are. I would also like to acknowledge Marie-Anne Flix for the gift of the SID-2 expressing JU1018 transgenic C. briggsae line, and the pJH1774 plasmid containing mWormCherry was a gift from Arshad Desai’s lab. Finally, I would like to thank Google and Wikipedia for providing me with a deep well of knowledge that I repeatedly drew from when I was missing pieces of the puzzle.
iv Contents
1 Introduction 1
1.1 Mechanismsforgenomechange ...... 2
1.1.1 Mutation ...... 3
1.1.2 Selection...... 4
1.2 ExperimentalEvolution ...... 7
1.2.1 Howreproducibleisevolution? ...... 8
1.3 Change in gene function of orthologous genes...... 8
1.3.1 Case studies from quantitative genetics ...... 9
1.3.2 Case studies from the evolution of development ...... 10
1.4 Evolutionofnovelgenes ...... 14
1.4.1 Evolution of gene function for taxonomically restrictedgenes . . . 14
1.4.2 The evolution of novel genes by gene duplication ...... 16
1.4.3 The evolution of novel genes from non-coding DNA ...... 19
1.5 Pathwayevolution ...... 20
1.5.1 Convergent evolution in the C. elegans and C. briggsae sex devel- opmentpathways ...... 20
1.5.2 RNAinterferenceandArgonauteproteins ...... 21
1.6 High-throughput approaches to genome evolution ...... 23
1.6.1 How much of evolution can single genes explain? ...... 23
1.6.2 Geneexpressiondivergence ...... 24
v 1.6.3 Transcription factor binding divergence ...... 26
1.7 DevelopmentalSystemDrift ...... 28
1.8 Howmuchofmolecularfunctionisnoise?...... 31
1.9 Openquestionsandthesisgoals ...... 34
2 Evolution of ortholog gene function in Caenorhabditis spp. 37
2.1 Abstract...... 38
2.2 Introduction...... 38
2.3 Results...... 42
2.3.1 Construction and screening of the C. briggsae RNAi library . . . 42
2.3.2 Genes with different phenotypes are enriched for transcription fac- torsandrecentlyevolvednovelgenes ...... 48
2.3.3 Changes in gene function during DSD are often the result of pro- moterevolution...... 50
2.3.4 Ortholog pairs encoding more divergent protein sequences are more likelytohavedifferentRNAiphenotypes ...... 52
2.3.5 Orthologs may have different organismal roles due to changes in othergenes ...... 54
2.3.6 Conservation of function can be maintained at the level of gene familyandnotgenefamilymembers ...... 55
2.4 Discussion...... 57
2.5 Methods...... 63
2.5.1 Construction of the C. briggsae RNAi Library ...... 63
2.5.2 Manual screening of the C. briggsae RNAi Library ...... 63
2.5.3 FitnessAssay ...... 64
2.5.4 qPCR ...... 65
2.5.5 Examination of C. elegans and C. briggsae gene phylogenetic age 66
2.5.6 GFPStitchingandMicroscopy ...... 66
vi 2.5.7 Transgenicrescueexperiments ...... 66
2.5.8 Examination of C. elegans and C. briggsae protein similarity . . . 67
2.5.9 Predictability of phenotype differences ...... 67
2.5.10 Identifying functionally related genes ...... 68
3 Evolution of essential functions in novel genes 83
3.1 Abstract...... 84
3.2 Introduction...... 84
3.3 Results...... 86
3.3.1 ThenumberofTRGsinthewormgenome ...... 87
3.3.2 Novel genes preferentially form functional links with other novel genes...... 87
3.3.3 Essential TRGs are enriched for functional links ...... 88
3.3.4 Prediction of novel gene function based on co-expressionprofiles . 89
3.3.5 Novel genes contribute to drug resistance predictions ...... 91
3.4 Discussion...... 92
3.5 Methods...... 96
3.5.1 Finding Taxonomically restricted genes ...... 96
3.5.2 TRGfunctionaldata ...... 96
3.5.3 EssentialTRGclassification ...... 97
3.5.4 Featureimportanceanalysis ...... 98
3.5.5 Drugresistancepredictions...... 98
4 Discussion and concluding remarks 109
4.1 Summary ...... 109
4.2 Turnoverofgeneswithinpathways ...... 110
4.3 Transversalofadaptivepeaks ...... 113
4.4 Novelgenefunctionathighresolution ...... 115
vii 4.5 How do novel genes change functional networks? ...... 116 4.6 OverallSignificance...... 118
Bibliography 122
viii Chapter 1
Introduction
Evolution has generated the incredibly diverse array of life forms which inhabit the planet. Small changes in the genetic material, DNA, are passed down to offspring and these changes can eventually become fixed in the population. Improvements in genomic technologies have completely revolutionized our understanding of evolution. Changes in the heritable material are now measurable, and thus, we can measure the changes that occur during evolution. For example, the old view of the tree of life was that of 5 kingdoms of life, Monera (bacteria), Protists, Plants, Animals and Fungi (Whittaker, 1969), which is based on our (naive) understanding of the biology of different organisms. However, the first attempts to measure the heritable material (rDNA sequences) seriously challenged this: they found that different types of bacteria were as divergent as bacteria are from eukaryotes, and this led to a classification system of eukaryotes, bacteria, as well as a third group, archeabacteria (Woese and Fox, 1977). Since then, similar molecular taxonomics has changed our view of the eukaryotic tree of life as well. The emerging view is of 5 major groupings: Unikonts which include fungi and animals, Plantae which include land plants and algae, Excavates, Cercozoa and Chromalveolates (Keeling et al., 2005); the exact relationships between these groups remain unclear. In parallel to our ability to measure heritable changes in DNA, we have also acquired
1 Chapter 1. Introduction 2 the technology to characterize the function of the genes encoded within the DNA se- quence. Gene function exists at two levels; consider for example the Cyclin Dependant Kinases (CDKs). We can understand the biological function of a gene by characterizing mutant phenotypes (for example, CDKs have a necessary function for cell cycle progres- sion), and we can also understand the molecular function of a gene by doing biochemical experiments (for example, CDKs have a kinase function). Evolutionary biology is very interested in understanding the phenotypic differences between related species, which are often the result of adaptive evolution. Many adaptive evolutionary changes can be understood in terms of changes of molecular gene function which also affect the biolog- ical gene function. For example, adaptive loss of spines in benthic forms of sticklebacks has occurred because of evolution at the pelvic enhancer of the Pitx1 homeobox gene (Chan et al., 2010). In this case, changes in the molecular regulation of the gene have eliminated a biological function in pelvic spine development, which in turn has led to adaptive evolution. The goal of this thesis is to try and understand the causes of consequences of evolu- tionary differences in gene function between related species. I will address a small part of this by comparing loss of function phenotypes in Caenorhabditis spp. and trying to explain which novel genes acquire essential functions by using co-expression interactions to different molecular pathways.
1.1 Mechanisms for genome change
In the pre-genome era many evolutionary biologists thought that selection was the major force causing evolutionary change. However this was before we had any molecular data, and this view changed with the first argument which would become the Neutral Theory of Molecular Evolution; at the molecular level, there was far too much change and that it must be mostly selectively neutral (Kimura, 1968). Chapter 1. Introduction 3
Evolution is the process of genetic alleles changing in frequency through time, and the mechanisms by which this can occur are mutation, genetic drift, recombination and selection. While they are all important, the topic of this thesis is the evolution of gene function, and thus I will focus on mutation and selection which are the most likely processes to be consequential for changes in gene function.
1.1.1 Mutation
One of the major mechanisms for altering genome function is mutation. The best under- stood mechanism for this is single nucleotide errors induced during replication or repair of DNA. One experimental technique for measuring this is mutation accumulation lines, in which a single individual is propagated every generation. Application of this method has revealed that the rate of spontaneous point mutations is around 10−8 to 10−9 per base per generation in worms (Denver et al., 2004), yeast (Lynch et al., 2008) and Ara- bidopsis (Ossowski et al., 2010). There is increasing evidence that the mutation rate can be reduced by natural selection, and that the lower limit is determined by the strength of selection relative to drift in different populations (Lynch, 2010a). However, not all muta- tions are point mutations: there are also insertions, deletions and the recently discovered Copy Number Variants (CNV). In human populations these sorts of mutations occur regularly with a handful of insertions/deletions in every offspring (Lynch, 2010a). Addi- tionally nine novel CNV greater than 100kb were identified in a study of 772 offspring (Itsara et al., 2010). Although most novel mutations are neutral with respect to gene function, some mu- tations can have serious effects on the function of genes. Many Mendelian diseases are caused by loss of function point mutations in protein coding genes with critical functions for organism physiology (Hamosh et al., 2005). Other types of mutation can also cause disease. For example, CNV deletions overlapping with the chromodomain containing CHD7 gene are associated with CHARGE syndrome, whose patients experience congen- Chapter 1. Introduction 4 ital deformations in the heart, ear and retina (Vissers et al., 2004). There is growing evidence that large de novo CNVs are associated with complex diseases such as autism (Sebat et al., 2007) and schizophrenia (Xu et al., 2008). Negative selection is thought to be far more prevalent in nature than positive selection, which reflects the fact that mutations are much more likely to be loss of function than gain of function, since it is easier to break something than to create, which I will discuss in more detail below. One consequence of random mutation is that genes are continuously acquiring dele- terious mutations which must be purged by natural selection. For example, any de novo mutation that causes sterility will be purged from the population by natural selection. Such loss of function mutations cause an overall reduction in the average fitness of the population, a phenomenon called the mutational load (reviewed in Agrawal and Whit- lock (2012)). When the effects of selection are weak, populations can acquire a large number of mildly deleterious mutations, which collectively lower the overall fitness and thus health of a population (Lynch, 2010b). There is evidence that they can be pulled to high frequencies by genetic hitchhiking; consider the TYR polymorphism (S192Y) which has been selected and is associated with a loss of freckles, while another polymor- phism (R402Q) is deleterious and is associated with minor ocular albinism (Chun and Fay, 2011). It is possible that deleterious mutations fix in non-outcrossing species, which can eventually lead to their extinction. Consider C. elegans which has recently evolved hermaphroditism; it has been suggested that C. elegans will go extinct within a million years (Loewe and Cutter, 2008).
1.1.2 Selection
At the molecular level, by far the most common form of selection is negative selection, which acts to preserve established function from the constant stream of random mutation to the genome. Considering that the human genome sequence revealed far fewer genes and far more transposable elements than expected (Lander et al., 2001), an ongoing Chapter 1. Introduction 5 question has been how much of the human genome sequence is under negative selection. Based on whole genome alignments to the mouse genome, Chiaromonte et al. (2003) estimated that 19.2% of 50 bp windows were under selection, and since 27.1% of the human genome is covered by the alignment, this analysis gives an estimate of around 5% of the human genome being under selection. Siepel et al. (2005) used a more sophisticated statistical model, a phylogenetic Hidden Markov Model and found that 3%-8% of the human genome is conserved. Studies which consider conservation at a level above that of sequence conservation, for example consideration of DNA topology, have estimated closer to 12% of the human genome is under negative selection (Parker et al., 2009). These results have been criticized, however, since they do not take into consideration the evolutionary turnover which can occur in non-coding sequence (Ponting and Hardison, 2011); I will discuss these ideas in more detail in the section on Developmental System Drift. Although it is less common, there is considerable interest in finding instances of posi- tive selection since these can illuminate the molecular basis behind adaptation. The most widely used method looks at coding sequence by comparing the ratio of non-synonomous changes to that of synonomous changes (Ka/Ks). An equal ratio in a given gene would indicate neutral evolution, while an elevated number of non-synonomous changes would occur after a period of positive selection. This approach revealed that across many lin- eages the genes experiencing positive selection are those involved in the immune system and reproduction (Yang and Bielawski, 2000). Rapid evolution of proteins in the im- mune system likely represents the legacy of an ongoing evolutionary arms race between pathogen and host. Sometimes selection can only be detected in part of the gene, for example, detailed studies of co-evolution between the mouse TfR1 receptor and the viral GP1 protein suggest that only a few residues on the surface are actively changing under positive selection, and that these residues are divorced from the iron uptake function of TfR1 (Demogines et al., 2013). Similarly, rapid evolution of reproductive genes is Chapter 1. Introduction 6 thought to be due to a different type of evolutionary arms race - that between sperm cells in a process known as sperm competition, or other processes like sexual selection or sexual conflict (Swanson and Vacquier, 2002). One gene of great interest which shows positive selection based off the Ka/Ks metric is FoxP2, which may have contributed to the emergence of language in modern humans. This gene was originally identified as the genetic cause of a developmental disorder of which patients had pronounced problems with language (Lai et al., 2001). When the human sequence of this gene was compared to other species of great apes, it was found that there had been positive selection on the human branch from a conserved ancestor, suggesting that positive selection on this gene was involved in human language capabilities (Enard et al., 2002). Ka/Ks methods are limited for a number of reasons, but a primary one is that they can only test for selection within coding sequences. Another approach took advantage of multiple species alignments and looked for regions with a statistically elevated number of substitutions along the human branch. Several of these regions were shown to be pu- tatively functionally important for aspects of human specific biology, for example HAR1 is an RNA gene expressed in the developing neocortex and is hypothesized to have a role in the evolution of human brain function (Pollard et al., 2006). Another region, HAR2, is an enhancer: the human version show specific activity in the developing hindlimb in a transgenic mouse assay and may contribute to the evolution of bipedalism in modern humans (Prabhakar et al., 2008). The publication of the Neutral Theory of Molecular Evolution left many evolutionary biologists believing that most molecular substitutions were neutral. This has been chal- lenged recently with several results from a statistical method developed by McDonald and Kreitman (1991), originally used at the Adh gene. This method compares the ratio of fixed polymorphisms between species to polymorphic polymorphisms within species for non-synonymous and synonymous SNPs; the rationale is that positively selected polymor- Chapter 1. Introduction 7 phisms will spread through the population rapidly and will lead to many fixed differences but little standing variation. The most dramatic results using this method came from a study of 91 genes in Drosophila, which alleged that 95% of fixed non-synonomous dif- ferences were positively selected, though with very small selective coefficients (Sawyer et al., 2007). A genome scale analysis of polymorphisms in Drosophila at over 10,000 genes suggested that 54% of non-synonomous polymorphisms had been fixed by positive selection (Begun et al., 2007). However, these results remain controversial, however, be- cause when they are applied to other lineages such as yeast, humans or Arabidopsis, they detect primarily neutral evolution (Nei et al., 2010).
1.2 Experimental Evolution
One of the methods that experimental biologists have been using to make a connection between genome change and phenotype evolution is experimental evolution in which experimenters passage an organism for long periods of time and measure how its fitness changes. By using DNA sequencing, biologists can explain these changes in fitness in terms of functional alterations in the genome, and often they can understand what has happened at the resolution of changes in individual genes. One of the longest running studies is the evolution of 12 replicate populations of E. coli in laboratory conditions, which has been in progress for 20 years (Barrick et al., 2009). During this experiment E. coli fitness has increased substantially relative to their ancestor (Lenski and Travisano, 1994). There are on average less than 50 mutations in the genome of these strains after propagation for 20,000 generations (Barrick et al., 2009). Many of the same mutations occurred in different experimental lines which manifests at the level of similar changes in gene expression between these lines (Cooper et al., 2003). The experimental evolution experiments in E. coli have suggested that the majority of mutations which occur are adaptive (Barrick et al., 2009), but similar experiments in Chapter 1. Introduction 8 other organisms such as S. cerevisiae support a different picture. In yeast populations which were propagated for 1000 generations, Lang et al. (2013) found that mutations tend to move through populations in ‘cohorts’, which is suggestive of genetic hitchhiking.
1.2.1 How reproducible is evolution?
A famous debate in evolutionary biology surrounds the issue of “rewinding the tape”; if the whole of evolutionary history were to run again would we get the same result? (Gould, 1989). Experimental evolution of replicate populations can shed some light on this question. In the experimental evolution of E. coli detailed above, there are twelve replicates and certain events only occur in a subset of these populations. For example, the hypermutability phenotype arises in only seven cultures. Certain adaptations, such as the evolution of citrate metabolism, occurred in only a single culture. This evolu- tionary transition was contingent upon another unknown mutation (Blount et al., 2008), suggesting that certain evolutionary events are contingent on other stochastic events and thus that certain evolutionary transitions are fundamentally random occurrences. Conversely, some evolutionary changes are very reproducible, for example, the conver- gent loss of stickleback pelvic spines in different populations in similar lake environments (Coyle et al., 2007). It is unclear what the general picture is, but small numbers of evolutionary transitions which are contingent upon other random mutations essentially guarantee that aspects of evolutionary history would come out differently if the process was repeated.
1.3 Change in gene function of orthologous genes
The most important functional part of the genome is protein or RNA encoding genes. In many cases changes in the function of genes results in clear effects on the phenotype of the organism. These examples come to us from quantitative genetics and from studies of Chapter 1. Introduction 9 the evolution of development, and I will discuss each in turn here.
1.3.1 Case studies from quantitative genetics
There has been substantial interest in mapping polymorphic population traits to genomic loci (Quantitative Trait Loci - QTL). The locations of many of these traits have been narrowed down even further to a specific nucleotide and the gene whose function it alters. Since these population traits may eventually become fixed, they represent the first step in the evolution of gene function between related species. One of the first polymorphisms in C. elegans that was mapped to a causative gene is the plugging phenotype. In some natural isolates males deposit a copulatory plug after mating, which reduces the ability of subsequent males to successfully mate with the hermaphrodite (Barker, 1994). Based on QTL mapping between non-plugging Bristol and 2 separate plugging isolates, researchers were able to discover a loss of function mutation in a gene plg-1 which had been disrupted by the insertion of transposable element (Palopoli et al., 2008). Plg-1 encodes a large glycoprotein, which is expressed in the tail cells of the male and likely forms a significant structural component of the copulatory plug. Hermaphroditism has evolved recently in C. elegans (Cutter et al., 2008), and evolution in the plg-1 gene is thought to have arisen due to reduced selection on male function (Palopoli et al., 2008). Another mapped polymorphism in C. elegans is involved in drug susceptibility. Sensi- tivity to the drug avermectin was mapped to glc-1, a glutamated-gated chloride channel, whereby a four amino acid deletion in the gene confers resistance (Ghosh et al., 2012). This allele is thought to be maintained in the population because of balancing selection, as the deletion causes a sensitivity to the bacterium S. avermitilis. Possibly the most well known example of the evolution of gene function from quan- titative genetics in C. elegans is the gene npr-1. Several strains of C. elegans exhibit a social feeding phenotype, whereby they aggregate into groups on bacterial lawns. This Chapter 1. Introduction 10
phenotype is associated with a single nonsynonomous mutation in the neuropepide Y receptor npr-1, which phenocopies loss of function mutations (de Bono and Bargmann, 1998). The mechanism by how npr-1 affects this is thought to be by sensing the hy- pooxygenic environment, C. elegans prefers which is found in the social aggregations (Gray et al., 2004). This variation is thought to be adaptation to laboratory conditions and not a natural polymorphism (McGrath et al., 2009). The polymorphism in npr-1 has very pleiotropic effects on the phenotypes of the animals, for example it also affects ethanol tolerance of the animals (Davies et al., 2004), as well as pathogen susceptibility (Reddy et al., 2009). While npr-1 is a major effect gene which has very pleiotropic effects on the behavior and physiology of C. elegans, other genes with smaller effects have also been discovered for phenotypes that it affects. For example, QTL mapping for the rate of food patch abandonment identifies a peak around npr-1, and finer mapping with near introgression lines (NILs) was able to discover that a second gene was also contributing to this trait, tyra-3 (Bendesky et al., 2011). Tyra-3 contains a number of different polymorphisms and cross isolate rescue experiments showed that mutations in the promoter of tyra-3 were responsible for the difference in food patch leaving behaviour. Another example is that social aggregation behaviour is influenced by non-synonomous substitutions in exc-1 (Bendesky et al., 2012). These represent some of the examples that experimental biologists have identified of how the variation which exists in a population affects gene function at the molecular or biological level. There are also a number of examples of fixed differences between species, particularly affecting developmental pathways which I will survey in the next section.
1.3.2 Case studies from the evolution of development
Studies of the evolution of development have provided numerous examples to illustrate the evolution of gene function. This viewpoint attempts to understand how evolutionary Chapter 1. Introduction 11
change of developmental genes and pathways can lead to the wide diversity of form observable in nature. One of the successful lines of inquiry in the field has been on the evolution of pig- mentation patterns in different species of Drosophila, for example at the yellow gene (Gompel et al., 2005). Yellow has pleiotropic functions and it is thought that protein coding mutations within the gene could be deleterious. Transcription factor binding sites present in D. biampes but not D. melanogaster drive yellow expression within the wing, which result in adult wing spots. This mechanism has been proposed to explain the diversity of wing spots that are observed throughout Diptera (Wittkopp et al., 2003). Evolution of gene function during the evolution of fly pigmentation patterns has occured in many more genes such as tan (Jeong et al., 2008), bab (Williams et al., 2008), and ebony (Rebeiz et al., 2009). Another example of changes in orthologous gene function is in the evolution of tri- chome patterns. Different species of Drosophila have lost dorsal trichomes and a phy- logenetic analysis shows this has occurred in three different lineages. The evolutionary loss of expression of Shavenbaby, a transcription factor which is genetically required for trichome development in D. melanogaster, is highly correlated with the loss of such trichomes in other species of Drosophila (Sucena et al., 2003). Furthermore, three en- hancers were identified which control the expression of Shavenbaby in D. melanogaster, of which the orthologous regions in D. sechellia showed modified expression patterns when tested transgenically (McGregor et al., 2007). Finally, based off interspecies recombinant mapping between D. sechellia and D. mauritiana, the researchers found that the pheno- typic difference mapped to a region upstream of Shavenbaby which overlaps with these enhancers. The clearest example of a change in gene function affecting developmental pathways in Caenorhabditis spp. is the changes of salt tolerance due to evolution in the lin-48 gene. C. elegans has a much greater tolerance of high salt conditions than related nematode Chapter 1. Introduction 12
species and this change correlates with a difference in the morphology of the excretory duct (Wang and Chamberlin, 2004). Lin-48 mutations in C. elegans phenocopy the C. briggsae duct cell morphology and salt tolerance phenotypes and rescue experiments with the C. elegans regulatory sequence combined with coding sequence from other species indicate that the evolution in this gene occurs because of changes in the promoter (Wang and Chamberlin, 2004). Promoter bashing experiments have detected a piece of DNA with several predicted ces-2 binding sites in the C. elegans promoter which are not present in the C. briggsae promoter, but transgenic experiments into C. briggsae suggest that changes in the trans expression environment are also important for the change in expression between species (Wang and Chamberlin, 2002). A final case study that we will consider is of the evolution in Pitx1 between marine and freshwater sticklebacks. A pelvic spine is present in marine stickleback species that has been reduced in several different populations of freshwater sticklebacks (Bell, 1987). This phenotype difference is thought to be adaptive either due to change in the preda- tory environment (Reimchen, 1980) or calcium availability (Giles, 1983). QTL mapping between marine and freshwater fish have found a large QTL which accounts for 13.5% - 43.7% of the variation in pelvic size (Shapiro et al., 2004). Using a candidate gene ap- proaches the authors identified Pitx1 as the causative gene behind the major QTL, given that it is known to have a role in hindlimb development in model vertebrate systems (Szeto et al., 1999). QTL mapping in other stickleback isolates has shown that Pitx1 is likely the likely functional gene in multiple populations which have lost pelvic spines (Coyle et al., 2007). Different sticklebacks do not have non-synonomous changes in their Pitx1 coding se- quence (Shapiro et al., 2004), and thus the functional difference is likely to be regulatory. GFP promoter bashing experiments found a region upstream of Pitx1 that drove expres- sion in the pelvic regions and that is absent in isolates of stickleback with reduced pelvic spines (Chan et al., 2010). Transgenic expression experiments using this enhancer and Chapter 1. Introduction 13
Pitx1 were able to create pelvic spines in pelvic reduced isolates. Finally, the authors were able to find evidence of genomic selection at this locus based on an excess of derived alleles. Evolution at the Pitx1 locus provides us with an excellent model for understanding how adaptive evolution can occur because of positive selection in the expression pattern of a single gene. It has shown us that in certain circumstances there are specific genes that are mutated to cause large phenotypic changes for the organism, and that such mutations can occur repeatedly when different populations exist in similar environments. Sean Carroll has argued extensively that the evolution of gene expression is a ma- jor force in the evolution of development because many of the developmental regulators controlling changes in form are highly pleiotropic and evolutionary changes in their regu- latory regions allow for a fine tuning of their function without disrupting other biological processes (Carroll, 2008). There has been some pushback to the idea that regulatory evo- lution is the dominant mechanism in genomic adaptation (Hoekstra and Coyne, 2007), since there are several examples of protein coding evolution having a significant effect on the evolution of forms. For example, in D. melanogaster Ultrabithorax has the capacity to repress Distall-less expression in the abdomen of flies to prevent legs from developing, but the homologous protein from the onychophora phylum does not have this capacity (Grenier and Carroll, 2000). This difference arises from expansion of the QA motif next to the homeodomain (Galant and Carroll, 2002; Ronshaugen et al., 2002). Another exam- ple of protein evolution having an important role in the evolution of form is the MC1R gene. MC1R is a receptor for melanocortin and mutations which reduce its function can explain part of the variation in pigmentation color in mice (Hoekstra et al., 2006). The same gene is at least partially responsible for variation in pigmentation in reptiles (Rosenblum et al., 2004) and catfish (Gross et al., 2009). Like many of the competing models in the biological sciences, both regulatory and coding change are important evolutionary forces. Clearly there are examples of both Chapter 1. Introduction 14 being required for adaptive evolutionary change.
1.4 Evolution of novel genes
Sequencing the yeast genome revealed that around one third of identified genes had no ortholog in any other known species and thus were called ‘orphan’ (Dujon, 1996). Orphan genes have been found in many different genomes in multiple lineages, however their def- inition is problematic because it depends on the density of published genome sequences. Consider that with the completion of genomes of other species in the Saccharomyces genus many of the originally identified orphan genes were found to have homologs. For this reason, a more descriptive name was proposed: Taxonomically Restricted Gene (TRG), which reflects the fact that many genes exist in restricted lineages and are likely impor- tant for lineage specific biology, even if they exist in multiple genomes (Wilson et al., 2005). TRGs are somewhat enigmatic because we are often clueless to their function, but they could be very important for the evolution of lineage specific traits.
1.4.1 Evolution of gene function for taxonomically restricted
genes
A current model for the evolution of development is that a core set of developmental genes are re-used in different ways to produce the wide array of animal forms we observe today. Early studies supported this by demonstrating that there had been little increase in the number of Hox genes from the onychophoran outgroup to the radiation of arthropod species (Grenier et al., 1997). This supported a model in which core developmental genes are reused through novel cis-regulatory modules (Carroll, 2008). Despite this, there are many examples of how novel genes can be important for species specific biology. TRGs are hypothesized to play an important part in lineage specific biology, but they are very difficult to study because of the lack of homology to genes in other lineages. For Chapter 1. Introduction 15
example, the Sphinx gene has emerged within the past 2-3 million years in the Drosophila melanogaster genome from a restrotransposition of the ATPS-F gene with another exon and intron to create an RNA gene (Wang et al., 2002). The function of this gene would remain opaque except that mutations in Sphinx cause defects in proper mating behavior, with an increase in male-male courtship behaviors (Dai et al., 2008). Another example of the biological importance of such genes comes from species of Hydra. Nematocytes are cell types specific to Cnidarians which are used to sting and capture their prey; based on a large scale expression study of these cells in Hydra magnipapillata, 80% of genes expressed specifically in Nematocytes were found to not have homologs outside the phylum (Hwang et al., 2007). A more specific example from Cnidarians is Hym301, which is found in H. oligactis and is expressed in 2 long tentacles which develop before the rest. Transgenic expression experiments in H. vulgaris phenocopy the developmental pattern, arguing for an important role in tentacle development (Khalturin et al., 2009). There are a number of similar examples in model organisms. BSC4 has evolved recently in the S. cerevisiae genome and it is undetectable in related species of Saccha- romyces by sequence searches or southern blot. Based on synthetic lethal interactions and its coexpression with members of the DNA damage pathway, it is thought to be a novel member of that pathway (Cai et al., 2008). Another example is the mouse gene Poldi, which has arisen within the past 3.5 million years from non-coding DNA and is expressed in testis. Poldi knockout mice display a reduced sperm motility phenotype (Heinen et al., 2009). TRGs in the D. melanogaster genome appear to also be expressed preferentially in the testis and may play a role in the evolution of germ line specific functions (Levine et al., 2006). Similar testis biased expression patterns for evolutionary novel genes were identified in D. yakuba ESTs (Begun et al., 2007). Recently evolved genes are also thought to be important for human evolution. By looking for H. sapiens genes which are not present in the chimpanzee genome and sup- ported by proteomic databases, Knowles and McLysaght (2009) were able to detect three Chapter 1. Introduction 16
genes which are specific to the human genome. The sequence of the syntenic chimpanzee DNA to these genes is highly similar, but there are inactivating features, such as a pre- mature stop, which prevent them from being genes. These inactivating features are found in other species such as gorilla and thus are ancestral; the derived removal of these in- activating features is required for the creation of these novel genes. Another example of a novel gene in the human genome is FLJ33706, which arose from non-coding DNA along with the insertion of an Alu sequence to form an RNA gene which is enriched in expression in the brain and associated with nicotine addiction in human populations (Li et al., 2010). Novel genes are being created throughout evolution and in some cases it is possible to understand the function they have acquired. Next I will consider what sort of mechanisms can create novel genes.
1.4.2 The evolution of novel genes by gene duplication
One significant source of novel genes is the process of gene duplication, which had been hypothesized to be of great importance since long before genome sequences were available (Ohno, 1970). After many genomes were sequenced, scientists were surprised by how many duplicate genes were present: for example nearly 10% of genes in C. elegans were found to be duplicates (Lynch, 2000). Duplicated genes can occur in tandem because of unequal crossing over during meiosis, which results in a duplicated pair sitting next to one another. An alternative mechanism is by reverse transcription and transposition back into the genome. Since the processed gene copy is what ends up being duplicated, this method has been thought to produce pseudogenes. However, it has recently been discovered that reverse transcribed sequence can be used for more innovative forms of novel gene evolution, such as gene fusion (reviewed in Kaessmann (2010)). Once gene duplicates are created there are two possible outcomes for the evolution of their function. One possibility is neofunctionalization in which one copy retains the Chapter 1. Introduction 17
ancestral function, while the other copy adopts a novel function. An example of this is the evolution of trichromatic vision in H. sapiens. The human red and green opsins are extremely similar in amino acid sequence and are thought to be tandem duplicates on the X-chromosome (Nathans et al., 1986). Despite the relatively small difference in amino acid sequence, the red and green opsin proteins have a difference in peak absorbance of about 30nm (Bowmaker, 1981), which gives us the capacity to distinguish red from green. An alternative model is that of subfunctionalization in which a pleiotropic ancestral gene has its separate functions partitioned into different child genes. An example of this is ZAG1 and ZMM2 in maize which are homologous to AGAMOUS in Arabidopsis, which is involved in the development of both carpels and stamens (Coen and Meyerowitz, 1991). In maize this function appears to be subfunctionalized: ZAG1 is expressed in developing carpels, and knockout phenotype confirm a necessary function in that tissue (Schmidt et al., 1993), while ZMM2 is highly expressed in stamens and likely required for the development of that tissue (Mena et al., 1996). There are two models to explain how selection operates during subfunctionalization, the first argues for neutral evolution and is called the duplication, degeneration, comple- mentation (DDC) model. Under this model, selection is relaxed on the duplicates and neutral mutations which degenerate different aspects of each duplicate can become fixed, until both are required for the full function of the ancestral gene (Force et al., 1999). Such a mechanism is thought to explain the subfunctionalization of ZAG1 and ZMM2 which I described above. The alternative selective model for subfunctionalization of gene duplicates is the es- cape from adaptive conflict model. In this model, a multifunctional protein can have each function optimized separately in the duplicated genes and such changes are selected for; it was originally proposed to explain an example from yeast (Hittinger and Car- roll, 2007). In Kluyveromyces lactis (a yeast species which diverged before the whole Chapter 1. Introduction 18 genome duplication), GAL1 is a multifunctional protein which acts as a coinducer to the galactose pathway, as well as the first enzyme in the pathway, a galactokinase. In species which have experienced a whole genome duplication such as S. cerevisiae, GAL1 has the galactokinase activity, while GAL3 has the coinducer activity. In S. cerevisiae there has been apparent adaptation of each gene for their specific function; GAL1 now induces over 1000 fold in response to galactose, compared to 3-5 fold in K. lactis. This has occurred because of evolution of GAL4 binding sites in the promoter of these genes (Hittinger and Carroll, 2007). In this example the pleiotropic ancestral gene has had each of its functions optimized post duplication. There are two other known examples of this model: divergence of the dihydroflavonol-4-reductase gene into separate enzymatic activities in Ipomoea (Des Marais and Rausher, 2008), and the split of sialic acid binding activity and ice binding activity in the evolution of the antifreeze gene in an Antarctic zoarcid fish (Deng et al., 2010). An intriguing mechanism with significant consequences for gene duplication is where the entire genome duplicates, producing a duplication of every single gene. The first serious evidence for such a model came with the sequence of the Kluyveromyces waltii genome, which contained sequences that were syntenic to two separate regions of the S. cerevisiae genome, unlike the 1-1 relationship found for genomes of most closely related species (Kellis et al., 2004). In the post duplication lineage many duplicate pairs reverted to single copies, likely 50% before the divergence of S. castelli (Scannell et al., 2006). Yeast is not the only lineage in which there is evidence for whole genome duplication, a detailed analysis of the growing number of genomes is providing more evidence. There has been a duplication in the Arabidopsis lineage (Bowers et al., 2003), two rounds in the H. sapiens lineage (Dehal and Boore, 2005), and at least three rounds in the ciliate Paramecium tetraurelia (Aury et al., 2006). Currently it is thought that Drosophila and C. elegans have not experienced a whole genome duplication, but it is likely a recurring phenomenon throughout evolution. Chapter 1. Introduction 19
1.4.3 The evolution of novel genes from non-coding DNA
Another model for the creation of genes would be the transition of non-coding DNA into a bona fide gene by the acquisition of genic features such as transcription and translation start/stop sites. Searches of the yeast genome find 267,000 ORFs (start codon to in frame stop codon), far more than the legitimate 6,000 genes. The “proto-Gene” model argues that these ORFs exist in a large continuum of gene-like entities - there is some evidence that a subset of these proto-genes are translated based on ribosome footprinting data, representing intermediates in the continuum between intergenic and genic sequence (Carvunis et al., 2012). The three novel genes in the human genome I discussed earlier are likely created through such a mechanism, since the crucial mutations resulting in them becoming genes have removed “disablers” such as eliminating a premature stop in C22orf45 (Knowles and McLysaght, 2009). These different mechanisms for the evolution of novel genes have likely had different functional outcomes. When a gene evolves from gene duplication, it will contain domains which give it an established molecular function that it can evolve from; for example FOG-2 emerged with an F-box domain (discussed in the next section) which allowed it to participate in protein-protein interactions while its derived biological function in sex development was being established. However, if a non-coding sequence turns into a gene then it is unlikely to have any existing protein domains in it, and no previously existing molecular function can form the base for its evolution. Chapter 1. Introduction 20
1.5 Pathway evolution
1.5.1 Convergent evolution in the C. elegans and C. briggsae
sex development pathways
Since genes fit into molecular pathways, we could also understand their evolution in terms of how they affect these pathways. An excellent example of this is sex determination in Caenorhabditis since three separate species have made the transition to a hermaphrodite- male system from a male-female ancestor. This transition is thought to have occurred recently based an analysis of when the relaxation in selection occurred in those species (Cutter et al., 2008). In C. elegans males, male germ cell identity comes from the soma which secretes HER- 1; this binds to and inactivates the receptor TRA-2, which induces a signalling cascade to initiate sperm production (Kuwabara and Kimble, 1995). In order for an ancestral female to become a hermaphrodite, a pathway must evolve to downregulate TRA-2 in females. In C. elegans hermaphrodites TRA-2 is downregulated post-transcriptionally by the RNA binding protein GLD-1, which complexes with FOG-2 (Nayak et al., 2005). FOG-2 is an F-box protein which is found only in C. elegans and thus its recent creation could be central to the mechanism which C. elegans has become a hermaphrodite. In C. briggsae, evolution of hermaphroditism does not depend on fog-2 since it is not present in the C. briggsae genome. Instead, SHE-1, which is a different F-box protein, acts genetically upstream of tra-2, but does not physically bind to GLD-1 (Guo et al., 2009). Thus the mechanism by which C. elegans and C. briggsae repress TRA-2 is distinct, the only common theme is the existence of an F-box protein. The idea that the evolution of hermaphroditism only required the downregulation of tra-2 has been partially recapitulated by experimental biologists in the male-female species C. remanei. Here, RNAi of tra-2 produces pseudohermaphrodites in some animals (Baldi et al., 2009). However, such an evolutionary transition must have required changes Chapter 1. Introduction 21
in more genes, since such experiments also required inactivation of swm-1 in order to activate the sperm cells and produce viable hermaphrodites.
1.5.2 RNA interference and Argonaute proteins
RNA interference (RNAi) is a widely used technique to substantially reduce the expres- sion of a gene of interest, simulating loss of function. It is of interest to this thesis because it is the central experimental technique that I use and because related pathways are experiencing ongoing evolutionary change in Caenorhabditis. For these reasons I will briefly discuss it here. RNAi was originally observed in plants when researchers were attempting to increase the amount of pigmentation in Petunias by transgenically expressing chalcone synthase, a central gene in the pathway (Van Blokland et al., 1994). However, instead of increasing the amount of pigmentation, it was reduced - a phenomenon termed co-suppression. Although at the time the mechanism behind this was not understood, this is now known to be caused by RNA interference (RNAi). In C. elegans RNAi was originally observed in the form of double stranded RNA complementary to the gene of interest, which when experimentally introduced produced a robust knockdown in expression level (Fire et al., 1998). The authors remarked that there was likely an amplification mechanism involved, since only a few molecules of RNA were required to elicit knockdown. Using this system, a number of genetic mutants were identified (Tabara et al., 1999), and along with the purification of Dicer from Drosophila S2 cells (Hammond et al., 2000), led to a characterization of the mechanism behind RNAi. In outline, double stranded RNA is cleaved into small segments of RNA by the Dicer complex, this signal is amplified by RNA-dependant RNA polymerases, and then used by the RNA induced silencing complex (RISC) to cleave complementary endogenous RNA molecules (Grishok, 2005). The biological role of RNAi and Argonaute genes in C. elegans is thought to be an Chapter 1. Introduction 22
anti-viral or anti-transposable element system. There is only 1 virus known to infect C. elegans naturally, which is an RNA nodavirus, and mutants in critical RNAi machinery, such as rde-1 /rde-2 /rde-4, are necessary for keeping the viral load and phenotypes low (F´elix et al., 2011). There are 25 Argonaute genes in the C. elegans genome and many of these belong to the worm-specific ago (WAGO) family not found in other lineages (Buck and Blaxter, 2013). This diversification of Argonaute genes have led to their involvement in many different pathways (Yigit et al., 2006); for example csr-1 is involved in chromosome segregation (Claycomb et al., 2009). Only a subset of these genes are conserved in other Caenorhabditis species, suggesting that they are a rapidly expanding family of proteins (Dalzell et al., 2011). C. elegans has traits which makes RNAi particularly useful for experimental biolo- gists: it can uptake double stranded RNA from the environment and silencing signals will reach every tissue in the body (except sperm and neurons) once double stranded RNA is in the body wall cavity. This has led to the experimental technique of RNAi by feeding, which is the experimental basis of chapter 2 of my thesis. Several systemic RNA interference-deficient (sid) genes have been discovered which are required for these processes. The first characterized was sid-1 which is a transmembrane protein expressed on cell surfaces and required for uptake of the silencing signal in cells which are capable of RNAi and transgenic expression of sid-1 into neurons turns them from systemic RNAi resistant to systemic RNAi competent (Winston et al., 2002). A biochemical study of sid-1 when expressed in Drosophila S2 cells led to a model in which sid-1 is a dsRNA gated dsRNA specific channel which allows for passive diffusion (Shih and Hunter, 2011). Another gene is sid-2 which is an intestinal gene required for uptake of double stranded RNA from the environment via the gut lumen (Winston et al., 2007). The ability to uptake environmental RNAi signals is uncommon in Caenorhabditis; however transgenic expression of C. elegans sid-2 can render some of these other species capable of envi- ronmental RNAi (Nuez and F´elix, 2012). Two other pathway members required for the Chapter 1. Introduction 23
import of dsRNA into cells have been described: sid-3 is a cytoplasmic tyrosine kinase (Jose et al., 2012) and sid-5 is an endosome associated protein with a transmembrane domain (Hinas et al., 2012). An understanding of the rest of the pathway is still ongo- ing, but it provides an interesting example of a C. elegans pathway which has evolved recently, since homology searches do not detect these genes in more divergent nematodes (Dalzell et al., 2011).
1.6 High-throughput approaches to genome evolu-
tion
1.6.1 How much of evolution can single genes explain?
Through case studies we are led to believe a picture whereby phenotypic differences between related species occur through changes in individual genes with substantial effects on the development or physiology of the organism. However, it is unclear if this is a general picture since researchers are drawn to what is likely to give positive results - large effect QTLs. Consider for example in the case of pelvic spine reduction in stickleback which makes a very nice story about how selection at a single gene leads to adaptive evolution. Mapping experiments have identified a number of small effect QTLs in addition to the Pitx1 locus (Shapiro et al., 2004), so even in this case the story isn’t quite as simple as we would like to believe. Rockman (2012) has argued that by focusing on large effect mutations to explain phenotype evolution, researchers have missed out of a large part of the picture as there is no theoretical reason why evolution should proceed by changes in single large effect loci. Based on results from quantitative genetics, the genomic basis behind many different traits can be explained by a large number of small effect loci. For example pooled QTL mapping experiments in yeast can identify upwards of 20 loci which can explain Chapter 1. Introduction 24 the majority of the variance behind chemical resistance (Ehrenreich et al., 2010). In addition, mapping loci for yeast growth under different environmental conditions yielded 591 QTLs for 46 traits - an average of nearly 13 per trait (Bloom et al., 2013), again explaining most of the variance in the trait. Technology development in molecular biology is changing the perspective by giving us the ability to look at the function of large numbers of genes at once. Different types of experiments allow the measurements of different facets of gene expression: for example, it is now possible to measure on a genomic scale the expression levels of different genes (Hibbs et al., 2007), protein-protein physical interactions (Gavin et al., 2002), protein- DNA interactions (Martone et al., 2003) genetic interactions (Costanzo et al., 2010), and gene loss of function phenotypes (Kamath et al., 2003). Many groups have been comparing these measurements between related species in order to try and understand the evolution of gene function.
1.6.2 Gene expression divergence
The first technology that opened the door to a high-throughput look at the evolution of gene function was the microarray. One of the early studies to make use of microarrays to understand the evolution of gene expression looked at expression in several different tissues in human, chimpanzees, orangutans and macaques which found that the brain was highly enriched for expression differences in the human lineage (Enard et al., 2002). Fol- low up studies confirmed this general pattern by showing that human-specific expression differences are not observed in other tissues such as liver (Preuss et al., 2004). The brain genes which are upregulated in the human lineage come from a diverse set of functions such as regulation of transcription, signal transduction and lipid metabolism. While there is substantial interest in identifying positive selection leading to gene ex- pression divergence, it is likely that most transcriptome divergence is neutral or negatively selected. Khaitovich et al. (2004) found that expression divergence accumulated linearly Chapter 1. Introduction 25 with divergence time in primates, and that transcribed pseudogenes diverge at the same rate as real genes, suggesting that neutral evolution dominated the transcriptome. In contrast to this, a study in C. elegans compared expression divergence in mutational accumulation lines (where selection is inefficient) to natural isolates (where selection is efficient) and found that there was higher divergence in mutational accumulation lines, suggesting that negative selection dominates the transcriptome (Denver et al., 2005). Not all types of genes are equally constrained, it appears that signal transduction genes are under higher levels of negative selection than carbohydrate, amino acid and lipid metabolism genes. One of the most intriguing results to come from applying microarrays to the study of gene expression divergence is the finding that most divergence results from changes in cis. S. cerevisiae and S. paradoxus can hybridize, and using custom designed microarrays, Tirosh et al. (2009) compared allele specific expression between the species in question and the hybrid to determine how much of the divergence occurs in trans and how much occurs in cis. They found that the majority of the change could be explained by changes in cis, and furthermore, that changes in cis were independent of the condition the yeast were grown in while changes in trans were dependant upon condition. In another study, gene expression measurements from human chromosome 21 was expressed in mouse hep- atocytes confirmed this result by finding that the majority of the human expression was recapitulated within the trans environment of the mouse cell (Wilson et al., 2008). These results support a model in which the majority of the evolution of gene expression levels occurs due to changes in the cis regulatory sequence such as transcription factor binding sites and sequence which influences promoter chromatin organization. A model which was hypothesized prior to the development of high-throughput tech- nologies is that genetic pathways should be most conserved at the middle stage of animal development, when the body plan of the animal was being established; the so called “hourglass” model (Kalinka and Tomancak, 2012). For example, observations of devel- Chapter 1. Introduction 26
opment in different nematode species have found that establishment of AP asymmetry and P granule segregation occurs differently than in C. elegans, despite similar devel- opment after this point (Goldstein et al., 1998). After correcting for differences in the timing of developmental progression in species of Drosophila it was found that gene ex- pression divergence supports the hourglass model of development, with the highest level of conservation during the middle arthropod phylotypic period (Kalinka et al., 2010). Further results from zebrafish have confirmed this pattern, not by looking at average ex- pression over all genes in the genome but by attempting to build relevant gene modules (Piasecka et al., 2013). The hourglass model has been confirmed at a molecular level in multiple phyla and using multiple methods; it is an excellent example of how a molecular finding can support previously conceived models in evolutionary developmental biology. By using genomic technologies evolutionary biologists have been able to understand aspects of what governs the evolution of gene expression between related species. Such genomic technologies are also useful in studying how the transcription factor binding sites which control gene expression levels change between related species, and I will survey some of this research in the next section.
1.6.3 Transcription factor binding divergence
The evolution of gene expression in cis is thought to be due to the evolution of tran- scription factor binding sites or sequencing controlling the organization of nucleosomes. Technically, the way we can study transcription factor binding sites is using Chromatin ImmunoPrecipitation followed by microarray (ChIP-chip). One of the first studies to use such an experiment for an evolutionary comparison was mapping the binding sites of the pseudohyphal regulators Ste12 and Tec1 in different yeast species (Borneman et al., 2007). They found that binding sites turn over far more rapidly than orthologous gene content, implying that such changes could be an important phenomenon in evolution. Comparisons between multicellular organisms are more complicated due to the presence Chapter 1. Introduction 27 of multiple different cell types in differing ratios between different species. However, the liver is an excellent tissue for comparison because it is very homogeneous in cell type. A study looking at the binding sites of 4 key liver transcription factors (FOXA2, HNF1A, HNF4A and HNF6) found something very similar to yeast - there is a very high level of change in the genes whose promoters are bound by these transcription factors (Schmidt et al., 2010). Furthermore, when these factors do bind to the same promoter, the site often does not align, implying turnover of binding sites within genes. Given how quickly transcription factor binding sites turn over, the question remains whether all of these changes result in corresponding changes in gene expression. If we examine yeast genes promoters with binding site divergence there is no statistically sig- nificant increase in expression divergence (Tirosh et al., 2008). However, when binding site turnover is examined for a single transcription factor (Ste12) in a biological context that is well understood (mating response), about half the divergence in gene expression can be explained by divergence in Ste12 binding (Tirosh et al., 2008). These results highlight the potential difficulty in understanding evolution from a functional genomics perspective, especially in the context of transcription factor binding data in aggregate. Another model for studying the evolution of transcription factor binding sites is the development of the Drosophila blastoderm embryo, which is a conserved process through- out the genus. In comparing binding of core transcription factors (BICOID, GIANT, HUNCHBACK and KRUPPEL)¨ in the early developing blastoderm, it was found that only one third of ChIP-chip peaks were conserved between D. melanogaster and D. pseu- doobscura (Paris et al., 2013), and that such peaks are changing far faster than gene expression levels. Another study which compared D. melanogaster to the more closely related D. yakuba found there were significant changes in quantitative binding, even when there were peaks at the same locus (Bradley et al., 2010). Another force that could explain evolutionary turnover in gene expression is evolu- tion of the chromatin environment around genes. Nucleosome occupancy was originally Chapter 1. Introduction 28
measured genome wide in yeast using micrococcal nuclease followed by hybridization to a DNA microarray (Lee et al., 2007). Similar measurements in 12 Hemiascomycota yeast species suggest that the evolution of chromatin organization can affect the evolu- tion of gene expression, since nucleosomes can obscure transcription factor binding sites (Tsankov et al., 2010). Another study looking at hybrid yeast found that the majority of divergence in nucleosome occupancy occurs in cis rather than in trans (Tirosh et al., 2010), similar to what is the case for expression levels (Tirosh et al., 2009). Genomic technologies such as microarrays and high-throughput sequencing are greatly expanding our current views of evolution, moving us past thinking about finding the gene which underlies a specific trait, towards trying to explain how sets of genes and their biochemical activities can explain a given trait.
1.7 Developmental System Drift
I have discussed numerous examples of how a positively selected change in gene function can lead to adaptation, and I have also discussed how many polymorphisms do not affect gene function, they are selectively neutral and change in population frequency through the process of genetic drift. There is a third outcome, in which genome changes affect gene function, but due to strong stabilizing selection the phenotypic output of the genetic circuit remains conserved (Figure 1.1), this is known as Developmental System Drift (DSD) (True and Haag, 2001). It implies that molecular evolution is a continuously occurring process, even when species are in apparent periods of evolutionary stasis. A significant mechanism of DSD is when changes in the transcription factor bind- ing sites occur which maintain the gene expression output of the promoter/enhancer (Weirauch and Hughes, 2010). A good model for this is the stripe 2 enhancer of even- skipped in Drosophila, whose activation is controlled by 12 binding sites for Bicoid, Hunch- back, Giant and Kr¨uppel. Ludwig et al. (1998) found that the locations and numbers of Chapter 1. Introduction 29
these binding sites are not conserved in different species of Drosophila, but that those enhancers still produced the correct stripe 2 of even-skipped when tested transgenically in D. melanogaster. Furthermore, the locations and/or numbers of the binding sites within each species is critical, since a hybrid enhancer between D. melanogaster and D. pseu- doobscura did not produce a functional stripe (Ludwig et al., 2000). Follow up studies in Sepsid flies had similar findings: the function of even-skipped enhancers is conserved despite widely divergent sequence and divergent binding site motifs (Hare et al., 2008). These data are consistent with the billboard model of enhancer function, in which each binding site or group of binding sites is read independently by the molecular machinery of the cell, and the sum of their effects determines activation or repression (Kulkarni and Arnosti, 2003). Another example is unc-47, which has a conserved low level of expression in SDQR and SDQL neurons in both C. elegans and C. briggsae. The regulatory circuitry controlling this has diverged, as hybrid promoters between the species do not hold the conserved expression pattern (Barri`ere et al., 2012). Furthermore there is apparent co-evolution between the cis and trans regulatory machinery to maintain unc-47 expression, possibly due to stabilizing selection. The evolution of post-translational gene regulatory mechanisms such as phosphory- lation could also be explained by DSD. Consider as an example the cell cycle in related species of yeast, which is an extremely highly conserved biological process. CDK phospho- sites in the human or yeast ORC1 linker region are often disrupted by a mutation in the critical S/T or P residue in closely related species; however, there are often other possible sites in the same linker region, which could be explained by a turnover model (Moses et al., 2007). Consistent with this, these phospho-sites exist in loose clusters along pre- RC proteins and these phosphorylation events does not induce allosteric changes in the conformation of these proteins; rather, its change of function occurs through bulk elec- trostatics, thus allowing the number of location of these sites to change without affecting Chapter 1. Introduction 30
the function of the interaction (Serber and Ferrell, 2007). Experiments measuring RNA polymerase III binding to tRNA genes suggest that there can be little conservation at the level of the individual gene while that conservation
exists at a higher order level. Specifically, Pol III binding was divergent between ∼500 tRNA genes in 5 mammalian species, but more conservation was observed when the authors merged the tRNA genes into sets of aggregate anticodon isoacceptor or amino acid isotype (Kutter et al., 2011). These results are consistent with a changing binding profile among a redundant set of tRNA genes, as long as there is conservation of the levels of different tRNA bound amino acids available to the cell. Another excellent example is the regulation of a-specific genes (asgs) in S. cerevisiae and C. albicans. In both species, asgs are transcribed in a cells and repressed in α cells, but while in C. albicans this occurs by activation of the asgs in a cells, in S. cerevisiae it occurs via repression of the asgs in α cells (Tsong et al., 2003). This difference exists because of evolution on the S. cerevisiae branch of the tree; the circuitry of C. albicans is ancestral. Molecularly, this difference exists for two reasons: first, there was the loss of the interaction between a2 and Mcm1, which controlled activation of asgs in a cells. Secondly, there was a gain of an interaction between the α2 protein and Mcm1, which led to repression of asgs in α cells (Tsong et al., 2006). In order to avoid a state without regulation of the asgs, there was likely an intermediate state with both activation and repression, which still partially exists in intermediate extant species (Baker et al., 2012). Beyond gene expression, there are examples of molecular pathways which can be fairly divergent with little effect on the phenotype of the organism. In the nematodes C. elegans and C. briggsae, the pattern and timing of cell division and cell death is the same up until the 350-cell stage of embryogenesis (Zhao et al., 2008). However, the genetic networks that regulate this are not the same: knocking down the Wnt-pathway effector pop-1 by RNAi causes the E cell to adopt an MS cell like fate, whereas in C. briggsae it causes the MS cell to adopt an E cell like fate (Lin et al., 2009; Zhao et al., 2010). Chapter 1. Introduction 31
Another example of pathway evolution with a conserved phenotype is in the development of the vulva between C. elegans and Pristionchus pacificus. While in C. elegans, vulval development is induced by EGF/Ras signaling from the anchor cell (with a minor role of Wnt signaling), in P. pacificus it is induced by the Wnt pathway from the anchor cell and gonad (Tian et al., 2008). This change in reliance on different pathways may have at least partially been driven by changes in protein architecture. In P. pacificus there are multiple SH3 interaction peptide motifs in the Wnt pathway gene lin-18 (Ryk), which are not present in C. elegans - these are thought to be negative regulators of the gene (Wang and Sommer, 2011). Although these examples help demonstrate several mechanisms for DSD, it is cur- rently not clear how promoters or molecular pathways can change from one state to another because it would involve moving across a region of low fitness in which the molecular pathway wouldn’t produce the desired output. An attractive hypothesis is that there is a transition through a redundant intermediate such as an enhancer with multiple binding sites, or a developmental process with multiple parallel pathways or redundant genes (Haag, 2006).
1.8 How much of molecular function is noise?
In the previous section I discussed how the mechanics of transcription factors activating gene expression or molecular pathways can change without any effect on the phenotype of the organism. This raises questions about how much of the molecular function high- throughput biologists are currently measuring has any effect on the phenotype of the organism at all. The ENCODE project claimed that 80% of the human genome holds a function (Bernstein et al., 2012), in contrast to the current thought that it is mostly disposable (eg. a megabase size gene desert can be deleted from the mouse genome without ob- Chapter 1. Introduction 32
vious phenotypes, N´obrega et al. (2004)). Given ENCODE’s definition of function as a DNA segment which produces a transcript or has a biochemical signature such as protein binding or histone modification, the statement that 80% holds a function is technically correct. However, this has been criticized because only 10% of the human genome shows any evidence of negative selection, and thus there is no evidence that most of this bio- chemical function manifests itself in a phenotype which is detectable by evolution (Graur et al., 2013). If we think about function from a gene centric rather than a DNA centric point of view then the molecular function of a transcription factor is the activation or repression of genes by binding to their promoters. When transcription factor binding sites were first mapped using ChIP-chip on the human genome they estimated there were over 10,000 binding sites - it was been suggested that at least some of these are likely not functional, binding events that do not activate or repress the transcription of any gene (Euskirchen and Snyder, 2004). Thus, many of these DNA elements may not appear to be functional when we consider their effect on gene expression. Consider the example of pha-4, which is a FoxA transcription factor whose activity is necessary and sufficient for pharynx specificity in C. elegans (Horner et al., 1998). Microarray experiments have identified 240 genes which are expressed in the pharynx (Gaudet and Mango, 2002), but ChIP-Seq experiments have found 4800 genes whose promoters are bound by pha-4 (Zhong et al., 2010). These data are consistent with model in which there is pervasive transcription factor binding across the genome due to a lack of selection against the random creation of spurious binding sites, and that these spurious binding sites do not activate gene transcription. In support of this, evolutionary comparisons of binding events of core developmental transcription factors in Drosophila species show that there is extremely little conservation of individual sites and this rapid turnover likely indicates that many binding events are undetectable by evolution (Paris et al., 2013). Chapter 1. Introduction 33
Much like how transcription factor binding sites change rapidly due to a lack of negative selection, it has been suggested that a lack of negative selection against non- functional protein-protein interactions, particularly in species with a low effective pop- ulation size where selection is ineffectual, has led to a number of such non-functional interactions in PPI networks (Levy et al., 2009). As evidence for this hypothesis, Landry et al. (2009), found that S/T sites with evidence for phosphorylation, but without ev- idence of a functional role of that phosphorylation event, evolve at the same rate as non-phosphorylated S/T residues. While the random birth of a non-functional PPI in- tuitively seems like it should be a rare evolutionary occurrence, groups have found that PPIs can be formed by mutation of a single or a handful of residues (Skerker et al., 2008; Grueninger et al., 2008). Along with protein-DNA or protein-protein interactions, another example of a molec- ular event which may be widespread with little functional consequence is alternative splicing. High-throughput techniques have identified that up to 95% of genes in the hu- man genome are alternatively spliced (Pan et al., 2008). However, it is apparent that only a small subset of these events are conserved between related species; only 50% of human alternatively spliced exons are conserved in chimpanzees and only 20% are con- served in mouse (Barbosa-Morais et al., 2012). These results are consistent with a model in which a lack of negative selection against functionally irrelevant splice isoforms causes their proliferation, especially in species with very ineffectual selection such as humans. It is difficult to convincingly show that a given molecular event does not have a phenotypic effect, however, our current understanding of the strength of selection suggests that many other molecular events are likely to represent evolutionary noise. For example, neutral evolution could generate binding sites, which in turn lead to gene expression in new tissues, despite no functional importance in that tissue. It has been estimated that pol-II has only a 104-fold higher specificity for maximally active promoters than average sequence (compare this to 108 for nucleotide addition during DNA replication), and based Chapter 1. Introduction 34
on the size of the yeast genome 90% of pol-II initiation events in yeast are transcriptional noise (Struhl, 2007). Consistent with this, genes which express in a single tissue are rare in C. elegans (Hunt-Newbury et al., 2007). If we consider evolution as the mechanisms which can change population allele fre- quency, and if we accept how ineffectual negative selection can be, particularly in the human genome, then we should also accept that there is a neutral proliferation of molec- ular functions which are irrelevant for the organism level phenotype. Evolution is not a process which generates optimal molecular pathways without any form of waste, rather a process which creates and maintains pathways which are just barely good enough to get the population to the next generation.
1.9 Open questions and thesis goals
High-throughput technologies such as gene expression microarrays have been successfully used on related species for making evolutionary comparisons. However, there are other high-throughput technologies for looking at gene function, which have not yet been used on closely related species to understand how gene function evolves. Particularly, RNAi has been used extensively to measure the biological gene function in C. elegans (Kamath et al., 2003), and recent developments which have made C. briggsae amenable to RNAi by feeding (Winston et al., 2007), have opened the door to a genome scale comparison of loss of function phenotypes in order to understand how biological gene function differs between species. While C. elegans and C. briggsae are a good pair of species for which to investigate changes in gene function using RNAi by feeding for reasons that I have discussed, there do not appear to be many instances of adaptation between these species. They look nearly identical under the microscope, they have a highly conserved developmental lineage (Zhao et al., 2010), they are even capable of cross fertilizing one another although the embryos Chapter 1. Introduction 35
die (Baird et al., 1992), and they occur in an overlapping ecological niche (F´elix and Duveau, 2012). These points are highly suggestive that strong stabilizing selection has acted over the past tens of millions of years to preserve the biology of these species, which can be explained by the model of DSD. Thus, a high throughput look at differences in biological gene function by using RNAi screening in the model pair of C. elegans and C. briggsae could shed substantial light on how much gene function changes during DSD - this is one of the goals I have for this thesis. Another aspect of the evolution of gene function which is not well understood is how novel genes acquire functions after being born. There are some examples from the literature of an evolutionary novel gene being created and then changing enough to acquire a biological function, for example Poldi has acquired a necessary function for sperm motility (Heinen et al., 2009). However, there is no high-throughput look at which novel genes go on to develop essential functions for the organism. Given that there are large numbers of high-throughput datasets which have been produced which contain a large amount of information on gene function, it should be possible to use that information to create predictions of which novel genes become essential, and an understanding of what kinds of functional features cause novel genes to become essential. This is the goal of the second section of my thesis. Chapter 1. Introduction 36
ANCESTRAL
organism-level molecular phenotype phenotype
genome evolution
DERIVED
ADAPTATION DSD NEUTRAL
molecular CHANGED CHANGED same phenotype organism-level CHANGED same same phenotype
Figure 1.1: Possible outcomes of genome evolution. As a genome evolves, the accumulated mutations can be neutral, having no impact on the molecular phenotype (that is, the functions encoded in the genome and the ways that these are regulated), or they can lead to adaptation via changes in heritable phenotype due to changes in the molecular phenotype. Developmental System Drift (DSD) describes a third possibility: while the overall phenotype of the organism remains identical, the underlying genetic networks underpinning this phenotype have changed. A key outcome of this is that some orthologous genes play different in vivo roles in phenotypically identical, related species. Chapter 2
Evolution of ortholog gene function in Caenorhabditis spp.
This work in this chapter was published in part in: Verster AJ, Ramani AK, McKay SJ, Fraser AG. (2014). Comparative RNAi Screens in C. elegans and C. briggsae Reveal the Impact of Developmental System Drift on Gene Function. PLoS Genetics, 10(2):e1004077 I performed all work associated with this chapter except Arun K. Ramani who helped me by providing a second set of eyes in the manual RNAi screening, Sheldon J. Mckay who wrote the C. briggsae RNAi library primer design pipeline, and Andrew G. Fraser and Arun K. Ramani who contributed to writing the text.
37 Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 38
2.1 Abstract
Although two related species may have extremely similar phenotypes, the genetic net- works underpinning this conserved biology may have diverged substantially since they last shared a common ancestor. This is termed Developmental System Drift (DSD) and reflects the plasticity of genetic networks. One consequence of DSD is that some ortholo- gous genes will have evolved different in vivo functions in two such phenotypically similar, related species and will therefore have different loss of function phenotypes. Here I report an RNAi screen in C. elegans and C. briggsae to identify such cases. I screened 1333 genes in both species and identified 91 orthologs that have different RNAi phenotypes. Intriguingly, I found that recently evolved genes of unknown function have the fastest evolving in vivo functions and, in several cases, I identify the molecular events driving these changes. I thus find that DSD has a major impact on the evolution of gene function and I anticipate that the C. briggsae RNAi library reported here will drive future studies on comparative functional genomics screens in these nematodes.
2.2 Introduction
As genomes evolve, new genes are born and older genes may adopt novel functions, fuse, or disappear altogether. What are the phenotypic consequences of this continual molecular change? One striking consequence of the evolution of genomes is adaptation: novel genetic variants can underpin the evolution of novel organism-level phenotypes such as new anatomical structures or behaviors and, if these result in improved fitness, these can become fixed in the population through selection. At the molecular level, such novel organism-level phenotypes can arise through the evolution of entirely novel biochemical activities such as novel genes, new protein domains, or new classes of functional RNAs: for instance, metazoan genomes encode classes of proteins that are absent from single- Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 39
celled eukaryotes and that participate in metazoan-specific processes (e.g. netrins in axon guidance, immunoglobulins and MHC complex subunits in the immune system). New organism-level phenotypes can also result from the rewiring of already existing activities such as the shuffling of existing domains into novel combinations (e.g. the rapidly evolving architectures of chromatin regulators, Lander et al. (2001)) or through changes in the regulation of expression of otherwise conserved genes - for example, evolution of lin-48 expression affects salt tolerance in C. elegans (Wang and Chamberlin, 2004), evolution of the yellow gene alters wing spots in different Drosophila species (Gompel et al., 2005), and evolution at the Pitx1 locus causes adaptive loss of pelvic spines in sticklebacks (Chan et al., 2010). Adaptation is dependent on changes in the molecular phenotype of the organism - the functional activities encoded by the genome and the way they are regulated - which result in selectable changes in the phenotype of the organism. At the other end of the spectrum from adaptation is neutral drift. Many genomic changes have no impact on the phenotype of the organism since they do not have any impact on the molecular phenotype, that is, on the functions encoded in the genome and their precise regulation. Such changes are therefore under no selection - while they may disappear or become fixed in a species, neither outcome is a consequence of their effect on phenotype. All changes in organism-level phenotype (such as those that result in adaptation) are thus underpinned by changes in molecular phenotype and, conversely, genomic changes that do not affect molecular phenotype cannot alter organism level phenotype and are therefore neutral. However, there is a third outcome, a phenomenon known as Develop- mental System Drift (DSD) (True and Haag, 2001). In DSD two related species share an identical organism-level function that was also present in their last shared ancestor; however, since the species diverged, the genetic networks that underpin this function have drifted. Unlike in classical drift, molecular change in DSD is under strong stabi- lizing selection to preserve the phenotype of the organism. In DSD then, the molecular Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 40
phenotype has changed, while the organism-level phenotype has remained unaltered; this is a reflection of the plasticity of genetic networks. One effect of the changes in molecular phenotype that accompany DSD is that some orthologs evolve different roles in related organisms - these will therefore have different loss of function phenotypes. If I knew the entire set of orthologous genes that have differ- ent loss of function phenotypes in two related species that have very similar phenotypes, this would provide a global view of how gene function can drift while maintaining the same organism level phenotype - this is my goal here. Specifically, by examining how DSD affects gene function in a systematic manner, I would like to examine whether the in vivo function of certain classes of genes evolves faster than others and begin to ex- plore the molecular changes which underpin the types of changes in gene function that nonetheless preserve the same overall organism-level phenotype. C. elegans and C. briggsae are both free-living hermaphroditic nematodes that share the same ecological niche (F´elix and Duveau, 2012). Their anatomical structures are strikingly similar and, up to the 350-cell stage of embryogenesis, the lineages and tim- ings of cell division are nearly identical (Zhao et al., 2008). However, their genomes
have diverged significantly in the ∼20 Mya since they last shared a common ancestor
(Cutter, 2008): only ∼60% of their genes have 1-1 orthologs, with many species-specific expansions, losses, and chromosomal rearrangements (Stein et al., 2003). There is al- ready good evidence that while C. elegans and C. briggsae have very similar biology, the genetic networks that control this are not the same, since while they can fertilize each other, the resulting interspecific hybrids die as embryos (Baird et al., 1992). More specifically, a small number of genes is also known to play very different roles in otherwise identical processes - for example, while early embryogenesis is identical in both species, knocking down the Wnt-pathway effector pop-1 by RNA-mediated interference (RNAi) causes opposite cell fate transformations in the two nematodes (Zhao et al., 2010; Lin et al., 2009). Thus, while many of the organism-level phenotypes are highly conserved Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 41
between these two worms, the genetic networks underpinning these functions may have diverged considerably. A systematic comparison of loss of function phenotypes between orthologous genes in these two related nematodes might thus shed light on how DSD affects gene function. RNAi-based screens have been used extensively in C. elegans to identify the in vivo (i.e. organism-level) functions of each gene (Fraser et al., 2000; Kamath et al., 2003; S¨onnichsen et al., 2005; G¨onczy et al., 2000). However, no analogous screens have been carried out in C. briggsae. In this chapter, I describe the construction of a C. brig- gsae RNAi library of 1333 dsRNA-expressing bacterial strains analogous to the well- characterized C. elegans RNAi library (Fraser et al., 2000; Kamath et al., 2003) - feeding any single bacterial strain to C. briggsae targets a single C. briggsae gene. The genes targeted in the library are the great majority of the C. briggsae 1-1 orthologs of C. el- egans genes that have a well-characterized RNAi phenotype (see Methods). Comparing the RNAi phenotypes of the C. briggsae gene with the RNAi phenotypes of its C. elegans ortholog thus allows identification of orthologs that have different loss of function phe- notypes in these two worms indicating that they play different roles in the development and function of these anatomically highly similar animals. In this section of my thesis I report the construction of a C. briggsae RNAi library and a screen to identify orthologs that have different RNAi phenotypes in C. elegans and C. briggsae. This data indicate that while these two species have very similar morphology and behavior, many orthologous genes have different in vivo functions suggesting that DSD has a major impact on the evolution of gene function. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 42
2.3 Results
2.3.1 Construction and screening of the C. briggsae RNAi li-
brary
RNAi is an extremely powerful tool for examining gene function in C. elegans (Fire et al., 1998). RNAi allows the knock-down of any gene in vivo and thus can be used to rapidly identify the role any gene plays in the development and function of the worm, that is, its organism-level function. In C. elegans, RNAi can be induced by feeding worms with bacteria expressing dsRNA complementary to a gene of interest (so-called RNAi by feeding, Timmons and Fire (1998); Kamath and Martinez-Campos (2001)) and a library of dsRNA-expressing bacteria has been constructed that allows the researcher to individually target over 80% of all predicted C. elegans genes (Fraser et al., 2000; Kamath et al., 2003). I wished to construct an analogous library for C. briggsae and use it to compare RNAi phenotypes of orthologous genes between species. Constructing and screening a genome-scale RNAi library for C. briggsae is a huge undertaking. Since my principal goal was to identify genes that have different RNAi phenotypes in C. elegans and C. briggsae, the great majority of genes will be uninforma-
tive since they will have no readily detectable RNAi phenotype in either worm (∼85% of genes have no readily detectable phenotype in C. elegans (Kamath et al., 2003), and this is likely to be broadly similar in C. briggsae). I thus decided to construct a library tar- geting only the set of 1437 C. briggsae genes that had direct 1-1 orthologs with the 1640 genes which were previously shown to have a robust, readily detectable RNAi phenotype in C. elegans (Kamath et al., 2003) (see Methods, Figure 2.1A). Although this excludes a small number of genes that have no apparent phenotype in C. elegans but that have a phenotype in C. briggsae, this set will nonetheless cover the great majority of genes that have phenotypes in C. briggsae. I made the library according to the same design prin- ciples as the C. elegans RNAi library (Fraser et al., 2000; Kamath et al., 2003), and as Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 43
far as possible targeted an orthologous region of the C. briggsae gene as was targeted by the C. elegans RNAi fragment (Figure 2.1B). In total, I was able to construct targeting strains for 93% (1333) of the 1437 targeted genes in C. briggsae (Methods, Figure 2.1A). The central goal of this project is to compare the loss of function phenotypes of orthologous genes in C. elegans and C. briggsae - accurate identification of orthologs is thus critical. I initially used InParanoid 6.1 (Berglund et al., 2008) to identify putative 1-1 orthologs - these candidates are similar to candidates that would be identified using reciprocal BLAST, and this is a reasonable place to start. To increase my confidence that the identified putative orthologs are indeed likely to be true orthologs, I carried out three sets of additional tests. First, I determined whether there are additional closely related genes in either genome, in which case orthology can be harder to assign, or whether the putative orthologs appear to be the sole related gene in either genome, in which case orthology is fairly unambiguous. For example, K04G7.1 and CBG16609 are reciprocal best hits and have a BLAST E-value of 0 in either direction; in C. briggsae, the next closest BLAST hit is CBG20138, with a E-value of 8×10−4, and in the other direction, the next closest C. elegans hit is C01H6.2 with an E-value of 4.3. When the difference in E-value is greater than 20 on a log10 scale I called these unambiguous and 72% of my ortholog pairs fall into this class. Second, I checked whether the ortholog pairs identified via InParanoid, a graph-based method, were also identified using a tree-based method, which is a very different and complementary approach (Kuzniar et al., 2008). In this case I used TreeFam (Li et al., 2006) and I found that 90% of my putative orthologs are identified as orthologs in TreeFam. Finally, I used synteny to resolve harder assignments. Alignments of the C. elegans and C. briggsae genome indicate that considerable proportions of these genomes are syntenic (Stein et al., 2003), that is, many segments can be identified in which gene order has been preserved in both species since the last common ancestor. Synteny can be used to aid in identification of likely true orthologs in complex cases (e.g. large families of closely related genes, or cases where Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 44
orthologs have diverged greatly). I was able to find evidence of upstream or downstream synteny in 87% of cases. Together these results suggest that my ortholog identification is correct in the great majority of cases - 72% are unambiguous, and a further 27% of the putative orthologs can be confirmed either through TreeFam or synteny - thus 99% of my orthologs can be confirmed by other complementary approaches. These data are all summarized in Table 2.1. To screen the C. briggsae library, I followed an identical screening protocol to that used in the first genome-scale screens in C. elegans (Fraser et al., 2000; Kamath et al., 2003) and assessed the same developmental and morphological phenotypes (see Methods for a complete list). However, while wild-type C. briggsae is capable of RNAi when the dsRNA is delivered by injection, RNAi by feeding is ineffective at least in part because of the inability of the C. briggsae SID-2 to actively uptake dsRNA (Winston et al., 2007). This defect can be rescued by transgenic expression of C. elegans sid-2 (Nuez and F´elix, 2012), however, and thus all my screening was not in wild-type C. briggsae but in a transgenic line expressing C. elegans sid-2. I note that this could produce some false positive results due to genetic interactions in the background I am using, such as synthetic lethality with the expression of SID-2, but this is likely to be only a minority of cases. To identify genes with different phenotypes in C. elegans and C. briggsae,I not only compared the phenotypes in C. briggsae to previously published data for C. elegans (Kamath et al., 2003) but I also screened C. elegans side by side with C. briggsae as shown schematically in Figure 2.2A. The RNAi phenotypes of each pair of orthologs were compared in the two species at two time points by two independent observers; three C. elegans replicates and six C. briggsae replicates were examined in any single experiment. Any differences were repeated in an independent experiment, and genes where I detected a different phenotype in at least 3 out of 4 observations between the 2 observers and 2 experiments were considered as potential hits. Based on these criteria, I examined the loss of function phenotypes of 1333 orthologous genes by RNAi in C. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 45 elegans and C. briggsae and identified 679 orthologs that have different phenotypes in the two species (Figure 2.3A). There are two major sources of false positives in this initial screen, which I try to deal with using secondary filters and rescreening. The first source of false positives in my primary screen is that RNAi is more efficient in the transgenic SID-2-expressing line of C. briggsae than in C. elegans - I generally get a stronger RNAi knockdown of C. briggsae genes as measured by qPCR (see Figure 2.4). Many genes thus have stronger RNAi phenotypes in C. briggsae (eg. ytk-6 has a growth defect in C. elegans but is completely sterile in C. briggsae) but this does not reflect any true difference in in vivo function. To partly test this idea, I tested a 111 gene subset of the 508 genes that have a stronger phenotype in C. briggsae in the lin-35(n745) C. elegans strain, which has increased RNAi efficiency compared with wild-type C. elegans. I found that a substantial proportion of these genes (36%; 40/111) also have stronger phenotypes in lin-35(n745) worms than in wild-type C. elegans which provides some support for the view that the stronger phenotypes seen for many genes in C. briggsae may be due to an increased level of knockdown in C. briggsae than C. elegans. Crucially, however, this increased RNAi efficiency in C. briggsae means that in the cases where the RNAi phenotype is weaker in C. briggsae, this is unlikely to be due to a weaker knockdown in C. briggsae (given that a majority of a tested set of genes showed a stronger knockdown in C. briggsae as shown by qPCR in Figure 2.4), rather that it likely reflects a genuine difference in the in vivo function. I thus focus the rest of this chapter on studying genes whose phenotypes are weaker in C. briggsae than C. elegans and excluded all genes that had stronger RNAi phenotypes in C. briggsae from any downstream analysis. The second source of false positives is that some of the C. briggsae RNAi library clones do not produce adequate knockdowns in C. briggsae - these genes will thus appear to have weaker phenotypes in C. briggsae than in C. elegans. To address this, I made independent RNAi clones targeting a different region of the gene to that used in the Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 46
primary screen (where possible) and screened these. I re-examined the RNAi phenotypes of all 204 genes that had weaker phenotypes in C. briggsae in this way and found that 91 genes still showed reproducibly weaker phenotypes in C. briggsae with the independent clones (final breakdown of hits is shown in Figure 2.3B, genes are shown in Table 2.2). I note that while rescreening with independent targeting clones is fairly rigorous, it is still possible that both independent clones failed to generate good knockdown in C. briggsae. To assess how often this may happen, I used qPCR to examine levels of knockdown in C. elegans and C. briggsae for genes that have weaker phenotypes in C. briggsae - of the 8 genes examined, 7 showed similar or stronger knockdown in the SID-2 expressing transgenic C. briggsae than in C. elegans (and thus are true positives) and only a single example had weaker knockdown in C. briggsae. This last example, tsr-1, is a false positive in my dataset. I thus estimate that around 80-90% of my hits are true positives, but acknowledge that a few rare examples are false positives due to poor knockdown in C. briggsae. As a final confirmation of the differences in RNAi phenotype seen using the manual phenotyping described above, I retested 50 of the hits from my manual screen and a random subset of 324 additional genes using a fully automated phenotyping method (shown schematically in Figure 2.2B). This is highly complementary to manual screening. The manual screening described above has many advantages - multiple time-points are examined, many phenotypes are scored at once and, for the purposes of this screen, it allowed us to assess RNAi phenotypes in C. briggsae using the exact same methodology used for the initial screens in C. elegans. One disadvantage, however, is that it is not fully quantitative and this affects sensitivity in two ways. Firstly, there is a limit to what the eye can detect at high throughput: while differentiating between a sterile worm and one with a normal brood size is trivial, it is hard to tell the difference between a worm that has 50% of normal brood size and one that has 35% normal brood size. Secondly, different worm strains and especially different worm species do not grow identically. The C. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 47 briggsae sid-2 -expressing transgenic line that I use for all my experiments grows slightly more slowly than N2, and this inherent difference in growth rate can make identification of subtle differences in phenotype more difficult. For these reasons, I also carried out a fully automated quantitative screen using a commercially available worm sorter (Union Biometrica) which addresses both the issues of sensitivity and normalization for different growth of the two species. In outline, RNAi experiments are set up in liquid culture in 96-well format. At the start of the experiment, each well contains a saturated culture of dsRNA-expressing bacteria and 10 L1 worms; phenotypes are examined after 96 hours by which time, in a normally growing culture, the initial L1 animals have grown to fertile adults, laid the next generation, and these will have hatched. Using the worm sorter, I quantify the number of worms in each well, as well as the sizes and optical densities of each worm in each well. These data allow us to precisely measure brood size as well as identify differences in growth rate, body size, and embryonic lethality (see Methods for more details in analysis). Crucially, by comparing the phenotypes seen after targeting a specific gene with phenotypes of worms growing in bacteria expressing a control non-targeting dsRNA, all phenotypes are normalized for any inherent differences in worm growth between the two species. Using this pipeline, I confirmed statistically significant differences in phenotype for 26 of the 50 tested manual phenotyping hits; 21 showed brood size differences and a further 5 showed differences in growth rate or embryonic lethality (see Methods for data processing details). I failed to see differences in phenotype for 24 - the majority of these show subtle phenotypic differences (e.g. cuticle defects, or movement defects) that are not readily detectable in the sorter and I believe this explains the difference in the two assays. Finally, I note that I see an additional 57 genes having significantly different effects on brood size in these two species using the automated pipeline, suggesting that the true number of genes with different phenotypes in these two species is significantly greater than was detected by manual phenotyping which has few false positives but a Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 48
substantial false negative rate. In summary, I constructed an RNAi library of targeting 1333 C. briggsae genes. I used this library to compare the RNAi phenotypes of orthologs in C. elegans and C. briggsae using manual phenotyping and identified 91 genes that have different RNAi phenotypes in these two species that is likely to be due to a genuine difference in their in vivo function. The majority of these differences could be confirmed by a quantitative phenotyping method designed specifically to measure differences in brood size, lethality, and growth rate. This list of genes undoubtedly has some false positives due to inadequate RNAi knockdown in C. briggsae (e.g. the example of tsr-1 above, or pal-1 which has a detectable embryonic lethal phenotype in C. briggsae when using RNAi by soaking Winston et al. (2002), but has no phenotype in my screen) - however, my qPCR analysis suggests that only ∼15% of my reported hits are such false positives and thus that the great majority of my hits are true positives. The rest of this chapter is concerned with examining this set of genes to explore the molecular changes that underlie this difference.
2.3.2 Genes with different phenotypes are enriched for tran-
scription factors and recently evolved novel genes
I identified 91 genes that have a different RNAi phenotype in C. elegans and C. briggsae - I refer to these from here on as the ‘Different Function’ genes. To begin to under- stand why these ‘Different Function’ genes have such differing in vivo roles, I initially assessed whether this set of genes was enriched for any specific molecular functions. I annotated genes into the functional categories previously used by Kamath et al. (2003) and find that transcription factors and genes of unknown function are enriched among the ‘Different Function’ genes, while genes involved in protein synthesis are under-enriched (Figure 2.5A, p <0.01, Hypergeometric test). This indicates firstly that the basic ma- chinery of the eukaryotic cell has changed very little in organismal function over time and, secondly, suggests that transcription factors appear to have more rapidly evolving Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 49
organismal roles than other classes of gene. These two findings are unsurprising. The individual genes that encode for components of the basic eukaryotic cell machineries (e.g. DNA replication, transcription, translation etc.) are essential in organisms as divergent as worms and yeasts (Kamath et al., 2003; Tischler et al., 2006), so finding great similarity in these genes between two related species is expected. Likewise, transcription networks are well-known to be extremely plastic across evolution (Tsong et al., 2006) and thus finding an enrichment of transcription factors in the set of genes with different in vivo functions in C. elegans and C. briggsae is not unexpected. However, the finding that genes with different phenotypes are enriched for genes of unknown function is intriguing since many of these unknown function genes are nematode-specific (see analysis below). This suggested that more recently evolved genes may have most rapidly changing in vivo roles and I examined this further. To investigate more closely whether there was any correlation between the evolution- ary age of a gene (i.e. when any such gene arose de novo from non-coding sequences) and the likelihood that it had a different in vivo function between C. elegans and C. briggsae, I carried out a phylogenetic analysis for each gene screened and date the emergence of these genes to their last common ancestor in a similar method to the phylostratum ap- proach (Domazet-Loˇso and Tautz (2010), see Methods). I found that the more recently a gene has arisen, the more likely it is to have a different phenotype between C. elegans and C. briggsae. Ancient genes (those that I was able to date to the emergence of the Opisthokont lineage) are the least likely to show a difference in phenotype (<5%, p <0.01 Hypergeometric test, Figure 2.5B) while extremely recently evolved genes (those which date to the emergence of the Caenorhabditis genus) are the most likely (>15%, p <0.01 Hypergeometric test, Figure 2.5B), suggesting that phylogenetically novel genes have a high rate of evolution of their in vivo functional roles. These bulk analyses thus reveal that just as changes in transcriptional networks and the ‘invention’ of entirely novel classes of gene are major forces driving the evolution of Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 50 novel organismal functions in adaptive evolution (for example, Dai et al. (2008)), these classes of gene are those that have fastest evolving in vivo functions during DSD.
2.3.3 Changes in gene function during DSD are often the result
of promoter evolution
I found that the 91 genes that have significantly different in vivo functions in C. elegans and C. briggsae are enriched for both transcription factors and for recently evolved genes of known function. However, this does not tell us why they have different in vivo functions (and thus different RNAi phenotypes). There are three possible reasons that orthologous genes could have a different RNAi phenotype in C. elegans and C. briggsae, they might encode the same molecular function but be expressed in different tissues, the coding sequences might have diverged such that they have different molecular functions, or, while the orthologs are functionally identical both in terms of expression and encoded functions, changes in some other genes may have altered the level at which these orthologs are required in these two worms. I examined each possibility in turn. I initially focused on testing whether genes with different RNAi phenotype in C. elegans and C. briggsae might have different expression patterns in these two species. This could be due to many different levels of gene regulation from transcriptional to post- transcriptional and translational control - for the purposes of these analyses, I focused on transcriptional control of gene expression since this is a major step of regulation of gene expression. In outline, I used PCR stitching (Hobert, 2002) to generate pairs of constructs in which either the promoter of the C. elegans gene drives GFP expression or the syntenic region of the orthologous C. briggsae promoter drives expression of mWormCherry. In this way, I could make C. elegans worms transgenic for both constructs and rapidly identify cells that were either exclusively GFP or mWormCherry positive, indicating that the C. elegans and C. briggsae orthologs might be expressed in different cell types. In any cases where I found differences in C. elegans, I repeated the experiment in C. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 51
briggsae to test whether any differences in tissue expression were due to evolved changes in the promoters or any changes in trans-acting factors. Since it would have been an impractical amount of work to do this analysis for all 91 ortholog pairs, I focused my effort on examining the expression patterns of the ‘Differ- ent Function’ genes of unknown function that are uniquely found in nematode genomes since this class of gene was enriched in my dataset. I analyzed expression patterns for 12 such worm-specific ‘Different Function’ genes; in addition, to sample other gene classes, I examined expression patterns of 10 random ‘Different Function’ genes in my dataset. I identified 3 worm-specific orthologs, C03D6.1, K04G7.1, and C27F2.7, that had clearly visible differences in expression pattern between C. elegans and C. briggsae (Figure 2.6A- C); in addition, one gene in my random set, sac-1, also had a different expression pattern in the two species (Figure 2.6D). In all four cases this was due to differences in the pro- moter and not to differences in trans-acting factors since the expression patterns seen in C. elegans could be faithfully recapitulated in C. briggsae (all data shown in Figure 2.6 are expression patterns in transgenic C. briggsae). Crucially, in all four cases, the dif- ference in expression pattern is likely to explain the difference in phenotype since the tissue expression in C. briggsae, where the phenotype is weaker, is a restricted subset of the tissue expression in C. elegans. For example, C03D6.1 has a strong growth defective RNAi phenotype in C. elegans and is expressed in the gut, the hypodermis, and a small number of tail cells; in C. briggsae, where its expression is restricted to only a handful of cells in the tail, it has no obvious phenotype at all. These data strongly suggest that the reason for the differences in RNAi phenotypes between C. elegans and C. briggsae for the four genes examined here is that they are expressed in a very different set of tissues in these two animals, leading to a differential requirement for these genes for organismal viability. To test this prediction directly, I took a cross species rescue strategy. In outline, I examined the ability of a set of transgenes (shown schematically in Figure 2.7A) to rescue the phenotype of a null C. elegans mutant Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 52 and designed these to be able to test which parts of the C. elegans and C. briggsae genes are functionally interchangeable - the promoter, the coding region, neither, or both. Of the four orthologs that I could have tested, there was only a suitable null mutant for one of these, sac-1, and I focused my attention on this gene. I found that transgenic expression of the C. elegans sac-1 ORF under control of the C. elegans sac-1 promoter gives robust rescue of the growth arrest phenotype of C. elegans homozygous for the null allele sac-1(ok1602), but that the C. briggsae sac-1 ORF under control of the syntenic region of the C. briggsae sac-1 promoter does not, indicating that these genes have indeed functionally diverged. When I use hybrid rescue constructs, I found that while the coding sequences are apparently functionally interchangeable, the promoters are not: only the C. elegans promoter drives expression in the correct tissues to rescue the sac-1(ok1602) phenotype (Figure 2.7B). These data show that at least in the case of sac-1 the difference in RNAi phenotype in C. elegans and C. briggsae is entirely due to promoter evolution.
2.3.4 Ortholog pairs encoding more divergent protein sequences
are more likely to have different RNAi phenotypes
I examined the expression patterns of 22 pairs of orthologs that have different RNAi phenotypes in C. elegans and C. briggsae and found that 4 of these have obviously different expression patterns, suggesting that promoter evolution underlies the differences in in vivo function that I observe for these genes. However, as shown in Figure 2.7A differences in in vivo function might also be due to evolution of coding sequences - if the C. elegans and C. briggsae orthologs encode different enzymatic activities, for example, this could result in different in vivo functions. Using a similar hybrid transgene rescue strategy to that for sac-1 above, I tested whether coding sequences of bli-4, bli-5, vha-5, flr-1, sma-3, and sem-5 were functionally interchangeable, or whether there was evidence that they had evolved functional differences. I selected these 6 genes since for each gene Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 53
there was a null allele available in C. elegans that had a readily detectable phenotype; for most genes, there was no null allele available at the time, and thus I could not carry out similar tests for most of my dataset. I found no clear examples where the difference in RNAi phenotype of orthologs in C. elegans and C. briggsae could be conclusively shown to be due to evolution of coding sequences. However, I only tested a very small number of cases and, in many of these cases, I failed to get strong enough rescue of the null phenotype by transgenic expres- sion of the C. elegans coding sequence under the C. elegans promoter to allow us to distinguish between the ability of different hybrid transgenes to give different levels of rescue. These are therefore inconclusive experiments and, as more null alleles are being generated, it will be interesting to revisit this. I note however that bulk analyses of the protein sequences encoded by C. elegans and C. briggsae orthologs indicates that diver- gence of protein sequence between orthologs does appear to correlate with the likelihood that orthologs have different RNAi phenotypes. I compared the proteins encoded by orthologous ‘Different Function’ genes in C. elegans and C. briggsae and find that the ‘Different Function’ genes have drifted slightly more in sequence than the Same Func- tion genes as would be expected if changes in protein function have in part driven the evolution of different organismal functions for these genes. I found that the alignable regions are more divergent (as measured by the Ka or Ka/Ks metrics; see Figure 2.8A,B, p <0.01 Mann Whitney U test) and that both the number and the total length of non- alignable regions are slightly increased (Figure 2.8C,D, p <0.01 Mann Whitney U test). This is consistent with a model in which drift in the proteins encoded by orthologous genes might contribute to DSD, but this effect is modest at this level of bulk analysis. It is nonetheless predictive: the orthologs that differ most in sequence are substantially more likely to have different RNAi phenotypes than more similar orthologs and this is shown in Figure 2.8E. I thus find that the greater the divergence in protein sequence between orthologs, Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 54
the greater the likelihood that they will have different in vivo functions, as identified by different RNAi phenotypes. However, I have no conclusive evidence to show that this is causative rather than correlative: it could simply be that genes with differing in vivo roles have more rapidly diverging coding sequences and this is still an open question from my data.
2.3.5 Orthologs may have different organismal roles due to changes
in other genes
I tested whether changes in RNAi phenotype might be due either to changes in gene expression or to changes in the molecular functions of the encoded protein. I identified four genes with a different RNAi phenotype between C. elegans and C. briggsae which is likely to be due to changes in promoter sequence and for one of these, sac-1, I showed that to be the case. In addition, given the increased protein divergence between orthologs that have different RNAi phenotypes in the two worms, it appears that many of the molecular events that lead to changes in the level of requirement for a specific gene are likely to be linked to changes in the gene itself, either in its promoter or in its coding region. As shown in Figure 2.7A, there is a final possibility: that orthologs in the two species might encode identical proteins and be expressed in an identical manner, yet still have very different RNAi phenotypes due to changes in other genes that alter the level at which the orthologs are required. In such cases, both the coding regions and the regulatory sequences are functionally interchangeable between the orthologs, but the RNAi phenotypes in the two species still differs. Similar cross-species transgenic approaches have been used to great effect between C. elegans and C. briggsae. For example, a similar cross species rescue experiment has been used to show that the different RNAi phenotype of gld-1 lies in the overall genetic context of C. elegans and C. briggsae and not in the molecular function of gld-1 (Beadell et al., 2011; Liu et al., 2012) and careful analysis of unc-47 has revealed extensive compensatory evolution in the regulation of gene expression in these two species Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 55
(Barri`ere et al., 2011, 2012). I found two examples of orthologs that have differing in vivo functions in C. elegans and C. briggsae due to changes in other genes. bli-4 and bli-5 act together to regulate molting and have very different phenotypes in the two species studied - for example bli-5 has a strong blistering phenotype in C. elegans but not in C. briggsae (Figure 2.9A,B). Bli-4 encodes a subtilisin-like serine protease (Peters et al., 1991) whereas bli-5 encodes a kunitz family serine protease inhibitor thought to act with BLI-4 (Page et al., 2006). Given that these genes are hypothesized to act together to affect cuticle development, I wondered whether the difference in requirement for these two genes in C. elegans and C. briggsae might not be due to independent functional changes in bli-4 and bli-5, but to changes in the requirement for this entire pathway between the two worms due to changes in other genes. Using transgenic rescue experiments I found that both the coding sequences and the promoters of C. elegans and C. briggsae are functionally inter- changeable for both bli-4 and bli-5 : expression of the C. briggsae bli-5 under control of the C. briggsae bli-5 promoter gives as robust rescue of the C. elegans bli-5(e518) null mutant as expression of C. elegans bli-5 coding region under control of the C. elegans bli-5 promoter (Figure 2.9C); the same is true for rescue of the C. elegans bli-4(e937) mutant by C. briggsae bli-4 (Figure 2.9D). Thus, at least in these two cases, I have found examples where the difference in the RNAi phenotype for orthologs in C. elegans and C. briggsae is not due to any difference in the genes themselves, but rather in the level of requirement for the pathway in which the genes act.
2.3.6 Conservation of function can be maintained at the level
of gene family and not gene family members
In the case of bli-4 and bli-5 above, these genes have differing RNAi phenotypes in the two species studied because of changes elsewhere in the genetic networks of these worms but one cannot trivially pinpoint these other changes. However, for a subset of genes Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 56
with differing phenotypes one can make an educated guess - the set of genes that are members of multigene families. In these cases, it is possible that both worms have an essential requirement for a specific gene activity but that this is carried out by different members of the same gene family in the two worms. Although I have not followed this in depth, I have data that are consistent with this. I first examined all 91 C. briggsae genes that had a weaker phenotype and searched for related genes in the C. briggsae genome (see Methods for details) that might instead be carrying out the required molecular function. If this is indeed the case, these related genes would thus be expected to have a stronger phenotype in C. briggsae. There are 49 genes with a weaker phenotype in C. briggsae for which I was able to find one or more related genes in the C. briggsae genome that might have a similar molecular function. When I compared RNAi phenotypes in C. elegans and C. briggsae for these related genes, I found 5 examples where the C. briggsae gene has a stronger phenotype than C. elegans (Figure 2.10). The numbers of genes I examined here (5 out of 49 examples) are too low to support a statistical analysis of these findings, rather they are exploratory. For example rsp-3 is an SR protein which is 100% embryonic lethal in C. elegans, but not in C. briggsae, while rsp-6, a different SR protein, is 100% embryonic lethal in C. briggsae but not C. elegans. The family of rsp genes is known to have multiple functional overlaps in C. elegans (Longman et al., 2001; Kawano et al., 2000) and I suggest that not only is this true in C. briggsae but, crucially, that the relative importance of each family member differs in the two species. This is consistent with a model in which both C. elegans and C. briggsae require a specific molecular function, but that this function can be carried out by different members of the same family of genes in the two species. In summary, I have a generated an RNAi library targeting 1333 C. briggsae genes; each targeted gene is the direct ortholog of a C. elegans gene known to have a clear detectable RNAi phenotype. I screened for genes that have major differences in in vivo function between the two nematodes but clearly many more refined RNAi screens are Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 57
possible using this reagent and I anticipate that the availability of my library will help drive progress in this area of comparative evolutionary development. I identified 91 genes with obviously different in vivo functions and examining these genes reveals key features of the molecular events driving the changes in gene function that accompany DSD. In more focused studies, I showed that multiple genes with different in vivo functions have evolved different expression patterns and, in the case of sac-1, I showed that promoter evolution is indeed the cause of the change in in vivo function. This is only one example and I anticipate that my dataset, along with the RNAi library itself, will provide a rich source of other future detailed studies to pinpoint the molecular causes of the changes in in vivo function that I observe.
2.4 Discussion
C. elegans and C. briggsae are phenotypically extremely similar. They live in the same ecological niche (F´elix and Duveau, 2012), they have near-identical development (Zhao et al., 2008), and are sufficiently morphologically close that they can be crossed and can fertilize each other (Baird et al., 1992). The resulting interspecies hybrids are not viable, however, indicating that while the biology of these two nematodes is nearly identical, the molecular pathways that underpin this conserved biology have diverged substantially, a phenomenon termed Developmental System Drift (DSD) (True and Haag, 2001). One of the consequences of DSD is that some orthologous genes play different in vivo roles in the two species and thus their loss of function phenotypes will be different. My goal in this study was to investigate the consequences of DSD on gene function in C. elegans and C. briggsae. Rather than examine one specific process in great detail, as has been done successfully before in these two species (F´elix, 2007; P´enigault and F´elix, 2011; Hoyos et al., 2011), I chose instead to carry out a broad screen to identify as many cases as possible of genes that have different in vivo roles due to DSD and hence to gain insight Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 58
into the following questions. How many genes have changed their in vivo roles as these species diverged? Is this common or extremely rare? Do specific classes of genes change more frequently? Finally, can I identify any common features in the molecular events that underlie the changes in gene function that I identify? Addressing these questions gives insight both into how great an impact DSD has on the evolution of gene function and into how gene functions evolve during DSD. I used RNAi to target over 1300 genes in both C. elegans and C. briggsae. Each of these genes has a readily detectable RNAi phenotype in C. elegans and thus I could identify genes whose RNAi phenotypes (and hence whose in vivo functions) differ between these two species as the result of DSD. Using a manual phenotyping method designed to screen for a broad range of phenotypes, I identified 91 orthologs that have obviously different RNAi phenotypes in these two species (the ‘Different Function’ genes). In parallel to this, I also screened 374 genes using an automated quantitative phenotyping method which allows detection of more subtle differences in brood size and growth rate.
This more sensitive assay identified significant differences in phenotype for ∼21% of genes. Taken together, I estimate that over 25% of genes have different in vivo functions in C. elegans and C. briggsae as the result of DSD. I note that this estimate is likely to be a substantial underestimate of the true rate at which gene functions are diverging during DSD for several reasons. Firstly, while I tried hard to eliminate false positives from my dataset, both through multiple rounds of rescreening and by re-designing additional RNAi clones for each potential hit, I have little means to estimate my false negative rate. This is likely to be significant: the screen was carried out at high throughput, the phenotypes examined were fairly crude and, at least in the case of the manual phenotyping, differences needed to be quite large for us to detect. All these factors will result in false negatives and thus the proportion of the genes that I screened which have truly different phenotypes is almost certainly higher than I report here. Secondly, because of the difference in RNAi efficacy in the two species, I Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 59
could only detect biologically meaningful differences in RNAi phenotype if the phenotype was weaker in C. briggsae than in C. elegans. In all likelihood, there are as many genes that have a weaker phenotype in C. elegans than in C. briggsae as vice versa, I just cannot identify them in my screen. Finally, I screened an extremely selectively chosen gene set i.e. the set of genes that have a readily detectable RNAi phenotype in C. elegans (<15% of all C. elegans genes (Kamath et al., 2003)) and that also have a 1-1 ortholog in
C. briggsae. While only ∼60% of genes have a 1-1 ortholog in C. elegans and C. briggsae
(Stein et al., 2003), my gene set is extremely highly conserved: ∼90% have 1-1 orthologs between these two species. Furthermore, many of the genes I screened are known to be functionally conserved over extremely long evolutionary distances: for example, 60% of the genes giving lethal or sterile phenotypes in C. elegans are also essential for viability in S. cerevisiae (Tischler et al., 2006). The set of genes I screened are likely to be the most functionally conserved between C. elegans and C. briggsae of any genes in the genome. Taking this all together, my finding that during DSD over 25% of these have evolved different functional roles in the two species is surprisingly high and suggests that DSD has a major impact on the evolution of gene function. What are the underlying molecular causes of the differences in gene function that I observe as differences in RNAi phenotype? I found that three main types of molecular events explain many of the changes in gene function that I identified. Firstly, I found multiple examples in which orthologous genes that have different RNAi phenotypes also have different in vivo expression patterns. I examined the expression patterns of 22 such ‘Different Function’ genes in both species and find that 4 have a clearly different expression pattern in C. elegans and C. briggsae that is entirely due to promoter evolution. In all four cases, the species in which the RNAi phenotype is less penetrant (and thus the species which has a lower requirement for the function of that ortholog) is also the species in which the expression is far more restricted, suggesting that the difference in phenotype might indeed be explained by the difference in expression. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 60
In one case, sac-1, I tested this explicitly and showed that this is indeed true. Gene expression, as a result of promoter evolution, thus plays a significant role in the way genes change in vivo functions during DSD. Secondly, certain types of gene are more likely than others to evolve different in vivo functions as a result of DSD. While most of the core conserved components of the eukaryotic cell (the ribosome, the proteasome etc.) tend to have the same functions in both species, transcription factors and recently evolved genes of unknown function often have different phenotypes. In the case of transcription factors, this result is perhaps expected: transcriptional networks are known to be extremely plastic and can rewire extensively while still having similar outputs and responses (Baker et al., 2012). For the recently evolved genes, however, this is intriguing. None of them have orthologs outside nematodes and indeed many are specific to Caenorhabditis species, and few have any functional annotation. Why should a gene that is absolutely essential for C. elegans viability be more likely to be dispensable in C. briggsae if it evolved recently than if it is an ancient gene? What essential roles do these novel genes play in nematode biology and why do they seem to be changing so rapidly? Some carefully dissected examples already exist such as the example of fog-2 and she-1 in the independent evolution of hermaphroditism in these two species. fog-2 is a recently evolved gene which has evolved a specific function in sperm development in C. elegans, while the non-orthologous F-box protein she-1 plays the same role in C. briggsae (Clifford et al., 2000; Nayak et al., 2005; Guo et al., 2009) . The roles of such novel recently evolved genes in nematode biology and evolution are intriguing open questions that will require extensive follow-up studies. Finally, my data suggest that the individual members of multigene families frequently adopt different in vivo roles during DSD. There are often multiple redundancies among members of gene families and I suggest that this results in the requirement for any single family member to be extremely fluid over time. For example, there are well described redundancies in the SR family of splicing regulators in C. elegans (Longman et al., 2001; Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 61
Kawano et al., 2000). I found that while rsp-3 is essential for viability of C. elegans, targeting rsp-3 in C. briggsae has little effect; conversely, targeting rsp-6 in C. briggsae has a strong RNAi phenotype, but in C. elegans rsp-6 has no obvious phenotype. In this example, while both worms require an rsp activity, in C. elegans the essential rsp is rsp-3 whereas in C. briggsae it is rsp-6 and I suggest this is a common feature of drift in gene function during DSD. I note that all the three key molecular drivers of gene functional change during DSD - changes in gene expression, the rapid evolution of novel genes, and subfunctionalisation among related family members - are also central molecular drivers of changes in gene function that result in adaptation (Chan et al., 2010; Dai et al., 2008; Hittinger and Carroll, 2007). One explanation for this is that DSD and adaptation are unrelated and unlinked phenomena- for example, some evolved alterations in gene expression have advantageous phenotypic outcomes while others have no impact on phenotype and neither set of changes has any influence on the other. While this is completely plausible, there is an alternative view: that the reason that the molecular events that often underpin the changes in gene function that accompany both DSD and adaptation are very similar is that DSD and adaptation are intimately linked evolutionary phenomena. One possible conceptual model for a link between DSD and adaptation comes from detailed studies of in vitro molecular evolution (Fontana and Schuster, 1998; Schuster and Fontana, 1999). In these studies, the evolution of a new phenotype (in this case, a new fold or activity) is rarely the result of a single adaptative mutation alone. Rather, a series of phenotypically neutral mutations (the molecular equivalent of DSD) results in a derived molecule that is phenotypically indistinguishable from the ancestor, but that is different with respect to its evolvability. While a final adaptive mutation results in a new adaptive phenotype in the derived molecule, making the same mutation has no effect on the phenotype of the ancestral form. The derived and ancestral molecules are thus functionally equivalent, but a single base change has radically different phenotypic Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 62
consequences for adaptation in these two molecular species. In this way, at least at the level of adaptation of in vitro molecular phenotypes, neutral drift and adaptation are often intimately linked. I speculate that DSD and adaptation might be linked in an analogous manner at the level of whole organism phenotypes. While the widespread changes in gene function that occur during DSD do not appear to have any direct impact on phenotype, they might have profound consequences on the effect of additional subsequent changes. The effect of DSD, viewed in this way, is that while two species such as C. elegans and C. briggsae are phenotypically extremely similar at present, the possible evolutionary trajectories of the two species are very different since the phenotypic outcomes of identical molecular changes can be very different in the two animals. Changes in gene function that would be deleterious in C. elegans might have no effect in C. briggsae (e.g. mutation or change in gene expression of sac-1 ) or, at the limit, might confer a selective advantage that would drive adaptation. This idea of a potential link between DSD and adaptation is still speculative but the finding I report here that similar molecular events underlie the evolution of gene function in both processes is consistent with this notion. In summary, then, I used RNAi to identify genes with different in vivo functions in two extremely phenotypically similar nematode worms, C. elegans and C. briggsae. This study is the first systematic survey of the outcome of DSD on the in vivo functions of orthologous genes in any closely-related animal species and my data suggest that DSD has major consequences for the evolution of gene function. I anticipate that the dataset from my RNAi screen will help to drive deeper characterization of the molecular events underlying DSD and, just as the public availability of the C. elegans RNAi library was key for the systematic analysis of gene function in C. elegans, so the availability of the C. briggsae RNAi library will drive extensive comparative screens in these two related nematodes. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 63
2.5 Methods
2.5.1 Construction of the C. briggsae RNAi Library
I used InParanoid 6.1 (Berglund et al., 2008) to identify C. briggsae genes that are puta- tive 1-1 orthologs of C. elegans with a reported RNAi phenotype (Kamath et al., 2003). To further validate these ortholog assignments, I also used orthology assignments from TreeFam, which use phylogeny relationships, and also synteny to resolve complex or- tholog assignments. In order to design the C. briggsae clones I identified the orthologous region in the C. briggsae genome to that targeted by the C. elegans RNAi clone using BLAST and used this as a seed region. Predicted clones that had at least 80% identity over 200bp to additional C. briggsae genes were eliminated as having potential off target effects and manually redesigned. Secondary clones were designed by hand according to the principles above and were targeted to a separate group of exons to the first clone I used. For cloning I digested L4440 with EcoRV (Fermentas) and then dephosphorylated with Shrimp Alkaline Phosphatase (Fermentas). PCR products were amplified from AF16 genomic DNA using Pfu (Fermentas) and then phosphorylated with PolyNucleotide Kinase (Neb) for blunt end cloning. The vector and PCR products were ligated together overnight and then transformed into HT115 bacteria. The colonies were screened using a T7 colony PCR, and positives were reassembled into the correct locations in 96 well plates, and then finally verified using an insert specific colony PCR.
2.5.2 Manual screening of the C. briggsae RNAi Library
Caenorhabditis species were maintained by feeding on OP50 on NGM plates at 20 ◦C. Screening was done on 12 well agar plates as previously described (Kamath et al., 2003). I screened for a list of visible phenotypes which have been previously reported (Kamath et al., 2003), listed here: Emb (embryonic lethal), Ste (sterile), Stp (sterile progeny), Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 64
Gro (slow post-embryonic growth), Lva (larval arrest), Lvl (larval lethality), Adl (adult lethal), Bli (blistering of cuticle), Bmd (body morphological defect), Clr (clear), Dpy (dumpy), Egl (egg-laying defective), Him (high incidence of males), Lon (long), Mlt (moult defect), Muv (multivulva), Prz (paralysed), Pvl (protruding vulva), Rol (roller), Rup (ruptured), Sck (sick) and Unc (uncoordinated). Each ortholog pair was screened by 2 people in 2 fully independent experimental set-ups on separate weeks. My confidence score is the number of observations of a phenotype difference out of 4 possible observa- tions. Genes with at least 3 out of 4 observations of a different phenotype in the two species were potential hits and were tested in secondary screens. For these, I designed additional RNAi clones which targeted a different region of the C. briggsae gene where possible and screened these secondary RNAi clones in an identical way to the first screen. Genes were called as final hits if I saw a consistent phenotype difference using both the primary and secondary RNAi clones.
2.5.3 Fitness Assay
L1 animals were grown and filtered for purification as described above. RNAi clones were grown overnight at 37 ◦C in LB media with 1mM Carbenicillin and induced at a final concentration of 4 µM IPTG for one hour. After induction, bacterial cultures were spun down and resuspended in NGM containing 4 µM IPTG and 1mM Carbenicillin. 10 µl of a ∼1 worm/ µl solution were put into each well of a 96 well plate and then 40 µl of the bacterial suspension was added. Each row of the 96 well plate had 5 replicates of each RNAi clone for each species and 2 blank wells. In each plate non-targeting dsRNA- expressing bacteria (GFP) were also present as negative controls. After growing at 20 ◦C with shaking at 200 rpm for 96 hours I quantified the number of progeny using a COPAS worm sorter; the length (measured as the Time of Flight - TOF) and optical darkness (measured as Extinction - EXT) of each counted animal are also recorded. From these data, I calculated the relative brood number following RNAi as the ratio Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 65
between the worm number in the targeted cultures and the worm number in cultures grown with non-targeting GFP RNAi bacterial controls. To assess differences in relative brood size, I calculated the log ratio of the relative brood sizes for C. elegans and C. briggsae for each targeted gene, and used the empirical distribution of 60 independent non-targeting GFP RNAi bacterial controls to determine a cutoff for statistical signif- icance. In order to identify embryonic lethal phenotypes I counted objects with TOF less than 50 and EXT less than 30 (which identifies embryos) and calculated the ratio of the number of embryos to non-embryos for each RNAi and control experiment. By comparing the empirical distribution of these ratios in the control experiments to the targeting RNAi I was able to identify genes that resulted in embryonic lethality when knocked down by RNAi.
2.5.4 qPCR
For each knock down, 50 L4 larvae were grown on a lawn of dsRNA-expressing bacteria on NGM plates containing 1mM IPTG and 1mM Carbenicillin for 72 hr. RNA was harvested using Trizol (Invitrogen) and was cleaned-up using an RNeasy kit (QIAGEN). Following a DNase I digestion (Invitrogen) I carried out first strand cDNA synthesis using superscript II (Invitrogen). I calculated the efficiency of the primers by dilution curves and ensured they were between 1.85 and 2.05. The qPCR was done in a CFX96 (Bio- Rad) using Sybr Green (Clonetech) according to the manufacturers protocols. Relative expression was calculated using the Pfaffl efficiency correction (Pfaffl, 2001) where each sample was normalized to the expression of tbg-1. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 66
2.5.5 Examination of C. elegans and C. briggsae gene phylo-
genetic age
In order to define the phylogenetic position of genes I took curated lists of orthologs to the C. elegans gene from Wormbase (WS233) (Harris et al., 2010). I downloaded a phylogenetic tree from the NCBI taxonomy database, (downloaded on the 8th of January 2013) for the species which have genomes available and found the last common ancestor as the point of emergence of each gene.
2.5.6 GFP Stitching and Microscopy
PCR primers were designed to amplify 2kb upstream of the translation start site or up until the next gene. C. elegans promoters were combined through PCR stitching to the coding sequence from GFP and unc-54 3’UTR from the vector pPD95.75, while C. briggsae promoters were stitched onto the coding sequence from mWormCherry and unc-54 3’UTR from the vector pJH1774. Stitched PCR products were quantified on an agarose gel and then diluted to the same concentration and injected with pRF4 into C. elegans (N2) worms as a co-injection marker. F2 animals were isolated and then imaged on a custom Quorum confocal microscope. For each expression pattern I imaged a minimum of 3 lines to ensure I had consistent expression patterns. Any genes with obvious expression differences were then validated by injection into C. briggsae (AF16) in order to ensure that I get a consistent expression pattern.
2.5.7 Transgenic rescue experiments
I created the rescue constructs shown in Figure 2.7 by first generating constructs that encode a C-terminal GFP fusion for each ORF to be expressed using the pPD95.75 vector. For each of these I cloned the region upstream to either C. elegans or C. briggsae orthologs to make a total of 4 constructs, 2 containing DNA specific to one species, and Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 67
the other 2 being hybrids between species. The bli-4 and bli-5 constructs were injected at 15 ng / µl with pCFJ90 as a co-injection marker. I isolated F2 progeny which were positive for myo-2 ::mCherry and then I counted the proportion of RFP+ adult animals with blisters. The sac-1 constructs were injected at 1ng/µl with pRF4 as a co-injection marker into sac-1(ok1602) animals. Rol positive F2s were isolated and the proportion of homozygous adult rescued animals were scored by the absence of myo-2 GFP signal from the hT2 balancer. A subset of animals were confirmed to be homozygous by single worm genotyping PCR.
2.5.8 Examination of C. elegans and C. briggsae protein simi-
larity
Orthologs between C. elegans and C. briggsae were defined using InParanoid 6.1 and their CDS sequences were downloaded from Wormbase (WS190). I translated these to protein sequences, aligned them using ClustalW 2.0 (Larkin et al., 2007) and then projected these alignments back to the CDS sequences. I then used the Yn00 program from PAML (4.3) (Yang, 2007) to calculate Ka, Ks and the Ka/Ks ratios for C. elegans and C. briggsae orthologs. I measured evolutionary novel segments between C. elegans and C. briggsae by taking the protein alignments defined above and then identifying segments which did not align between the 2 species (minimum of 4 residues). I then counted the total number of such unique segments as well as the total residues involved.
2.5.9 Predictability of phenotype differences
In this procedure I ranked orthologs by either the Ka metric. Then I randomly picked pairs of orthologs, one with a different phenotype and one with the same phenotype, and I asked whether the ortholog with a greater Ka was the ortholog with a different phenotype. If so I classified this as a positive prediction and put it into bins based on Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 68
the rank difference of the Ka. This randomization procedure was repeated one million times and the results were plotted.
2.5.10 Identifying functionally related genes
I identified genes which have weaker RNAi phenotypes in C. briggsae and then searched for related C. briggsae genes by using BLASTP; I considered any gene with a BLASTP hits with an E-value less than 10−5 as a possible related gene. These genes display some sequence similarity to both the C. elegans and C. briggsae copy of the gene but should not be considered orthologs since they are far more divergent than the true ortholog and do not cluster together on the gene tree (Figure 2.10). I then constructed RNAi clones for the sets of related genes but excluded families with greater than five related members as being too complex. All RNAi clones were screened in C. briggsae side by side with RNAi experiments in C. elegans using clones targeting the C. elegans orthologs. In this way I compared the RNAi phenotypes of C. elegans and C. briggsae orthologs for small gene families that contain at least one member that had a weaker phenotype in C. briggsae. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 69
A
1640 Genes with an RNAi phenotype (Kamath et al. 2003)
110 genes have no ortholog at all in C. briggsae
57 genes are duplicated in C. elegans
36 genes are duplicated in C. briggsae
1437 Genes have a 1-1 ortholog in C. briggsae
104 genes failed to clone 1333 genes that are 1-1 and have an RNAi clone in both C. elegans and C. briggsae
B
chrI_random 860k 861k Gene Models CBG24498 (Cbr-sac-1)
C. elegans RNAi Clone (BLAST) Sac-1 C. briggsae RNAi Clone CBG24498
Figure 2.1: Design of the C. briggsae RNAi library. A. Breakdown of how I arrived at the final set of genes in the C. briggsae RNAi library. 1640 genes with an RNAi phenotype in C. elegans were targeted and then filtered for having a 1-1 ortholog in C. briggsae. B. Molecular design of the C. briggsae RNAi library. C. elegans RNAi clones form the Ahringer library were mapped by BLAST to the C. briggsae genome and used as a seed region for RNAi clone design. Primers were designed around this region targeting the maximum number of bases of exons.
Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 71
A
Identical Stronger 204 Weaker
639
475
B
Identical Stronger 91 Weaker
697
508
Figure 2.3: The breakdown of the final screening results I observed. A. Breakdown of primary RNAi screen. Genes were placed into three classes: those with an identical phenotype in both species (Identical); those with a stronger phenotype in C. briggsae (JU1018) (Stronger); and those with a weaker phenotype in C. briggsae (JU1018) (Weaker). B. Breakdown of the final result after rescreening with secondary RNAi constructs. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 72
1.4 C. elegans C. briggsae 1.2
1.0
0.8
0.6
Expression relative to control relative Expression 0.4
0.2
0.0
Identical Weaker Figure 2.4: Comparison of levels of knock-down achieved in C. elegans and C. briggsae using bacterial-mediated RNAi. 9 genes were individually targeted by RNAi in both C. elegans and C. briggsae, RNA was harvested after 72 hrs of treatment, and qPCR was used to examine levels of knockdown. One of the genes had an identical RNAi phenotype in the two nematodes, the other 8 had weaker RNAi phenotypes in C. briggsae as indicated in the figure. The data in the graph represent the means of three independent biological replicates; each biological replicate had two independent technical replicates. The error bars shown are the standard error and expression levels are expressed relative to the expression of tbg-1. Genes are from left to right pqn-85, nekl-2, K04G7.1, csn-5, unc-62, mcm-7, sac-1, apr-1, tsr-1. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 73
A 0.30
0.25
0.20
0.15
0.10
0.05 Fraction of genes with a different phenotype a different with of Fraction genes 0.00 All Prot. TF Unk. B 0.25
0.20
0.15
0.10
0.05 Fraction of genes with a different phenotype a different with of Fraction genes 0.00 Opis. Coel. Nem. Chro. Caen. Figure 2.5: A. Functional enrichment in genes with different in vivo functions in C. elegans and C. briggsae. All 1333 genes analyzed were manually placed into the functional categories described in Kamath et al. (2003). The graph shows the proportion of genes that have different RNAi phenotypes in several different functional classes: all genes analyzed (All), genes annotated to have roles in Protein Synthesis (Prot. Synth.), Transcription Factors (TF), or genes of Unknown function (Unk). Classes with significantly fewer genes with different RNAi phenotypes are shown in blue; those with statistically increased numbers are shown in red. Enrichments are significant with an FDR of 0.05 (Hypergeometric test). B. RNAi phenotypes differ most for more recently evolved genes. All 1333 genes analyzed were placed into five classes based on their evolutionary age as described in Methods. The most ancient genes could be dated back to the emergence of the Opisthokta lineage (Opis.), then becoming progressively younger, I could date sets of genes back to the emergence of the Coelomata (Coel.), Nematoda (Nem.), Chromadorea (Chro.), and finally some genes had arisen so recently that they were only detectable in Caenorhabditis species (Caen.). In each case, the graph shows the proportion of genes in each evolutionary class that had a different RNAi phenotype. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 74
A C03D6.1
B K04G7.1
C C27F2.7
D sac-1
Figure 2.6: in vivo expression of a subset of genes with different RNAi pheno- types. C03D6.1, K04G7.1, C27F2.7, and sac-1 had strongly different RNAi phenotypes in C. elegans and C. briggsae. I generated transgenic C. elegans strains (N2) expressing GFP under control of the promoter of the C. elegans ortholog or C. briggsae strains (AF16) expressing mWormCherry under control of the orthologous C. briggsae promoter for each gene. In each case, four panels are shown: DIC image of N2 worms transgenic for the C. elegans promoter driving GFP, fluorescence image of N2 worms transgenic for the C. elegans promoter driving GFP, DIC image of AF16 worms transgenic for the C. briggsae promoter driving mWormCherry expression, fluorescence image of AF16 worms transgenic for the C. briggsae promoter driving mWormCherry expression. Images are confocal projections at 200X magnification, and scale bars represent 100 µm, except for C. elegans K04G7.1 which is at 400X magnification with a scale bar representing 50 µm. Images are representative of 3 independent lines. A. Expression difference for C03D6.1. Arrow heads indicate tail cells (white), intestine (red) and hypodermis (blue). B. Expression difference for K04G7.1. Arrow heads indicate head neurons (white) and body wall muscle (red). C. Expression difference for C27F2.7. Arrow heads indicate head neurons (white), hypodermis (blue), intestine (yellow), vulva (green) and tail neurons (red). D. Expression difference for sac-1. Shown are 3 confocal projections along the body of the animals 400X magnification. Scale bars represent 50 µm. Arrowheads indicate the intestine (blue), pharynx and pharyngeal neurons (white), spermatheca (yellow), and tail cells (red). Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 75
A B C. elegans 0.4 C. briggsae Change in gene expression 0.3
Change in protein sequence 0.2
0.1 Change in pathway structure of Fraction homozygous survive which progeny
0.0
Figure 2.7: Differences in sac-1 RNAi phenotype are due to differences in sac- 1 promoter function. A. Schematic illustrating transgenic rescue approach. To determine whether the difference in in vivo function of sac-1 in C. elegans was due to changes in the sac-1 coding region, differences in sac-1 expression, or due to changes in other genes, I tested the ability of the hybrid transgenes illustrated to rescue the sac- 1(ok1602) phenotype in C. elegans. B. The ability to rescue a sac-1 null allele requires the C. elegans sac-1 promoter. I generated C. elegans lines transgenic either for the C. elegans sac-1 ORF under control of the C. elegans sac-1 promoter; the C. briggsae sac-1 ORF under control of the C. briggsae sac-1 promoter; or for the two hybrid constructs shown. In each case, I examined the ability of the transgenic array to rescue the developmental arrest phenotype of sac-1(ok1602) homozygous animals - the graph shows the percentage of animals that reached the adult stage that are homozygous for the sac-1(ok1602) allele, indicating rescue. Either the C. elegans sac-1 ORF or the C. briggsae sac-1 ORF under control of the C. elegans sac-1 promoter could partially rescue; no rescue was seen for the remaining constructs, indicating that the sac-1 promoter has diverged in the two species, while the sac-1 coding regions appear to be functionally interchangeable. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 76
A B 0.30
0.6 0.25
0.20
0.4 0.15 Ka Ka/Ks 0.10 0.2 0.05
0.0 0.00 Different Same Different Same C D 10 100
8 80
6 60
4 40 Sum of all the Sum of non-aligned all alignment per regions Number of non-aligned Number alignment per regions 2 20
0 0 Different Same Different Same E Positive Negative Positive predictive value predictive Positive 0.0 0.2 0.4 0.6 0.8 1.0 0 0.10.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Rank difference Figure 2.8: A.,B. Enrichment of metrics relating to protein divergence for genes with a different phenotype. Ka (Non-synonymous substitutions per non- synonymous site) and Ka/Ks ratio was calculated from the Yn00 program of PAML 4.3 C.,D. Non-alignable differences in protein sequence between C. elegans and C. briggsae orthologs. The number of non-aligned regions is calculated by counting the number of gaps (minimum size of 4 residues) in the protein alignments, and the total number of residues in all of those gaps. E. Ka is predictive of a different phenotype when the difference in large. Genes were ranked by the Ka metric and random pairs were chosen, one with the same phenotype and one with a different phenotype. If a Ka in C. elegans predicted the gene with a different phenotype, then this is classified as a positive prediction. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 77
A B
1.0 0.8 0.6 0.4 C. elegans C. briggsae 0.2 bli-5 (RNAi) bli-5 (RNAi) 0.0
Fraction of adults with blisters with of Fraction adults C. elegans C. briggsae bli-5 (RNAi) bli-5 (RNAi)
C D
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2 Fraction of adults with blisters with of Fraction adults blisters with of Fraction adults 0.0 0.0 bli-5(e518) C. elegans C. briggsae bli-4(e937) C. elegans C. briggsae rescue rescue rescue rescue Figure 2.9: bli-5 and bli-4 have an identical gene function despite showing different RNAi phenotypes. A. RNAi phenotype of bli-5 in C. elegans (N2) and C. briggsae (JU1018). B. Quantification of the phenotype shown in panel A. C. Rescue of the bli-5(e518) phenotype by either C. elegans or C. briggsae bli-5 genes. I generated transgenic bli-5(e518) lines in which either C. elegans bli-5 coding region was expressed under the control of C. elegans bli-5 promoter (C. elegans rescue) or the C. briggsae bli-5 under the control of the C. briggsae bli-5 promoter (C. briggsae rescue). I examined adult animals and assessed the proportion with blistered cuticles; the results were combined across lines, with a minimum of 3 lines. Error bars represent the standard error on the binomial proportion. D. Similar data to panel C but instead showing rescue of the bli-4(e937) allele with analogous C. elegans and C. briggsae bli-4 constructs. Chapter 2. Evolution of ortholog gene function in Caenorhabditis spp. 78
100% Emb Wt
✌
✮
✠
✯
☞
✰
☛
✱
✡
✲
✠
✳
✟
✴
✞
✵ ✝
Cbr-cyp-32B1
✶ ✷
F53B1.4 ✸
✹
✺
✻
✼
✹
✽
✾
✿ ❀
CBG14085
✕
✔
✣ ✤ ✥✦✧★✩✦ ✪✫✬✭
✓
✒
✑
✏ ✎ Gro ✍ CBG22228 Y71G12B.6 Gro C01F1.3
Wt