Evaluating Patterns of Selection in Reproductive and Digestive Protein Genes of Seed Beetles
Total Page:16
File Type:pdf, Size:1020Kb
Evaluating patterns of selection in reproductive and digestive protein genes of seed beetles. A comparative approach. Konstantinos Papachristos Degree project in biology, Master of science (2 years), 2021 Examensarbete i biologi 60 hp till masterexamen, 2021 Biology Education Centre and Department of Ecology and Genetics, Uppsala University Supervisor: Göran Arnqvist External opponent: Philipp Kaufmann, Sanne Cornelia Everling Abstract Seminal fluid proteins (SFPs) have been shown to affect the physiology, behaviour and immune responses of mated females in some species. This open window for manipulation of female’s fitness allows the possibility for complex evolutionary dynamics between the SFPs and proteins of females that would counter the effects of the former, the female reproductive proteins (FRPs). Also, the bean beetles of the Bruchinae subfamily are pests to pre- ferred species of plant hosts. The hosts have a great variety of secondary defensive metabolites between them and to detoxify those compounds, each beetle species is expected to have a well adapted arsenal of digestive proteins for a specific host. I carried out a comparative study with four species of bean beetles with the aim to identify patterns of selection in the proteins mentioned. Expres- sion data for one of those species, Callosobruchus maculatus, has allowed to identify its SFPs, FRPs and digestive proteins and with orthology inference I identified their orthologues in the other three species. Then I estimated the ratio of non-synonymous to synonymous substitution rates (w) for each pro- tein by using codeML of the PAML package and used them as a proxy for estimating selection. FRPs had about the same w values as conserved genes found across the Arthropod phylum, while the SFPs and digestive proteins had higher w values, indicating more relaxed purifying selection. I also performed tests of positive selection and have identified 92 digestive proteins, 9 FRPs and 26 SFPs as potential targets for future functional work. Finally, I examined the scenario of co-evolution between SFPs and FRPs because of direct interaction. By correlating branch-specific w values for each possible pairs of proteins I found that SFPs are associated on average more with FRPs than with digestive or conserved genes, as expected. The same was true for the FRPs. Also I examined the possibility of factors contributing to the association such as expression levels, sex-biased expression and protein func- tion. Using linear regression models I found that expression levels and protein function do predict in some degree the w estimates and could thus also affect the correlations examined. High gene expression levels reduce the overall w values of genes, also known as E-R anticorrelation. Sex-biased expression does not affect the overall w values, but does affect the intensity of the E-R anticorrelation, with it being less prominent in male-biased genes and more prominent towards female-biased genes. i To Katerina and my family. Thank you for all your love and support. Contents 1 Introduction .................................................................................................. 7 1.1 The species ....................................................................................... 7 1.2 The digestive proteins .................................................................... 10 1.3 The reproductive proteins .............................................................. 11 1.4 Identifying orthologues .................................................................. 12 1.4.1 Homology ........................................................................ 12 1.4.2 Orthology and Paralogy .................................................. 13 1.4.3 Orthogroups ..................................................................... 13 1.5 Identifying selection ....................................................................... 15 1.5.1 dN and dS ratios .............................................................. 15 1.5.2 MacDonald – Kreitman test ............................................ 17 1.5.3 Tajima’s D ........................................................................ 19 1.6 Identifying co-evolution ................................................................ 20 2 Methods ...................................................................................................... 21 3 Results ........................................................................................................ 23 3.1 Orthofinder ..................................................................................... 23 3.2 phylopypruner ................................................................................ 23 3.3 Omega ratios ................................................................................... 26 3.4 Tajima’s D estimates and MacDonald - Kreitman tests ............... 37 3.5 Linear models on omega values .................................................... 39 4 Discussion .................................................................................................. 46 4.1 Proteins under positive selection ................................................... 47 4.2 Distributions of omegas for sites ................................................... 48 4.3 Linear model of w values .............................................................. 50 4.4 Correlations of branch omegas and protein coevolution ............. 51 4.5 Limitations of the study ................................................................. 52 5 Conclusions ................................................................................................ 54 References ........................................................................................................ 55 1. Introduction “ Τὰ πάντa ῥεῖ kaÈ οὐδὲν mènei.” Everything flows and nothing stays the same. Herakleitos of Ephesus In his verse, Herakleitos sums up a true property of life, that her only con- stant is change. Tectonic plates move constantly, ocean and air currents change direction, mountains rise, glaciers form and melt, all perpetually shaping the conditions in which organisms have to survive. During the span of thousands to millions of years when these changes are taking place, hardships to deal with and opportunities to seize are created for all life. So life must evolve in order to keep up with a constantly changing world. In this work I am mainly interested in one of the mechanisms of evolu- tion, selection, and the patterns that it leaves behind when acting on proteins. Specifically, the proteins examined belong to four species of beetles that infest beans and will be introduced in the next chapters. Also, the proteins can be split into three sets, proteins that are expressed in the male reproductive tract of the beetles and are characterised as Seminal fluid proteins (SFPs), proteins whose expression in females changes after mating, the Female reproductive proteins (FRPs) and proteins that are expressed in the gastrointestinal tract of the beetles, the digestive enzymes. In a later chapter I will explain why these sets of genes were chosen in the first place. I will also examine the possibility of co-evolution between the SFPs and the FRPs. As to why this would be expected or how it will be tested, I will discuss it when introducing SFPs and FRPs. Finally, I will introduce the topic of homology and orthology, as they are vital when doing comparative work between species, and I will describe the theory behind the methods used to identify and quantify selection. 1.1 The species Three of the beetles for this work belong to the genus Callosobruchus. They are C. maculatus, C. analis and C. chinensis. The fourth species is Acan- thoscelides obtectus. They all belong to the subfamily Bruchinae which in- cludes seed beetles. This subfamily was traditionally considered to be a fam- ily related to Chrysomelidae, the Bruchidae, but this view changed when the 7 Callosobruchus maculatus Callosobruchus analis Callosobruchus chinensis Acanthoscelides obtectus Figure 1.1. The species tree as understood with molecular data. Bruchidae were found to have the Sagrinae, a subfamily of Chrysomelidae, as a sister group [1,2]. All species belong to the Chrysomelidae family, which includes seed and leaf beetles and is quite rich in diversity, as it is estimated to have more than 50.000 species [3]. The phylogenetic relationships among the seed beetles is shown in figure 1.1, according to molecular data [4,5]. The four species of interest are all seed beetles of legumes with wide spread distribution around the globe [6] (See figure 1.2). They usually infest storages of crop products, vital for the economy of developing third-world countries and nutrition of their people [7]. In Ethiopia, A. obtectus and another pest, Zabrotes subfasciutus, were responsible for up to 38% of stored beans being damaged [8,9]. Other sources estimate the reduction of yield by A. obtectus to be between 50-60% [10]. So, the destructive potential of these pests is great and thus there is incentive to study the evolutionary mechanisms that made them suitable for this lifestyle. The native and preferred plant hosts for each beetle species are [Arnqvist G., pers. communication, January 25, 2021]: • Black eyed beans (Vigna unguiculata) for C. maculatus • Mung beans (Vigna radiata) for C. analis • Adzuki beans (Vigna angularis) for C. chinensis • Common beans (Phaseolus vulgaris) for A. obtectus though they may infest other types of beans, but with some cost to