Genomic analysis of the genetic consequences of

haplodiploidy in eusocial

Nicholas M. A. Smith

Faculty of Science

The University of Sydney

A thesis submitted in fulfilment of the requirements of the degree of

Doctor of Philosophy (2019)

Statement of originality

This is to certify that, to the best of my knowledge, the content of this thesis is my own work, except where specifically acknowledged. The work in this thesis has not been previously submitted for a degree at The University of Sydney or any other institution.

Nicholas Smith

December 2019

Authority of access

This thesis may be made available for loan and limited coping in accordance with the

Copyright Act (1968).

Nicholas Smith

December 2019

The cover image is The Beekeepers and the Birdnester by Pieter Bruegel the Elder, circa 1568

i

Authorship attribution statement

Chapter Two

Smith N. M. A., C. Wade, M. H. Allsopp, B. A. Harpur, A. Zayed, S. A. Rose, J.

Engelestaedter, N. C. Chapman, B. Yagound and B. P. Oldroyd. 2019. Strikingly high levels of heterozygosity despite 20 years of inbreeding in a clonal bee. Journal of Evolutionary

Biology. 32: 144 - 152. https://doi.org/10.1111/jeb.13397

NMAS, NCC, BPO Designed Research

BPO, MHA Collected Samples

NMAS, BPO, BAH Performed Research

NMAS, BPO, BAH, AZ, SAR, BY, CW Analysed Data

NMAS, BPO Wrote the paper

ii

Chapter Three

Smith N. M. A., B. Yagound, M. H. Allsopp, B. A. Harpur, A. Zayed, C. Kent, G. Buchmann,

E. J. Remnant, A. Ashe, M. Beekman, and B. P. Oldroyd. (2020). Paternally-biased gene expression follows kin-selected predictions in female honey bee embryos. Molecular Ecology.

29: 1523 - 1533. https://doi.org/10.1111/mec.15419

AA, BPO Designed Research

AA, BAH, BPO, BY, CSPF, EJR, NMAS Analysed Data

AZ, CK, EJR, KL, SAR Contributed analytical tools

BPO, MB, MHA, NMAS Performed field work

BAH, BPO, BY, GB, NMAS Performed Research

BPO, NMAS Wrote the paper

iii

Chapter Four

Christmas, M. J., Smith N. M. A., Oldroyd B. P and M. T. Webster. (2019). Social in the Honeybee (Apis mellifera) Is Not Controlled by a Single SNP. Molecular Biology and

Evolution. 36: 1764 – 1767. https://doi.org/10.1093/molbev/msz100.

BPO, MJC, MW and NMAS designed research

BPO, MJC, MW and NMAS performed research

MJC, MW and NMAS analysed data

MJC and MW drafted the manuscript, after which all other authors contributed to revisions.

iv

Chapter two of this thesis is published as ‘Strikingly high levels of heterozygosity despite 20 years of inbreeding in a clonal honey bee’ in the Journal of Evolutionary Biology.

Chapter three of this thesis is published as ‘Paternally-biased gene expression follows kin- selected predictions in female honey bee embryos’ in the journal Molecular Ecology.

Chapter four of this thesis is published as ‘Social Parasitism in the Honeybee (Apis mellifera)

Is Not Controlled by a Single SNP’ in the journal Molecular Biology and Evolution.

In addition to the statements above, in cases where I am not the corresponding author of a published item, permission to include the published material has been granted by the corresponding author.

Nicholas Smith

As supervisor for the candidature upon which this thesis is based, I can confirm that the authorship attribution statements above are correct.

Benjamin Oldroyd

v

Other work produced during thesis period

2017

Harpur, B. A.* and Smith, N. M. A.* (2017), Digest: Gene duplication and social evolution—

Using big, open data to answer big, open questions. Evolution 71: 2952-2953. https://doi.org/10.1111/evo.13390

*equal first author

Cole-Clark, M. P., Barton, D. A., Allsopp, M. H., Beekman, M., Gloag, R. S., Wossler, T. C.,

Ronai, I., Smith, N., Reid, R. J., and Oldroyd, B. P. (2017). Cytogenetic basis of in

Apis mellifera capensis. Apidologie 48: 623-634. https://doi.org/10.1007/s13592-017-0505-7

2018

Wilson, R. S.*, Smith, N. M. A.*, Santiago, P. R. P., Camata, T., Ramos, S. de P., Giuliano, F. C.,

Cunha, S. A., Souza, A. P. S. de, and F. A. Moura. (2018). Predicting the defensive performance of individual players in one vs. one soccer games. PLOS ONE 13: e0209822. https://doi.org/10.1371/journal.pone.0209822

vi

*equal first author

2019

Wilson, R. S., Smith, N. M. A., Ramos, S. de P., Giuliano, F. C., Rinaldo, M. A., Santiago, P. R.

P., Cunha, S. A., and F. A. Moura. (2019). Dribbling speed along curved paths predicts attacking performance in match-realistic one vs. one soccer games, Journal of Sports Sciences

37: 1072-1079.

Yagound, B., Smith N. M. A , Buchmann, G. Oldroyd B. P. and E. J. Remnant. (2019) Unique

DNA methylation profiles are associated with cis-variation in honey bees. Genome Biology and Evolution 11: 2517-2630.

2020

Wilson, R. S., Smith, N. M. A., Bedo, B. L. S., Aquino, R., Moura, F. A. and P. R. P. Santiago.

(2020). Technical skill not athleticism predicts an individual’s ability to maintain possession in small-sided soccer games. Science and Medicine in Football 4: 305-313.

vii

Wilson, R. S., Smith, N. M. A., de Souza, N. M. and F. A. Moura. (2020) Dribbling speed predicts goal‐scoring success in a soccer training game. Scandinavian Journal of Medicine &

Science in Sports 30:11, 2070-2077.

Wilson, R. S., Smith, N. M. A., Sandes de Souza, A. P. and P. R. P. Santiago. (2020) Individual performance in passing tests predicts age-independent success in small-sided soccer possession games. Translational Sports Medicine 3: 353-363.

Aquino, R., Carling, C., Maia, J., Vieira, L.H.P., Wilson, R.S., Smith, N., Almeida, R.,

Gonçalves, L.G.C., Kalva-Filho, C.A., and J. Garganta. (2020). Relationships between running demands in soccer match-play, anthropometric, and physical fitness characteristics: A systematic review. International Journal of Performance Analysis in Sport 20: 534–555.

Wilson, R. S., Smith, N. M. A., de Souza, N. M. and F. A. Moura. (2020). Rapid improvements in dribbling speed over an eight-week training program. Biology of Sport. (Accepted).

viii

Acknowledgements

My first acknowledgment has to go to Ben Oldroyd. Your mits are all over this thesis and anyone who is familiar with your work will immediately identify this. I have tried to test the ideas (slowly) you have had resting inside your skull for many years with sequencing data. I think the data supports most of the ideas you proposed. I hope I haven’t shaved too many years off your life. Enjoy life up at Mission Beach. Jun and I will have to come and visit regularly. Also, don’t worry your secret love of Cabernet Sauvignon is safe with me (I’ll just remind the critics that it is from South Africa). Thank you also to Madeleine Beekman my other supervisor. I’ve thoroughly enjoyed discussing all of our interests outside of evolutionary biology. Thank you, Brock Harpur. I’ll never forget the day you gave your social security details to a stranger over the phone while I was sitting next to you. That was incredibly dumb. However, you did redeem yourself that night. Lobster was an excellent choice. I wish you and Katey all the best. I still wish we lived on the same continent. Mike

Allsopp, thanks for everything you’ve done. I loved driving your Bakkie. I still have untold stories about the things I did in that car. I owe you. The next ginger beer is on me. Also, thanks for introducing me to Zapiro. I owe a huge thank you to Emily Remnant. You are brilliant. I wish you all the best. I think you have agitated me more than anyone else during my time at Sydney University. Hilariously, you are also the kindest person I have met during my time here. You also make the best Tiramisu. Boris Yagound, I still remember that train trip

ix

to Pearl Beach you had to take alone with me in your first few weeks in this country. I’m still surprised you didn’t change offices or buy a ticket straight back to France the following day. I know I wouldn’t have finished this thesis without you in the lab. You’ll be annoyed at my melodramatic homage to you in this section but that only makes me happier. Hopefully we’ll be back in NYC soon. A massive thank you to Gabi Buchmann. If it wasn’t for you I’d still be in the lab learning how to extract DNA. I’ll miss the Como Hotel with you and Jan. Thank you to all those in the social lab who have come and gone. I hope we continue to cross paths and share the more romantic aspects of our lives. I’ll never forget the “Barnaby incident” with Josh Christie. I still believe I read somewhere that he had a PhD in cattle genetics. Jules, I’ve loved trying to drag you away from your sacred political beliefs. Our discussions about life, the universe and everything will be missed. Thank you Amanda

Norton for in-depth discussions about figs. Jun Tong, I have enjoyed living with you in the final moments of this thesis and framing pictures of Margaret Thatcher together. I think there is only weeks left in the friendship but I’ll make sure I cherish them. Please send my lawyers all the money you can reasonably part with. They will need more than luck. Kit Abbott, thank you for introducing me to the gorgeous Leichardt Oval. I am looking forward to seeing how things pan out for you. Robbie Wilson, this can be accessed by the public so I’ll just say thank you here. Thank you to Amanda Niehaus for all of the beautiful food and conversations about science and writing. I always felt so welcome coming to your place to escape Sydney

x

and work on this thesis. Thank you Amro Zayed and Clement Kent for a splendid time at

York University. I loved the night blooming cactus, Clement. Charles Foster, thank you for debugging a lot of my code and all of the lifts back to Mosman. I wish you all the best in your future endeavors. Your conscientiousness is inspiring. Thank you Jan Engelstaedter for all the help you gave with respect to the mathematics in this thesis. I’ll see you sometime soon at

UQ. Thank you, Matt Crowther, for all the discussions about mixed-effects models. I hope we can share a nice bottle of Shiraz at that Uighur restaurant again soon. Thank you to Marty and

Jules for all the great dinners and shortbread in Mosman. I think I had a better win/loss ratio than Marty on FIFA by the end of this thesis but definitely not at the start. Thank you to Miles and Becky for a superb time in South Africa. Becky please remember to close the window next time we’re near Baboons. Miles, one day we’ll go to Brazil. Thank you to my mother and father for all their support during my time at University (and beforehand). A special acknowledgment has to go to my mum who actually enrolled me in my first University course. I was demented in my youth and your vision helped me get out of a self-righteous, idiotic funk. Few know just how odd my time at University has been. However, you two do. I love that you’ve never raised major concerns. I’ve appreciated your utterances of “that’s odd” and “are you sure the University will pay for that trip to X?”. May there be many years of

Jamón Ibérico and Champagne to come with the two of you. Lastly, a doting thank you to

xi

Lucy Smith whose tolerance and love is treasured. We’ll be back in Paris soon. To all those

I’ve missed. I’m sorry.

xii

Preface

This thesis contains four independent manuscripts that have either been accepted, submitted or soon to be submitted to an international peer-reviewed journal. Some of the manuscripts comprise the same study organism or a similar research area. Therefore, there will be overlapping content in some of the chapters. Further, each manuscript has been prepared to be submitted to a different journal. The different formatting of each manuscript reflects each journal’s unique style.

Finally, it would be impossible to complete all of the work in this thesis alone. However, stylistically I have chosen to use ‘I’ throughout. I hope my co-authors do not think I am trying to diminish the role they have played. Their contributions to each chapter are presented in the author contributions section.

xiii

Thesis Abstract

In this thesis I explore the reproductive biology of a subspecies of honey bee Apis mellifera capensis (hereafter, Capensis). Capensis is unusual among honey bees in that unmated workers can produce female offspring by thelytokous . Capensis has become an excellent system in which to examine evolutionary phenomena such as overdominance, genomic imprinting and conflict and cooperation empirically due to its unique biology and the genomic resources available.

In this thesis, I provide the first whole-genome study of a twenty year old clonal lineage of

Capensis workers and show that despite the expectation of complete homozygosity the lineage retains heterozygosity across one third of its genome. This heterozygosity is maintained by selection against homozygotes, and demonstrates the importance of heterozygosity in this lineage and in honey bees more generally. I then focus on genomic conflict in honey bees. Although, honey bee societies are often lauded for their cooperative behaviours, they are also subject to internal conflicts. Using a whole-genome and transcriptome approach I reveal how sexual conflict between drones and queens is manifest at the genomic level. I report parent-specific gene expression patterns that nicely follow predictions of kin-section theory. Finally, I then focus on a genomic variant that had been

xiv

previously identified as the cause of thelytokous reproduction in Capensis. I show that in fact this genomic region is not responsible for thelytoky.

xv

Table of Contents

Statement of originality i

Authorship attribution statement ii

Other work produced during thesis period vi

Acknowledgements ix

Preface xiii

Thesis Abstract xiv

Table of Contents 1

List of figures 5

List of tables 7

Chapter One 8

General Introduction 8

1

Thelytoky 10

Opportunity for cheating 13

Social parasitism and The Clone 17

Haplodiploidy and social evolution 18

Examples of conflict and selfishness in haplodiploids 22

Intragenomic conflicts in honey bee colonies 26

Summary and goals of the thesis 31

Chapter Two 35

Strikingly high levels of heterozygosity despite 20 years of inbreeding in a clonal honey bee 35

Abstract 36

Introduction 37

Materials and Methods 43

Results 48

Discussion 50

Acknowledgements 55

Supplementary Material 61

Chapter Three 77

2

Paternally-biased gene expression follows kin-selected predictions in female honey bee embryos 77

Abstract 78

Introduction 79

Methods 83

Results 92

Discussion 102

Acknowledgements 109

Supplementary Information 110

Chapter Four 123

Social Parasitism in the Honeybee (Apis mellifera) Is Not Controlled by a Single SNP 123

Abstract 124

Main Text 125

Materials and Methods 131

Supplementary Material 134

Acknowledgments 134

Chapter Five 136

General Discussion 136

3

The genetic basis of thelytoky in Capensis 146

Concluding remarks 149

References 150

4

List of figures

Figure 1.1 Map of Africa and South Africa showing the natural range of Capensis, Scutellata and the hybrid zone between the two.

Figure 1.2 Comparison of meiotic divisions with recombination in different fusion mechanisms of parthenogenesis.

Figure 1.3 The relatedness of two focal workers to other individuals in an arrhenotokous and thelytokous honey bee colony.

Figure 1.4 Pedigree showing the relatedness between individuals in a honey bee colony.

Figure 1.5 Relatedness between individuals within arrhenotokous and thelytokous honey bee colonies and the potential reproductive conflict between queens and drones.

Figure 2.1 Automixis central fusion with and without recombination.

Figure 2.2 Heterozygosity SNP density across each sub-lineage.

Figure 2.3. Assessing the effects of the centromere and overdominant genes along honey bee chromosomes.

Figure 3.1 Genes with parental bias in gene expression in embryos, displayed as the proportion of maternal and paternal reads identifiable by an informative SNP

Figure 3.2 Relationship between the number of parentally-biased genes identified and the level of biological replication.

5

Figure 5.1 Backcross design to uncover the region responsible for thelytokous reproduction in

Capensis.

6

List of tables

Table 2.1 Binomial logistic regression testing the effects of genic and intergenic windows, distance from the centromere and their interactions on the probability that window would contain one or more heterozygous SNPs.

Table 3.1 Summary of the genes which exhibit a paternal bias in allele expression in S x C colonies.

Table 4.1 Genotype counts and allele frequencies of a single nucleotide polymorphism at position 509,225 on Group 1.23 of Amel_4.5 (Th-SNP) for worldwide populations of Apis mellifera.

7

Chapter One

General Introduction

In this thesis I explore the reproductive biology of a subspecies of honey bee Apis mellifera capensis (hereafter, Capensis). Capensis is unusual among honey bees in that unmated workers can produce female offspring asexually by thelytokous parthenogenesis (Onions,

1912; Anderson, 1963). In contrast, in all other honeybee species, unmated workers can only produce males via arrhenotokous parthenogenesis. Capensis is confined to the Fynbos ecotone in Southern Africa (Figure 1.1). Capensis is separated from another subspecies A. m. scutellata (hereafter, Scutellata) which is located north of a stable hybrid zone between the two and is found throughout the rest of South Africa and in the countries to its north (Figure

1.1).

8

Figure 1.1 Map of Africa and South Africa showing the natural range of Capensis, Scutellata and the hybrid zone between the two. The natural range of Capensis is in blue, the hybrid zone between the two is in black and the natural range of Scutellata is in yellow (modified from https://commons.wikimedia.org/wiki/File:Cape_and_African_Honey_Bee_range.svg).

9

Thelytoky

Thelytokous reproduction, which is ubiquitous in Capensis, is extremely rare in .

Only an estimated 0.1% of species reproduce thelytokously (White, 1984; Suomalainen et al.,

1987). However, thelytoky is more common in haplodiploids (male haploid, female diploid) where it is present in more than 250 species (Normark, 2003; Engelstädter, 2008; Rabeling &

Kronauer, 2013).

Under thelytoky two of the four haploid pro-nuclei present in a newly-laid egg fuse to restore diploidy. The genetic consequences of thelytokous parthenogenesis depend on which of the four haploid pronuclei fuse and the presence or absence of recombination. Diploidy can be restored via central fusion or terminal fusion (Figure 1.2). Under central fusion, the two central haploid products from each secondary oocyte fuse. In the absence of recombination, the allelic state of the mother is identical in her offspring. However, when loci are free to recombine, there is a one out of three chance that a locus which is heterozygous in the mother will become homozygous in her offspring (Pearcy et al., 2006; Oldroyd et al., 2011; Goudie &

Oldroyd, 2014).

10

Diploidy can also be restored via terminal fusion (Figure 1.2). Here, after the second division of meiosis, the sister pronuclei from the same secondary oocyte fuse. In the absence of recombination, a heterozygous mother will produce only homozygous offspring. Terminal fusion without recombination is likely to be functionally lethal in many haplodiploids because sex is determined by a single locus (Beye et al., 2003). Females must be heterozygous at the complementary sex determining locus (csd) to trigger normal female development

(Beye et al., 2003). However, when loci are free to recombine, the allelic state of the mother is similar in her offspring and offspring are likely to be viable. In (ants, bees and wasps) diploidy is typically restored in thelytokous species via central fusion (Rabeling &

Kronauer, 2013).

11

Figure 1.2 Comparison of meiotic divisions with recombination in different fusion mechanisms of parthenogenesis. The origin of the two pronuclei that fuse determines the type of automixis. Diploidy is restored in Capensis via automixis with central fusion. Figure reproduced from Cole-Clark et al. (2017).

12

Thelytoky with central fusion results in the erosion of heterozygosity at loci that are free to recombine, as discussed above. Therefore, obligate thelytokous lineages with central fusion should become completely homozygous after several generations. In this system, the probability that a locus that is heterozygous in generation 0 remains heterozygous declines at a rate of (1 - 1 / 3)n where n is the number of generations. As a result, all loci that are free to recombine should be homozygous after only ten generations (Baudry et al., 2004; Oldroyd et al., 2011; Goudie et al., 2014; Goudie & Oldroyd, 2014). Once heterozygosity is lost at a locus in these lineages, it cannot be restored. Therefore, thelytokous lineages which have retained heterozygosity via selection against homozygotes are an excellent system to assess the importance of heterozygosity at the genomic level and to identify loci that must be heterozygous.

Opportunity for cheating

Thelytoky greatly alters the kin structure of honey bee colonies (Figure 1.3) (Greeff, 1996). In thelytokous colonies, workers are related to their own female offspring by unity (푟 ≈ 1).

Therefore, workers share the same amount of genetic material with the queen’s daughters as they do with the daughters of their half-sisters (푟 ≈ 0.25). In contrast, in arrhenotokous colonies, workers share 12.5% of their genes with the sons of their half-sisters. Consequently,

Capensis workers are predicted to be more tolerant of worker reproduction by their half-

13

sisters in colonies with a queen compared to Scutellata and other honey bee subspecies

(Greeff, 1996). And indeed, about 10% of offspring workers are daughters of workers

(Beekman et al., 2011). However, laying workers become very active when queen cells are present (Jordan et al., 2008).

Thelytoky profoundly alters the relationship between workers and the queen over the production of future queens (Figure 1. 3). In arrhenotokous colonies, workers can only compete with the queen over the production of males. However, in thelytokous colonies, workers can also compete with the queen over the production of daughter queens (Greeff,

1996). In relatedness terms, a worker that produces the next queen is genetically reincarnated as the queen (푟 ≈ 1). Therefore, the potential fitness pay off for a worker which can produce the next queen is vast. As a result, conflict between subfamilies of workers is predicted to be rife during the period of queen production. Each subfamily is selected to thwart other subfamilies from reproducing. It is important to note that, in thelytokous colonies, the queen is equally related to her own offspring and the offspring of her daughters. Therefore, from the queen’s perspective, it is inconsequential if she or her daughters produce the next queen.

Aggression between workers over reproduction arises within 24 hours of the queen’s removal from a Capensis colony (Anderson, 1968). Within each colony only particular subfamilies become reproductively dominant (Moritz et al., 1996).

14

A

0.5 0.5 0

X 0.75 0.25 0.25

0.325 0.5 0.125

B

0.5 0.5 0

Y 0.75 0.25 0.25

0.75 1 1 0.25

15

Figure 1.3 The relatedness of two focal workers (X and Y) (red dashed circle) to other individuals in an arrhenotokous (A) and thelytokous (B) honey bee colony. Females in each colony are represented by a circle and males by a semi-circle. The queen is represented with a crown. In colonies where workers can only reproduce arrhenotokously (A), X is more closely related to the son of her mother (r = 0.25) compared to the the son of her half sisters (r = 0.125).

Therefore, selection favours behaviours such as the removal (policing) of worker laid eggs to thwart the reproductive efforts of other workers. In contrast, in colonies where workers can reproduce thelytokously (B), Y can produce daughters that are related to her by unity (r = 1).

Y can use this ability to produce daughters that might become queens, reincarnating herself as the queen. Further, Y is also equally related to her sisters as she is to the thelytokous daughters of her full-sisters. In thelytokous colonies, the queen is equally related to her own offspring and the offspring of her daughters. Therefore, from her perspective it is inconsequential if she or her daughters produce the next queen. Therefore, selection is relaxed with respect to behaviours which suppress the reproductive efforts of other workers in thelytokous colonies.

16

Social parasitism and The Clone

Thelytoky also predisposes Capensis workers to social parasitism (Beekman et al., 2008). In an extraordinary example of a 'social cancer' (Oldroyd, 2002) an asexual lineage of Capensis workers presently parasitises the strictly arrhenotokous population of commercial Scutellata colonies in South Africa (Allsopp, 1992, 1993; Allsopp & Crewe, 1993).

This parasitic lineage, hereafter the Clone, arose during the 1990s when a beekeeper moved an estimated 200 Capensis colonies across the hybrid zone between Capensis and Scutellata to the region where only Scutellata colonies exist (Beekman et al. 2008). The current lineage, consisting of millions of workers that infest thousands of Scutellata colonies, are all the descendants of a single worker that lived in 1990 (Kryger, 2001; Baudry et al., 2004; Oldroyd et al., 2011). This ancestral worker was derived from the Capensis population that is extant in southern South Africa (Figure 1.1). Once the Clone emerged it started to invade Scutellata colonies. Capensis clones enter a Scutellata colony, activate their ovaries, and produce eggs that are raised by Scutellata workers. Eggs produced by The Clone avoid policing by

Scutellata workers through chemical mimicry of the host queen (Martin et al., 2002). Once a

Scutellata colony is infected it eventually dies and The Clone workers disperse to infect new hosts.

17

Haplodiploidy and social evolution

Historically, haplodiploidy has been linked to the evolution of the highest level of sociality that has yet evolved: (Hamilton, 1964a; b; Maynard Smith & Szathmary,

1995). Eusocial species can live in colonies comprising tens to hundreds of thousands of related individuals. Eusocial species are characterized by cooperative brood care, overlapping adult generations, and by reproductive division of labour (Michener, 1974; Wilson, 1975).

Typically, reproductive individuals in eusocial societies (queens) produce all offspring and non-reproductive individuals (workers) rear the queen's brood (the worker’s sisters and brothers) and take care of all ergonomic tasks.

Charles Darwin found it difficult to understand how the phenotype of workers ants differs so strongly from fertile females (queens) and males when natural selection cannot operate in the sterile caste. Darwin could not comprehend how structural changes to the thorax of workers and the loss of flight and sight could evolve if workers could not ‘propagate their kind’.

Darwin referred to this problem as the “one special difficulty” for his theory of evolution by natural selection (Darwin, 1859). This problem remained unresolved until W. D. Hamilton reasoned that:

18

“the ultimate criterion which determines whether G will spread is not whether the behavior is to the benefit of the behaver but whether it is to the benefit of the gene G” (Hamilton, 1963),

Hamilton showed how altruistic traits such as the loss of reproduction in workers could evolve: non-reproductive individuals could pass their genes on by assisting their relatives in the colony instead of reproducing themselves. This phenomenon became known as ‘kin selection’ (Maynard Smith, 1964). Hamilton (1964a, 1964b) developed a formula, known as

‘Hamilton’s Rule’ (Charnov, 1977), which specifies the conditions under which altruism can evolve:

푟 푏 > 푐

where r is the relatedness of the actor to the donor, b is the fitness benefit the donor receives for the altruistic behavior and c is the fitness cost to the actor. Put simply, altruism can evolve when the benefit, b, to a related individual, which is reduced by the relatedness between the two, r, exceeds the cost, c, to the altruist.

Haplodiploid insects have occupied a central place in the development of theory about the role of kinship in the evolution of cooperation (Hamilton, 1964a; b; Haig, 1992; Crozier &

Pamilo, 1996; Queller, 2003). An important consequence of haplodiploidy is that females

19

share more genes with their full sisters than they do with their own offspring. This asymmetrical relatedness was hypothesised to pre-dispose the evolution of cooperation and altruism between sisters, and consequently the evolution of eusociality (Hamilton, 1964a; b;

Crozier & Pamilo, 1996; Boomsma, 2007, 2009). Hamilton (1964a, b) proposed that sisters were more likely to cooperate and forego their own reproduction to help their mother produce more sisters which they are more related (r = 0.75) compared to their own offspring (r = 0.5).

Therefore, haplodiploid females may evolve towards helping their mother produce more sisters at great personal cost (foregoing personal reproduction), if doing so increases the number of copies of a particular allele in the population. However, Trivers & Hare (1976) importantly revealed that although the relatedness between sisters is 75%, they only share a quarter of their genes with their brothers. Therefore, the average relatedness between siblings is 50% which is identical to diplodiploid species.

More recently, Boomsma (2007) showed that Hamilton’s rule could be simplified when assessing the evolution of eusociality in social insects. This is because the ancestor of each independent origin of eusociality was monogamous (Boomsma, 2007, 2009; Hughes et al.,

2008). Monogamy is an important promotor of altruism between siblings because the average relatedness between siblings is high (r = 0.5). Under lifetime monogamy, Boomsma (2007) showed that the transition to eusociality could occur whenever the benefit, b, is greater than

20

the cost, c, in Hamilton’s rule. Lifetime monogamy would ensure that the benefits would outweigh the costs throughout the altruist’s existence. Therefore, offspring would be selected to help produce more siblings and forego personal reproduction. Boomsma (2007) realized that in the evolution of eusociality Hamilton’s Rule should be reformulated as,

푟 푏 > 0.5푐

because the cost is paid in the reduction of progeny for the altruist who is related to the donor by 0.5. We can now simplify Hamilton’s rule to:

푏 > 푐

because the average relatedness between siblings is 0.5. Therefore, the relatedness term is eliminated. As a result, under lifetime monogamy, individuals that forego personal reproduction and help produce more siblings will arise whenever the benefits, b, for doing so exceeds the costs, c.

Eusocial insects have become a paradigmatic example of animal cooperation. Eusociality has arisen independently at least ten times in haplodiploid species (Crespi, 1992; Hughes et al.,

2008; Rehan & Toth, 2015; Toth & Rehan, 2017). However, even the most advanced, altruistic

21

and cooperative eusocial species have not eradicated conflict and selfishness completely

(Ratnieks et al., 2006; Galbraith et al., 2016).

Examples of conflict and selfishness in haplodiploids

Egg laying workers in haplodiploids provide a compelling example of selfish behavior in a cooperative society. Male-destined eggs produced by workers in haplodiploids are common

(Bourke, 1988). However, in honey bees, worker reproduction is rare (Oldroyd et al., 1994;

Barron et al., 2001). In honey bee colonies, queens monopolize reproduction and workers diligently raise her offspring. However, workers have not lost the ability to produce their own offspring. In fact, in several unique colonies around the world, workers (hereafter,

Anarchists) have done just that (Oldroyd et al., 1994; Montague & Oldroyd, 1998; Barron et al., 2001; Chaline et al., 2002; Niu et al., 2016). Anarchist colonies can be detected by the presence of eggs and brood in a section of the colony that the queen cannot access

(beekeepers exclude queens from certain parts of the colony using a queen excluder that her larger body cannot pass through). Therefore, the eggs in these sections must be worker laid.

These anarchists cannot mate so their eggs develop into haploid males (drones). Anarchist workers produce their own offspring because they are more closely related to their offspring compared to offspring produced by the queen and her half-sisters. Because the queen mates with more than ten males (Palmer & Oldroyd, 2000; Tarpy et al., 2004) most of the workers in

22

a colony are half-sisters. From the perspective of each worker they share half their genes with their own offspring, a quarter of their genes with the queen’s sons and an eighth with the sons of their half-sisters (Figure 1.6). Therefore, a worker that produces her own offspring has a higher fitness than a worker that fails to do so, especially if the majority of workers remain sterile (Montague & Oldroyd, 1998). Despite the benefits associated with personal reproduction, colonies with anarchistic workers are rare (Oldroyd et al., 1994; Barron et al.,

2001). The reason is that each worker shares more genes with the queen’s sons compared to the sons of her half-sisters. Therefore, workers remove (police) eggs which are laid by their half-sisters and rear those which are laid by their queen (Ratnieks, 1988; Ratnieks & Visscher,

1989). Eggs which are not marked by the queen are eaten by the workers. In contrast, anarchists can lay eggs that evade policing, thereby hijacking the cooperative reproductive system that has evolved in honey bees.

23

0.5 1 1 0.5 0.5 1 0.5

A

0.5 0.5 0.5

Son of Son of full queen Son of half Own son sister (brother) sister

A 0.5 0.375 0.25 0.125

Figure 1.4 Pedigree showing the relatedness between individuals in a honey bee colony. Here, the queen has only mated with two males (the queen usually mates with more than ten males

24

(Palmer & Oldroyd, 2000; Tarpy et al., 2004). The table shows the relatedness of individual ‘A’ to all of the possible males that can be produced in this colony.

Rebel worker bees are another example of individual selfishness in a cooperative system.

Rebel bees are only produced in queenless honey bee colonies (Woyciechowski & Kuszewska,

2012). Due to the absence of the queen and her pheromones during larval development, rebels have large ovaries and a greater number of ovarioles (the thread-like sections of the ovary in which eggs develop) than ordinary workers (Woyciechowski & Kuszewska, 2012).

Rebel workers are more likely to lay eggs in the colony in which they were reared and are more likely to reproductively parasitize other colonies compared to normal workers

(Woyciechowski & Kuszewska, 2012; Kuszewska et al., 2018). As a result, rebel bees produce more male offspring than normal workers. This suggests that rebel bees can shift their reproductive strategy depending on their social environment to increase their own reproductive success. For example, when the queen is present the best strategy for a worker is to help raise offspring produced by the queen. However, when the colony is queenless it can be beneficial for rebels to reproduce in their own colony and parasitize other colonies. This shift in reproductive strategy is also an excellent example of the selfish behavior present in honey bee societies (for further examples, see Chapman et al. (2010) and Yagound et al.

(2017)

25

Intragenomic conflicts in honey bee colonies

In honey bees, queens are highly polyandrous; queens mate with more than ten males

(Winston, 1987; Palmer & Oldroyd, 2000; Tarpy et al., 2004). The queen stores the spermatozoa of each drone in an organ known as the spermatheca and utilises this sperm to fertilize queen- or worker-destined eggs (Kerr et al., 1980; Schlüns et al., 2004).

As discussed above, insect societies with a single polyandrous queen, such as those of the honey bee, provide compelling systems in which to study reproductive conflict and cooperation at the genomic level (see Chapter Two). The family structure of honey bee colonies means that fathers that can alter the expression of genes in their offspring so that the offspring are more likely to develop as a queen or a reproductive worker, have a much greater reproductive success than another male that fails to do so, particularly because males never father sons (Figure 1.6) (Queller, 2003; Drewell et al., 2012). In response, the queen should be selected to reduce the effect of the male’s modifications that influence worker reproduction to maintain her monopoly on reproduction (Queller, 2003; Drewell et al., 2012).

However, the queen must ensure that her alterations only decrease the fecundity of workers and not her future fertile male and female offspring. As a result, the alteration of genes

26

affecting worker reproduction are more likely to evolve in males than in queens (Kronauer,

2008).

27

A

B

28

Figure 1.5 Relatedness between individuals within arrhenotokous and thelytokous honey bee colonies and the potential reproductive conflict between queens and drones. Queens (full circle and crown) mate with multiple males (semicircles, brown and orange). Colonies are comprised of subfamilies of workers each of which share the same father. This generates the potential for conflict between males to increase the reproductive success of their female offspring.

A. Normal arrhenotokous subspecies (e.g. Scutellata). A father that is able to influence the expression of genes in offspring so that his daughters are more likely to develop as a reproductive worker or queen, has a greater probability of reproductive success than another male that fails to do so.

B. Thelytokous subspecies (Capensis). The evolutionary consequences of thelytoky are profound because Capensis workers have the potential to be genetically reincarnated as a queen. The unique ability of Capensis workers to produce both queens and workers without mating enhances the evolutionary incentive of Capensis drones to modify genes that increase the reproductive success of their female offspring. Capensis fathers have the potential to

‘father’ queens if their worker offspring produce the next queen thelytokously. Capensis drones also maintain the same evolutionary incentive as their arrhenotokous counterparts to have their immediate offspring develop into the next queen.

29

The first evidence for parent-specific effects in social insects came from a study of the reproductive phenotypes of workers produced from reciprocal crosses between Capensis and

Scutellata (Oldroyd et al., 2014). On average the workers of these colonies had genetically similar nuclear genomes. Workers whose father was of Capensis had 1/3 more ovarioles than workers whose father was the other subspecies (Oldroyd et al., 2014). This study suggested that honey bee males modify the expression of genes related to female reproduction in their worker daughters.

More recently, reciprocal crosses between honey bee subspecies showed strong parent- specific effects in the worker transcriptome. Parentally-biased genes were plausibly related to worker reproduction (Kocher et al., 2015; Galbraith et al., 2016). This phenomenon has not been assessed in the transcriptome of one of the social insect genomes that is most likely to demonstrate parent-specific effects: the Cape honey bee. Capensis is more likely to show parentally-biased gene expression because a male’s worker daughters have the unique ability to produce female offspring that can develop into reproductive workers or the next queen without mating (Goudie & Oldroyd, 2014).

30

Summary and goals of the thesis

In this thesis I present a series of manuscripts, that examine the reproductive biology of

Capensis and the consequences of thelytoky. Chapters two to five are formatted for submission to a journal, and these chapters are already published.

In Chapter Two, I examine the level of homozygosity in the Clone. As explained above, generations of thelytokous (asexual) reproduction are expected to erode heterozygosity and eventually result in a totally inbred organism. Despite this prediction, heterozygosity is preserved in the Clone at several independent microsatellite loci (reviewed in Goudie &

Oldroyd, 2014). In this chapter, I examine genome-wide levels of heterozygosity in the Clone.

I show that more than a third of the genes are still heterozygous, suggesting that there are several genes on almost all chromosomes that show heterozygote advantage. This means that even though thelytoky results in a 1/3 loss of heterozygosity with each generation, selection acts against individuals that have lost heterozygosity at these overdominant loci.

The Clone provides an ideal system to assess the importance of heterozygosity because we can directly observe the erosion and maintenance of heterozygosity in this lineage. This study identifies regions that have retained heterozygosity independently in three sub-lineages,

31

strongly suggesting that these regions contain genes that must be heterozygous. I conclude that heterozygote advantage at thousands of loci is an important factor in the genetic architecture of honey bees, and most likely other insects.

In Chapter Three, I report on gene expression in a reciprocal cross between Capensis and

Scutellata. I use a whole-genome and transcriptome approach to demonstrate that, as predicted by theory, male honey bees influence gene expression in their daughters, particularly for genes that are related to reproduction. I identified 21 genes that show a bias in expression towards the Capensis father’s allele in colonies with a Capensis father, with no such bias in the reciprocal cross. Further, six genes showed a consistent bias towards expression of the father’s allele across all eight colonies examined, regardless of the direction of the cross. Consistent with kin-selected predictions, two of the six genes are associated with female reproduction and one with gametogenesis.

I convincingly show that the parent-specific gene expression patterns I uncovered follow the predictions of David Haig’s Kinship Theory of Genomic Imprinting (KTGI), and kin selection theory more broadly. This study is also technically sound with appropriate sequencing depth and biological replication.

32

Chapter four is a rebuttal of a paper that recently claimed to have located the genetic locus that controls thelytoky in Capensis (Aumer et al., 2019). Thelytoky in Capensis has a genetic basis (Chapman et al., 2015) and is most likely controlled by a single gene (Lattorff et al., 2007;

Aumer et al., 2019). Three studies have tried to locate the genomic region associated with thelytoky (Lattorff et al., 2007; Jarosch et al., 2011; Aumer et al., 2019). Each study has identified a different genomic location on chromosomes 1 (Aumer et al., 2019) and 13 (Lattorff et al., 2007; Jarosch et al., 2011). In Chapter Four, I show that the region most recently identified Aumer et al., (2019) is in fact not associated with thelytokous reproduction in honey bees.

My contribution to this study is a whole-genome analysis of three Capensis queens produced by thelytokous workers. These queens should have the thelytoky-inducing non-synonymous

SNP identified by Aumer et al. (2019). We showed that this SNP was absent in every queen.

Additionally, genomic sequences from multiple A. mellifera populations revealed that the alleged thelytoky-inducing SNP is widespread in multiple populations where thelytoky has never been described. I find that the proposed SNP by Aumer et al. (2019) is neither sufficient nor required to produce thelytoky in honey bees and conclude that the region responsible is yet to be discovered.

33

Finally, in chapter five I review and summarise the findings from this thesis before suggesting directions for future research in social insects.

34

Chapter Two

Strikingly high levels of heterozygosity despite 20 years of inbreeding in a clonal honey bee

35

Abstract

Inbreeding (the mating between closely related individuals) often has detrimental effects that are associated with loss of heterozygosity at overdominant loci, and the expression of deleterious recessive alleles. However, determining which loci are detrimental when homozygous, and the extent of their phenotypic effects, remains poorly understood. Here, we utilise a unique inbred population of clonal (thelytokous) honey bees, Apis mellifera capensis, to determine which loci reduce individual fitness when homozygous. This asexual population arose from a single worker ancestor approximately 20 years ago, and has persisted for at least

100 generations. Thelytokous parthenogenesis results in a 1/3 of loss of heterozygosity with each generation. Yet, this population retains heterozygosity throughout its genome due to selection against homozygotes. Deep sequencing of one bee from each of the three known sub-lineages of the population revealed that 3,766 of 10,884 genes (34%) have retained heterozygosity across all sub-lineages, suggesting that these genes have heterozygote advantage. The maintenance of heterozygosity in the same genes and genomic regions in all three sub-lineages suggests that nearly every chromosome carries genes that show sufficient heterozygote advantage to be selectively detrimental when homozygous.

Key words: inbreeding, heterozygosity, fitness, honey bees

36

Introduction

Inbreeding (the mating between closely related individuals) has detrimental effects across a diverse range of taxa (Charlesworth & Willis, 2009). These effects include decreased individual fitness and reduced population growth rates (Kardos et al., 2016). Inbreeding depression primarily arises from loss of heterozygosity at loci that are overdominant, and from the expression of deleterious recessive alleles (Charlesworth & Willis, 2009; Kardos et al., 2016). However, determining which loci show heterozygote advantage and the extent of their phenotypic effects remains poorly understood (Kardos et al., 2016).

High-throughput sequencing technologies can determine genome-wide levels of heterozygosity (Kardos et al., 2016). If heterozygosity is measured in several individuals from an inbred population, the relationship between heterozygosity and fitness can be assessed.

Areas of the genome where heterozygosity is important to fitness retain heterozygosity, despite inbreeding, whereas areas that are either neutral or subject to directional selection are expected to be more homozygous. Guo et al., (2016) sequenced the genome of an inbred laboratory population of the planarian, Schmidtea mediterranea and showed that heterozygosity was maintained at 37.5% of the genome after ten generations of selfing. The expected amount of heterozygosity after ten generations of selfing is almost zero. Kardos et al., (2018) sequenced the genomes of 97 grey wolves (Canis lupus) from a semi-isolated and bottlenecked population in Scandanavia, and also discovered that heterozygosity had been

37

maintained throughout the genome. Combined, these findings suggest that the maintenance of heterozygosity across much of the genome is important for the reproductive success and survival of both S. mediterranea (a hermaphrodite worm) and C. lupus (a mammal).

Apis mellifera capensis (hereafter Capensis) is a subspecies of honey bee whose natural range is the Western and Eastern Cape provinces of South Africa. Capensis is atypical among honey bees because unfertilized worker-laid eggs usually develop into diploid female offspring (Onions, 1912; Anderson, 1963) via thelytokous parthenogenesis with central fusion

(Verma & Ruttner, 1983; Cole-Clark et al., 2017). (In contrast, in all other honey bee populations unfertilized eggs result in haploid males via arrhenotokous parthenogenesis).

Thelytoky predisposes Capensis workers to social parasitism (Beekman et al., 2008). In an extraordinary example of a 'social cancer' (Oldroyd, 2002) an asexual lineage of Capensis workers presently parasitises the strictly arrhenotokous population of the African honey bee

(A. m. scutellata, hereafter Scutellata) that is found throughout northern South Africa

(reviewed in (Beekman et al., 2008)). This parasitic lineage, hereafter the Clone, arose after commercial Capensis colonies were transported across the stable hybrid zone into northern

South Africa where only Scutellata colonies exist (Beekman et al., 2008). The current lineage, consisting of millions of workers that infest thousands of host colonies, are all the descendants of a single worker that lived in 1990 (Kryger 2001; Baudry et al., 2004; Oldroyd et al., 2011). This ancestral worker was derived from the sexual Capensis population that is

38

extant in southern South Africa. It was almost certainly produced sexually, and was therefore heterozygous throughout much of its genome.

Under thelytoky with central fusion, meiosis proceeds as usual resulting in four haploid pronuclei. Diploidy is then restored by the fusion of the two central pronuclei

(Suomalainen et al., 1987; Stenberg & Saura, 2009). Due to the fusion of the central products, the pronuclei involved are descended from the two alternate products of meiosis I (Cole-

Clark et al., 2017) (Fig. 2.1). In the absence of meiotic recombination, the maternal allelic state is maintained in the offspring. However, where loci are free to recombine, there is a one third chance that a locus that is heterozygous in the mother will become homozygous in the offspring (Pearcy et al., 2006; Oldroyd et al., 2011). The probability that a locus that is heterozygous at generation 0 remains heterozygous declines at a rate of (1-1/3)n where n is the number of generations. Thus, after 20 years (even with a conservative assumption of one generation per year), the probability that thelytokous lineages such as the Clone will be heterozygous at regions that are free to recombine is less than 0.0004% (Baudry et al., 2004;

Oldroyd et al., 2008). It is important to recognize that loss of heterozygosity is a one-way street. Once heterozygosity is lost in a worker’s offspring, all subsequent descendants are also homozygous. As a consequence, every member of the Clone population carries one or both of the alleles that were present in the common ancestor at every locus. Some loci will be homozygous in a particular individual because heterozygosity has been lost with respect to

39

the common ancestor. With sufficient sampling of the current Clone population, the genotype of the common ancestor can be inferred by parsimony (Baudry et al., 2004; Goudie &

Oldroyd, 2014).

The only genetic change that can occur in the Clone over time is loss of heterozygosity.

Gene flow into the Clone population from the Scutellata population is impossible because workers cannot mate. There can potentially be gene flow from the Clone population into the

Scutellata population if a Clone larva is raised as a queen by its Scutellata hosts. But this situation is likely to be rare or absent. The resulting queen would mate (Beekman et al. 2011), and its offspring would not be recognised as Clones because they would carry alleles absent from other Clones. Because the Clone population is large and asexual, genetic drift (i.e. changes in allele frequency due to stochastic sampling in a sexual population) cannot occur.

In reality, despite the expectation that all members of the current Clone should be completely homozygous for all loci that are free to recombine, heterozygosity is maintained throughout genomes of every extant Clone (Baudry et al., 2004; Härtel et al., 2006; Oldroyd et al., 2011). Two mechanisms have been proposed to explain the high levels of heterozygosity.

First, a reduction in recombination frequency could help maintain heterozygosity (Moritz &

Haberl, 1994; Baudry et al., 2004). However, (Goudie et al., 2012) showed that for rates of recombination observed in the field, all heterozygosity should be lost within a few generations. Second, heterozygosity could be maintained by selection against homozygous

40

recombinants at regions of the genome that are subject to heterozygote advantage

(overdominance) (Oldroyd et al., 2011; Goudie et al., 2012, 2014b). A well-characterised example of this type of selection is the maintenance of heterozygosity at the complementary sex-determining locus (csd). The sex of a diploid honey bee is determined by the allelic state at this single locus (Beye et al., 2003). Honey bees must be heterozygous at the csd for the female phenotype to be expressed. Homozygosity at the csd results in the production of a diploid male, which is inviable and cannibalised by nurse workers at the early larval stage

(Woyke, 1963). Therefore, csd is a prime example of a gene that exerts strong selection to maintain heterozygosity.

(Goudie et al., 2014) mapped patterns of heterozygosity and homozygosity along chromosomes III and IV of the Clone using microsatellites and argued that it is selection for overdominant genes that maintains heterozygosity in the Clone. (Goudie et al., 2014) showed that there are at least three overdominant genes maintaining heterozygosity on chromosome

IV, and four genes (including the csd) maintaining heterozygosity on chromosome III.

(Goudie et al., 2014) therefore argued that heterozygosity is maintained via relatively rare selectively-overdominant loci. Thus, while heterozygosity has been completely lost in some regions of chromosomes III and IV, heterozygosity is retained at other regions by these putative overdominant loci.

41

Honey bees have extremely high rates of recombination resulting in more than five recombination events per chromosome per meiosis (Beye et al., 2006; Solignac et al., 2007; Liu et al., 2015; Wallberg et al., 2015) and low levels of positive interference (Solignac et al., 2004).

Outside of centromeric regions, recombination occurs at high frequencies along all chromosomes (Nachman, 2002; Solignac et al., 2007; Wallberg et al., 2015). There is currently no evidence that recombination events consistently occur in the same regions along a chromosome (hot spots) within the honey bee genome (Wallberg et al., 2015).

The probability of a crossover event increases as the distance from the centromere increases. A single recombination event during a thelytokous meiosis results in the complete loss of heterozygosity at all regions in the telomeric direction from the point of the chiasmata because the two proximate chromosomes are brought together (Goudie & Oldroyd, 2014)

(Fig. S2.1). Heterozygosity can be restored towards the telomere by a second recombination event that must occur during the same meiotic division as the first recombination (i.e. a double crossover) (Fig. S2.1). In sum, Clone chromosomes are expected to retain heterozygosity near the centromere and be increasingly homozygous towards the telomere

(Fig. S2.1). The proportion of the chromosome that is homozygous should increase with each generation. In agreement with this prediction, (Goudie et al., 2014) showed that centromeric interference maintains heterozygosity for c.a. 40 cM along chromosomes III and IV.

42

The Clone has evolved into three major sub-lineages (A, B and C) which are all found in varying proportions throughout north east South Africa. They can be characterised by their microsatellite genotype at several loci (Oldroyd et al., 2011). Here, we assess genome-wide heterozygosity in each of the sub-lineages at single base resolution. If one or two genes with a large effect are important for honey bee fitness we would expect to see a small number of similar heterozygous regions per chromosome across all sub-lineages (Fig. S2.2), and potentially some chromosomes that are completely homozygous. In contrast, if heterozygosity is maintained by a large number of genes with a small effect we would expect to see heterozygosity maintained throughout the genome. Loss and maintenance of heterozygosity would likely be inconsistent across sub-lineages because selection for heterozygosity would be relatively weak at some loci (Fig. S2.2). By this means we assess the extent to which heterozygosity is important to the genetic architecture of honey bees, and potentially to insects more broadly. We aim to explain the pervasiveness of heterozygosity within the honey bee genome and assess the extent to which heterozygote-advantage has shaped the genomic architecture of honey bees, and potentially to insects more broadly.

Materials and Methods

We identified one individual from each of the three sub-lineages of the Clone (A, B and C) based on their characteristic microsatellite genotypes (Oldroyd et al., 2011). These bees were

43

sampled in 2010, approximately 20 years after the common ancestor of the Clone. We sequenced the whole genome of each individual using Illumina HiSeq2000 in paired-end mode (2 by 150-bp reads). Each bee was sequenced in a single lane. Raw sequence data can be downloaded at the NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra),

BioProject ID (Submission ID: xxxx).

To assess heterozygosity throughout the Clone’s genome we developed the following pipeline (This pipeline is available at: https://github.com/Social-Insect-

Genomics/JEB.heterozygosity.fitness.pipeline). FASTQ files were aligned to the Apis mellifera genome assembly AMELv4.5 (Elsik et al., 2014) using the default parameters of BWA 0.7.12 after adapter sequences had been removed (Li & Durbin, 2009). These alignments were then imported into SAMtools 1.2 (Li et al., 2009) and converted into BAM format. PCR duplicates were marked using Picard tools (http://picard.sourceforge.net, version 1.130). We realigned sequences with the Genome Analysis Tool Kit 3.5-0 (GATK) RealignerTargetCreator followed by IndelRealigner to reduce any potential erroneous alignments near insertions and deletions

(McKenna et al., 2010). To improve the base quality scores, these scores were recalibrated using GATK (McKenna et al., 2010). Following this, we detected single nucleotide polymorphisms (SNPs) and created variant calling files (VCF) using GATK HaplotypeCaller

(McKenna et al., 2010).

44

We excluded SNPs within each individual with a quality score (QUAL) less than 30 and a Fisher strand (FS) bias value greater than 60 using GATK VariantFiltration (McKenna et al., 2010). We further excluded SNPs within each individual using VCFTools 0.1.15 (Danecek et al., 2011) by the following criteria: i) mapping quality (MQ) less than 40, quality by depth value (QD) less than two, genotype quality (GQ) less than 20, read depth (DP) less than 10 and heterozygous SNPs that had an allele ratio less than 0.15 (Fig. S2.3); ii) those that fell within ten base pairs of insertions and deletions. We removed SNPs located within unplaced and non-nuclear scaffolds, centromeres and repeat regions using VCFtools (Danecek et al.,

2011). Centromere positions were determined by performing a Blastn match of centromeric microsatellite primer sequences from (Solignac et al., 2007) back to the A. mellifera reference genome with an E-value of 0.5 and percent identity greater than 95%. SNPs within centromeric positions were removed using VCFtools (Danecek et al., 2011). Except for chromosome 1 the centromere is located at the beginning of each honey bee chromosome

(Elsik et al., 2014). We also removed SNPs within potentially duplicated regions using

VCFTools 0.1.15 (Danecek et al., 2011). Duplicated regions were determined to be sites where the read depth was twice the average read depth at a SNP in each sub-lineage. The total number of SNPs removed at each filtering step and the number and density of SNPs along each chromosome are tabulated in Tables S2.1, S2.2 and S2.3.

45

We used Sanger sequencing to validate a subset of the SNPs identified by next- generation sequencing, thereby ensuring that our SNP calling was not an artefact of the next- generation sequencing approach. We selected four exons for sequencing based on the high level of polymorphism in these regions. We sequenced 528-666 bases from each of these exons from the three Clones used for the next-generation sequencing using the primers given in

Table S2.4. PCR was performed using KAPA2G Robust reagents and condition (Sigma-

Aldrich). PCR products were cleaned-up with GF-1 PCR Clean-up Kit (Vivantis) and cloned into pCR4-TOPO vector using a TOPO TA Cloning and PureLink Quick Plasmid Miniprep

Kit (ThermoFisher Scientific). Of the 70 SNPs identified within the regions sequenced in the next-generation sequences, the same 70 were also present in the manual Sanger sequences.

Further, we did not identify SNPs in the Sanger sequencing that were absent in the next- generation sequencing approach (Table S2.4).

We removed the unplaced and non-nuclear scaffolds, centromere and repetitive regions from the honey bee reference genome to determine the number of accessible sites. We then calculated the size of the genic and intergenic regions and the entire genome using

Bedtools 2.24.0 (Quinlan & Hall, 2010). The size of the accessible honey bee genome is approximately 110 Mb and contains 10,884 genes. The size of genic and intergenic regions is

58,176,338 and 51,967,713 bp, respectively. The sizes of unplaced and non-nuclear scaffolds, centromeric regions and repetitive regions for each chromosome are provided in Table S2.3.

46

To calculate the breadth of coverage of the accessible genome we subtracted all sites that were removed and would have been removed due to our SNP filtering criteria from the accessible genome size. We divided the remaining number of sites by the accessible genome size to determine the percentage of the accessible genome covered. We also calculated the average read depth at callable sites in each sub-lineage.

We created 10kb windows across the entire accessible genome to estimate the heterozygous SNP density in each of the sub-lineages. We subtracted the number of sites where a SNP could not be called within each 10kb window to obtain the total number of callable sites within a particular window. We divided the number of heterozygous SNPs by the total number of callable sites within a window to estimate the heterozygous SNP density of each window. We also carried out this analysis using a window size of 100kb and 250kb.

We tried various window sizes and discovered that the window size selected does not materially influence the patterns of heterozygosity observed (Fig. S2.4, S2.5).

We created 10kb windows outside of centromeric regions and determined whether a window was either heterozygous or homozygous. For a window to be called as being heterozygous, one or more heterozygous SNPs needed to be present. We then fitted a binomial, non-linear, mixed-effects model (GLMM) with a logistic link function to assess the effects of genic and intergenic regions and distance from the centromere on the probability that a window would contain one or more heterozygous SNP (Table 2.1). Here, genic regions are defined as a

47

window that overlaps a protein coding region. Intergenic regions were defined as those regions that did not meet this criterion. We modelled log( distance of a window from the centromere), genic vs intergenic region and their interactions as fixed effects. We modelled sub-lineage as a random effect. We used the ‘glmmPQL’ function within the library ‘MASS’ in

R (R Core Team, 2013) to carry out this analysis. The raw data and files relating to this statistical model are archived at Dryad: doi:10.5061/dryad.4pv2ft0.

If there are a few key genes on each chromosome that are crucial to the fitness of the honey bee, one or more overdominant loci must exist within each of the heterozygous regions reported here (Fig. 2.2). An important issue is the genetic distance over which a single overdominant gene or centromere can exert its influence over linked but selectively neutral regions. Based on (Engelstädter, 2017), we derive in the Supplementary Information the rate at which heterozygosity is lost at a neutral locus that is linked to an overdominant region.

This rate is a function of the distances of the neutral locus from its centromere and the overdominant region, and the number of generations that have elapsed (Fig. 2.3). If we assume that there are five generations per year in the Clone population (honey bee workers generally live four weeks with a minimum generation time of five weeks (Winston, 1987), conservatively one hundred generations had elapsed since the common ancestor and our

2010 sample of the sub-lineages.

Results

48

The average sequencing depth across the accessible genome in sub-lineages A, B and C was 46, 69 and 50, respectively. The breadth of coverage (the percentage of the genome that is covered) across the accessible genome in sub-lineages A, B and C was 89%, 82% and 90%, respectively. The total number of heterozygous SNPs in sub-lineages A, B and C were

482,252, 307,470 and 456,275, respectively.

Of the 10,884 genes present in the genome, 3,766 (34%) have retained heterozygosity at at least one SNP across all three sub-lineages. Furthermore, heterozygosity has been maintained across the same extensive regions (> 200kb) in all three sub-lineages (Fig. 2.2, S2.4,

S2.5). However, there are also large regions (> 200kb) where heterozygosity has been lost or is reduced (Fig. 2.2, S2.4 and S2.5) and regions that have retained heterozygosity are not always in common across the three sub-lineages. While sub-lineages A and C show similar patterns of heterozygosity across the genome they differ from sub-lineage B (Fig. 2.2). There are also regions where sub-lineages A and C differ; for example, towards the end of chromosomes 4 and 5 and the beginning of chromosome 13. Within sub-lineage B, significant stretches (>

200kb) of the genome were consistently punctuated by short (< 100kb) and long (> 200kb) regions where heterozygosity has been reduced or lost altogether (Fig. 2.2). In sub-lineage B, heterozygosity has been lost along different segments of the chromosomes but is retained towards the telomere. This suggests that a few telomeric loci have a large effect on fitness. In general, this phenomenon was not observed in sub-lineages A and C.

49

The region of the genome (genic or intergenic) had a significant effect on the probability that a window is heterozygous, but not the distance from the centromere (Table

2.1). There was a significant genomic region by distance from the centromere interaction

(Table 2.1). The number of heterozygous windows within intergenic regions are significantly greater than the number of heterozygous windows within genic regions (t48303 = -0.43, P =

0.0003, Table 2.1, Fig. 2.2). The number of heterozygous windows does not significantly decrease as the distance from the centromere increases (t48303 = -0.02, P = 0.0691, Table 2.1, Fig.

2.2). These results do not appear to be dependent on windows with only one SNP present.

Less than 5% of heterozygous windows across the three sub-lineages contained a single SNP.

Our mathematical model of how an overdominant locus influences heterozygosity in adjacent regions (see Supplementary Material) illustrates that as the distance from the overdominant region and centromere increases, the probability that a neutral region retains heterozygosity decreases (Fig. 2.3). More specifically, after 100 generations, the probability that a neutral region that is 500 kb or 1,000 kb from an overdominant region and is located between the overdominant region and the telomere will have retained heterozygosity is less than 1% (Fig. 2.3E and 2.3F). Further, the probability that a neutral region that is 500 kb centromeric from an overdominant region will have retained heterozygosity after 100 generations is also less than 1% (Fig. 2.3F).

Discussion

50

Without selection for heterozygosity, all heterozygosity should have been lost in the

Clone within approximately ten generations, whereas in fact, heterozygosity has been maintained throughout the genome (Baudry et al., 2004; Goudie et al., 2014). Our findings suggest that heterozygosity in the Clone is maintained by heterozygote advantage at multiple genes along most chromosomes. However, the heterogeneity in the degree of heterozygosity among sub-lineages suggests that many overdominant loci have only small fitness benefits.

Sub-lineage B is able to tolerate loss of heterozygosity in substantial regions of its genome, even though selection appears to maintain heterozygosity at these same regions in lineages A and C (Fig. 2.2). Surprisingly, we found that intergenic regions are more likely to be heterozygous. This indicates that the maintenance of heterozygosity within intergenic regions is also crucial for the survival and reproduction of honey bees. Further, heterozygosity within these regions could also influence the expression and function of genic regions which may be important for the fitness of honey bees. Heterozygosity is also maintained towards the telomeres on many chromosomes due to multiple overdominant loci within these regions (for example, chromosome 3). The genic regions that have retained heterozygosity across all three sub-lineages most likely contain genes that are overdominant, although these genes appear to vary in the extent of their advantage. For example, within sub-lineage B, heterozygosity is retained in a small number of regions along some of the chromosomes. Each heterozygous region is followed by regions where heterozygosity has been substantially lost. This pattern

51

strongly suggests that each heterozygous region is retained in its heterozygous state by one or more overdominant genes within that region, but that much of the genome can be homozygous with minimal adverse effects. Our findings suggest an overdominant locus or centromere can only maintain heterozygosity across a small proportion of the chromosome

(less than 250 kb) over 100 generations (Fig. 2.3). The pattern of heterozygosity across the

Clone genome is therefore consistent with the presence of several genes with heterozygote advantage per chromosome.

In contrast to sub-lineage B, sub-lineages A and C are heterozygous across most chromosomes, which suggests that a far greater number of loci are overdominant in A. mellifera than sub-lineage B might suggest. Although, sub-lineages A and C have substantial regions of homozygosity, more than 34% of genes have retained heterozygosity (Fig. 2.2).

Stochastic events that are not lethal could explain the differences between each of the sub- lineages. For example, consider a single recombination event in one sub-lineage where overdominant effects are not severe, homozygosity would be permanently lost in this sub- lineage. Eventually one might predict the demise of a more homozygous lineage in competition with other, more heterozygous lineages like A and C. Nonetheless, the patterns of heterozygosity are generally similar across many of the chromosomes across all three lineages, indicating that there are areas where the negative effects of homozygosity are more severe.

52

The csd locus on chromosome III is an overdominant gene that is responsible for the expression of the female phenotype (Beye et al., 2003). This locus must remain heterozygous for a Clone worker to be viable. It is therefore unsurprising that heterozygosity has been maintained around this genomic location. Our findings confirm previous results based on microsatellites in this region (Oldroyd et al., 2011; Goudie et al., 2014). It is also notable that there is no clear signal of csd (GB47022) in Fig. 2.2 - the region surrounding csd is no more heterozygous than many other regions of the genome. Thus, along with csd that is known to be homozygous lethal (Woyke, 1963; Beye et al., 2003), these regions throughout the genome must retain heterozyogisty.

Our results show similar phenomenon to some other populations (wild wolves and laboratory planarians) that show heterozygosity throughout the genome despite many generations of inbreeding (Kardos et al. 2018) or selfing (Guo et al. 2016) - though not clonal propagation as in the Clone). However, it is important to note that the mating system of the honey bee differs markedly from these previously-studied species. The extreme polyandry of the honey bee promotes outbreeding (Winston, 1987; Palmer & Oldroyd, 2000), whereas wolves are monogamous (Milleret et al., 2017), which is more likely to lead to inbreeding.

Despite the differences, heterozygosity appears to be crucial for the survival and reproduction in all of these organisms.

53

Genomic rearrangements such as inversions and translocations can potentially suppress recombination events (Coyne et al., 1991) and thereby reduce the rate of loss of heterozygosity. However, an inversion will not prevent loss of heterozygosity arising from any recombination event that takes place outside of the inversion. Given that inversions, should they exist, are likely to be infrequent and short, inversions and their consequential reduced recombination are an unlikely explanation for the maintenance of heterozygosity in the Clone. Furthermore, translocations and gene duplications are likely to be present on both homologous chromosomes and therefore cannot interfere with recombination. We note that empirically, recombination rates in the Clone are similar to those seen in other honey bees

(Goudie et al., 2012).

A final potential mechanism by which heterozygosity could be maintained in the

Clone is mutation of a base back to one present in the ancestor of the clone. The estimated frequency of mutation per base per generation in the honey bee genome is 6.9-10 (Yang et al.,

2015). Therefore, the expected frequency of de novo mutations per window is 6.9 x 10-6 per generation. As a result, de novo mutations have a minimal effect on the patterns on heterozygosity observed here.

Honey bees are haplodiploid meaning that males are hemizygous and are therefore never in a heterozygous state (Crozier & Pamilo, 1996). It is therefore curious that heterozygosity plays such an important role in honey bees when its importance must be

54

limited to only females. Nonetheless, crosses show strong non-additive effects for both colony-level (for example, (Cale Jr & Gowen, 1956; Oldroyd et al., 1985; Oldroyd & Goodman,

1988; Nonacs & Kapheim, 2008) and individual-level (Roberts, 1961; Brückner, 1979; Clarke et al., 1992) traits. Thus, while heterotic effects are limited to females, they are none-the-less extremely important in honey bees.

In conclusion, this study provides a whole-genome assessment of the importance of heterozygosity in a wild (albeit parasitic) insect population. The Clone provides an ideal system for such an analysis because the common ancestor existed recently, allowing direct observation of loss and maintenance of heterozygosity in a tractable natural system. The

Clone has revealed the critical importance of genome-wide heterozygosity in the honey bee, while also revealing regions such as those surrounding the csd that are under strong overdominant selection. These phenomena are likely to be present in many outbreeding insect species.

Acknowledgements

We thank Emily Remnant for assistance with Sanger Sequencing. We also thank members of the Behaviour and Genetics of Social Insects lab for comments on the manuscript. This work was supported by an Australian Research grant DP150101985 to BP Oldroyd and R Gloag.

55

Figure 2.1 Automixis central fusion with and without recombination.

Central fusion. (A) In the absence of recombination, the maternal allelic state is maintained in the offspring. However, where loci are free to recombine, there is a one third chance that a locus that is heterozygous in the mother will become homozygous in the offspring (for further details, see (Suomalainen et al., 1987; Stenberg & Saura, 2009) (B). This figure was modified from (Engelstädter et al., 2011) and (Goudie & Oldroyd, 2014).

56

Figure 2.2 Heterozygosity SNP density across each sub-lineage. Heterozygous SNP density per window (10000bp) along chromosomes 1-16 in each of the three sub-lineages of the Clone

(A - C). For example, 1A is chromosome 1 in sub-lineage A. The position of the centromere on each chromosome is shown by gray boxes. The black regions are regions of low coverage

(where the read depth is less than ten). The scale bar numbers at the bottom of thed figure represent the range standardised number between zero and one of the heterozygous SNP density per 10000 bp.

57

Figure 2.3. Assessing the effects of the centromere and overdominant genes along honey bee chromosomes. The probability that a neutral region heterozygous at generation 0 will remain heterozygous after 100 generations as a function of distance from the centromere and overdominant region (A =10kb, B = 50kb, C =100kb, D = 250kb, E = 500kb and F =1000kb). The position of the neutral region is shown on the x-axis, with negative values indicating that the

58

neutral region is between the centromere and the overdominant region. The lowest negative value indicates that the neutral region is located at the centromere. At position 0 the neutral region shares the same position as the overdominant locus. Positive values indicate that the neutral region is located between the overdominant region and the telomere.

59

Table 2.1 Binomial logistic regression testing the effects of genic and intergenic windows,

distance from the centromere and their interactions on the probability that window would

contain one or more heterozygous SNPs.

Standard Degrees of Source of Variation Value Error Freedom t P 48303 Distance from centromere (log) -0.03 0.02 -1.81 0.0691 Genic window -0.43 0.12 48303 -3.58 0.0003 Distance from centromere (log) x Genic window 0.05 0.02 48303 2.48 0.0128

60

Supplementary Material

Table S2.1 SNP filtering pipeline developed, the parameters used and the number of SNPs filtered at each stage of the analysis.

Filter sub-lineage (A) Removed SNPs Number of SNPs removed Removing Unplaced Scaffold 17 and 18 171,861 Scaffolds Quality < 30 18,905 Fisher Strand FS > 60 875 Mapping Quality MQ < 40 50,598 Quality by Depth QD < 2 41,87 Genotype Quality GQ < 20 61,703 Depth DP > 10 64,795 Indels (+ / - 10 base pairs) 291,287 Repetitive regions 549,534 Centromeric regions 295,488 Maximum of Two Alleles > 2 2,062 per site Minimum Allele MAF < 15 1268 Frequency Potential Duplications Twice the Average SNP 931 depth Filter sub-lineage (B) Removing Unplaced Scaffold 17 and 18 63,933 Scaffolds Quality < 30 7,290 Fisher Strand FS > 60 389 Mapping Quality MQ < 40 54,408 Quality by Depth QD < 2 4,404 Genotype Quality GQ < 20 12,980 Depth DP > 10 9,808 Indels (+ / - 10 base pairs) 357,260 Repetitive regions 670,708 Centromeric regions 341,915

61

Maximum of Two Alleles > 2 3,253 per site Minimum Allele MAF < 0.15 1,008 Frequency Potential Duplications DP > 1,127

Filter sub-lineage C (45) Removing Unplaced Scaffold 17 and 18 224,586 Scaffolds Quality < 30 6,477 Fisher Strand FS > 60 546 Mapping Quality MQ < 40 55,927 Quality by Depth QD < 2 5,473 Genotype Quality GQ < 20 11,135 Depth DP > 10 8,072 Indels (+ / - 10 base pairs) 35,1149 Repetitive regions 666,342 Centromeric regions 342,057 Maximum of Two Alleles > 2 3,084 per site Minimum Allele MAF < 0.15 1140 Frequency Potential Duplications Twice the average SNP 997 depth

62

Table S2.2 The number of SNPs, the average SNP density for each chromosome and the transition and transversion ratio after filtering.

Sub-lineage A Sub-lineage B Sub-lineage B Sub-lineage C Sub-lineage C Sub-lineage A Density per bp Number of SNPs Density Number of Density Chromosome Number of SNPs SNPs 1 149,733 158 140,566 169 141,738 158 2 60,711 183 61,111 182 60,798 183 3 93,773 132 80,091 154 93,855 131 4 62,575 141 55,663 158 58,778 150 5 47,687 153 33,317 218 39,896 182 6 109,102 131 100,292 142 109,129 131 7 13,720 186 10,308 247 8,590 245 8 60688 207 63969 196 60933 206 9 70,340 120 49,808 169 70,423 120 10 64,677 165 60,224 177 64,762 164 11 83,825 164 75,871 182 84,046 164 12 77835 134 65023 161 77,921 134 13 52,387 174 49,246 185 50,379 181 14 32,665 207 29,101 232 34,399 196 15 50,468 165 50,587 164 50,502 164 16 25,172 255 24,137 266 25,134 255 Total number of SNPs 1,055,358 949,314 1,031,283 Ts/Tv 4.9328 4.8961 4.9357

63

Table S2.3 Number of base pairs removed that occur in unplaced and non-nuclear scaffolds, centromeric and repetitive regions in the honey bee genome.

Chromosome Centromere Size 1 4804353 2 3965060 3 402076 4 3225465 5 6414266 6 2680404 7 10413356 8 534281 9 2195529 10 987881 11 88985 12 723496 13 695747 14 3829620 15 854883 16 2039791 Total Centromere Size 43855193 Repetitive Regions 51223037 Unplaced and non- nuclear scaffolds 30657388 Total Centromere Size + Repetitive Regions + Unplaced 125735618 and non-nuclear scaffolds

64

Table S2.4 Summary of the primers, samples and fragment length from the four genes used to validate SNPs with Sanger

Sequencing. The position and number of SNPs discovered using Sanger sequencing were identical to those uncovered using next-generation sequencing.

Gene Scaffold Positon Forward primer Reverse primer Sub- Length SNPs lineage (bp) Odorant 2.17 862463, 862544, CCGCACATAAAT TGAACGAAGGCTG A 7 Receptor 862559, 862598, TGAACACC GAAACTT B 666 7 55 862637, 862717, C 7 (GB52413) 863008 Foraging 13.12 608782, 609086, ACACAGGCCTCG CAGGACGAAACAG A 5 (GB49908) 609191, 609196, AAAGAGAA AAGCACA B 569 5 609251 C 5 Vitellogen 4.6 420169, 420213, CGGCGTTTATTAC TTGGAAGAGGGATC A 12 in 420216, 420221, GTCCACT GAATTG B 625 12 (GB49544) 420231, 420257, C 12 420258, 420327, 420361, 420372, 420410, 420444 Ebony 1.23 484209, 484246, CCGGCATTAACC AGGCGAAAGTTCTG A 10 (GB46429) 484269, 484270, ACTGTTCT TTTCCA B 528 Absent 484294, 484419, C 10 484434, 484607, 484617, 484619

65

Figure S2.1 The relationship between recombination and heterozygosity and homozygosity in the clone. (A) A single recombination event (black X) will result in the loss of heterozygosity at all markers in the telomeric regions (c, d, e and f) and is not reversible in following generations. If locus f is overdominant, recombinant genotypes of this type will be selected against. (B) A double recombination event (two black Xs). Here, the first recombination event will result in the loss of heterozygosity at markers c and d but retention of heterozygosity at markers teleomeric to d. (Het = heterozygous region; Hom = homozygous region).

66

Figure S2.2 Illustration of a large number of genes with a small effect on fitness (A) and a small number of genes with a large effect on fitness (B) across a chromosome. Alleles are shown as upper and lower-case letters. Regions with different colours on each chromosome are heterozygous. In the Clone, if many genes need to be heterozygous, most of the chromosome will be heterozygous (A). However, if very few genes must be heterozygous, there will be homozygous segments (h, t and z) and a number of key overdominant loci along a chromosome (a, b and x).

67

Figure S2.3 Frequency of reference and alternate allele ratio in each of the three sub-lineages.

68

69

Figure S2.4 Heterozygosity SNP density across each sub-lineage (100kb). Heterozygous SNP density per window (100kb) along chromosomes 1-16 in each of the three sub-lineages of the Clone (A - C). For example, 1A is chromosome 1 in sublineage A. The position of the centromere on each chromosome is shown by gray boxes. The black lines are regions of low coverage (where the read depth is less than ten). The scale bar numbers at the bottom of the figure represent the range standardised number of heterozygous SNPs per 100kb.

70

71

Figure S2.5 Heterozygosity SNP density across each sub-lineage (250kb). Heterozygous SNP density per window (250kb) along chromosomes 1-16 in each of the three sub-lineages of the Clone (A - C). For example, 1A is chromosome 1 in sublineage A. The position of the centromere on each chromosome is shown by gray boxes. The black lines are regions of low coverage (where the read depth is less than ten). The scale bar numbers at the bottom of the figure represent the range standardised number of heterozygous SNPs per 250kb.

72

Loss of heterozygosity for loci linked to overdominant loci

We consider a locus A that is maintained heterozygous under overdominant selection that counterbalances the erosion of heterozygosity caused by central fusion automixis.

We are then interested in the maintenance of heterozygosity at another, neutral locus

B that is linked to A. If A is maintained heterozygous by overdominant selection, this means that offspring lineages that have become homozygous at A will eventually

(or even instantly) be purged from the population and will not leave any long-term descendants. Therefore, in order to understand erosion of heterozygosity at locus B, we need to obtain the fraction of offspring that have become homozygous at B but not at A among those offspring that have remained heterozygous at A.

Let 휸효횩 be the fraction of offspring that have become homozygous at locus B but remained heterozygous at locus A among all offspring and let 휸효 be the total fraction of offspring that have remained heterozygous at locus A. Then, our quantity of interest can be expressed as:

훾ΑΒ 훾ΒΑ = (1) 훾Α

Starting with a population that is heterozygous at A and B, the fraction of the

73

population that is still heterozygous at B after n generations can then be estimated as:

휌훣(푛) = (1 − 훾훣훢 )푛 (2)

What remains is to express 휸횩효 as a function of genetic distances of the two loci.

Let 푑퐂퐀 be the genetic distance of locus A from the centromere and let 푑퐀퐁 be the distance between A and B, both given in centimorgans (where 1 cM = 25,000 bp in honey bees, see (Liu et al., 2015). We can then distinguish two cases depending on whether locus B is situated between A and the telomere or between A and the centromere.

Case 1: B between A and telomere. Previous work (e.g., (Engelstädter et al., 2011)), has shown that for central fusion automixis and assuming a Poisson distribution of crossover events,

3푑 3푑 1 − CA 1 − CA 훾Α = 1 − 훾Α = 1 − (1 − 푒 100 ) = (2 + 푒 100 ) (3) 3 3

To obtain the numerator in Eq. 1, we first note that,

훾ΑΒ = 훾Β − 훾AΒ (4)

74

Substituting a result from (Engelstädter, 2017) (see Eq. 5), we get,

3푑 3푑 1 1 − CA 1 − AB 1 훾ΑΒ = (1 − 푒−3(푑CA+푑AB)/100) − (1 − 푒 100 ) × (1 + 2푒 100 ) = (2 + 3 3 3 9

3푑 3푑 − CA − AB 푒 100 ) (1 − 푒 100 ) (5)

After some simplifications, inserting the expressions from Eqs. 3 and 5 into Eq.

1 then yields,

3푑AB (1−푒− 100 ) 훾ΒΑ = (6) 3

Here, 휸횩효 does not depend on 푑퐂퐀 because locus A acts like a centromere from locus B’s perspective.

Case 2: B between A and centromere. We can proceed in a similar manner for this case. The formula for 휸효 is still given by Eq. 3. However, since the order of the two loci is now reversed relative to the centromere, we now have,

1 1 1 훾훢훣 = (1 − 푒−3(푑퐶퐴−푑퐴퐵)/100) − (1 − 푒−3(푑퐶퐴−푑퐴퐵)/100) × (1 + 2푒−3푑퐴퐵/100) = 3 3 3

2 (1 − 푒−3(푑퐶퐴−푑퐴퐵)/100)(1 − 푒−3푑퐴퐵/100) (7) 9

75

After inserting the expressions from Eqs. 3 and 7 into Eq. 1 and simplifying, we get

2(1−푒−3(푑CA−푑AB)/100)(1−푒−3푑AB/100) 훾ΑΒ = 3푑CA (8) 3(2+푒− 100 )

In this case, the 휸효횩 is affected by both genetic distances. We can interpret the formulae in Eqs. 6 and 8 as follows. For loci B that are distant to A towards the telomere, 휸횩효 will be close to 1/3 because linkage to A is so loose that B behaves like any other locus that is not linked to the centromere. For loci situated increasingly closer to A, 휸횩효 decreases towards zero as crossovers between A and

B become increasingly unlikely. For loci between A and the centromere, 휸횩효 will initially increase with increasing distance 푑퐀퐁 but when the half-distance between A and the centromere is reached it will start decreasing again and zero will be reached for loci tightly linked to the centromere. In essence, both A and the centromere take on the role of preserving heterozygosity of B, and for loci that sit in- between both of these forces are combined. Also note that for loci A that are not tightly linked to the centromere (large 푑퐂퐀), the conditional loss of heterozygosity is approximately symmetrical for loci on either side of A, i.e. in this case the formulae in Eqs. (6) and (8) are approximately the same for small 푑퐀퐁.

76

Chapter Three

Paternally-biased gene expression follows kin-selected predictions in female honey bee embryos

77

Abstract

The Kinship Theory of Genomic Imprinting (KTGI) posits that, in species where females mate with multiple males, there is selection for a male to enhance the reproductive success of his offspring at the expense of other males and his mating partner. Reciprocal crosses between honey bee subspecies show parent-of-origin effects for reproductive traits, suggesting that males modify the expression of genes related to female function in their female offspring. This effect is likely to be greater in the Cape honey bee (Apis mellifera capensis), because a male’s daughters have the unique ability to produce female offspring that can develop into reproductive workers or the next queen without mating. We generated reciprocal crosses between

Capensis and another subspecies and used RNA-seq to identify transcripts that are over- or under-expressed in the embryos, depending on the parental origin of the gene. As predicted, 21 genes showed expression bias towards the Capensis father’s allele in colonies with a Capensis father, with no such bias in the reciprocal cross. A further six genes showed a consistent bias towards expression of the father’s allele across all eight colonies examined, regardless of the direction of the cross. Consistent with predictions of the KTGI, six of the 21 genes are associated with female reproduction. No gene consistently showed over-expression of the maternal allele.

Keywords: genomic imprinting, gene expression, paternal effects, transcriptomics, kin selection

78

Introduction

The Kinship Theory of Genomic Imprinting suggests that, in polyandrous (i.e. females mate multiply) species, the reproductive interests of a father can be favoured if his offspring can secure additional resources from their mother, at the expense of the offspring of the other males (Burt & Trivers, 2006; Haig, 2002; Haig & Westoby,

1989). In contrast, the reproductive interests of a mother are generally favoured if her maternal resources are partitioned equally among her present and future offspring. This means that the reproductive interests of mothers and fathers are sometimes in conflict (Burt & Trivers, 2006). Therefore, selection can act divergently on fathers and mothers so that they modify certain genes in their offspring in ways that favour their offspring’s reproductive success. Since ‘imprinting’ often implies epigenetic mechanisms like DNA methylation and histone modification, we will refer to such genes as being ‘parentally manipulated’ to include RNA-mediated manipulations such as RNA interference, that do not include physical changes to inherited DNA. Parentally manipulated genes can be completely silenced when transmitted from one sex but expressed when transmitted from the other sex, or simply have an expression bias towards the allele transmitted by one parent or the other (Burt & Trivers, 2006; Haig, 2002).

Colonies of Hymenopteran (ants, bees and wasps) insects have long served as model systems for examining how cooperative behaviour can evolve (Hamilton,

1964a, 1964b; Trivers & Hare, 1976). More recently, insect colonies have also

79

emerged as important systems for understanding the evolutionary interplay between conflict and cooperation (Hamilton, 1964a, 1964b; Trivers & Hare, 1976).

Insect colonies headed by polyandrous queens provide a situation that is rife with potential for conflicts over reproduction (Figure S1). In honey bees, for example, queens mate early in life with an average of 12 males (Tarpy, Nielsen, & Nielsen,

2004; Winston, 1987). The queen stores spermatozoa from each drone in a specialised organ, the spermatheca, and utilises this stored sperm to fertilize female- (queen or worker) destined eggs. Honey bee colonies are, therefore, comprised of subfamilies of females (mostly workers), each of which share the same father. Thus, a father that can modify the expression of genes in his offspring so that his daughters are more likely to develop as a queen or a reproductive worker, has a much greater probability of reproductive success than another male that fails to do so (Drewell,

Lo, Oxley, & Oldroyd, 2012). Therefore, we predict there will be an expression bias towards paternal alleles in genes that are related to worker reproduction.

Apis mellifera capensis (hereafter, Capensis) and A. m. scutellata (hereafter,

Scutellata) are two honey bee subspecies from South Africa. Capensis and Scutellata have remarkably similar genomes, but strikingly different reproductive phenotypes

(Wallberg, Pirk, Allsopp, & Webster, 2016). Capensis is atypical among honey bees because unfertilized worker-laid eggs usually develop into diploid female offspring

(Anderson, 1963; Onions, 1912) via thelytokous parthenogenesis (Cole-Clark et al.,

2017; Verma & Ruttner, 1983). In contrast, in Scutellata and all other honey bee populations, unfertilized eggs almost always develop as males via arrhenotokous

80

parthenogenesis (Winston, 1987). The evolutionary consequences of thelytoky are profound because Capensis workers have evolved to compete with each other and their queen over the production of future queens (Goudie & Oldroyd, 2014) (see

Figure S3.1 for details). The unique ability of Capensis workers to produce both queens and workers without mating greatly enhances the evolutionary incentive of

Capensis drones to modify genes that increase the reproductive success of their worker offspring. In particular, Capensis fathers can increase their potential to sire queens, albeit indirectly, if their worker offspring produce the next queen thelytokously (Goudie & Oldroyd, 2014). The enhanced reproductive value of workers makes Capensis an ideal system to investigate parental manipulation of genes within social insects.

The first evidence of parent-of-origin effects on the reproductive physiology of social insects came from reciprocal crosses between Capensis and Scutellata

(Oldroyd et al., 2014). Although the F1 progeny of this cross were genetically identical on average, irrespective of the direction of the cross, offspring with a

Capensis father had 1/3 more ovarioles (the thread-like sub-units of the insect ovary in which eggs develop) than female offspring with a Scutellata father (Oldroyd et al.,

2014). This result suggested that Capensis males epigenetically modify their spermatozoa to increase the fecundity of their worker offspring.

Additional evidence for parent-of origin effects in honey bees comes from reciprocal crosses between Africanised and European honey bees (Galbraith et al.,

2016; Kocher et al., 2015). Examination of the transcriptome of worker offspring of

81

these crosses suggested parent-specific gene expression. Surprisingly, Kocher et al.

(2015) found a maternal bias in gene expression throughout the worker transcriptome extending across all stages of development and across multiple tissues

(Kocher et al., 2015). This result was unexpected because genes with paternally- biased expression are more likely to evolve (Kronauer, 2008; Queller, 2003). In contrast, Galbraith et al. (2016) reported a paternal bias in gene expression in worker ovaries in genes related to worker reproduction.

The mechanisms by which honey bees apparently manipulate gene expression in offspring according to the sex of the parent are unknown (Galbraith,

Kocher, et al., 2016; Kocher et al., 2015), but there are two plausible candidates: DNA methylation (Drewell et al., 2012) and the transmission of RNAs to offspring embryos via semen (Krawetz et al., 2011). Both mechanisms could potentially alter gene expression in the embryo or early larval stages, thereby altering development and enhancing the reproductive phenotype of the resulting adult worker. It is also possible that cyto-nuclear interactions (Wolf, 2009), could cause parentally-biased gene expression in offspring. However under this scenario the expectation is that gene expression should be biased towards maternal alleles (Wolf, 2009).

Here we determine whether there are parent-specific effects on gene expression in reciprocal crosses between the thelytokous Capensis, and arrhenotokous Scutellata. We predict, based on the KTGI, that some genes transmitted from Capensis fathers would be over-expressed in female offspring relative to expression levels of the same gene when it is transmitted from a Scutellata

82

father, and that these over-expressed genes will tend to be involved in female reproduction and development. To test this prediction, we generated replicate reciprocal crosses between the two subspecies using artificial insemination to generate offspring embryos that had, on average, identical nuclear genomes. The nuclear and mitochondrial genomes of Capensis and Scutellata are very similar, with only 60 fixed SNPs between the two populations (Wallberg et al., 2016). Indeed, the major phenotypic differences between the two subspecies are likely to be based on just five loci that relate to superficial body colour traits and to the reproductive mode of the workers (Wallberg et al., 2016). Based on whole-genome sequencing of each parent, we identified single nucleotide polymorphisms (SNPs) that could be used to identify the parent-of-origin of transcripts in offspring embryos. We then examined the gene expression profiles of the embryos using RNA-sequencing, and identified the parent-of-origin of each transcript for which we had found a parent- specific SNP. This allowed us to identify genes that were over-expressed from the genome of one parent relative to the genome derived from the other parent. Finally, we assessed previously-published DNA methylation data sets to determine whether genes that exhibited parent-specific gene expression are also methylated.

Methods

Crosses performed

83

We used instrumental insemination (Harbo, 1986) to generate interspecific reciprocal crosses between Capensis and Scutellata. The queens and drones from both subspecies were from colonies located in Stellenbosch, Western Cape, South

Africa (33.9321° S, 18.8602° E). The Scutellata parents were colonies originally brought from Groblershoop, Northern Cape (28.8730° S, 21.9790° E), which is located well north of the hybrid zone that separates the two subspecies (Beekman, Allsopp,

Wossler, & Oldroyd, 2008). For each subspecies we generated four replicate reciprocal crosses (Figure S2) using standard procedures for instrumental insemination (Harbo, 1986). Four half-sister virgin Scutellata queens were each inseminated with the semen of a different brother Capensis drone (replicates A, B, C and D, hereafter S x C, Figure S3.2). Four unrelated virgin Capensis queens were each inseminated with the semen of a different brother Scutellata drone (replicates E,

F, G and H hereafter C x S, Figure S3.2).

Sources of biological material

We retained the carcass of each drone used for insemination for later genotyping and whole genome sequencing. We also retained a portion of each queen’s wing for later genotyping. When the colonies were checked seven days after insemination, each queen was laying eggs. We used five DNA microsatellites

(Estoup, Solignac, & Cornuet, 1994) to confirm that the diploid offspring within each colony were produced by the queen we inseminated and that she had not mis-mated

84

(see Dataset S1, Supplementary Material online). We confirmed that both parental alleles were present in all worker offspring, one of the queen’s alleles, and the father’s (Oldroyd et al., 2018). We collected between 150 and 300 diploid eggs (24 –

72 hours old so that embryos were expressing zygotic transcripts (Pires, Freitas,

Cristino, Dearden, & Simões, 2016)) into TRIzol Reagent (Invitrogen) for RNA- sequencing. We chose to sequence embryos over other life stages, since it is known that DNA methylation and small RNA molecules, putative mechanisms by which parents might influence gene expression in offspring, either degrade or are removed after the early stages of larval development (Drewell et al., 2014; Remnant et al.,

2016). We therefore assumed that embryos have the greatest likelihood of showing parentally-biased gene expression. Following egg collection, we collected the queen from each colony into propanol (some queens were missing at the time of collection).

Queens were stored at -20C for later whole-genome DNA-sequencing. For the colonies with missing queens we collected between 12 and 20 emerging workers for later sequencing, so that the genotype of the missing queen could be inferred.

We isolated genomic DNA from the thorax of all eight fathers and the thorax of three Capensis queens and one Scutellata queen for sequencing. For the missing queens we isolated genomic DNA from the hind leg of each worker and pooled the samples to generate a composite for DNA-sequencing. We then inferred the missing queen’s genotype by subtracting the drone’s genome from the genome of the pooled workers.

85

For genomic sequencing, we used an Illumina HiSeq2000 machine at the Australian

Genome Research Facility (Melbourne) in paired-end mode (2 x 150-bp reads).

Drones were sequenced in two lanes and queens were sequenced in a single lane. The pooled workers for each colony where a queen was not available were also sequenced in a single lane. For RNA-sequencing we used an Illumina HiSeq2000 in paired-end mode (2 x 100-bp reads). RNAs from each egg pool were also sequenced in a single lane. Raw sequence data can be downloaded at the NCBI Sequence Read

Archive (http://www.ncbi.nlm.nih.gov/sra), BioProject ID (Submission ID:

PRJNA591427).

Genome and transcriptome analyses and the identification of differentially expressed genes

We used the whole genome sequence of the queens and drones from each colony to determine the genetic material that had been transmitted from each parent to offspring. The average sequencing depth across each queen and drone parental genome was 64 and 59 reads per base, respectively. To determine whether parent- specific and lineage-specific allele expression occurs in the offspring of C x S and S x

C crosses we aligned the pooled F1 embryo transcriptomes (FASTQ files) to the newly-created parent-specific reference genomes using the default settings in

TopHat2 (Trapnell, Pachter, & Salzberg, 2009). We developed the following pipeline to determine the genetic material that had been transmitted from each parent to

86

offspring. (This pipeline is available at: https://github.com/Social-Insect-Genomics/).

FASTQ files were aligned to the A. mellifera genome assembly AMELv4.5 using the default parameters of BWA 0.7.12 (Li & Durbin, 2009). These alignments were then imported into SAMtools (Li et al., 2009) and converted into BAM format. PCR duplicates were marked using Picard tools (http://picard.sourceforge.net, version

1.130). Following this, we called variants using Freebayes

(https://github.com/ekg/freebayes). We set the ploidy to diploid for each queen and haploid for each drone. We excluded SNPs using VCFTools 0.1.15 (Danecek et al.,

2011) by the following criteria: i) mapping quality (MQ) less than 40, quality (QUAL) less than 30, genotype quality (GQ) less than 20 and read depth (DP) less than 10; ii) those that fell within non-nuclear and unplaced scaffolds and repetitive regions. We removed SNPs that were present in both parents using VCFTools 0.1.15 (Danecek et al., 2011). Identified SNPs were considered to be diagnostic and used to create a specific reference genome for each parent. We used VCFtools 0.1.15 (Danecek et al.,

2011) to replace the reference allele with the variant we detected with Freebayes

(https://github.com/ekg/freebayes). We annotated SNPs using the Apis mellifera official gene set amel_OGSv3.2 (Elsik et al., 2014).

For the pooled F1 workers (this pipeline is available at: https://github.com/Social-Insect-Genomics/) we removed all homozygous SNPs because these sites were uninformative. We only assessed heterozygous SNPs that were also present in each father and passed our variant filtering criteria above. We calculated the nucleotide ratio at each heterozygous site within the pooled F1

87

workers and used this ratio to determine the genotype of the missing queen. For example, consider a site, in which the haploid reference genome is A and the father which is hemizygous has a different nucleotide T. At this stage, we cannot determine which nucleotide the mother has. Assume that the pooled sequences of offspring are heterozygous A / T. We can now determine whether the mother was heterozygous

(A / T) or homozygous (A / A) by assessing the ratio of nucleotides in the pooled offspring sequences. We determined that the mother was heterozygous if the percentage of reads with A were between 37.5% and 62.5% and homozygous A / A if the more than 62.5% of reads were A. Identified SNPs were then used to create the alternate reference genome for the queen. We used VCFtools 0.1.15 (Danecek et al.,

2011) to replace the reference allele with the variant detected with Freebayes

(https://github.com/ekg/freebayes). We annotated SNPs using the Apis mellifera official gene set amel_OGSv3.2 (Elsik et al., 2014).

To determine whether parent-specific and lineage-specific allele expression occurs in the offspring of C x S and S x C crosses we aligned the pooled F1 egg transcriptomes (FASTQ files) to the newly-created parent-specific reference genomes using the default settings in TopHat2 (Trapnell et al., 2009) (the potential allelic expression patterns within the F1 generation at an informative polymorphic site are presented in Figure S3). We removed all reads with mismatches. Removing mismatches ensures that reads only align to the parental genome where the SNP is present or absent. We calculated the number of reads overlapping each informative site. This pipeline is available at: https://github.com/Social-Insect-Genomics/.

88

From the transcriptome alignment we imported the read counts at each SNP into R (R Core Team, 2013). We used an existing statistical approach to identify parent-specific and lineage-specific allele expression in honey bees and other organisms (Galbraith et al., 2016; Kocher et al., 2015; Wang & Clark, 2014; Wang,

Werren, & Clark, 2016). We used generalized linear mixed effects models (GLMMs) to identify parent-specific and lineage-specific expression at each gene using the read count at each SNP. Genes that were significantly biased after false discovery corrections and passed our stringent filtering approach were declared as parentally- biased.

To assess the function of genes that showed significant parent-specific effects we uploaded the gene-id to DAVID Bioinformatics Resources 6.7 (Huang, Sherman, &

Lempicki, 2009) and Insect Base (Yin et al., 2016).

To determine whether DNA methylation facilitates parent-specific expression patterns, we assessed methylation patterns at these genes. We assessed whether genes with parent-specific expression were more likely to be methylated than the genome-wide average using published embryo methylomes (Remnant et al., 2016).

Methylated genes were defined as those that had any region within a gene (3’ UTR,

5' UTR, introns and exons) containing at least one methylated site and non- methylated genes as those that had no methylation. We removed methylated sites with a read depth less than four and required genes to be methylated across both datasets (Remnant et al., 2016).

89

From the whole genome sequencing data, we assessed whether interactions between the nuclear and mitochondrial genomes could cause differential nuclear gene expression, depending on the direction of the cross. We compared the alignments of the mitochondrial genomes between the two directions to identify unique SNPs between the two. We annotated non-snynoymous SNPs in the mitochondrial genomes using SnpEff (Cingolani et al., 2012).

Statistical analyses

To assess parent-specific allele expression patterns that were derived from

Capensis drones, Capensis queens, Scutellata drones and Scutellata queens we used the read counts obtained from the transcriptome alignment and analysed S x C

(replicates A, B, C and D) and C x S (replicates E, F, G and H) colonies separately.

We used the ‘glmmPQL’ function from the MASS library within R to (R Core Team,

2013) fit a Poisson, non-linear, mixed-effects model with a logistic link function. This analysis assessed the effects of the parent-of-origin, replicate colony and SNP location on the read count at each SNP that we identified from sequencing each parental genome. Queen and drone subspecies were modelled as fixed effects.

Colony number and SNP location were modelled as random effects.

We used a false-discovery threshold of p < 0.05 to correct for multiple testing

(Benjamini & Hochberg, 1995). We required the parental bias to be consistent across each SNP and replicate. For example, consider a transcript that has a significant

90

paternal bias with one informative SNP. This transcript can only be considered paternally biased if the SNP is paternally biased within each colony (replicate A, B,

C, D or E, F, G and H). Now, consider a transcript with two informative SNPs that has a significant paternal bias. This paternally-biased transcript can only be considered biased if each SNP exhibits the same paternal bias within each colony.

We followed the same method for n SNPs within a transcript. Further, for a transcript to be considered parentally biased, the percentage of maternal or paternal reads overlapping each SNP within each transcript had to be greater than 60% in the direction of the significant bias (Wang & Clark, 2014). This threshold is based on an evaluation of alternative methods for identifying parent-of-origin effects in multiple organisms (Wang & Clark 2014).

In a second analysis, we again fitted a Poisson, non-linear, mixed-effects model with a logistic link function to assess the effects of parent, subspecies, parent x subspecies interaction, colony-ID and SNP location on the read count at each informative SNP. We followed the same methods as above. This analysis is designed to detect parent-specific effects irrespective of the subspecies of the father.

Subsampling biological replication

Studies on the effects of parent-of-origin on gene expression can be prone to false positives arising from individual-specific gene expression that is unrelated to parent-

91

of-origin (Forstmeier, Wagenmakers, & Parker, 2017; Libbrecht, Oxley, Keller, &

Kronauer, 2016). To control for individual-specific false positives we required that a gene showed differential expression in the same direction when we assessed C x S and S x C colonies separately and when we assessed all eight colonies together. To explore the effect of biological replication on the number of genes found to be differentially expressed we generated new data sets by removing one C x S and one

S x C colony (resulting in three S x C and C x S colonies) or two C x S and two S x C colonies. We then re-evaluated the number of genes that showed parent-of-origin gene expression differences.

Results

We performed reciprocal crosses between Capensis and Scutellata. Each queen was inseminated with the semen of a single drone of the alternate subspecies

(four Capensis queen x Scutellata drone crosses (C x S) and four Scutellata queen x

Capensis drone crosses (S x C)) resulting in eight colonies of hybrid workers that were genetically similar and carried an intact parental chromosome set from the

Capensis parent and another from the Scutellata parent (Figure S2). We sequenced the genomes of each parent to identify parent-specific alleles, and the transcriptome of a pool of embryos (eggs) from each colony to determine whether there was an expression bias towards one or other parent’s allele in early development, and whether the direction of the cross altered expression.

92

Parent-Specific Allele Expression

From the genomes of the parents, we identified 7,138 SNPs that allowed us to identify each parent’s allele in each cross (Figure S2). These SNPs allowed us to identify the parent-of-origin of 2,251 genes (Dataset S1). Since there are 13,285 genes in the honey bee genome (Elsik et al., 2014), we were able to assess differential expression of parental alleles for 17% of genes. Of these genes, 96% were expressed in embryos.

We identified 32 genes that showed evidence of differential expression of one or the other parent’s allele consistently in at least one cross direction (corrected p-values following the false discovery rate control method (Benjamini & Hochberg 1995) are denoted by p’, generalized linear mixed effects models (GLMMs): all p’ < 0.05, Table

1, Figure 3.1). Where a gene showed differential expression, there was a strong bias towards expression of the father’s allele. Twenty-one genes showed increased expression of the paternal allele relative to the maternal allele in S x C colonies (i.e. where the father was Capensis), and four were paternally biased in C x S colonies

(Figure 3.1). Only three genes were maternally-biased in S x C colonies and two were maternally biased in C x S colonies (GLMMs: all p’ < 0.05, Figure 3.1). The putative functions of the 21 paternally-biased genes in Apis and other organisms are given in

Table 1. Six of the 21 genes (28%) are plausibly associated with female reproduction.

Six genes showed a consistent bias towards expression of the father’s allele across all eight colonies (GLMMs: all p’ < 0.05, Table S3.1, Figure 3.1), providing

93

evidence for paternal bias in gene expression regardless of the subspecies of the father. The putative functions of these six genes in Apis and other organisms are given in Table 1. Two of the six genes are associated with female reproduction and one with gametogenesis. No gene showed over-expression of the maternal allele across all eight colonies (GLMMs: all p’ > 0.05, Figure 3.1). We found no evidence of lineage-specific differential gene expression in embryos (GLMMs: all p’ > 0.05,

Figure 3.1). That is, no gene was over-expressed simply because it was derived from

Scutellata or Capensis, regardless of the sex of the parent.

94

Table 3.1 Summary of the genes which exhibit a paternal bias in allele expression in

S x C colonies.

Gene ID Gene Name Role in reproduction GB40217 Cuticlin-1 Y (Jovine et al., 2002) Associated with sperm receptors. GB40614 Vitamin K-dependent N gamma-carboxylase

GB41915 Dopamine N- Y (Cheng et al., 2012) acetyltransferase Linked to the production of melatonin. Melatonin initiates physiological processes such as reproduction and diapause. GB41989 Midasin Y (Chantha et al., 2010; Waiho et al., 2019) Essential for female gametogenesis progression in Arabadopsis thaliana. Linked to reproduction pathways in mud crabs. GB42426 Aminopeptidase N N

GB43271 Acetylcholine receptor N subunit alpha-like

GB44009 Biotin protein ligase Y (Watts et al., 2018) Crucial for early embryonic development in C. elegans. GB44884 Splicing factor 3B subunit 3 Y Linked to spermatogenesis in Drosophila pseudoobscura. GB46020 E3 ubiquitin-protein ligase N Praja-2

GB46687 Uncharacterised Protein ?

GB47086 Max-interacting protein 1 N

95

GB47518 Protein tipE N

GB47947 Titin N

GB48159 Multiple C2 and N transmembrane domain- containing protein 1 GB48331 GTP-binding protein REM 1 N

GB50573 Pyridoxal 5'-phosphate N synthase subunit PdxS

GB50825 Odorant receptor 22c Y (An et al., 2016) Associated with host-seeking behaviour, mating, and oviposition. GB53310 Glutaminase kidney N isoform, mitochondrial

GB53407 Translation initiation factor N IF-2

GB53538 Nischarin N

GB53596 Choline transporter-like N protein 1

96

Figure 3.1 Genes with parental bias in gene expression in embryos, displayed as the proportion of maternal and paternal reads identifiable by an informative SNP.

Each dot represents one SNP. The x-axis represents the maternal or paternal bias for each SNP within embryos from S x C colonies (no bias = 0.5). The y-axis represents the maternal or paternal bias for each SNP within embryos from C x S colonies. Red circles represent genes that were paternally biased across all colonies. Yellow circles represent genes that were paternally biased within S x C crosses. Orange circles represent genes that were paternally biased within C x S crosses. Blue circles represent genes that were maternally biased within S x C crosses. Green circles represent genes that were maternally biased within C x S crosses. Grey circles represent SNPs from genes that were not significantly biased. Note: There can be

97

more colored dots then the number of parentally-biased genes due to there being more than one SNP in each gene.

98

Cyto-nuclear Interactions

We assessed whether interactions between the nuclear and mitochondrial genomes could cause differential nuclear gene expression, depending on the direction of the cross. The mitochondrial genomes of the two parents of our crosses differed by just six non-synonymous SNPs (in a 16,343 bp genome). These minor differences are unlikely to explain the differential expression of paternally-inherited nuclear genes. No mitochondrial genes were differentially expressed in embryos.

Biological replication

For the transcriptome analyses above we used a stringent approach that required a gene to show the same parentally-biased expression pattern towards the same sex in all replicate colonies. For the 21 genes that showed paternally-biased expression in S x C colonies only, we explored the effects of biological replication on the number of false positives and the robustness of our conclusion that there is a bias towards the expression of paternal alleles by determining the number of differentially expressed genes when we considered different subsets of colonies. The average number of parentally-biased genes was 28 (SD ± 17.70) when we considered just two S x C colonies, and 14 genes (SD ± 22.34) when we considered three S x C colonies (Figure 3.2). There were significantly more paternally-biased than maternally-biased genes when we assessed only two (t8 = 3.36, P = 0.0065, Figure 3.2)

99

and three S x C colonies (t3 = 3.98, P = 0.0044, Figure 3.2). Therefore, although the number of parentally-biased genes changed with the number of colonies assessed, there was consistent over-expression of paternal alleles (Figure 3.2), giving great confidence that this effect is real. We also found the same pattern for C x S colonies and all eight colonies (see Supplementary Information results section for C x S colonies (Figure S3.4) and all eight colonies (Figure S3.5)).

100

Figure 3.2 Relationship between the number of parentally-biased genes identified and the level of biological replication. Each point represents the number of genes with parent-specific effects for a different combination of replicate comparisons using S x C colonies. Red dots denote paternally-biased genes in S x C colonies and blue dots denote maternally-biased genes in S x C colonies.

101

No Evidence That DNA Methylation Regulates Parent-Specific Allele Expression

We used published embryo methylomes (Remnant et al., 2016) to ascertain whether genes showing parent-of-origin expression bias were consistently methylated and were more likely to be methylated than the genome-wide average.

The number of genes for which we could determine the presence or absence of methylation was 12,843. Of these, 4,019 (31%) are methylated in embryos. Genes that showed parent-of-origin expression bias in one cross direction showed similar frequency of methylation to the genome-wide frequency: nine of 21 (42%) paternally-biased genes in S x C colonies (Fisher’s exact test, p = 0.274) and zero out of four in C x S colonies. None of the maternally-biased genes in C x S and S x C colonies are methylated. Of the six genes that were paternally-biased regardless of cross direction, only one (16%) is methylated, which indicates again that DNA methylation is unlikely to play an essential role in biasing gene expression in a parent-of-origin manner (Fisher’s exact test, p = 0.851).

Discussion

In keeping with the predictions of the Kinship Theory of Genomic Imprinting

(Haig, 1992; Queller, 2003), the offspring of Capensis fathers exhibit paternal biases in gene expression much more strongly than offspring of Scutellata fathers (Figure

3.1). Biased expression was not seen when the same gene was transmitted by a

102

Capensis queen. This is the first evidence of parent-specific gene expression that is unique to a particular social insect subspecies, and provides the most compelling evidence for paternal manipulation of allele expression in social insects to date. We argue that this is because Capensis fathers, with their capacity to sire workers that reproduce thelytokously, experience stronger selection to influence the reproductive success of their offspring than males of a normal arrhenotokous subspecies such as

Scutellata (Figure S3.1). A laying worker of Capensis can potentially lay hundreds of clonal eggs that can develop as females, both workers (that will themselves be reproductive) and queens. In contrast, an arrhenotokous Scutellata worker can only produce males that have poor mating prospects (Berg, Koeniger, Koeniger, & Fuchs,

1997).

In addition, there was a consistent paternal bias in gene expression for six genes irrespective of the subspecies of the father. It may be that these genes have been selected for paternal manipulation in all honey bee populations, not just Capensis.

Thus, although we aimed to use Capensis as a model system to demonstrate parent- of-origin effects, we confirm that parent-specific effects are a widespread phenomenon in honey bees (Galbraith et al., 2016; Kocher et al., 2015), and that some genes are consistently modified by honey bee males (Table S3.1, Figure 3.1). It is also important to recognize that due to the similarity of the Scutellata and Capensis genomes, we were only able to identify SNPs for 17% of genes. Therefore, the paternally-biased genes that we have identified are likely to be a fraction of the overall number of genes with a paternal bias in expression in honey bees.

103

Consistent with the KTGI (Drewell et al., 2012), genes that showed biased expression of the paternal allele are associated with female reproduction (Table 3.1, Table S3.1).

Prostaglandins are associated with female reproduction and oviposition behaviour

(Stanley & Kim, 2011). We show here that Prostaglandin E2 receptor EP3 subtype is upregulated when transmitted from a male, suggesting that the paternal allele is increasing the sensitivity of the embryo to prostaglandin. Dopamine N- acetyltransferase is linked to the production of melatonin (Cheng, Liao, & Lyu,

2012). In insects, melatonin initiates physiological processes such as reproduction and diapause (Cheng et al., 2012). The Cuticlin-1 protein has a Zona pellucida (ZP) domain which is associated with sperm receptors (Jovine, Qi, Williams, Litscher, &

Wassarman, 2002). The other genes showing paternally-biased gene expression

(Table 3.1, Table S3.1) may also be indirectly involved in reproductive functions, but is unclear how their over-expression in embryos might function to increase worker reproduction.

Cyto-nuclear interactions cannot explain our results. Only six non- synonymous SNPs differentiated the mitochondrial genomes of the parents of our crosses. It is unlikely that these differences produce cyto-nuclear interactions that influence parent-specific gene expression in honey bees. Further, because mitochondrial DNA is maternally inherited, cytonuclear interactions are likely to produce parent-specific effects in one cross direction only. We observed a preponderance of paternally-biased gene expression differences, regardless of the direction of the cross.

104

DNA methylation is often suggested as a prime candidate for parent-specific modifications in the honey bee (Drewell et al., 2012). However, we found no evidence for a relationship between DNA methylation and parent-specific allele expression. Indeed, five of the six genes with parental bias in gene expression are not methylated (Remnant et al., 2016). Further, only nine of the 21 genes which exhibited parent-specific effects in Capensis males were methylated. Galbraith et al.

(Galbraith, Yi, & Grozinger, 2016) also found that there is minimal overlap between parentally-biased genes and known methylated regions within honey bees.

The low level of methylation in paternally-biased genes and surrounding regions and the lack of cyto-nuclear interactions suggests that it is unlikely that these processes are involved with parent-specific effects in male honey bees. This begs the question as to how males are able to influence gene expression in their offspring embryos. Long non-coding RNAs (lncRNAs) are often associated with parental manipulation (see Monk, Mackay, Eggermann, Maher, & Riccio, 2019) for a recent review) and could be responsible for maintaining a paternal bias in the honey bee.

Alternatively, we note that small RNAs (sRNAs) are an abundant component of the seminal fluid exosome across a broad range of taxa (Chen, Yan, & Duan, 2016). Small

RNAs are likely responsible for transgenerational epigenetic inheritance in

Caenorhabditis elegans (Rechavi & Lev, 2017; Woodhouse et al., 2018), and for maternal control of offspring sex in jewel wasps (Verhulst, Beukeboom, & van de

Zande, 2010) a wasp of the same taxonomic order as the honey bee). We therefore suggest that both lncRNAs and small RNAs provide a plausible mechanism

105

underlying parental manipulation of gene expression in honey bees, though this is clearly an open question.

Parent-specific allele expression patterns are heterogenous across two published honey bee studies (Galbraith et al., 2016; Kocher et al., 2015) and this one. This is unsurprising given the three studies assessed allele expression in different tissues, subspecies and life stages. Therefore, the differences are likely due to tissue-specific or age-specific allele-expression effects. Further, the colony conditions were different between each experiment. The presence and absence of the queen from a colony affects gene expression in workers (Holman, Helanterä, Trontti, & Mikheyev, 2019).

Kocher et al. (2015) reared offspring produced from each reciprocal cross in queenright colonies with brood, conditions where workers rarely reproduce

(Winston, 1987). In contrast, Galbraith et al. (2016) reared offspring from each reciprocal cross in colonies without a queen or brood, conditions that trigger workers to lay eggs. These disparities could potentially explain some of the variation in parent-specific allele expression patterns between these studies. Alternatively, these differences may reflect the different patterns of parent-specific allele expression between different honey bee subspecies. In this experiment we used two

African subspecies (Capensis and Scutellata). In contrast, Kocher et al. (2015) and

Galbraith et al. (2016) utilised European (mostly A. m. ligustica) and Africanized bees, a hybrid strain of mostly Scutellata origin that has emerged in the New World

(Wallberg et al., 2014; Whitfield et al., 2006).

106

Our stringent approach required that genes showed differential expression in the same direction in all colonies in a sample. This approach likely excluded some genes that typically do show parent-of-origin expression differences, but did not do so in particular colonies. Thus, with greater replication, genes showing evidence of parent-specific effects become more rare. Nonetheless, across all levels of biological replication and each analysis (C x S colonies and S x C colonies separately and all eight colonies together), our data show a consistent pattern of bias towards the expression of paternal alleles over maternal alleles (Figure 3.2, Figure S3.4, Figure

S3.5).

We identified six genes that showed parent-specific expression differences regardless of cross direction; fewer than the 273 (Galbraith et al., 2016) and 46

(Kocher et al., 2015) previously reported for honey bees. However, we found similarly-high numbers of parentally-biased genes to previous studies when we assessed only two S x C and C x S colonies or three S x C and C x S colonies via sub- sampling. We propose that some of the differences between studies are due in part to the different levels of replication.

Conclusion

Crosses involving Capensis fathers are more likely to reveal parent-of-origin effects than crosses between other sub-species because of the unique ability of

Capensis workers to produce diploid offspring (thelytoky), thereby providing strong

107

selective pressures for males to modify genes related to female reproduction. Our whole-genome and transcriptome approach provides evidence that supports this prediction. This clear-cut example, suggests that parent-specific effects are likely to be present in many other social insect species and possibly across a broader range of insects.

108

Acknowledgements

This work was funded by the Australian Research Council projects DP1501101985 to

BP Oldroyd and A Ashe and DP180101696 to BP Oldroyd and A Zayed. We thank

Theresa Wossler and Chris Fransman for support with field work.

109

Supplementary Information

Supplementary Methods

Subsampling biological replication

Studies on the effects of parent-of-origin on gene expression can be prone to false positives arising from individual-specific gene expression that is unrelated to parent-of-origin (Forstmeier et al., 2017; Libbrecht et al., 2016). To control for individual-specific false positives we required that a gene showed differential expression in the same direction when we assessed C x S and S x C colonies separately and when we assessed all eight colonies together. To explore the effect of biological replication on the number of genes found to be differentially expressed we generated new data sets by removing one C x S and one S x C colony (resulting in three S x C and C x S colonies) or two C x S and two S x C colonies. We then re- evaluated the number of genes that showed parent-of-origin gene expression differences.

When we assessed C x S and S x C colonies separately there are four different ways in which we can shuffle the colonies to determine the mean number of parentally-biased genes when we assess only three replicates in each direction. We further replicated this analysis by removing two replicates from each direction.

There are six different ways we can randomly shuffle the colonies to determine the mean number of parentally-biased genes when we assess only two replicates in each direction.

110

When we assessed all eight colonies together there are sixteen different ways in which we can randomly shuffle the colonies to determine the mean number of parentally-biased genes when we assess only three replicates in each direction

(Matrix S3.1). We further replicated this analysis by removing two replicates from each direction. There are thirty-six different ways we can randomly shuffle the colonies to determine the mean number of parentally-biased genes when we assess only two replicates in each direction (Matrix S3.2).

Matrix S3.1 Identifying the number of parentally-biased genes assessing only three biological replicates. There are sixteen different ways in which we can randomly shuffle the eight colonies to determine the number of parentally-biased genes when we assess only three replicates in each direction. Here, Ci denotes the colonies compared.

Matrix S3.2 Identifying the number of parentally-biased genes across each possible combination when assessing only two biological replicates. There are thirty-six different ways in which we can shuffle the eight colonies to determine the number of parentally-biased genes when we assess only two replicates in each direction. Here,

Ci denotes the colonies we compared.

111

112

Supplemental Results

Biological replication

For the transcriptome analyses we used a stringent approach that required a gene to show the same parentally-biased expression pattern towards the same sex in all replicate colonies. For the four genes that showed paternally-biased expression in C x S colonies, we explored the effects of biological replication on the number of false positives and false negatives on our conclusions by determining the number of differentially expressed genes when we considered different subsets of colonies. The average number of parentally-biased genes was five genes (SD ± 5.14) when we considered just two C x S and two S x C colonies, and six genes (SD ± 5.82) when we considered three C x S and three S x C colonies (Figure S3.4). Thus, the number of parentally-biased genes decreased as the number of colonies assessed increased

(Figure S3.4). Further, the variance in the number of paternally and maternally- biased genes between colonies also increased as the number of colonies considered decreased (Figure S3.4). The number of parentally-biased genes ranged from zero to

15 across all possible combinations of three S x C colonies and from 0 to 14 across all possible combinations of two colonies. Importantly, although we found vastly more paternally-biased than maternally-biased genes when we assessed only two (t4 =

2.21, P = 0.0045, Figure S3.4) and three C x S and three S x C colonies (t3 = 2.91, P =

0.023, Figure S3.4) This data set also shows a similar pattern of over-expression of

113

paternal alleles rather than maternal alleles (Figure S3.4), giving further confidence that this effect is real.

For the six genes that showed paternally-biased expression in all eight colonies, the average number of parentally-biased genes was 28 (SD ± 17.70) when we considered just two S x C colonies, and 14 genes (SD ± 22.34) when we considered three S x C colonies (Figure S3.5). Thus, the number of parentally-biased genes decreased as the number of colonies assessed increased (Figure S3.5). The variance in the number of paternally and maternally-biased genes between colonies decreased as the number of colonies considered increased (Figure S3.5). The number of parentally-biased genes ranged from zero to 33 across all possible combinations of three S x C colonies and from zero to 115 across all possible combinations of two colonies. Importantly, we found vastly more paternally-biased than maternally-biased genes when we assessed only two (t8 = 3.36, P < 0.0065, Figure S3.5) and three C x S and three S x C colonies (t3 = 3.98, P < 0.0044, Figure S3.5), giving further confidence this paternal effect is real.

114

Supplementary Tables

Table S3.1 Summary of the genes which exhibit a paternal bias in allele expression in all crosses.

Gene ID Scaffold SNP Gene Name Length Number Number Position of of exons of SNPs gene in gene within (bp) gene GB40701 12.13 2174392, Cuticlin-1 28,293 9 2 2184524 GB41915 8.9 1473275 Dopamine N- 1,088 2 1 acetyltransferase

GB47947 7.19 296232 Titin 14,806 8 1

GB49727 1.30 139220 Prostaglandin 11,926 6 1 E2 receptor EP3 subtype (LOC724237) GB50970 10.26 1131609 3-ketoacyl-CoA 2,799 8 1 thiolase, mitochondrial GB54806 8.12 76662 Facilitated 12,348 5 1 trehalose transporter Tret1-like (LOC551553)

115

Supplementary Figures

A Arrhenotokous colony (Scutellata and majority of honey bee subspecies)

X & &...

2

1

B Thelytokous colony (unique to Capensis)

X & &...

2

1

Figure S3.1 Relatedness between individuals within an arrhenotokous (A) and thelytokous (B) honey bee colony and the potential reproductive conflict between queens and drones.

Queens (full circle and crown) mate with multiple males (semicircle, pink and purple). Colonies are comprised of subfamilies of workers each of which share the same father. This generates the potential for conflict between males to increase the

116

reproductive success of their female offspring. A. Normal arrhenotokous subspecies

(Scutellata). A father that is able to influence the expression of genes in offspring so that his daughters are more likely to develop as a reproductive worker (1)) or queen

(2), has a greater probability of reproductive success than another male that fails to do so. B. Thelytokous subspecies (Capensis). The evolutionary consequences of thelytoky are profound because Capensis workers have the potential to be genetically reincarnated as a queen (1). The unique ability of Capensis workers to produce both queens and workers without mating enhances the evolutionary incentive of Capensis drones to modify genes that increase the reproductive success of their female offspring. Capensis fathers have the potential to ‘father’ queens if their worker offspring produce the next queen thelytokously (see Capensis section of

Introduction for further details). Capensis drones also maintain the same evolutionary incentive as their arrhenotokous counterparts to have their immediate offspring develop into the next queen (2).

117

Figure S3.2 Illustration of the reciprocal crosses and experimental design.

We generated four replicate reciprocal crosses (four colonies with Capensis fathers = replicate A, B, C, D and four colonies with Scutellata fathers = E, F, G, H). The different colors and shapes distinguish the queen (blue and diploid) and the drone

(pink and haploid). The letters (G and A) represent a polymorphic region between the two parental genomes. We can determine the genetic material transmitted from each parent to their offspring using this single region. In this example, F1 workers from Capensis and Scutellata fathers will be genotypically identical (G / A), but have inherited the alleles from parents of different sex. By combining this data with the gene expression data, we can determine regions of the Capensis and Scutellata genome which exhibit parent-of-origin and lineage-of-origin patterns in allele expression.

118

S x C colonies C x S colonies

G G x A A A x G

t

t

n

n

u

u

o

o

C

C

F d

d 1

a

a

e

e

R R G A G A G A G A G A G A Paternal Maternal No bias Paternal Maternal No bias

bias bias bias bias

t

t

n

n

u

u

o

o

C C F

1

d

d

a

a

e

e

R R G A G A G A G A Capensis pa- Scutellata Scutellata pa- Capensis ma- ternal bias maternal bias ternal bias ternal bias

Figure S3.3 Hypothetical allelic expression patterns within the F1 generation at an informative polymorphic site (G / A). Hypothetical allelic expression patterns within the F1 generation at an informative polymorphic site (G / A). At each site there could be a paternal bias, maternal bias, no bias, Capensis paternal or maternal bias and a

Scutellata paternal or maternal bias.

119

Figure S3.4 Relationship between the number of parentally-biased genes identified and the level of biological replication in C x S colonies. Relationship between the number of parentally-biased genes identified and the level of biological replication.

Each point represents the number of genes with parent-specific effects for a different combination of replicate comparisons using C x S colonies. Red dots denote

120

paternally-biased genes in C x S colonies and blue dots denote maternally-biased genes in C x S colonies.

121

Figure S3.5 Relationship between the number of parentally-biased genes identified and the level of biological replication in all eight colonies.

Relationship between the number of parentally-biased genes identified and the level of biological replication. Each point represents the number of genes with parent- specific effects for a different combination of replicate comparisons. Blue dots denote maternally-biased genes and red dots denote paternally-biased genes.

122

Chapter Four

Social Parasitism in the Honeybee (Apis mellifera) Is Not Controlled by a Single SNP

123

Abstract

The Cape bee (Apis mellifera capensis) is a subspecies of the honeybee, in which workers commonly lay diploid unfertilized eggs via a process known as thelytoky.

A recent study aimed to map the genetic basis of this trait in the progeny of a single capensis queen where workers laid either diploid (thelytokous) or haploid

(arrhenotokous) eggs. A nonsynonymous single nucleotide polymorphism (SNP) in a gene of unknown function was reported to be strongly associated with thelytoky in this colony. Here, we analyze genome sequences from a global sample of A. mellifera and identify populations where the proposed thelytoky allele at this SNP is common but thelytoky is absent. We also analyze genome sequences of three capensis queens produced by thelytoky and find that, contrary to predictions, they do not carry the proposed thelytoky allele. The proposed SNP is therefore neither sufficient nor required to produce thelytoky in A. mellifera.

Key words: thelytoky, social insects, genetic mapping, genome variation.

124

Main Text

In colonies of the honeybee, Apis mellifera, workers are female but do not usually lay eggs. The queen dominates reproduction and inhibits the activation of the worker’s ovaries using pheromones (Butler, 1957; Butler et al., 1962). However, when colonies are queenless, workers typically lay unfertilized haploid eggs, which develop into males due to haplodiploidy (Winston, 1987). The Cape bee, A. m. capensis, which is found in southern South Africa, is an exception to this rule. In this subspecies, workers lay diploid eggs via an abnormal form of meiosis called thelytoky, which then develop into females (Onions, 1912; Anderson, 1963; Ruttner,

1988; Neumann & Moritz, 2002; Goudie & Oldroyd, 2014). In addition, capensis workers exhibit a set of traits that can turn them into social parasites, whereby they invade foreign colonies and reproduce. These traits include thelytokous reproduction, larger and more readily activated ovaries, and reproductive dominance (Martin et al., 2002).

The genetic basis of social parasitism in capensis has been investigated by focusing specifically on the trait of thelytokous worker reproduction. This has been done by performing crosses between capensis and another sub-species where thelytoky is absent. The segregation of the trait in progeny can then be analyzed by determining whether worker bees lay diploid (thelytokous) or haploid (arrhenotokous) eggs. One set of studies identified patterns of segregation consistent with a single recessive locus in a test cross between capensis and A. m. carnica, and mapped the trait to a

125

locus on chromosome 13 (Lattorff et al., 2007; Jarosch et al., 2011). However, a subsequent study identified the proposed variant at high frequency in populations of A. mellifera without thelytoky (Chapman et al., 2015). The genetic basis of thelytoky was therefore unclear from these studies.

Another study aimed to identify candidate loci for capensis-specific traits, which include thelytoky and other traits associated with social parasitism, by identifying genes with evidence for the action of natural selection in capensis (Wallberg et al.,

2016). This study searched for genomic regions with elevated genetic divergence in capensis compared with populations of other subspecies of A. mellifera, and high homozygosity in capensis. It utilized a 30 whole-genome sequences from capensis workers, whose mode of parthenogenesis was not assessed, and two other African subspecies (A. m. scutellata and A. m. adansonii). Despite very low genome-wide divergence between capensis and the other subspecies (FST = 0.051–0.056), this study identified 12 genomic regions of extreme FST (FST > 0.8), indicating near fixation of otherwise rare variants in capensis, which were also associated with signals of selection in capensis. It is likely that these signals underlie genetic variants involved in capensis-specific traits. However, connections between these loci and phenotype could not be determined from this study. The strongest signal of selection was identified in the Ethr gene, where several single nucleotide polymorphisms (SNPs) were fixed in capensis compared with other subspecies and there was a strong signal of a selective sweep.

126

More recently, Aumer et al. (2019) analyzed segregation of the mode of parthenogenesis in the progeny of a single naturally mated capensis queen that had mated with at least 25 males in the wild. In this colony, it was found that about half of the workers reproduced thelytokously and the others arrhenotokously, a finding potentially consistent with a single polymorphic locus governing this trait (Aumer et al., 2017, 2019). By sequencing the genomes of 21 thelytokous and 21 arrhenotokous workers from this colony, the authors identified a genomic region in genetic linkage with Ethr in which high genetic divergence was observed between the groups with different modes of parthenogenesis. This region, designated Th, overlaps the 3’- untranslated region (UTR) and coding sequence of an uncharacterized gene,

LOC409096, and was homozygous in arrhenotokous workers, but heterozygous in thelytokous workers. One haplotype present in the thelytokous workers was identical to the arrhenotokous haplotype, whereas the other was unique to thelytokous workers. Arrhenotokous capensis workers therefore had genotype ThAr /

ThAr and thelytokous workers were ThTh / ThAr. In addition, a high level of homozygosity consistent with a selective sweep was observed in a linked neighbouring region encompassing the Ethr gene, consistent with the findings of

(Wallberg et al., 2016). Aumer et al. (2019) report that only a single SNP was consistently associated with the mode of parthenogenesis: a nonsynonymous SNP in

Th (509,225 on group 1.23), which we refer to here as Th-SNP. They conclude that thelytoky in A. mellifera is controlled by this SNP.

127

A standard QTL (quantitative trait locus) mapping approach involves using a genetic cross to generate a segregating mapping population from which one can determine the potential contribution of a large number of genetic variants to variation in phenotype (Lynch & Walsh, 1998). The degree to which a genetic association identified in a single colony from a wild population as in Aumer et al.

(2019) can be extrapolated to the rest of the species is unclear. We therefore aimed to test whether the association between Th and mode of parthenogenesis is restricted to a particular genetic background. We first analyzed the genome sequences of 180 unrelated worker bees from a large number of populations of A. mellifera sampled globally (table 1; supplementary tables S1 and S2, Supplementary Material online).

In total, the Th-SNPTh allele was present in 43 samples and reaches frequencies > 50% in populations of A. m. scutellata and A. m. monticola from Kenya. In these populations, the Th-SNPTh allele is also mainly found linked to the same SNPs in the

ThTh haplotype on the 3’- UTR of Th as found in the capensis colony studied by

Aumer et al. (2019). Thelytoky and social parasitism are absent from these Kenyan populations (Ruttner, 1988), which indicates that the presence of Th-SNPTh (or its associated ThTh haplotype) is not sufficient to generate these traits. We also find the

Th-SNPTh allele in one A. m. scutellata worker from South Africa, as was also reported by Aumer et al. (2019), and two Africanized bees from Brazil. These populations are also not thelytokous. The Th-SNPTh allele occurs on a different Th

128

haplotype in the Africanized bees. One haplotype appears to be fixed in capensis at

Ethr, which we do not observe in any other population.

We next examined the genome sequences of three capensis queens that were derived from thelytokously laid eggs (supplementary tables S1 and S2 online,

Supplementary Material online). These queens were produced by first removing the queen from genetically pure capensis colonies from Stellenbosch, South Africa.

Workers in these colonies then produced new queens by laying thelytokously. These new queens therefore only contain genetic material from the thelytokous workers from which they are derived. In all cases, these queens were homozygous for the Th-

SNPAr allele and its associated haplotype ThAr, and the capensis haplotype at Ethr.

They therefore possessed identical alleles to the arrhenotokous workers in (Aumer et al., 2019). These results demonstrate that ThTh is not required to generate thelytoky.

Seven out of ten randomly sampled capensis workers also have the same homozygous genotypes at these loci (Wallberg et al., 2016), indicating that these genotypes are commonly found in capensis.

Based on the absence of ThTh / ThTh from their data set, Aumer et al. (2019) suggest that the ThTh allele is lethal or strongly deleterious in homozygous form. However, we observed 12 ThTh/ThTh homozygotes in Kenya, and 1 in capensis, indicating that this genotype is not lethal in workers. We do not find any deviations from Hardy–

Weinberg equilibrium at this locus in any of the populations where it is present (chi-

129

square test, all P > 0.2; supplementary table S3, Supplementary Material online). Our data therefore do not support the hypothesis that this genotype is deleterious. Based on the frequencies of the Th-SNPTh allele observed in capensis by Aumer et al. (2019)

(5.3%) and (Wallberg et al., 2016) (20%), thelytoky should be relatively rare.

However, the frequency at which workers lay eggs thelytokously in capensis has been estimated as > 99% (Hepburn & Radloff, 2002; Goudie et al., 2015), which suggests that this allele cannot be strongly associated with thelytoky in the capensis population generally.

Taken together, these observations indicate that presence of the Th-SNPTh allele is neither necessary nor sufficient to generate thelytokous reproduction in A. mellifera.

The haplotype linked to this allele at Th is also not strongly associated with mode of parthenogenesis. It is therefore likely that the action of the Th-SNPTh allele is strongly dependent on genetic background. Although this SNP was reported to be strongly associated with mode of parthenogenesis in the colony of Aumer et al.

(2019), this association was not found in the other honeybee workers examined here, which have a different set of genotypes at other loci across the genome. (Wallberg et al., 2016) identified SNPs in at least 12 candidate loci across the genome with FST >

0.8, and capensis tend to be homozygous for haplotypes at these loci that are rare or absent in other subspecies. These loci could potentially have important effects on the mode of parthenogenesis, or on other capensis traits, but are unlikely to be variable in a pure capensis colony such as the one studied by Aumer et al. (2019). Hence, that

130

study could not test the effect of variation at those loci. One such locus is the Ethr gene, which is fixed for a haplotype in capensis that we do not observe in other subspecies. It is possible that the capensis haplotype at this locus is required for thelytoky, as no thelytokous workers have been observed without this haplotype so far. However, it is also plausible that a number of other loci are involved. It is therefore clear that although variation at Th appears to be associated with thelytoky in the colony studied by Aumer et al. (2019), the genetic control of thelytoky in A. mellifera is more complex.

Materials and Methods

Sequencing and Variant Calling of Queens Produced by Thelytoky

We collected three capensis queens produced by thelytoky from different colonies.

They were stored in propanol at -20 C. Genomic DNA was isolated from the abdomen of each queen for sequencing. We sequenced each sample in a single lane of Illumina HiSeq2000 in paired-end mode (2 x 150-bp reads). We mapped FASTQ to the A. mellifera genome assembly Amel_4.5 using the default parameters of BWA

0.7.12 (Li & Durbin, 2009). These alignments were then imported into SAMtools (Li et al., 2009) and converted into BAM format. Polymerase chain reaction duplicates were marked using Picard tools (http://picard.sourceforge.net, version 1.130).

Following this, we called variants using Freebayes v 1.2.0 (Garrison & Marth, 2012) for each sample individually, with the same settings as in Aumer et al. (2019) (-min- alternate-fraction 0.25, -min-alternate-total 2, min-coverage 2, ploidy = diploid). We

131

excluded SNPs using VCFTools 0.1.15 (Danecek et al., 2011) by the following criteria: mapping quality (MQ) < 40, quality (QUAL) < 30, genotype quality (GQ) < 20, and read depth (DP) < 10. We confirmed the genotypes of all SNPs within the mycC–Ethr region manually inspecting pileup files made by SAMtools. This was important to confirm that sites where the reference allele was inferred (i.e., where no variation is detected) were not false negatives.

Analysis of Genotypes at the Putative Thelytoky Locus

We assessed variation within the mycC–Ethr region in 180 A. mellifera worker samples from published data (Wallberg et al., 2014, 2017; Fuller et al., 2015; Nelson et al., 2017) and from the three newly sequenced queens (see table 1 and supplementary Table S3.1, Supplementary Material online, for details). We utilized

SNP genotype files in Beagle format from the published data and VCF files for the queens. We then extracted genotypes for each sample for all nonsynonymous SNPs in the coding sequences and in the 3’- and 5’- UTR of the Th, Ethr, and mycC loci as detailed in supplementary Table S3.1 of Aumer et al. (2019) (total of 134 SNPs) from the Beagle files using a custom perl script (available at https://github.com/MattChristmas/capensis-scripts) and from the A. m. capensis queen VCF files using the –positions flag in VCFTools 0.1.13 (Danecek et al., 2011).

Genotypes at the 134 loci were further confirmed independently of the SNP calling using mpileup on the bam files with samtools v1.9 (Li et al., 2009). Genotype

132

frequencies per population at the putative thelytoky-inducing SNP (Th-SNP, position 509,225 on Group1.23) were then calculated. Where two alleles were present at this locus in a population, deviation from Hardy–Weinberg equilibrium was assessed using a chi-square test.

133

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online.

Acknowledgments

This research was supported by the Swedish Research Council Formas (2013-722 to

M.T.W.), the Swedish Research Council Vetenskapsrådet (2014-5096 to M.T.W.), and the Australian Research Council (DP1801016996 to B.P.O. and Amro Zayed). We thank Andreas Wallberg for help with SNP data and Mike Allsopp for help with breeding the thelytokously produced queens.

134

Table 4.1 Genotype counts and allele frequencies of a single nucleotide polymorphism at position 509,225 on Group 1.23 of

Amel_4.5 (Th-SNP) for worldwide populations of Apis mellifera.

Genotype counts Allele frequencies Subspecies Location Number Homozygous Homozygous Heterozygou Referenc Alternate of bees reference alternate s e A. m. capensis South Africa* 10 7 1 2 0.80 0.20 A. m. scutellata South Africa* 10 9 0 1 0.95 0.05 A. m. scutellata Kenya∆ 25 5 5 15 0.50 0.50 A. m. monticola Kenya∆ 21 1 7 13 0.36 0.64 A. m. jemenitica Kenya∂ 3 3 0 0 1.00 0.00 A. m. litorea Kenya∂ 1 1 0 0 1.00 0.00 A. m. adansonii Nigeria* 10 10 0 0 1.00 0.00 A. m. iberiensis Spain* 9 9 0 0 1.00 0.00 A. m. mellifera Sweden/Norwa 19 19 0 0 1.00 0.00 y* A. m. carnica Germany* 10 10 0 0 1.00 0.00 A. m. ligustica Italy* 10 10 0 0 1.00 0.00 A. m. anatoliaca Turkey* 10 10 0 0 1.00 0.00 A. m. syriaca Jordan* 10 10 0 0 1.00 0.00 Africanised Brazilº 32 30 0 2 0.97 0.03 Samples from *(Wallberg et al., 2014); ∆(Wallberg et al., 2017); ∂(Fuller et al., 2015); º(Nelson et al., 2017).

135

Chapter Five

General Discussion

In this thesis I have explored the evolutionary and social consequences of the reproductive biology of the Cape Honey Bee, Apis mellifera capensis. Capensis is atypical among honey bees because unmated workers can produce female offspring asexually by thelytokous parthenogenesis (Onions, 1912; Anderson, 1963). I utilised the unique biology of Capensis and genomic data to examine fundamental questions in evolutionary biology. In this chapter, I reflect on the principal findings from my research and suggest some future research questions.

The evaluation of heterozygosity and its contribution to fitness in Capensis

In Chapter Two, I showed that heterozygosity is maintained at many hundreds of genes along large blocks of most chromosomes in the Capensis Clone (Apis mellifera capnesis). This finding was surprising since thelytokous parthenogenesis leads to a one third loss in heterozygosity per generation. I argued that heterozygosity is retained because many genes show heterozygote advantage, or, putting it another way, suffer a decline in fitness when homozygous. Some genes, like the sex locus are likely to be homozygous lethal, whereas for others the decline in fitness is likely to be marginal. In sum, Clones that have undergone recombination and become homozygous along a chromosome are less likely to contribute to the next generation than their less homozygous peers, and thus the population retains high levels of

136

heterozygosity over time. Variability in the regions that have retained heterozygosity among three sub‐lineages of the Clone suggested that selection against homozygosity is generally weak for most genes. I therefore suggest that many overdominant loci have only minimal fitness benefits when heterozygous.

However, the genomic regions that have retained heterozygosity across all sub‐ lineages most likely contain genes that are strongly overdominant and must be heterozygous for viable female development.

I also showed that an overdominant locus can only maintain heterozygosity across a small fragment of a chromosome. I developed a mathematical model that revealed that an overdominant gene can only retain heterozygosity at regions within 250kb of the gene after 100 generations. Therefore, the heterozygosity patterns I discovered are consistent with the presence of several genes with significant heterozygote advantage on each chromosome. In the absence of selection for heterozygosity, the

Clone should be homozygous after approximately 10 generations. Therefore, the genomic patterns uncovered illustrate the crucial importance of genome-wide heterozygosity in the honey bee. I suggest that this phenomena is also likely to be present in many outbreeding insect species.

Several questions remain unanswered about the Clone. For example, are all

Capensis workers capable of becoming the ancestor of a vast clonal population or is there something inherently unique about the lineage I studied? We know that at least three clonal lineages have emerged independently in Capensis (two historical and one contemporary) (Lundie, 1954; Johannsmeier, 1983; Kryger 2001). However,

137

we do not know whether the three lineages shared common genomic attributes.

Therefore, we do not know whether there are multiple genomic routes to becoming a successful Clone or a specific genomic architecture is required. To determine this, an experiment could be performed to recreate multiple clonal lineages. If the genome of each lineage is similar this would suggest that there are very few ways to become a successful Clone. In contrast, if there are large genomic differences between each of these lineages, this would indicate that there are multiple ways to become a successful Clone. In this study, numerous Capensis workers would be introduced into Scutellata colonies. At the time of introduction and after each generation

Capensis workers could be collected and then sequenced to determine whether there are multiple lineages present or whether the same lineages arise in different host

Scutellata colonies. The clonal lineages arising from this study could also be compared to the contemporary lineage to assess the differences between them.

Additionally, extensive field surveys of Scutellata colonies infected with the Clone could also be carried out to determine whether there are any novel lineages that have emerged independently from the lineage I studied. If there are new lineages, genomic comparisons between the lineages could be performed to assess the disparities between the two.

In the future, sequencing more Clones would help identify the genomic regions which must always be heterozygous for the Clone to survive and reproduce.

Aumer et al. (2019) sequenced four Clones to identify the genomic variation associated with thelytoky in Capensis. These sequences could be concatenated with

138

the sequencing I carried out to identify the regions that must always retain heterozygosity. Additionally, Aumer et al. (2019) collected the samples they sequenced almost a decade after the samples I used. Therefore, future studies could assess the levels of heterozygosity between the two collection periods and assess how the level of heterozygosity is eroding through time.

The relationship between heterozygosity and fitness in honey bees could also be assessed in arrhenotokous (non-thelytokous) populations. For example, Scutellata could be inbred from an outbred population for multiple generations. This study could reveal whether heterozygosity is maintained throughout the genome or whether few regions of the genome remained heterozygous after the inbreeding program. Given the results from the Clone, I predict that heterozygosity would be widespread in a Scutellata lineage after multiple generations of inbreeding. This finding would confirm that overdominant genes are crucial for the survival and reproduction of honey bees. This study could also determine whether the same genes that are heterozygous in the Clone are also heterozygous in Scutellata after the inbreeding program.

Future studies should also assess whether there are structural variants that are also responsible for the maintenance of heterozygosity and are unique to the

Clone. I used short-read sequencing in this study, therefore, I could only assess SNPs in the Clone. Long-read sequencing could determine whether structural variants such as inversions are also associated with the maintenance of heterozygosity in the

Clone. For example, inversions may block recombination events (Coyne et al. 1991)

139

and thereby maintain the same heterozygote genotype in the Clone’s offspring.

However, inversions are usually rare and short and cannot prevent loss of heterozygosity arising from any recombination event that takes place outside the inversion. Therefore, inversions are unlikely to explain most of the heterozygosity present in the Clone. In spite of this, they could still play a minor role in the maintenance of heterozygosity.

Genome-wide heterozygosity has also been assessed in planarians (Guo et al.

2016), wolves (Kardos et al. 2018) and Drosophila (Mackay et al. 2012). Guo et al.

(2016) sequenced the genome of an inbred laboratory population of the planarian,

Schmidtea mediterranea and found that a 1/3 of the genome retained heterozygosity after ten generations of selfing. In this population, the expected amount of heterozygosity remaining after ten generations of selfing is extremely low. Kardos et al. (2018) sequenced the genomes of 97 grey wolves (Canis lupus) from a semi- isolated and bottlenecked population in Scandanavia. Kardos et al. (2018) also discovered that heterozygosity has been maintained along most chromosomes in spite of frequent inbreeding. Collectively, these findings suggest that the retention of heterozygosity throughout the genome is important for the reproductive success and survival of both planarians and wolves. In contrast, Mackay et al. (2012) inbred

Drosophila melanogaster, from an outbred population for 20 generations. This study revealed that very few regions of the genome remained heterozygous. This indicates that natural selection for heterozygote advantage within this inbred line was unable to counter the effects of the inbreeding program, leading to the conclusion that

140

overdominance is of little consequence in Drosophila and that there are very few genes that must be heterozygous for survival, at least in the laboratory. However, this finding may not reflect the importance of heterozygosity more broadly. Firstly, laboratory populations enjoy a more stable environment than do wild populations.

Secondly, Drosophila populations are subject to frequent population bottlenecks.

Therefore, the effects of inbreeding are not catastrophic (Nei et al., 1975). Future studies should assess genome-wide heterozygosity in a broad range of taxa to determine whether it is crucial for the survival and reproductive success of most organisms.

Genomic conflict in honey bee societies

In Chapter Three, I performed reciprocal crosses between Capensis and

Scutellata to test the pervasiveness of inter-parent genomic conflict in honey bee workers. In doing so, I tested David Haig’s Kinship Theory of Genomic Imprinting in honey bees. I sequenced the genomes of each parent to identify parent-specific alleles, and the transcriptome a pool of embryos (eggs) from each cross to determine whether there is an expression bias towards one or other parent’s allele in early development, and whether the direction of the cross altered gene expression.

I demonstrated that gene expression in the worker is not always equal among the two alleles inherited from the parents and identified 28 genes where the expression pattern is influenced by the parental origin of the gene across multiple chromosomes. Crosses with Capensis fathers exhibited the greatest number of genes

141

with parent-specific expression patterns. I identified 21 genes that showed expression bias towards the Capensis father’s allele in colonies with a Capensis father. Importantly, I did not observe this expression pattern in offspring with a

Capensis mother. A high proportion of the genes altered by Capensis fathers are related to worker reproduction. I propose that the differences in the number of genes manipulated between Capensis and Scutellata is due to the unique ability of

Capensis fathers to sire workers that reproduce thelytokously. Capensis workers can lay hundreds of unfertilized, diploid eggs that can develop into queens and reproductive workers. Therefore, Capensis fathers experience greater selection to influence the reproductive success of their offspring relative to fathers from arrhenotokous populations such as Scutellata.

I identified several genes along multiple chromosomes that exhibit parent-specific effects. The expression pattern of these genes should also be assessed in other insect populations. Currently, the phylogenetic distribution of genes that exhibit parent- specific effects are poorly characterized in insects. I assessed whether the genes that I identified were also discovered in two previous reciprocal cross experiments assessing parent-specific effects in honey bees (Kocher et al., 2015; Galbraith et al.,

2016a). The regions I identified were not detected in the previous studies. However, each study assessed allele expression in a different tissue, subspecies and life stage.

The different expression patterns could be driven by tissue-specific or age-specific allele-expression effects. Future studies should aim to target the same tissue and life stage across studies.

142

The genes I identified do no not display parent-specific effects in the wasp

Nasonia vitripennis and N. giraulti (Wang et al., 2016). In fact, parent-specific effects appear to be absent in N. vitripennis and N. giraulti (Wang et al., 2016). This is unsurprising because Nasonia is neither polyandrous or social. Therefore, Nasonia fathers do not experience the same selection pressures to influence the reproductive success of their offspring as fathers from social and polyandrous populations.

Nonetheless, future work should assess parent-specific expression across a greater number of insects. These regions will be crucial for determining whether genes that exhibit parent-specific effects in honey bees are also present in other social insect genomes, and whether the expression patterns are conserved across the same genes in multiple lineages.

The mechanism that underpins parent-specific expression remains elusive in honey bees. I identified sparse methylation patterns in paternally-biased genes and surrounding regions. Therefore, it is unlikely that DNA methylation is involved with parent-specific effects in male honey bees. Galbraith, Yi, & Grozinger, (2016) also found that there is minimal overlap between parentally-biased genes and known methylated regions within honey bees. Indeed, it is becoming increasingly clear that methylation patterns in the honey bee are reflective of the underlying genomic sequence, and are likely to be non-functional in determining phenotype (Harris et al., 2019; Yagound et al., 2019), despite earlier indications to the contrary (Schaefer &

Lyko, 2007; Lyko et al., 2010).

143

The following question remains unanswered: How do males influence gene expression in their offspring? Long non-coding RNAs (lncRNAs) are one plausible mechanism. lncRNAs are often associated with parent-specific effects (Monk et al.,

2019) and could be associated with the paternal bias in gene expression I identified in the honey bee. Future studies should sequence, identify and characterize lncRNAs and their relationship with parent-specific effects in the honey bee (Sciamanna et al.,

2019). Further, how parent-specific effects interact with lncRNAs remains to be determined as well. Future studies linking parent-specific effects and its underlying mechanisms to phenotype will need to be carried out. Further, the relationship between parent-specific effects and reproductive fitness also needs to assessed.

Galbraith et al. (2016a) showed that parent-specific effects were present in workers with a greater number of ovarioles (workers with greater numbers of ovarioles have higher fecundity). However, it is unclear, at least in Capensis, that individuals with a greater number of ovarioles have increased fecundity. Indeed, ovary activation seems to be most common in workers with an intermediate number of ovarioles

(Roth et al., 2014).

Small RNAs (sRNAs) are another plausible mechanism underlying parent-specific effects. sRNAs are an abundant component of the seminal fluid exosome across multiple taxa (Chen et al., 2016). sRNAs are likely to be responsible for the transgenerational epigenetic inheritance that is present in Caenorhabditis elegans

(Rechavi & Lev, 2017; Woodhouse et al., 2018). Further, sRNAs are associated with maternal control of offspring sex in jewel wasps (Verhulst et al., 2010) (a wasp from

144

the same taxonomic order as the honey bee). I therefore suggest that both lncRNAs and small RNAs provide a plausible mechanism underlying parent-specific gene expression in honey bees. However, whether they do in fact alter these patterns remains to be determined in insects.

The relationship between parent-specific effects in embryos and the consequences for the reproductive fitness of adult workers remains poorly understood. I focussed on early embryos because I hypothesised that this life stage would be the most malleable to parental conflicts. During this period of development an embryo can become a fecund queen, a highly reproductive worker or a typical female worker, each with very different fitness benefits. Further, it remains poorly understood how parent-specific effects differ during developmental stages and tissues and how these changes affect fitness. Parent-specific expression appears to be dynamic in honey bees because each study assessing parent-specific effects has identified different regions (Kocher et al. 2015; Galbraith et al. 2016).

However, future studies should assess parent-specific effects across multiple tissues and developmental stages to determine how conserved or dynamic parent-specific expression is in honey bees.

Finally, these findings suggest that parental influence on gene expression is likely to be present in many other social insect species and possibly across a broader range of insects.

145

The genetic basis of thelytoky in Capensis

In Chapter four, we refuted the claim that the genetic locus that controls thelytoky in Capensis has been identified (Aumer et al., 2019). Thelytokous reproduction in Capensis has a genetic basis (Chapman et al., 2015) and is most likely to be controlled by a single gene (Lattorff et al., 2007; Aumer et al., 2019).

Aumer et al. (2019) identified a single nonsynonymous heterozygous SNP on chromosome I that they claim causes thelytokous reproduction in Capensis.

My contribution to this study was to determine whether this SNP does in fact cause thelytoky. I examined the genome sequences of three Capensis queens that were derived from thelytokously laid eggs. These queens, under the Aumer et al.

(2019) model should all have the thelytoky-inducing non-synonymous SNP they identified. I discovered that this SNP was absent in every queen. Additionally, my co-authors of this study analysed genomic sequences from multiple A. mellifera populations worldwide and identified that the alleged thelytoky-inducing SNP is widespread in multiple populations where thelytoky has never been described. We concluded that the proposed SNP by Aumer et al. (2019) is neither sufficient nor required to produce thelytoky in honey bees. I conclude that the region responsible is yet to be discovered.

Future studies could uncover the genomic region responsible for thelytoky by performing a backcross between Capensis and Scutellata (Figure 5.1). In this study,

Capensis queens would be derived from thelytokously laid eggs initially (these

146

queens would be homozygous for the thelytoky-inducing allele). Each thelytokously produced queen would be artificially inseminated (Harbo, 1986) with the sperm from one Scutellata drone (these queens would be heterozygous for the thelytoky- inducing allele). The F1 queens produced would then be inseminated by a Capensis drone (hemizygous for the thelytoky-inducing allele). Half of the workers produced from these F1 queens would produce arrhenotokous offspring and the other half would produce thelytokous offspring. The offspring produced from the thelytokous workers would then be collected. The F1 queens, the Capensis fathers and the offspring of thelytokous workers would be collected for sequencing to determine the allele frequency at the region underpinning thelytokous reproduction. The frequency of the thelytoky-inducing allele should be 100% in the thelytokous worker offspring. As a result, the genomic region associated with thelytoky in Capensis could be determined.

147

Figure 5.1 Backcross design to uncover the region responsible for thelytokous reproduction in Capensis.

148

Concluding remarks

In this thesis I have revealed the extraordinary costs and consequences of thelytokous reproduction in the Cape honey bee, Apis mellifera capensis. Social thelytokous species provide a superb system to assess models of overdominance, conflict and cooperation. The Capensis genome is ripe with conflict and many genes must retain heterozygosity for this organism to flourish. Capensis is and will continue to be an excellent organism to answer fundamental questions in social and evolutionary biology. I hope research continues on this awesome creature.

149

References

Allsopp, M.H. 1993. Summarized overview of the Capensis problem. South African

Bee Journal 65: 127–136.

Allsopp, M.H. 1992. The capensis calamity. South Journal 64: 52–55.

Allsopp, M.H. & Crewe, R.M. 1993. The cape honey bee as a Trojan horse rather than the hordes of Genghis Khan. American bee journal.

An, X.-K., Sun, L., Liu, H.-W., Liu, D.-F., Ding, Y.-X., Li, L.-M., et al. 2016.

Identification and expression analysis of an olfactory receptor gene family in green plant bug Apolygus lucorum (Meyer-Dür). Scientific Reports 6.

Anderson, R.H. 1968. The effect of queen loss on colonies of Apis mellifera capensis.

S. Afric. J. Agric. Sci 368–388.

Anderson, R.H. 1963. The laying worker in the Cape Honeybee, Apis mellifera capensis. Journal of Apicultural Research 2: 85–92.

Aumer, D., Allsopp, M.H., Lattorff, H.M.G., Moritz, R.F.A. & Jarosch-Perlow, A.

2017. Thelytoky in Cape honeybees (Apis mellifera capensis) is controlled by a single recessive locus. Apidologie 48: 401–410.

Aumer, D., Stolle, E., Allsopp, M., Mumoki, F., Pirk, C.W.W. & Moritz, R.F.A. 2019.

A single SNP turns a social honey bee (Apis mellifera) worker into a selfish parasite.

Mol Biol Evol 36: 516–526.

150

Barron, A., Oldroyd, B. & L. Ratnieks, F. 2001. Worker reproduction in honey-bees

(Apis) and the anarchic syndrome: A review. Behavioral Ecology and Sociobiology

50: 199–208.

Baudry, E., Kryger, P., Allsopp, M., Koeniger, N., Vautrin, D., Mougel, F., et al. 2004.

Whole-genome scan in thelytokous-laying workers of the Cape honeybee (Apis mellifera capensis): central fusion, reduced recombination rates and centromere mapping using half-tetrad analysis. Genetics 167: 243–252.

Beekman, M., Allsopp, M.H., Wossler, T.C. & Oldroyd, B.P. 2008. Factors affecting the dynamics of the honeybee (Apis mellifera) hybrid zone of South Africa. Heredity

100: 13–18.

Benjamini, Y. & Hochberg, Y. 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society.

Series B (Methodological) 57: 289–300.

Berg, S., Koeniger, N., Koeniger, G. & Fuchs, S. 1997. Body size and reproductive success of drones (Apis mellifera L). Apidologie 28: 449–460.

Beye, M., Gattermeier, I., Hasselmann, M., Gempe, T., Schioett, M., Baines, J.F., et al.

2006. Exceptionally high levels of recombination across the honey bee genome.

Genome research 16: 1339–1344.

Beye, M., Hasselmann, M., Fondrk, M.K., Page, R.E. & Omholt, S.W. 2003. The gene csd is the primary signal for sexual development in the honeybee and encodes an

SR-type protein. Cell 114: 419–429.

151

Boomsma, J.J. 2007. Kin Selection versus Sexual Selection: Why the Ends Do Not

Meet. Current Biology 17: R673–R683.

Boomsma, J.J. 2009. Lifetime monogamy and the evolution of eusociality. Philos

Trans R Soc Lond B Biol Sci 364: 3191–3207.

Brady, S.G., Sipes, S., Pearson, A. & Danforth, B.N. 2006. Recent and simultaneous origins of eusociality in halictid bees. Proceedings of the Royal Society B: Biological

Sciences 273: 1643–1649.

Brückner, D. 1979. Effects of inbreeding on worker honeybees. Bee World 60: 137–

140.

Burt, A. & Trivers, R. 2006. Genes in Conflict: The Biology of Selfish Genetic

Elements. Harvard University Press.

Butler, C.G. 1957. The control of ovary development in worker honeybees (Apis mellifera). Experientia 13: 256–257.

Butler, C.G., Callow, R.K. & Johnston, N.C. 1962. The isolation and synthesis of queen substance, 9-oxodec-trans-2-enoic acid, a honeybee pheromone. Proceedings of the Royal Society of London. Series B. Biological Sciences 155: 417–432.

Cale Jr, G.H. & Gowen, J.W. 1956. Heterosis in the honey bee (Apis mellifera L.).

Genetics 41: 292.

Chaline, N., Ratnieks, F.L.W. & Burke, T. 2002. Anarchy in the UK: Detailed genetic analysis of worker reproduction in a naturally occurring British anarchistic honeybee, Apis mellifera, colony using DNA microsatellites. Molecular Ecology 11:

1795–1803.

152

Chantha, S.-C., Gray-Mitsumune, M., Houde, J. & Matton, D.P. 2010. The MIDASIN and NOTCHLESS genes are essential for female gametophyte development in

Arabidopsis thaliana. Physiol Mol Biol Plants 16: 3–18.

Chapman, N.C., Beekman, M., Allsopp, M.H., Rinderer, T.E., Lim, J., Oxley, P.R., et al. 2015. Inheritance of thelytoky in the honey bee Apis mellifera capensis. Heredity

114: 584–592.

Chapman, N.C., Beekman, M. & Oldroyd, B.P. 2010. Worker reproductive parasitism and drift in the western honeybee Apis mellifera. Behavioral Ecology and

Sociobiology 64: 419–427.

Charlesworth, D. & Willis, J.H. 2009. The genetics of inbreeding depression. Nature

Reviews Genetics 10: 783–796.

Charnov, E.L. 1977. An elementary treatment of the genetical theory of kin-selection.

Journal of Theoretical Biology 66: 541–550.

Chen, Q., Yan, W. & Duan, E. 2016. Epigenetic inheritance of acquired traits through sperm RNAs and sperm RNA modifications. Nature Reviews Genetics 17: 733–743.

Cheng, K.-C., Liao, J.-N. & Lyu, P.-C. 2012. Crystal structure of the dopamine N- acetyltransferase–acetyl-CoA complex provides insights into the catalytic mechanism. Biochem J 446: 395–404.

Clarke, G.M., Oldroyd, B.P. & Hunt, P. 1992. The genetic basis of developmental stability in Apis mellifera: heterozygosity versus genic balance. Evolution 753–762.

153

Cole-Clark, M.P., Barton, D.A., Allsopp, M.H., Beekman, M., Gloag, R.S., Wossler,

T.C., et al. 2017. Cytogenetic basis of thelytoky in Apis mellifera capensis.

Apidologie 48: 623–634.

Coyne, J.A., Aulard, S. & Berry, A. 1991. Lack of underdominance in a naturally occurring pericentric inversion in Drosophila melanogaster and its implications for chromosome evolution. Genetics 129: 791–802.

Crespi, B.J. 1992. Eusociality in Australian gall thrips. Nature 359: 724.

Crozier, R.H. & Pamilo, P. 1996. Evolution of Social Insect Colonies. Oxford

University Press, Oxford, UK.

Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A., et al.

2011. The variant call format and VCFtools. Bioinformatics 27: 2156–2158.

Darwin, C. 1859. The origin of species by means of natural selection or the preservation of favoured races in the struggle for life. Murray, London.

Drewell, R.A., Bush, E.C., Remnant, E.J., Wong, G.T., Beeler, S.M., Stringham, J.L., et al. 2014. The dynamic DNA methylation cycle from egg to sperm in the honey bee

Apis mellifera. Development 141: 2702–2711.

Drewell, R.A., Lo, N., Oxley, P.R. & Oldroyd, B.P. 2012. Kin conflict in insect societies: a new epigenetic perspective. Trends in Ecology & Evolution 27: 367–373.

Elsik, C.G., Worley, K.C., Bennett, A.K., Beye, M., Camara, F., Childers, C.P., et al.

2014. Finding the missing honey bee genes: lessons learned from a genome upgrade.

BMC Genomics 15: 86.

154

Engelstädter, J. 2017. Asexual but Not Clonal: Evolutionary Processes in Automictic

Populations. Genetics 206: 993–1009.

Engelstädter, J. 2008. Constraints on the evolution of asexual reproduction.

BioEssays 30: 1138–1150.

Engelstädter, J., Sandrock, C. & Vorburger, C. 2011. Contagious parthenogenesis, automixis, and a sex determination meltdown. Evolution 65: 501–511.

Estoup, A., Solignac, M. & Cornuet, J.-M. 1994. Precise assessment of the number of patrilines and of genetic relatedness in honeybee colonies. Proceedings of the Royal

Society of London. Series B: Biological Sciences 258: 1–7.

Forstmeier, W., Wagenmakers, E.-J. & Parker, T.H. 2017. Detecting and avoiding likely false-positive findings – a practical guide. Biological Reviews 92: 1941–1968.

Fuller, Z.L., Niño, E.L., Patch, H.M., Bedoya-Reina, O.C., Baumgarten, T., Muli, E., et al. 2015. Genome-wide analysis of signatures of selection in populations of African honey bees (Apis mellifera) using new web-based tools. BMC Genomics 16: 518.

Galbraith, D.A., Kocher, S.D., Glenn, T., Albert, I., Hunt, G.J., Strassmann, J.E., et al.

2016a. Testing the kinship theory of intragenomic conflict in honey bees (Apis mellifera). Proceedings of the National Academy of Sciences 113: 1020–1025.

Galbraith, D.A., Yi, S.V. & Grozinger, C.M. 2016b. Evaluation of possible proximate mechanisms underlying the Kinship Theory of Intragenomic Conflict in social insects. Integr Comp Biol 56: 1206–1214.

Garrison, E. & Marth, G. 2012. Haplotype-based variant detection from short-read sequencing. arXiv:1207.3907 [q-bio].

155

Goudie, F., Allsopp, M.H., Beekman, M., Oxley, P.R., Lim, J. & Oldroyd, B.P. 2012.

Maintenance and loss of heterozygosity in a thelytokous lineage of honey bees (Apis mellifera capensis). Evolution 66: 1897–1906.

Goudie, F., Allsopp, M.H. & Oldroyd, B.P. 2014. Selection on overdominant genes maintains heterozygosity along multiple chromosomes in a clonal lineage of honey bee. Evolution 68: 125–136.

Goudie, F., Allsopp, M.H., Solignac, M., Beekman, M. & Oldroyd, B.P. 2015. The frequency of arrhenotoky in the normally thelytokous Apis mellifera capensis worker and the Clone reproductive parasite. Insect. Soc. 62: 325–333.

Goudie, F. & Oldroyd, B.P. 2014. Thelytoky in the honey bee. Apidologie 45: 306–

326.

Greeff, J.M. 1996. Effects of thelytokous worker reproduction on kin-selection and conflict in the Cape honeybee, Apis mellifera capensis. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, doi:

10.1098/rstb.1996.0060.

Guo, L., Zhang, S., Rubinstein, B., Ross, E. & Alvarado, A.S. 2016. Widespread maintenance of genome heterozygosity in Schmidtea mediterranea. Nature Ecology

& Evolution 1: 0019.

Haig, D. 2002. Genomic Imprinting and Kinship. Rutgers University Press.

Haig, D. 1992. Intragenomic conflict and the evolution of eusociality. Journal of

Theoretical Biology 156: 401–403.

156

Haig, D. & Westoby, M. 1989. Parent-specific gene expression and the triploid endosperm. The American Naturalist 134: 147–155.

Hamilton, W.D. 1963. The evolution of altruistic behavior. The American Naturalist

97: 354–356.

Hamilton, W.D. 1964a. The genetical evolution of social behaviour. I. Journal of

Theoretical Biology 7: 1–16.

Hamilton, W.D. 1964b. The genetical evolution of social behaviour. II. Journal of

Theoretical Biology 7: 17–52.

Harbo, J.R. 1986. Propagation and instrumental insemination. In: Bee Genetics and

Breeding (T. E. Rinderer, ed), pp. 361–389. Academic Press, Orlando.

Harris, K.D., Lloyd, J.P.B., Domb, K., Zilberman, D. & Zemach, A. 2019. DNA methylation is maintained with high fidelity in the honey bee germline and exhibits global non-functional fluctuations during somatic development. Epigenetics &

Chromatin 12.

Härtel, S., Neumann, P., Raassen, F.S., Moritz, R.F. & Hepburn, H.R. 2006. Social parasitism by Cape honeybee workers in colonies of their own subspecies (Apis mellifera capensis Esch.). Insectes sociaux 53: 183–193.

Hepburn, R. & Radloff, S.E. 2002. Apis mellifera capensis : an essay on the subspecific classification of honeybees. Apidologie 33: 105–127.

Holman, L., Helanterä, H., Trontti, K. & Mikheyev, A.S. 2019. Comparative transcriptomics of social insect queen pheromones. Nature Communications 10:

1593.

157

Huang, D.W., Sherman, B.T. & Lempicki, R.A. 2009. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols

4: 44–57.

Hughes, W.O.H., Oldroyd, B.P., Beekman, M. & Ratnieks, F.L.W. 2008. Ancestral monogamy shows kin selection is key to the evolution of eusociality. Science 320:

1213–1216.

Jarosch, A., Stolle, E., Crewe, R.M. & Moritz, R.F. 2011. Alternative splicing of a single transcription factor drives selfish reproductive behavior in honeybee workers

(Apis mellifera). Proc Natl Acad Sci U S A 108: 15282–15287.

Johannsmeier, M.F. 1983. Experiences with the Cape bee in the Transvaal. South

African Bee Journal 55: 130–138.

Jovine, L., Qi, H., Williams, Z., Litscher, E. & Wassarman, P.M. 2002. The ZP domain is a conserved module for polymerization of extracellular proteins. Nature Cell

Biology 4: 457–461.

Kardos, M., Åkesson, M., Fountain, T., Flagstad, Ø., Liberg, O., Olason, P., et al. 2018.

Genomic consequences of intensive inbreeding in an isolated wolf population.

Nature Ecology & Evolution 2: 124–131.

Kardos, M., Taylor, H.R., Ellegren, H., Luikart, G. & Allendorf, F.W. 2016. Genomics advances the study of inbreeding depression in the wild. Evolutionary Applications

9: 1205–1218.

Kerr, W.E., Martinho, M.R. & Goncalves, L.S. 1980. Kinship selection in bees. Revista

Brasileira de Genetica 3: 339–344.

158

Kocher, S.D., Tsuruda, J.M., Gibson, J.D., Emore, C.M., Arechavaleta-Velasco, M.E.,

Queller, D.C., et al. 2015. A search for parent-of-origin effects on honey bee gene expression. G3 5: 1657–1662.

Krawetz, S.A., Kruger, A., Lalancette, C., Tagett, R., Anton, E., Draghici, S., et al.

2011. A survey of small RNAs in human sperm. Hum Reprod 26: 3401–3412.

Kronauer, D.J.C. 2008. Genomic imprinting and kinship in the social Hymenoptera:

What are the predictions? Journal of Theoretical Biology 254: 737–740.

Kryger, P. 2001. The Capensis pseudo-clone, a social parasite of African honey bees.

In: Proceedings of the 2001 Berlin Meeting of the European Section of IUSSI.

Kuszewska, K., Miler, K., Rojek, W., Ostap‐Chęć, M. & Woyciechowski, M. 2018.

Rebel honeybee workers have a tendency to become intraspecific reproductive parasites. Ecology and Evolution 8: 11914–11920.

Lattorff, H.M.G., Moritz, R.F.A., Crewe, R.M. & Solignac, M. 2007. Control of reproductive dominance by the thelytoky gene in honeybees. Biology Letters 3: 292–

295.

Li, H. & Durbin, R. 2009. Fast and accurate short read alignment with Burrows-

Wheeler transform. Bioinformatics 25: 1754–1760.

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., et al. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25: 2078–2079.

Libbrecht, R., Oxley, P.R., Keller, L. & Kronauer, D.J.C. 2016. Robust DNA methylation in the Clonal Raider Ant brain. Current Biology 26: 391–395.

159

Liu, H., Zhang, X., Huang, J., Chen, J.-Q., Tian, D., Hurst, L.D., et al. 2015. Causes and consequences of crossing-over evidenced via a high-resolution recombinational landscape of the honey bee. Genome biology 16: 15.

Lundie, A.E. 1954. Laying worker bees produce worker bees. African Bee Journal 29:

10–11.

Lyko, F., Foret, S., Kucharski, R., Wolf, S., Falckenhayn, C. & Maleszka, R. 2010. The

Honey Bee Epigenomes: Differential Methylation of Brain DNA in Queens and

Workers. PLoS Biology 8: e1000506.

Lynch, M. & Walsh, B. 1998. Genetics and Analysis of Quantitative Traits. Sinauer.

Mackay, T.F., Richards, S., Stone, E.A., Barbadilla, A., Ayroles, J.F., Zhu, D., et al.

2012. The Drosophila melanogaster genetic reference panel. Nature 482: 173–178.

Martin, S.J., Beekman, M., Wossler, T.C. & Ratnieks, F.L.W. 2002. Parasitic Cape honeybee workers, Apis mellifera capensis, evade policing. Nature 415: 163.

Maynard Smith, J. 1964. Group Selection and Kin Selection. Nature 201: 1145–1147.

Maynard Smith, J. & Szathmary, E. 1995. The Major Transitions in Evolution. Oxford

University Press, Oxford.

McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., et al. 2010. The Genome Analysis Toolkit: a MapReduce framework for analyzing next- generation DNA sequencing data. Genome research 20: 1297–1303.

Michener, C.D. 1974. The Social Behavior of the Bees: A Comparative Study.

Harvard University Press.

160

Milleret, C., Wabakken, P., Liberg, O., Åkesson, M., Flagstad, Ø., Andreassen, H.P., et al. 2017. Let’s stay together? Intrinsic and extrinsic factors involved in pair bond dissolution in a recolonizing wolf population. Journal of Animal Ecology 86: 43–54.

Misof, B., Liu, S., Meusemann, K., Peters, R.S., Donath, A., Mayer, C., et al. 2014.

Phylogenomics resolves the timing and pattern of insect evolution. Science 346: 763–

767.

Monk, D., Mackay, D.J.G., Eggermann, T., Maher, E.R. & Riccio, A. 2019. Genomic imprinting disorders: lessons on how genome, epigenome and environment interact.

Nature Reviews Genetics 20: 235–248.

Montague, C.E. & Oldroyd, B.P. 1998. The evolution of worker sterility in honey bees: an investigation into a behavioral mutant causing failure of worker policing.

Evolution 52: 1408–1415.

Moore, T. & Haig, D. 1991. Genomic imprinting in mammalian development: a parental tug-of-war. Trends in Genetics 7: 45–49.

Moritz, R.F. & Haberl, M. 1994. Lack of meiotic recombination in thelytokous parthenogenesis of laying workers of Apis mellifera capensis (the Cape honeybee).

Heredity 73: 98–102.

Moritz, R.F.A., Kryger, P. & Allsopp, M.H. 1996. Competition for royalty in bees.

Nature 384: 31–31.

Nachman, M.W. 2002. Variation in recombination rate across the genome: evidence and implications. Current opinion in genetics & development 12: 657–663.

161

Nei, M., Maruyama, T. & Chakraborty, R. 1975. The bottleneck effect and genetic variability in populations. Evolution 1–10.

Nelson, R.M., Wallberg, A., Simões, Z.L.P., Lawson, D.J. & Webster, M.T. 2017.

Genomewide analysis of admixture and adaptation in the Africanized honeybee.

Molecular Ecology 26: 3603–3617.

Neumann, P. & Moritz, R. 2002. The Cape honeybee phenomenon: the sympatric evolution of a social parasite in real time? Behavioral Ecology and Sociobiology 52:

271–281.

Niu, D.-F., Pirk, C.W.W., Zheng, H.-Q., Ping, S., Shi, J.-H., Cao, L.-F., et al. 2016.

Reproductive traits and mandibular gland pheromone of anarchistic honey bee workers Apis mellifera occurring in China. Apidologie 47: 515–526.

Nonacs, P. & Kapheim, K.M. 2008. Social heterosis and the maintenance of genetic diversity at the genome level. Journal of Evolutionary Biology 21: 631–635.

Normark, B.B. 2003. The evolution of alternative genetic systems in insects. Annual

Review of Entomology 48: 397–423.

Oldroyd, B.P. 2002. The Cape honeybee: an example of a social cancer. Trends in

Ecology & Evolution 17: 249–251.

Oldroyd, B.P., Aamidor, S.E., Buchmann, G., Allsopp, M.H., Remnant, E.J., Kao, F.F., et al. 2018. Viable triploid Honey bees (Apis mellifera capensis) are reliably produced in the progeny of CO2 narcotised queens. G3: Genes, Genomes, Genetics 8:

3357–3366.

162

Oldroyd, B.P., Allsopp, M.H., Gloag, R.S., Lim, J., Jordan, L.A. & Beekman, M. 2008.

Thelytokous parthenogenesis in unmated queen honeybees (Apis mellifera capensis): central fusion and high recombination rates. Genetics 180: 359–366.

Oldroyd, B.P., Allsopp, M.H., Lim, J. & Beekman, M. 2011. A thelytokous lineage of socially parasitic honey bees has retained heterozygosity despite at least 10 years of inbreeding. Evolution 65: 860–868.

Oldroyd, B.P., Allsopp, M.H., Roth, K.M., Remnant, E.J., Drewell, R.A. & Beekman,

M. 2014. A parent-of-origin effect on honeybee worker ovary size. Proc Biol Sci 281.

Oldroyd, B.P. & Goodman, R.D. 1988. Inbreeding and heterosis in queen bees in relation to brood area and honey production. Crop and Pasture Science 39: 959–964.

Oldroyd, B.P., Moran, C. & Nicholas, F.W. 1985. Diallele crosses of honeybees. 1. A genetic analysis of honey production using a fixed effects model. Journal of

Apicultural Research 24: 243–249.

Oldroyd, B.P., Smolenski, A.J., Cornuet, J.-M. & Crozler, R.H. 1994. Anarchy in the beehive. Nature 371: 749.

Onions, G.W. 1912. South African fertile-worker bees. Agricultural Journal of the

Union of South Africa 3: 720.

Palmer, K. & Oldroyd, B. 2000. Evolution of multiple mating in the genus Apis.

Apidologie 31: 235–248.

Pearcy, M., Hardy, O. & Aron, S. 2006. Thelytokous parthenogenesis and its consequences on inbreeding in an ant. Heredity 96: 377–382.

163

Peters, R.S., Krogmann, L., Mayer, C., Donath, A., Gunkel, S., Meusemann, K., et al.

2017. Evolutionary History of the Hymenoptera. Current Biology 27: 1013–1018.

Pires, C.V., Freitas, F.C. de P., Cristino, A.S., Dearden, P.K. & Simões, Z.L.P. 2016.

Transcriptome Analysis of Honeybee (Apis mellifera) Haploid and Diploid Embryos

Reveals Early Zygotic Transcription during Cleavage. PLoS One 11.

Queller, D.C. 2003. Theory of genomic imprinting conflict in social insects. BMC

Evolutionary Biology 3: 15.

Quinlan, A.R. & Hall, I.M. 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26: 841–842.

R Core Team, R. 2013. R: A language and environment for statistical computing.

Rabeling, C. & Kronauer, D.J.C. 2013. Thelytokous parthenogenesis in eusocial

Hymenoptera. Annual Review of Entomology 58: 273–292.

Ratnieks, F.L.W. 1988. Reproductive Harmony via Mutual Policing by Workers in

Eusocial Hymenoptera. The American Naturalist 132: 217–236.

Ratnieks, F.L.W., Foster, K.R. & Wenseleers, T. 2006. Conflict resolution in insect societies. Annu. Rev. Entomol. 51: 581–608.

Ratnieks, F.L.W. & Visscher, P.K. 1989. Worker policing in the honeybee. Nature 342:

796.

Rechavi, O. & Lev, I. 2017. Principles of transgenerational small RNA inheritance in

Caenorhabditis elegans. Current Biology 27: R720–R730.

Rehan, S.M. & Toth, A.L. 2015. Climbing the social ladder: the molecular evolution of sociality. Trends in Ecology & Evolution 30: 426–433.

164

Remnant, E.J., Ashe, A., Young, P.E., Buchmann, G., Beekman, M., Allsopp, M.H., et al. 2016. Parent-of-origin effects on genome-wide DNA methylation in the Cape honey bee (Apis mellifera capensis) may be confounded by allele-specific methylation. BMC Genomics 17.

Roberts, W.C. 1961. Heterosis in the honey bee as shown by morphological characters in inbred and hybrid bees. Annals of the Entomological Society of

America 54: 878–882.

Roth, K.M., Beekman, M., Allsopp, M.H., Goudie, F., Wossler, T.C. & Oldroyd, B.P.

2014. Cheating workers with large activated ovaries avoid risky foraging. Behavioral

Ecology 25: 668–674.

Ruttner, F. 1988. Biogeography and taxonomy of honeybees. Springer-Verlag.

Schaefer, M. & Lyko, F. 2007. DNA methylation with a sting: An active DNA methylation system in the honeybee. BioEssays 29: 208–211.

Schlüns, H., Koeniger, G., Koeniger, N. & Moritz, R.F.A. 2004. Sperm utilization pattern in the honeybee (Apis mellifera). Behav Ecol Sociobiol 56: 458–463.

Sciamanna, I., Serafino, A., Shapiro, J.A. & Spadafora, C. 2019. The active role of spermatozoa in transgenerational inheritance. Proceedings of the Royal Society B:

Biological Sciences 286: 20191263.

Solignac, M., Mougel, F., Vautrin, D., Monnerot, M. & Cornuet, J.-M. 2007. A third- generation microsatellite-based linkage map of the honey bee, Apis mellifera, and its comparison with the sequence-based physical map. Genome biology 8: R66.

165

Solignac, M., Vautrin, D., Baudry, E., Mougel, F., Loiseau, A. & Cornuet, J.-M. 2004.

A microsatellite-based linkage map of the honeybee, Apis mellifera L. Genetics 167:

253–262.

Stanley, D. & Kim, Y. 2011. Prostaglandins and their receptors in insect biology.

Front Endocrinol (Lausanne) 2.

Stenberg, P. & Saura, A. 2009. Cytology of asexual animals. In: Lost sex, pp. 63–74.

Springer.

Suomalainen, E., Saura, A. & Lokki, J. 1987. Cytology and evolution in parthenogenesis. CRC Press.

Tarpy, D.R., Nielsen, R. & Nielsen, D.I. 2004. A scientific note on the revised estimates of effective paternity frequency in Apis. Insect. Soc. 51: 203–204.

Toth, A.L. & Rehan, S.M. 2017. Molecular evolution of insect sociality: an eco-evo- devo perspective. Annual Review of Entomology 62: 419–442.

Trapnell, C., Pachter, L. & Salzberg, S.L. 2009. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25: 1105–1111.

Trivers, R.L. & Hare, H. 1976. Haploidploidy and the evolution of the social insect.

Science 191: 249–263.

Verhulst, E., Beukeboom, L.W. & van de Zande, L. 2010. Maternal control of haplodiploid sex determination in Nasonia. Science 328: 620–623.

Verma, S. & Ruttner, F. 1983. Cytological analysis of the thelytokous parthenogenesis in the Cape honeybee (Apis mellifera capensis ESCHOLTZ).

Apidologie 14: 41–57.

166

Waiho, K., Fazhan, H., Zhang, Y., Li, S., Zhang, Y., Zheng, H., et al. 2019.

Comparative profiling of ovarian and testicular piRNAs in the mud crab Scylla paramamosain. Genomics, doi: 10.1016/j.ygeno.2019.02.012.

Wallberg, A., Glémin, S. & Webster, M.T. 2015. Extreme Recombination Frequencies

Shape Genome Variation and Evolution in the Honeybee, Apis mellifera. PLOS

Genetics 11: e1005189.

Wallberg, A., Han, F., Wellhagen, G., Dahle, B., Kawata, M., Haddad, N., et al. 2014.

A worldwide survey of genome sequence variation provides insight into the evolutionary history of the honeybee Apis mellifera. Nature Genetics 46: 1081–1088.

Wallberg, A., Pirk, C.W., Allsopp, M.H. & Webster, M.T. 2016. Identification of multiple loci associated with social parasitism in honeybees. PLOS Genetics 12: e1006097.

Wallberg, A., Schöning, C., Webster, M.T. & Hasselmann, M. 2017. Two extended haplotype blocks are associated with adaptation to high altitude habitats in East

African honey bees. PLOS Genetics 13: e1006792.

Wang, X. & Clark, A.G. 2014. Using next-generation RNA sequencing to identify imprinted genes. Heredity 113: 156.

Wang, X., Werren, J.H. & Clark, A.G. 2016. Allele-specific transcriptome and methylome analysis reveals stable inheritance and cis-regulation of DNA

Methylation in Nasonia. PLOS Biology 14: e1002500.

167

Watts, J.S., Morton, D.G., Kemphues, K.J. & Watts, J.L. 2018. The biotin-ligating protein BPL-1 is critical for lipid biosynthesis and polarization of the Caenorhabditis elegans embryo. J. Biol. Chem. 293: 610–622.

White, M.J.D. 1984. Animal Cytology And Evolution. Cambridge University Press.

Whitfield, C.W., Behura, S.K., Berlocher, S.H., Clark, A.G., Johnston, J.S., Sheppard,

W.S., et al. 2006. Thrice out of Africa: Ancient and recent expansions of the Honey bee, Apis mellifera. Science 314: 642–645.

Wilkins, J.F., Úbeda, F. & Cleve, J.V. 2016. The evolving landscape of imprinted genes in humans and mice: Conflict among alleles, genes, tissues, and kin. BioEssays

38: 482–489.

Wilson, E.O. 1975. Sociobiology: The New Synthesis. Belknap Press of Harvard U

Press, Oxford, England.

Wilson, E.O. 1971. The Insect Societies. Belknap Press of Harvard University Press.

Winston, M.L. 1987. The Biology of the Honey Bee. Harvard University Press.

Woodhouse, R.M., Buchmann, G., Hoe, M., Harney, D.J., Low, J.K.K., Larance, M., et al. 2018. Chromatin modifiers SET-25 and SET-32 are required for establishment but not long-term maintenance of transgenerational epigenetic inheritance. Cell Reports

25: 2259-2272.e5.

Woyciechowski, M. & Kuszewska, K. 2012. Generates Rebel Workers in

Honeybees. Current Biology 22: 707–711.

Woyke, J. 1963. What happens to diploid drone larvae in a honeybee colony. Journal of Apicultural Research 2: 73–75.

168

Yagound, B., Duncan, M., Chapman, N.C. & Oldroyd, B.P. 2017. Subfamily- dependent alternative reproductive strategies in worker honeybees. Molecular

Ecology 26: 6938–6947.

Yagound, B., Smith, N.M.A., Buchmann, G., Oldroyd, B.P. & Remnant, E.J. 2019.

Unique DNA methylation profiles are associated with cis-variation in honey bees.

Genome Biology and Evolution, doi: 10.1093/gbe/evz177.

Yang, S., Wang, L., Huang, J., Zhang, X., Yuan, Y., Chen, J.-Q., et al. 2015. Parent- progeny sequencing indicates higher mutation rates in heterozygotes. Nature 523:

463–467.

Yin, C., Shen, G., Guo, D., Wang, S., Ma, X., Xiao, H., et al. 2016. InsectBase: a resource for insect genomes and transcriptomes. Nucleic Acids Research 44: D801–

D807.

169