<<

GeneticGenetic mapping mapping InvitationInvitation Genetic mapping in polyploids Genetic Genetic mapping in polyploids Genetic

in polyploidsin polyploids You are cordiallyYou are invitedcordially to invited to attend theattend public the defense public defense of my PhDof thesis my PhD entitled: thesis entitled:

GeneticGenetic mapping mapping in polyploidsin polyploids

on Fridayon 15 Fridayth June 152018th June 2018 at 11:00 in at the 11:00 Aula in ofthe Aula of WageningenWageningen University, University, Generaal GeneraalFoulkesweg Foulkesweg 1, 1, Wageningen.Wageningen. Peter M. Bourke Peter M. Bourke Peter BourkePeter Bourke [email protected]@wur.nl

ParanymphsParanymphs

Michiel KlaassenMichiel Klaassen Peter PeterM. Bourke M. Bourke [email protected]@wur.nl 2018 2018 Jordi PetitJordi Pedro Petit Pedro [email protected]@wur.nl Propositions

1. Ignoring multivalent pairing during polyploid simplifies and improves subsequent genetic analyses. (this thesis)

2. Classifying polyploids as either autopolyploid or allopolyploid is both inappropriate and imprecise. (this thesis)

3. The environmental credentials of electric cars are more grey than green.

4. ‘Gene drive’ technologies display once again that humans act more like ecosystem terrorists than ecosystem managers.

5. Heritability is a concept that promises much but delivers little.

6. Awarding patents to cultivars or crop traits is patently wrong.

7. The term “air miles” should refer to one’s lifetime allowance of fossil- fuelled air travel.

8. The Netherlands’ most effective educational tool comes on two wheels with a bell.

Propositions belonging to the thesis, entitled

“Genetic mapping in polyploids”

Peter M. Bourke Wageningen, 15th June 2018 Genetic mapping in polyploids

Peter M. Bourke Thesis committee

Promotor Prof. Dr R.G.F. Visser Professor of Plant Breeding Wageningen University & Research

Co-promotors Dr C.A. Maliepaard Associate professor, Plant Breeding Wageningen University & Research

Dr R.E. Voorrips Senior scientist, Plant Breeding Wageningen University & Research

Other members Dr J. Endelman, University of Wisconsin-Madison, USA Prof. Dr M.E. Schranz, Wageningen University & Research Dr J.W. van Ooijen, Kyazma B.V., Wageningen Prof. Dr B.J. Zwaan, Wageningen University & Research

This research was conducted under the auspices of the Graduate School of Experimental Plant Sciences (EPS) Genetic mapping in polyploids

Peter M. Bourke

Thesis submitted in fulfilment of the requirements for the degree of doctor at Wageningen University by the authority of the Rector Magnificus, Prof. Dr A.P.J. Mol, in the presence of the Thesis Committee appointed by the Academic Board to be defended in public on Friday 15 June 2018 at 11 a.m. in the Aula. Peter M. Bourke Genetic mapping in polyploids, 306 pages.

PhD thesis, Wageningen University, Wageningen, the Netherlands (2018) With references, with summary in English

ISBN 978-94-6343-846-9 DOI 10.18174/444415 Table of contents

Chapter 1 General Introduction 7 Chapter 2 Tools for genetic studies in experimental populations of polyploids 27 Chapter 3 The double reduction landscape in tetraploid potato as revealed by a high-density linkage map 53 Chapter 4 Integrating haplotype-specific linkage maps in tetraploid species using SNP markers 77 Chapter 5 Partial preferential chromosome pairing is genotype dependent in tetraploid rose 109 Chapter 6 polymapR - linkage analysis and genetic map construction

from F1 populations of outcrossing polyploids 137 Chapter 7 An ultra-dense integrated linkage map for hexaploid chrysanthemum enables multi-allelic QTL analysis 153 Chapter 8 Quantifying the power and precision of QTL analysis in autopolyploids under bivalent and multivalent genetic models 177 Chapter 9 Multi-environment QTL analysis of plant and flower morphological traits in tetraploid rose 209 Chapter 10 polyqtlR – an R package to analyse quantitative trait loci in autopolyploid populations 233 Chapter 11 General Discussion 255

References 277 Summary 297

Acknowledgements 299 About the author 302 List of publications 303 Education certificate 304

Chapter 1

General Introduction

7 ______General introduction

Molecular markers and Mendel’s missing caveat

It is only relatively recently that the terms “marker” and “molecular marker” have been taken to be synonymous. In his pioneering experiments with peas, Mendel was already studying the transmission and inheritance of a now-famous set of genetic “markers” (Mendel, 1866), with the physical expression of seven different traits (what we term “phenotypes”) being the marker set itself. It took almost fifty years before it was realised that these morphological markers also represented specific locations on chromosomes which were either transmitted together (due to genetic linkage) or independently (due to lack of linkage) (Morgan, 1911). Curiously, Mendel had selected only unlinked traits for his study of peas (whether this was by accident or by design we do not know). His second law, the law of independent assortment, states that alleles of one gene sort into gametes independently of the alleles of another gene. However, because chromosomes and their connection to inheritance had not yet been discovered, Mendel missed one vital caveat to this law, namely that independence only occurs if such genes are located on separate chromosomes (or perhaps at opposite ends of a single long chromosome). Most of what follows in this thesis is based on this non-independent segregation of linked loci. One of the major milestones in genetics was the realisation that a collection of linked markers could be arranged in a linear fashion, with distances between their positions estimated from the counts of co-inherited markers (Sturtevant, 1913). In his demonstration of this fact using six linked morphological markers of the common fruit fly Drosophila melanogaster, Sturtevant created the world’s first genetic linkage map (Sturtevant, 1913;Van Ooijen and Jansen, 2013).

Nowadays, we generally rely on DNA markers (a.k.a. molecular markers) to identify positions on chromosomes. In this thesis we exclusively present data on single nucleotide polymorphism (SNP) markers. These are nucleotide positions which generally differ in the individuals being screened and are usually selected to be “bi-allelic” (i.e. alternating between two possible nucleotides in the material under study). However, the methods we develop are general to any bi-allelic marker system for which marker “dosage” counts can be accurately estimated. The term “dosage” is less commonly used in diploid studies, but is a very important concept in genetic analyses of polyploids. Dosage is generally understood to be the number of copies of the alternative allele carried by an individual at a particular locus. An illustration of the possible marker dosage scores in a tetraploid and hexaploid are shown in Figure 1.

8 ______Chapter 1

Linkage mapping and QTL analysis – part I

The principal aim of this thesis is the development and exploration of methods and tools to create linkage maps and perform quantitative trait locus (QTL) analysis in polyploids. Both of these activities combined can conveniently be termed “genetic mapping” as the title of this thesis suggests. These are not new activities – as already mentioned, the first linkage map was constructed in 1913, and the first QTL analysis was arguably conducted a decade later (Sax, 1923). The novelty of this work lies in its application to polyploids, a group of organisms that possess much more complex genomes than their diploid counterparts, and in its use of modern genotyping data which often consists of many tens of thousands of markers. a. 0 1 2

3 4

b. 0 1 2 3

4 5 6

Figure 1. Representation of the possible marker dosage scores, for a. Tetraploid (ploidy = 4) and b. Hexaploid (ploidy = 6). In general, there are ploidy + 1 possible dosage classes at a bi- allelic marker position, here shown in blue / red. The convention for assigning dosage scores is to count the number of copies of the alternative allele (coloured red), with the reference allele count (in blue) given by ploidy – dosage.

Modern plant and animal breeding has benefitted from the use of molecular markers, allowing specific traits of interest to be predicted without having to necessarily grow a plant to maturity and test the trait directly. This can have many advantages, for example by allowing a greater number of traits to be more cost-effectively combined, allowing selection among multiple progenies, or in material for which phenotyping is difficult

9 ______General introduction or expensive etc. At least, that is the intention of marker-assisted selection (MAS). In practice, MAS has met with mixed success in plant breeding programs (Collard and Mackill, 2008;Hospital, 2009). In the autotetraploid crop potato (Solanum tuberosum L.), markers have been demonstrated to be useful for certain disease resistances (e.g. potato cyst nematode (Schultz et al., 2012)) and have been shown to be a cost-effective proposition if deployed appropriately in a breeding program (Slater et al., 2013). However, in order to use markers effectively, we need to develop models to capture the relationship between a plant’s genotype and its phenotype. For polyploids, modelling these relationships is still relatively new and is fraught with added complications not encountered in diploids. Polyploid inheritance patterns are by nature more complex than diploids. Furthermore, many polyploids are outcrossing species that resist the genetic simplification achieved through inbreeding. We will first examine some of the challenges that polyploids pose to the molecular geneticist, and in particular the idiosyncrasies of polyploid meiosis, before returning to the topic of linkage mapping and QTL analysis.

Polyploidy

A polyploid is any organism that carries more than two copies of each chromosome, from the Greek “poly-” meaning much or many, and “–ploid” from “ploos”, meaning fold, thus “many-fold”. was discovered more than a century ago (Strasburger, 1910) and since then has been a topic of continued interest and debate. Diploidy, the state of possessing two copies of each chromosome, is considered the chromosomal “ground state” or norm for complex organisms. In diploids, parents transmit a single copy of each chromosome to each gamete, thereby re-establishing diploidy in the offspring. In polyploid organisms, more than one copy of each chromosome is transmitted, which is one of the main contributing factors to the complexity of polyploid genetics. There is a tendency for polyploid lineages to return to a diploid conformation over evolutionary timescales, a process termed “diploidisation” or “re-diploidisation” (Ohno, 1970;Le Comber et al., 2010). In the timescales of interest to breeders and researchers however, polyploidy is effectively a permanent condition. Although the definition of polyploidy is quite unequivocal, there can be some confusion over the classification of a species as polyploid or not, particularly as many complex life-forms were polyploid at some point in the past (Van de Peer et al., 2017). “Paleopolyploids” are species that were true polyploids millions of years ago but have since re-diploidised, and the term “neopolyploids” refers to newly- formed polyploids (Ramsey and Schemske, 2002;Lloyd and Bomblies, 2016), possibly artificially generated for research purposes to understand how polyploids deal with the

10 ______Chapter 1 initial “genomic shock” of having an extra genome (McClintock, 1984). Neopolyploidy may also refer to recently-formed wild polyploid populations such as Spartina anglica, an allopolyploid that arose when Spartina alterniflora was introduced outside its native range and hybridised with local Spartina species (Soltis and Soltis, 2009). Some authors further distinguish “mesopolyploids” as re-diploidised species that underwent whole- genome duplication (WGD) at a less ancient timescale than paleopolyploids and which can be detected by genetic or cytogenetic analyses (Mandáková et al., 2010;Franzke et al., 2011). In this thesis we are principally concerned with extant polyploid species that have not re-diploidised, yet have already passed the (presumed bumpy) early generations i.e. they are no longer considered neopolyploid.

Among polyploids, two distinct types are generally recognised – autopolyploids and allopolyploids. These terms can distinguish or emphasise two features, namely the origin of the polyploid (also termed the “taxonomic” definition), or how its chromosomes behave during meiosis (the “genetic” definition) (Ramsey and Schemske, 2002;Doyle and Egan, 2010). Autopolyploids are generally-speaking derived from a single species and exhibit polysomic inheritance. Polysomic inheritance means that all possible combinations of alleles are equally likely to end up in a gamete – although we will return to this question in more detail in the next section as it is one of the fundamental points of interest in this thesis. Allopolyploids on the other hand are derived from at least two species and exhibit disomic inheritance (where disomic means diploid-like inheritance, the result of exclusive pairing and recombination between homologous chromosomes and an absence of pairing and recombination between homoeologous chromosomes). Although also important, they are not the focus of study here. It should be noted that classifying a species as either autopolyploid or allopolyploid is not always straightforward, as demonstrated for example in the debate about the correct classification of the polyploid ancestor of soybean (Glycine spp.) (Barker et al., 2016;Doyle and Sherman‐Broyles, 2016). In other words, the taxonomic and genetic definitions do not always neatly overlap, particularly in species with a long history of inter-specific hybridisation among progenitor species of varying relatedness. A large body of polyploid research is aimed at understanding how different polyploid lineages arose, and how these newly-wed genomes adapted and evolved to accommodate each other and their changing environment.

There is a third category of polyploid, namely the “segmental allopolyploid” as it was originally termed (Stebbins, 1947). Again this category can be defined from a taxonomic perspective or a genetic perspective – as a hybridisation between two very closely-related species or subspecies, or as a polyploid which demonstrates a meiotic

11 ______General introduction pairing behaviour that cannot be classified as fully disomic or fully polysomic (recently termed “mixosomic” (Soltis et al., 2016)). Throughout this thesis we rely on the genetic definition, as it is the pairing behaviour that influences how homologues recombine, upon which our methods to study inheritance are ultimately based. Although it is interesting to speculate upon how or why such differences arose, in the end we are primarily interested in understanding what happens, as this is the most solid ground upon which to build a model.

Polyploidy occurs in animals, plants and fungi, with the ancestors of all angiosperms and vertebrates thought to have experienced at least two whole-genome duplications (Putnam et al., 2008;Jiao et al., 2011). In the plant kingdom there are numerous examples of extant polyploids. There are fewer known examples of polyploid animals, which some suggest is due to difficulties in re-establishing a balance in chromosomal sex-determination systems following genome duplication (Muller, 1925;Orr, 1990). However examples do exist, particularly in amphibians and fish (but much less so in other vertebrates) (Mable et al., 2011). Amphibious examples include the Bluespotted- Jefferson salamander complex (Uzzell, 1964), the grey tree-frog Hyla versicolor (Ptacek et al., 1994), the American ground frog Odontophrynus americanus (Beçak et al., 1966) and the African clawed frog Xenopus laevis (Session et al., 2016). Among fish, it is now well-established that a whole genome duplication (WGD) occurred in the ancestor of all salmonids (e.g. salmon, trout etc.) between 50 and 100 million years ago (Allendorf et al., 2015). Polyploidy is also sometimes artificially induced in animals, for example in Pacific oysters (Crassostreae gigas) (Benabdelmouna and Ledu, 2015) or the silkmoth (Bombyx mori L.) (Rasmussen and Holm, 1979). There continues to be debate about whether any polyploid mammals exist, with the initial claim that the Argentinian red vizcacha rat (Tympanoctomys barrerae) is tetraploid (Gallardo et al., 1999) being more recently challenged in light of new data (Svartman et al., 2005;Evans et al., 2017).

Polyploidy occurs widely among plant species, with recent advances in whole-genome sequencing allowing a detailed analysis of recent and ancient polyploidisation events in an increasingly large number of plant lineages (Vanneste et al., 2014;Van de Peer et al., 2017). In natural populations of plants there is always a small possibility of a new polyploid species arising (usually although not exclusively through unreduced (2n) gametes (Harlan and De Wet, 1975)). In the case of autotetraploids, one possible path for their establishment is through the initial formation of a triploid bridge (from a fusion of n + 2n gametes) (Ramsey and Schemske, 1998;Schinkel et al., 2017). These triploids, although generally infertile, may also produce 2n gametes and hence through

12 ______Chapter 1 selfing or pollination by 2n gametes from diploids, tetraploids may be formed (Ramsey and Schemske, 1998;Comai, 2005). Note that for a new polyploid lineage to establish and diversify, an even-numbered ploidy is required. An exception to this is when plants exclusively reproduce vegetatively or apomictically, thereby avoiding the disruptions that odd-numbered ploidies pose to balanced meiotic division. However such lineages would be expected to evolve more slowly than sexually-reproducing ones (McDonald et al., 2016). The fusion of unreduced with an unreduced egg cell, both from diploid parents, is also theoretically possible (“one-step” tetraploids (Ramsey and Schemske, 1998)). Induced polyploidy (man-made) can also occur through somatic chromosome doubling, although its status as a means to polyploid formation in natural populations is uncertain (Harlan and De Wet, 1975). An interesting alternative pathway that has only recently been explored is the possibility of polyspermy, where more than one sperm cell fertilises an ovule (Dresselhaus and Johnson, 2018). Interestingly, stressed plants are found to produce more unreduced gametes (such stresses may relate to environmental variables like extreme temperatures, wounding, drought or nutrient deficiency (Ramsey and Schemske, 1998)). Unreduced gametes are thought to arise as a result of defective spindle fibres or cell-plate formation, both of which have been shown to regularly occur at extreme (particularly higher) temperatures (Pécrix et al., 2011;De Storme and Geelen, 2014;Bomblies et al., 2015)).

In most cases, neopolyploid plants are usually at an immediate disadvantage, being un- adapted and reproductively isolated (what is termed “minority cytotype disadvantage” (Levin, 1975;Husband, 2000)). The speed at which newly-established polyploid lineages prosper and diversify varies, with indications that there may be a significant time lag before this occurs (Schranz et al., 2012). One of the fascinating hypotheses that has emerged is the possibility that a disproportionately-high number of WGDs occurred close to the Cretaceous-Paleogene boundary (around 66 million years ago) (Van de Peer et al., 2017). Non-avian dinosaurs are thought to have disappeared around the same time, along with 60-70% of all plant and animal life, coinciding with a number catastrophic phenomena including perturbations in the global climate, increased volcano activity and the impact of a large meteor near Chicxulub, Mexico (Renne et al., 2013). Clearly this was a difficult time to be alive on planet Earth. Nevertheless, in times of severe environmental stress the increased genomic plasticity of polyploids (te Beest et al., 2011) coupled with their propensity to vegetatively propagate (Herben et al., 2017), as well as reduced competition from severely-stressed or dying diploids may have contributed to their success.

13 ______General introduction

Polyploids are particularly common among domesticated crops (Salman-Minkov et al., 2016), a fact that has helped drive interest to better understand these species. In many cases polyploidy is deliberately induced – for example modern ornamental breeding often relies on inter-specific hybrids to create novel varieties, which are often “polyploidised” (through colchicine treatment for mitotic polyploidisation, or through selection of 2n gametes) to overcome sterility in the F1 (Van Tuyl and Lim, 2003). Fruit breeders also generate seedless fruit by crossing parents of different ploidy levels, resulting in sterile fruit-bearing (usually triploid) offspring (Bradshaw, 2016). Many of the native attributes of polyploids may also have endeared them to the early agriculturalists, e.g. larger organs (tubers, fruits, flowersetc. ), also known as the “gigas” effect (Sattler et al., 2016), or their ability to be clonally propagated. We therefore find ourselves in the position of relying on some of the most genetically-complex species to provide us with the basic necessities for life. Examples of some globally-important polyploid crops include allopolyploids such as wheat (Triticum aestivum L.), cotton (Gossypium hirsutum L.), coffee (Coffea arabica L.), oilseed rape (Brassica napus L.), oats (Avena sativa L.), peanut (Arachis hypogaea L.), strawberry (Fragaria × ananassa L.) and autopolyploids such as potato (Solanum tuberosum L.), alfalfa (Medicago sativa L.), sweetpotato (Ipomoea batatas L.), leek (Allium ampeloprasum L.), blueberry (Vaccinium corymbosum L.), chrysanthemum (Chrysanthemum spp.) and rose (Rosa × hybrida L.). Despite their importance, polyploids, and in particular autopolyploids, remain an understudied and poorly-understood group. One may attribute this to their genetic complexity and the fact that few software tools are developed to analyse autopolyploid data (noting that due to the diploid-like inheritance of allopolyploids it is possible to use many of the diploid software tools for those crops). It is also only relatively recently that datasets containing sufficient marker numbers to allow investigations into their genetic properties have become available. In this thesis we aim to tackle this deficit by developing tools and methods to analyse autopolyploid genotype and phenotype data. Although the methods we develop are general, we have so far only applied them to plant datasets. Before going into more detail on how this was done, we first need to understand the process of meiosis.

Meiosis

The topic of meiosis in polyploids is central to this thesis. Indeed, one of our primary interests in markers is that they provide a detailed picture of meiosis when deployed over a population of related individuals. The inheritance information from the population allows us not only to organise these markers into ordered clusters representing chromosomes,

14 ______Chapter 1 but also to track the pairing and recombination that occurred in each parental meiosis that gave rise to a particular offspring individual. Meiosis can be defined as “two successive nuclear divisions that produce gametes carrying one half of the genetic material of the original cell” (Griffiths et al., 2012). In plants and fungi as opposed to animals, the direct product of meiosis is technically not a gamete, but rather a spore which gives rise to a which later produces the gamete cells. However, from the perspective of inheritance, all the interesting phenomena occur during meiosis, and not in any later intermediate (mitotic) cell divisions which carry the haploid chromosomes towards their final destiny as gametes. Therefore, in this thesis the term “gamete” is often loosely used to denote the products of meiosis, when “spore” would have been more appropriate in plant-specific contexts.

Meiosis in polyploids

In describing polyploid meiosis, one can either choose to give a very detailed picture or to sketch the most important features and hope that in so doing the audience are still following. In this thesis we try to take the middle road, avoiding unnecessary terminology or technical detail where possible. Apart from lucidity, we are also only interested in meiosis from a practical perspective – it is simply a process that we model in order for us to make sense of marker data. We leverage this data to come to a better understanding of the chromosomal composition of our experimental organisms (in the fullest sense – from gene order and position to gene function and effect). One clear example of our (over) simplification is the use of the term “pairing” in meiosis. In the early stages of meiosis I (leptotene) homologues pair up – they physically align. This is followed by the formation of synaptonemal complexes during zygotene. However, in cases where this pairing is aberrant (e.g. between homoeologues) any synaptonemal complexes that may have formed are dissolved or corrected in late zygotene and pachytene, and only bivalents between homologues persist into metaphase I. In this thesis we avoid terms such as leptotene, zygotene and pachytene, and use “pairing” to describe the conformation of (recombining) homologues (or homoeologues even) that have persisted into metaphase I. If this pairing behaviour is not random but preferentially occurs between certain homologues, we call this behaviour “preferential pairing”, which is one of the main distinguishing characteristics of a segmental allopolyploid. However technically-speaking it would be better to talk about “preferential crossover formation” instead (Lloyd and Bomblies, 2016). In fact, the literature on the subject of preferential pairing appears to be written by two groups – those who are comfortable with terms like

15 ______General introduction

“leptotene” and those who are not.

Interphase I

Prophase I

Metaphase I

bivalents multivalent

Anaphase I

Figure 2. Simplified scheme of the early stages of polyploid meiosis, in this case foran autotetraploid. By the beginning of Metaphase I, pairs of sister chromatids will have associated and come to lie on the equatorial plane, either in bivalent or multivalent pairing structures.

From a simplified perspective, therefore, there are a number of aspects that are relevant for this thesis. The first concerns the early stages of meiosis which lead to the formation of different types of pairing structures (Figure 2). As already mentioned, the precise details of I are beyond the scope of this introduction, but the interested reader is directed to any number of excellent reviews on the subject of meiosis in general (Harrison et al., 2010;Da Ines et al., 2014;Hunter, 2015;Mercier et al., 2015;Zickler and Kleckner, 2015), and in particular those that look specifically at polyploid meiosis (Cifuentes et al., 2010;Zielinski and Scheid, 2012;Grandont et al., 2013;Moore, 2013;Lloyd and Bomblies, 2016).

At the onset of meiosis each condenses and is replicated into pairs of sister chromatids (shown in Figure 2, Prophase I). These are held together by cohesion complexes that persist into the later stages of meiosis (Hunter, 2015). These pairs of sister chromatids then associate either as pairs, in which case they are described

16 ______Chapter 1 as bivalents, or in groups of three or more, described as multivalents (Figure 2, Metaphase I). It is now known that in order for meiosis to proceed in an orderly fashion, there must be formation of chiasma leading to at least one cross-over per chromosome (the number of cross-overs per chromosome tends to also have an upper bound, with rarely more than three observed) (Mercier et al., 2015) which occur through the formation and subsequent repair of double-strand breaks in the DNA (Henderson and Keeney, 2004). The most commonly-observed multivalents involve four homologues (also called a quadrivalent), with on average 27% of all pairing structures found to be quadrivalents in a meta- analysis of autopolyploid meiosis (Ramsey and Schemske, 2002). By contrast, only 2% of trivalent structures were observed. Uneven numbers of pairing homologues are more likely to lead to aneuploid gametes (carrying an unbalanced number of chromosomes) which rarely survive. Multivalents are more likely to be observed in autopolyploids than allopolyploids (because in the latter case there are barriers to pairing and recombination between homoeologues), but even in allopolyploids they may occur (Ramsey and Schemske, 2002).

Double reduction

It has long been suggested that multivalents are aberrant pairing structures that can often lead to aneuploidy and reduced fertility (Darlington, 1937;Kostoff, 1940;Hazarika and Rees, 1967;Lloyd and Bomblies, 2016), but this view likely stems from a bias towards neopolyploids which have been shown to produce abnormally-high numbers of multivalents in early generations, stabilising to lower frequencies in later generations (Ramsey and Schemske, 1998;2002;Santos et al., 2003;Bomblies et al., 2016). There are numerous reports of multivalents being formed in a wide range of established autopolyploids, without necessarily negatively impacting on fertility (Swaminathan and Howard, 1953;Morrison and Rajhathy, 1960;Jones and Vincent, 1994;Khawaja et al., 1997;Ramsey and Schemske, 2002). As this thesis is concerned with stable, established autopolyploids, we are primarily interested in the genetic consequences of multivalents and in particular the phenomenon of double reduction (Figure 3). Double reduction has been known about for almost a century and theories of how it should be modelled have been considered by some of the early pioneers of statistical genetics (Haldane, 1930;Mather, 1935;Fisher, 1947). It is quite common nowadays for authors to assert that double reduction is required for a complete and accurate description of autopolyploid inheritance (and by extension, that it be ignored at great peril) (Luo et al., 2004;Wu et al., 2004;Wu and Ma, 2005;Li et al., 2010;Lu et al., 2012;Xu et al., 2013). On the other

17 ______General introduction

1 2 3

6 5 4

7 8 9

Figure 3. Steps leading to double reduction in an autotetraploid. 1. During late Prophase I, pairs of sister chromatids associate. Multivalents involving four pairs of sister chromatids are also called quadrivalents, as shown. 2. For quadrivalents to be maintained from Prophase I into Metaphase I, cross-overs need to form between at least three of the four pairing partners. Here, four cross-overs are shown between all partners, leading to a ring-quadrivalent. 3. Resolved double-strand breaks leading to cross-overs, as seen in late Metaphase I. 4. During Anaphase I,

18 ______Chapter 1 spindle fibres originating from the centrosomes attach to the . 5. For proper chromosomal segregation, two of the four pairing homologues must migrate to each pole. This mechanism appears to be unregulated in neopolyploids but established polyploids often exhibit correct segregation leading to no abnormalities. For double reduction to occur, a homologue and its pairing recombination partner must migrate to the same pole, a situation that is impossible with only bivalent pairing. 6. Following interkinesis (which includes Telophase I with the division of the cell in two, and Prophase II which is rather uneventful) pairs of sister chromatids carrying recombinant homologues come to lie again at the equatorial plane in Metaphase II. 7. Anaphase II subsequently follows, with spindle fibres again attaching to the centromeres. At this stage, cohesion complexes that held chromatids together are dissolved. 8. In late Anaphase II, cell division again occurs. 9. Based on how sister homologues segregate, it is possible for gametes (spores) carrying two identical copies of part of a homologue to occur. In this example, the upper two gametes carry double reduction products (e.g. homozygous blue section), whereas the lower two gametes do not. hand, a number of recent reviews of allo- and autopolyploid meiosis completely fail to mention it (Bomblies et al., 2016;Lloyd and Bomblies, 2016). In this thesis double reduction is not ignored, but in many cases we do omit it from our models. However, we take particular care that if so doing, we do not introduce unnecessary bias. Preferential pairing

One other aspect of polyploid meiosis that requires our attention is the phenomenon of “preferential pairing” (or as pointed out already, perhaps more accurately termed “preferential crossover formation” (Lloyd and Bomblies, 2016)). Allopolyploids consist of both homologues (closely related chromosomes presumed to have derived from the same species) and homoeologues (less closely-related chromosomes, presumed to be from distinct species). During meiosis, homologues are found to pair and recombine whereas homoeologues are not. This is often under the direct control of specific genes or loci, such as the Ph1 locus in wheat (Triticum aestivum L.) (Okamoto, 1957;Riley and Chapman, 1958) or the PrBn locus in oilseed rape (Brassica napus L.) (Jenczewski et al., 2003;Nicolas et al., 2009). In such instances we refer to “fully preferential pairing” between homologues, leading to disomic inheritance. In autopolyploids on the other hand there is an equal chance of pairing and recombination between all homologues during meiosis, leading to polysomic inheritance. However, these states represent the two extremes of a supposed continuum between random and non-random pairing, with the term “segmental allopolyploid” (Stebbins, 1947) used to gather together everything in between (including the possibility of certain chromosomes behaving disomically and others polysomically). Examples where a mixture of disomic and polysomic inheritance have been observed are rainbow trout (Oncorhynchus mykiss) (Allendorf and Danzmann, 1997), peanut (Arachis hypogaea L.) (Leal-Bertioli et al., 2015;Nguepjop et al., 2016),

19 ______General introduction

Figure 4. Gametes of two hypothetical tetraploid parents (displaying homologues of a single chromosome). In the maternal meiosis (left), the red and violet-coloured homologues (and the yellow and blue homologues) pair and recombine more often than would be expected by chance. Almost all gametes sampled carry red/violet and yellow/blue homologue mosaics, with far fewer red/yellow & violet/blue or red/blue & violet/yellow combinations. In this case, we say there is preferential pairing between red and violet homologues (and also yellow and blue homologues – one pairing implies the other). In the paternal meiosis (right), no such preference exists leading to balanced numbers of gametes from each of the three possible (bivalent) pairing combinations. sugarcane (Saccharum officinarum × S. spontaneum) (Jannoo et al., 2004) and birdsfoot trefoil (Lotus corniculatus L.) (Fjellstrom et al., 2001). In these cases it is not always clear whether we are dealing with an autopolyploid that is gradually becoming re- diploidised, or an allopolyploid from two very closely-related species (Soltis et al., 2016). Diagnosing the mode of inheritance in polyploids (disomic / polysomic / mixosomic) is a challenging endeavour, but the task has been made considerably easier by the use of

20 ______Chapter 1 markers. By following the inheritance of specific types of markers in a population, it is possible to detect whether there are statistically significant deviations from the expected proportions of marker alleles. These can be tested against an initial hypothesis of disomy or polysomy, usually informed by a study of the existing literature on the subject. Suppose we cross two heterozygous tetraploid plants and consider the segregation of markers in the resulting F1. It is possible for us to uncover the parental origin of each of the homologues inherited in the population, and in this way reconstruct the parental gametes that contributed to each offspring (Figure 4). This gives us even greater clarity regarding the parental meiosis as we are in a position to map all the recombination events (and therefore also know between which pairing partners they occurred).

Linkage mapping and QTL analysis – part II

We return to the topic of linkage maps and QTL analyses now that some of the more important eccentricities surrounding autopolyploids have been laid bare. At its core, the process of linkage mapping in polyploids is analogous to that of a diploid (Figure 5.a). The first step is to genotype a set of related individuals, either using a pre-defined set of markers or by sequencing the individuals directly and determining the marker set subsequently. Based on the co-segregation of markers in the population, the markers can be clustered together into linkage groups (ideally corresponding to chromosomes), ordered within these linkage groups and spaced according to the genetic distances between them (Figure 5.a).

However, polyploid linkage mapping, and in particular autopolyploid linkage mapping, adds a further level of complexity by distinguishing between homologue linkage groups and chromosomal linkage groups. The terminology often used to describe the process of classifying a marker across homologues is phasing, that is determining whether the segregating marker alleles are on the same homologue (for which we say markers are linked in coupling phase) or whether they reside on different homologues of the same chromosome (for this we say they are linked in repulsion phase). Phase considerations in diploids are trivial if inbred parents are used; an exception is when parents are out- breeding (so-called cross-pollinating or CP populations (Van Ooijen, 2006)), for which linkage phase must also be carefully considered (Maliepaard et al., 1997). The first main challenge of polyploid linkage mapping therefore is to assign a marker not just to a linkage group and a position, but to determine precisely how its alleles are distributed across the parental genomes in relation to all the other markers (Figure 5.b). The second

21 ______General introduction

a.

b. 0

20

40 cM 60

80

100

h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12 Figure 5. Linkage mapping in polyploid species. a. Principal stages in linkage map construction. Firstly, genotype (marker) data is generated for a population. These markers are clustered into groups (this can be at the level of both chromosome and homologue, not shown here). Markers are subsequently ordering within these clusters. In the final step, map positions are assigned to the markers, usually on a centiMorgan scale. Markers may be lost in the clustering stage due to lack of linkage, possibly due to poor data quality. Some markers may also map to the same position, as indicated by the side branches in the right-hand figure.b. Example of an integrated autohexaploid linkage map (centre), with phasing of the markers on the maternal homologues h1 – h6 in red, and paternal homologues h7 – h12 in blue. Marker positions in centiMorgans (cM) are shown on the y-axis. challenge is how to best order the markers. In this thesis we do not tackle marker ordering but rely on the work of others and in particular, we apply a recent algorithm that applies multi-dimensional scaling to efficiently solve the problem (Preedy and Hackett, 2016). As with diploid CP populations, recombination frequency estimates can be of varying precision based on the informativeness of a particular marker combination (Maliepaard et al., 1997). This non-constant variance is reflected in variable LOD scores, resulting

22 ______Chapter 1 in the need for a marker ordering algorithm that performs weighted optimisation (Stam, 1993;Preedy and Hackett, 2016).

Once a linkage map has been created it is possible to test for associations between genetic positions on the parental homologues and phenotypic traits of interest. Similar to the central challenge of linkage mapping, one of the main challenges in polyploid QTL analysis is to correctly predict the parental origin of QTL alleles (Figure 6). In the case of a QTL with a single allele of positive effect, this can usually be achieved by single- marker approaches (given sufficient marker coverage in the QTL region). However, in situations where QTL are influenced by multiple allelic variants at the same locus it may not be possible to track all positive alleles using single-marker approaches. Instead,

- + + +

Figure 6. Locating and phasing QTL in an autotetraploid. As with diploids, a logarithm of odds (LOD) profile gives an indication of the strength of association between a genetic position and a particular trait. Significant associations are usually taken as those that exceed a certain threshold, shown here as a dashed red line. QTL positions are often displayed beside a link- age map with various support intervals (e.g. a LOD-1 and LOD-2 support interval) using soft- ware such as MapChart (Voorrips, 2002). However, in a polyploid we are also interested in the configuration of a QTL. In this example, two alleles with positive (+) effect originated from parent 2, and both a positive and negative (-) allele originated from parent 1. In this way, off- spring carrying optimal combinations of alleles, potentially at multiple loci, can be identified.

23 ______General introduction

QTL methods that move away from single marker information to instead use multi-locus identity-by-descent (IBD) probabilities (Hackett et al., 2013;Zheng et al., 2016) may provide additional diagnostic power (Kempthorne, 1957;Hackett et al., 2014).

The polyploid revolution

Up until relatively recently, breeding in polyploid crops has relied completely on phenotypic selection. For simple traits this may well have been sufficient, but the effectiveness of phenotypic selection for complex traits like yield is questionable (Jansky, 2009). There is increased interest among polyploid breeders in the use of molecular and genomic tools to assist with their breeding programs. The reasons why this has not so far happened were technological, methodological and financial. The technology to be able to efficiently and cheaply dissect polyploid genomes in detail is only now becoming available. The methodology to interpret this data and the software tools to run these analyses (such as described in this thesis) are following suit. We therefore stand at the dawn of a revolution in polyploid genomics and breeding, brought about by a favourable combination of these three factors.

Layout of this thesis

The central aim of this thesis is to develop and test methods to perform genetic mapping in polyploid populations. This can constitute a significant computational challenge, given the complexity and sheer size of modern genotyping datasets. We begin therefore with a review of the available software tools for the genetic analysis of polyploids in Chapter 2. This chapter explores the current options for assigning marker dosages and assembling haplotypes, performing linkage mapping and QTL analyses, genome-wide association studies and genomic selection, as well as looking at the availability of physical maps and tools for simulating polyploid populations.

The rest of the thesis can be broken down into two main subjects – linkage mapping and QTL analysis, with linkage mapping methods developed in the first half of the thesis, and QTL mapping in the second half. However, we also are interested in understanding the meiotic behaviour of polyploids (which linkage mapping can help to clarify). In Chapter 3 we generate homologue-specific linkage maps in an autotetraploid potato population and use them to investigate the frequency of double reduction events across the population. A denser set of linkage maps, consisting of all marker segregation types,

24 ______Chapter 1 is integrated in Chapter 4 using information from bridging markers (markers with alleles which reside on multiple homologues). We also perform a simulation study to test the effects of double reduction and partial preferential chromosome pairing on the accuracy of recombination frequency estimates upon which our maps are based.

In Chapter 5 we go one step further by developing a high-density SNP map for a tetraploid rose population. We perform an in-depth exploration of the meiotic pairing and recombination behaviour and in the process, develop a method to correct for mixosomic inheritance in the creation of autotetraploid linkage maps. These initial chapters culminate in Chapter 6, where we present a new software package for the creation of integrated and phased linkage maps in polyploid species, called polymapR.

In Chapter 7, we apply our mapping methodology to a large population of the popular autohexaploid cut-flowerChrysanthemum × morifolium. The integrated linkage map and marker phase information is used to detect QTL for a number of flower-related traits. Chapter 8 explores the power of QTL mapping methods in autopolyploids, comparing models which allow for double reduction with those that do not. In Chapter 9, these methods are used to perform a QTL analysis in tetraploid rose, exploring the genetic architecture of a number of economically-important physiological traits measured across multiple growing environments. In Chapter 10, we present polyqtlR, a novel software package for performing QTL analyses in autopolyploid populations, building on the work of the previous chapters. Finally, the thesis is rounded off with a discussion in Chapter 11 on how the findings presented in this thesis can contribute to improvements in polyploid breeding as well deepening our understanding of these fascinating species.

25

Chapter 2

Tools for genetic studies in experimental populations of polyploids

Peter M. Bourke1, Roeland E. Voorrips1, Richard G. F. Visser1, Chris Maliepaard1

1 Plant Breeding, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands.

Published (with modifications) as Bourke, P.M., Voorrips, R.E., Visser, R.G.F. and Maliepaard, C. (2018). “Tools for genetic studies in experimental populations of polyploids”, Frontiers in Plant Science, 9:513. doi: 10.3389/fpls.2018.00513

27 ______Polyploid genetic tools

Abstract olyploid organisms carry more tan to copies of eac cromosome, a condition rarely tolerated in animals ut ic occurs relatiely freuently in te plant ingdom. ne of te principal callenges faced y polyploid organisms is to eole stale meiotic mecanisms to faitfully transmit genetic information to te next generation upon ic te study of ineritance is ased. n tis reie e loo at te tools aailale to te researc community to etter understand polyploid ineritance, many of ic ae only recently een deeloped. Most of tese tools are intended for experimental populations (rater tan natural populations), facilitating genomicsassisted crop improement and plant reeding. Tis is ardly surprising gien tat a large proportion of domesticated plant species are polyploid. Te current polyploid analytic toolox includes softare for assigning marer genotypes (and in particular, estimating te dosage of marer alleles in te eteroygous condition), estalising cromosomescale linage pase among marer alleles, constructing (sortrange) aplotypes, generating linage maps, performing genomeide association studies () and uantitatie trait locus (T) analyses, and simulating polyploid populations. Tese tools can also elp elucidate te mode of ineritance (disomic, polysomic or a mixture of ot as in segmental allopolyploids) or reeal eter doule reduction and multialent cromosomal pairing occur. n increasing numer of polyploids (or associated diploids) are eing seuenced, leading to puliclyaailale reference genome assemlies. Muc or remains in order to eep pace it deelopments in genomic tecnologies. oeer, suc tecnologies also offer te promise of understanding polyploid genomes at a leel ic iterto as remained elusie.

Key words

olyploid genetics, polyploid softare tools, autopolyploid, allopolyploid, segmental allopolyploid

28 ______Chapter 2

Introduction ne of te most fundamental descriptions of any organism is its ploidy leel and cromosome numer, generally ritten in te form 2n 2x 10 (ere, for te uiuitous model plant species Arabidopsis thaliana .). lant scientists in particular ill e familiar it tis representation of te cromosomal constitution of te sporopyte generation (i.e. te adult plant). Te second term in tis seemingly simple euation descries te normal complement of cromosomal copies possessed y a memer of tat species, which is generally 2x (“two times”) for diploids. Species where this number exceeds to are collectiely referred to as polyploids. ot unexpectedly, eac polyploid indiidual is te product of te fusion of gametes from to parents, ust lie teir diploid counterparts. n oter ords, polyploids can also e defined as indiiduals deried from nonaploid gametes (in te case of triploids deried from diploid tetraploid crosses, only one gamete satisfies tis condition). Te transmission of nonaploid gametes is one of the main “complexifying” features of polyploidy, leading to a whole range of implications for the genetic analysis of these “hopeful monsters” (oldscmidt, 1).

Te ongoing genomics reolution can e seen as a rising tide ic as also lifted te polyploid genetics oat, altoug not uite to te same leel as for diploids. Most genetic adances are made in model organisms, among ic selffertilising diploid species predominate. t is terefore not surprising tat most tools and tecniues for molecular genetic studies are specific to diploids. oeer, polyploid species are particularly important to manind in te proision of food, fuel, feed and fire (not to mention “flowers”, if ornamental plant species are also included), maing te genetic analysis of polyploid species an important aenue of researc for crop improement.

Although a collective term such as “polyploidy” has its uses, it tends to obscure some fundamental differences eteen its memers. or example, polyploids are generally sudiided into autopolyploids and allopolyploids (iara and no, 12). utopolyploids arise troug genomic duplication itin a single species, generally troug te production of unreduced gametes (arlan and e et, 1) and exiit polysomic ineritance, meaning pairing and recomination can occur eteen all omologous copies of eac cromosome during meiosis. ne of te most ellstudied examples is autotetraploid potato (Solanum tuberosum .). llopolyploids, on te oter and, are te product of genomic duplication eteen species (usually troug yridisation inoling unreduced gametes (arlan and e et, 1)) and display disomic ineritance, ere morerelated chromosome copies (“homologues”) may pair and recomine during meiosis, ilst lessrelated cromosome copies (“homoeologues”, also spelled “homeologues” (loer et al., 201)) do not. mong allopolyploids, alloexaploid eat (Triticum aestivum .) is proaly te most ell

29 ______Polyploid genetic tools studied. f pairing and recombination between homoeologues occurs to a limited extent, the species may be referred to as “segmental allopolyploid” (Stebbins, ), traditionally deemed to have arisen from hybridisation between very closelyrelated species (Stebbins, hester et al., 22) but which may also be the result of partially diploidised autopolyploidy (Soltis et al., 2). n many cases, a species cannot be clearly designated as one type or another, leading to uncertainty or debate on the subect (arer et al., 2oyle and Sherman royles, 2). rom the perspective of genetics and inheritance, allopolyploids behave much lie diploid species and therefore ‐ many of the tools developed for diploids can be directly applied. he main challenge that faces allopolyploid geneticists is in distinguishing between homoeologous gene copies carried by subgenomes within an individual (aur et al., 22van i et al., 22othfels et al., 2). Autopolyploids (and segmental allopolyploids) do not behave lie diploids, and are therefore in most need of specialised methods and tools for subseuent genetic studies. n this review we focus primarily on the availability of tools and resources amenable to polysomic (and “mixosomic” (Soltis et al., 2)) species, with less emphasis on allopolyploidspecific solutions. Although the development of novel methodologies for the genetic analysis of polyploids are interesting, without translation into a software tool for use by the research community they remain purely conceptual and with limited impact. e therefore try to limit our attention to the tools currently available rather than cataloguing descriptions of unimplemented methods.

Experimental populations, in use since Mendel’s groundbreaing wor (endel, ), are traditionally derived from a controlled cross between two parental lines of interest

(either directly studying the or some later generation). e use the term here to distinguish our subject matter from “wild” or “natural” populations, which would necessitate sampling individuals from an extant population in the wild. uantitative genetics, particularly the genetics of human pathology, has greatly benefitted from the use of large panels of individuals to perform socalled “genomewide association studies” (GWAS). The use of such panels offers to complement the experimental toolbox of polyploid geneticists as well, and although perhaps not strictlyspeaing an “experimental” population, we consider them relevant to the current discussion.

ere, we review the current possibilities for polyploid genotyping, including the scoring of marer dosage (allele counts) and generation of haplotypes, the availability of reference seuences, the possibilities for linage mapping, uantitative trait locus () analysis and AS, as well as tools to simulate polyploid organisms for in silico studies. e also reflect on current and future developments, and the tools that will be needed to eep pace with the innovations we are witnessing in genomic technologies.

30 ______Chapter 2

Polyploid genotyping ne of the most crucial aspects in the study of polyploid genetics is the generation of accurate genotypic data. owever it is also fraught with difficulties, not least the detection of multiple loci when only a single locus is targeted (Mason, imborg et al., ). arious technologies exist, with almost all current applications aimed at identifying single nucleotide polymorphisms (SNPs). Although many genomic “service providers” (e.g. companies or institutes that offer A seuencing) have their own tools to analyse and interpret raw data, these tools are not always suitable for use with polyploid datasets.

Genotyping technologies Although gelbased marer technologies continue to be used and have certain advantages (e.g. low costs associated with small marer numbers, reuiring only basic laboratory facilities, multiallelism etc.), most studies now rely on S marers for genotyping due to their great abundance over the genome, their highthroughput capacity and their low cost per data point. Targeted genotyping such as SNP arrays (a.k.a. “SNP chips”) rely on previouslyidentified and selected polymorphisms, usually identified from a panel of individuals chosen to represent the gene pool under investigation. n contrast, untargeted genotyping generally uses direct seuencing of individuals, albeit after some procedure to reduce the amount of A to be seuenced (e.g. by exome seuencing (g et al., ) or target enrichment (Mamanova et al., )). The disadvantages of targeted approaches have been well explored (particularly regarding ascertainment bias, where the set of targeted Ss on an array poorly represents the diversity in the samples under investigation due to biased methods of S discovery) (Albrechtsen et al., Moragues et al., idion et al., achance and Tishoff, ), although there are advantages and disadvantages to both methods (Mason et al., ). Apart from costs, differences exist in the ease of data analysis following genotyping, with seuencing data reuiring greater curation and bioinformatics sills (Spindel et al., ajgain et al., ) as well as potentially containing more erroneous and missing data (Spindel et al., ones et al., ). n polyploids, S arrays have been developed in numerous species, which include both autopolyploid (or predominantly polysomic polyploids) and allopolyploid species. Examples of the former include alfalfa (i et al., b), chrysanthemum (van Geest et al., c), potato (amilton et al., elcher et al., os et al., ), rose (oningoucoiran et al., ) and sour cherry (eace et al., ). Examples of allopolyploid S arrays include cotton (ulseemp et al., ), oat (Tiner et al., ), oilseed rape (altonMorgan et al., lare et al., ), peanut (andey et al., ), strawberry (assil et al., ) and wheat (Ahunov et al., avanagh et al., Wang et al., bWinfield et

31 ______Polyploid genetic tools al. ). ntargeted approaches such as genotypingyseuencing have also een applied or eample in autopolyploids such as alala (hang et al. u et al. ) lueerry (callum et al. ) luestem prairie grass (Andropogon gerardii) (cAllister and iller ) cocksoot (Dactylis glomerata) (ushman et al. ) potato (itdeilligen et al. Sverrisdttir et al. ) sugarcane (alsalore et al. ang et al. ) and seet potato (Shirasaa et al. ) and in allopolyploids such as coee (oncada et al. ) cotton (slam et al. eddy et al. ) intermediate heatgrass (Thinopyrum intermedium) (antarski et al. ) oat (hain et al. ) prairie cordgrass (Spartina pectinata) (raord et al. ), shepherd’s purse (Capsella bursa-pastoris) (ornille et al. ) heat (Poland et al. dae et al. ) and oysiagrass (Zoysia japonica) (camy et al. ) (noting that the precise classiication o some o these species as auto or allopolyploids has yet to e conclusively determined). hatever the technology used it is clear that e are currently itnessing an eplosion o interest in polyploid genomics. oever the critical issue o ho to make sense o this data remains starting ith the assignment o marker dosage a.k.a. “genotype calling”.

Assignment of dosage ne o the key distinguishing eatures o polysomic polyploidy is the act that there are multiple heteroygous conditions possile in genotyping data. e use the term marker “dosage” to denote the minor allele count of a marker; a species of ploidy q possesses q distinct dosage classes in the range to q (igure ). course the concept o marker dosage could also e used in diploid species ut coding systems such as the lm ll nn np hk hk system (an oien ) predominate. arker dosage is generally understood to apply to iallelic markers (such as single SNPs) although it is conceivale to score marker dosage at multiallelic loci. marker dosage cannot e accurately assessed genotypes ould likely have to e dominantlyscored (i.e. all heteroygous classes ould e grouped ith one o the homoygous classes) resulting in a loss o inormation (Piepho and och ).

Figure 1. Dosage possibilities in an autotetraploid. n a tetraploid ive distinct dosages are possile at a iallelic marker positions ranging rom copies o the alternative allele through to copies. ere the alternative allele is coloured red ith the reerence allele coloured lue.

32 ______Chapter 2

ll aailale dosagecalling tools rely on a population in order to determine marker dosage. n other ords, caliration eteen the arious dosage classes is performed across the population for hich e are not implying any degree of relatedness in the population other than coming from the same species. ll current tools are designed to process genotyping data from arrays, in other ords using the relatie strength of to allelespecific fluorescent signals to assign a discrete dosage alue. ith increasing interest in genotypingyseuencing data, e anticipate that tools hich use readcounts of potentially multiple s or multi haplotypes ill soon e deeloped, although these hae yet to appear. ne of the current challenges under inestigation regarding ased genotype calling is the accurate determination of dosage im et al., , hich may reuire relatiely deep seuencing e.g. coerage estimated in autotetraploid potato itdeilligen et al., .

eturning to the arrayased tools, the to main serice proiders for highdensity arrays, llumina and ffymetri, oth offer proprietary softare solutions for analysing polyploid datasets. Affymetrix’s Power Tools and Illumina’s GenomeStudio ith its olyploid enotyping odule ere deeloped ith oth diploid and polyploid datasets in mind. oeer, there hae also een a numer of genotyping tools that hae een put into the pulic domain. ne of the first of these to e released as fitetra oorrips et al., , a freelyaailale package ore eam, designed to assign genotypes to autotetraploids that were genotyped on either Illumina’s Infinium or Affymetrix’s Axiom arrays. fitTetra fits mixture models to biallelic intensity ratios either under the constraint of ardyeinerg euilirium ithin the population, or as an unconstrained fit, using an epectationmaimisation algorithm in fitting. his can hae the draack of reuiring significant computational resources for highdensity marker datasets, although it is automated and can therefore process large datasets in a single run. he original release as specific to tetraploid data only. oeer, an updated ersion fitoly can process genotyping data of all ploidy leels and is soon to e made aailale as a separate package . . oorrips, personal communication. he uper application erang et al., can also process data from all ploidy leels as it as initially deeloped to dosagescore sugarcane data, notorious for its cytogenetic compleity and is currently hosted online y the tatistical enetics aoratory in the niersity of o aulo, rail. ne of the interesting features of uper is that prior knoledge of the eact ploidy leel is not needed useful for a crop like sugarcane. nstead, the genotype configuration hich maimises the posterior proaility across all specified ploidy leels is chosen. n practice, most researchers ill already kno the ploidy of their samples although aneuploid progeny in some species may occur and can constrain the model search. maor draack of the current implementation is that markers are analysed oneyone, and results need to

33 ______Polyploid genetic tools be opied from the webpage eah time there is urrently no downloadable ersion of the software aailable. or highdensity datasets with tens of thousands of marers this is learly impratial.

The paage polysegatio aer et al. generates marer dosages for dominantlysored marers using the AGS software Plummer for aro hain onte arlo generation. ully polysomi behaiour is assumed and segregation ratios of marer data are used to derie the most liely parental sores. Although able to proess data from all een ploidy leels the software only onsiders a subset of marer types marer that are nulliplex in one parent or simplex in both parents. owadays there is a moe away from dominantlysored marers to o dominant marer tehnologies lie SPs and parental samples are usually inluded in multiple repliates and so an be genotyped diretly with offspring rather than imputed from the offspring. The paage is therefore of uestionable use for modern genotyping datasets. An unrelated paage beadarrayS Gidsehaug et al. was developed to handle Illumina Infinium SNP array data from “diploidising” tetraploid speies suh as the Atlanti salmon. The software was designed to sore marers whih target multiple loi soalled multisite ariants or Ss as well as singlelous marers displaying disomi inheritane. In a omparison with fitTetra beadarrayS was unable to aurately genotype autotetraploid data from potato although onersely fitTetra performed poorly on salmon data oorrips et al. . This demonstrates that appropriate software is needed for speifi situations indeed in many ases speifi senarios hae motiated the deelopment of speialised software.

aing prior nowledge about the expeted meioti behaiour of the speies is always adantageous when it omes to analysing any polyploid data. This is espeially true for the latest dosagealling software to be released the lusterall paage for Shmit arley et al. . ere prior nowledge of the meioti behaiour of the speies is reuired sine the expeted segregation ratios of an autotetraploid population are used to assign dosage sores to the lusters identified through hierarhial lustering. In well behaed autotetraploids suh as potato Swaminathan and oward oure et al. this is arguably not a problem as long as sewed segregation does not our and indeed an lead to inreased auray in genotype alling Shmit arley et al. . oweer in less wellharaterised speies suh as lee alfalfa or many ornamental speies the preise meioti behaiour may not always follow the expeted tetrasomi model ausing potential problems with fitting. The authors are aware of this and suggest that alternaties lie fitTetra or SuperASSA be used in irumstanes where a tetrasomi model no longer holds. nfortunately suh prior nowledge is not always aailable before genotyping taes plae – meioti behaiour an een differ between

34 ______Chapter 2 individuals of a speies that as thought to display meioti homogeneity e.g. omplete tetrasomy oure et al

Haplotype assembly lthough ialleli SNP marers have many pratial advantages they arry less inheritane information than multialleli marers rop researhers and reeders often ish to develop a simple diagnosti marer test for a trait of interest nfortunately the hanes of having a single SNP in omplete linage diseuilirium ith a favourale or ausative allele of a gene of interest is very small arers hih have een found to uniuely “tag” a favourable allele in one population may not do so in another. For more than a deade the inreased poer of haplotypeased assoiations have een non and reported in human geneti studies hang et al de aer et al ith the term “haplotype” denoting a unique stretch of sequence. Translating haplotyping approahes from diploid to polyploid speies has een a nontrivial eerise reuiring novel algorithms to handle the overhelming range of possiilities that an arise espeially hen alloing for seuening errors and possile reominations ulti SNP haplotypes an e assemled from single dosagesored SNPs originating from SNP array data although haplotypes are more ommonly generated using overlapping seuene reads igure numer of different polyploid haplotyping tools for seuene reads have een developed in reent years inluding polyap Su et al Slotyper Neigenfind et al apompass guiar and Istrail apree erger et al ShaP as and ialo Ssisplus Shen et al and riPoly otaedi et al a hree of these tools apompass apree and ShaP ere reently ompared and evaluated over a range of different simulated read depths ploidy levels and insert sies for pairedend reads otaedi et al he authors found that eah of these softare programs had partiular advantages for eample apree as found to produe more aurate haplotypes for triploid and tetraploid data hilst apompass performed est at higher ploidies and higher otaedi et al oth Ssisplus and riPoly have yet to e independently tested or allopolyploid speies the userfriendly aplotag softare has een designed to identify oth single SNPs and multiSNP haplotypes from S data iner et al . An interesting feature is the use of a simple “heterozygosity filter” that excludes haplotypes ith higher than epeted heteroygosity aross a population suggesting paralogous loi urrently hoever data from outrossing or autopolyploid speies is not suitale for this softare

he input data of haplotyping softare an e grouped into to types Individual SNP genotyping data ith a non marer order as used y the first ave of polyploid haplotyping implementations suh as polyap and Slotyper ore reently

35 ______Polyploid genetic tools

Figure 2. Generation of multi-SNP haplotypes. a. n this example three possible haplotypes exist spanning polymorphic positions and . b. ingle genotyping cannot distinguish between the “A” allele originating from different haplotypes, combining them into a single allele as illustrated in the second call. c. n a haplotyping approach overlapping reads are used to reassemble and phase single genotypes. ere the non ploidy level of the species x is used to impute the dosage of the to haplotypes identified in this individual given a ratio beteen the assembled haplotype readdepths. haplotyping tools use sequence reads as their input although some preprocessing is required reads must first be aligned folloed by extraction of their s i.e. masing of nonpolymorphic sites to generate a fragment matrix ith individual reads as ros and positions as columns as described for apompass Aguiar and strail . n other ords all haplotyping tools apart perhaps from aplotag Tiner et al. require that users possess a certain level of bioinformatics sills. n terms of applications and usage there appears to be some hesitation among the polyploid research community to tae up these tools Figure perhaps because of the required bioinformatics qualifications. Although e expect polyploid haplotypes to become increasingly used in the future the development of userfriendly and computationally efficient tools is first needed before haplotypebased genotypes become truly mainstream.

36 ______Chapter 2

ne interesting deelopment is the application of haplotyping to whole genome assemblies as opposed to genotyping a population his has recently been attempted in the tuberous heaploid crop sweet potato Ipomoea batatas ang et al, a he authors first produced a consensus assembly to which reads were remapped for ariant calling, followed by a phasing algorithm which resoled the si haplotypes of the seuenced cultiar for about of the assembly ang et al, a ltimately, about half of the assembled genome could be haplotyperesoled uture seuencing or re seuencing efforts in polyploid species should produce more phased genomes, which will no doubt be useful for haplotyping applications for eample in alidating predicted haplotypes

Figure 3. Cumulative number of Web of Science citations (core collection) by January 1st 2018 for the most popular polyploid haplotyping tools. ear of first citation is highlighted by a star Physical maps Arguably, one of the most important “tools” in current genomics studies is access to a highuality reference genome assembly pecies for which a reference genome assembly exists have even been classified as “model organisms” eeb et al, , such is the importance and impact a genome can bring to research on that species ithout a reference seuence aailable, the scope of genomic research remains limited or eample, genomewide association studies rely on nowledge of the relatie position of marers usually on a physical map, and many seuencing applications rely on a reference assembly on which to map reads A reference genome also facilitates the deelopment of molecular marers e.g. primer deelopment, the comparison of results

37 ______Polyploid genetic tools beteen different genetic studies by providing a single reference map, as ell as alloing comparisons of specific seuences such as genes, enabling prediction of gene function across related species

olyploid genomes are by definition more complex than diploid genomes, having multiple copies of each homologous chromosome any polyploid species are also outbreeding, leading to increased heteroygosity hich is problematic in de novo assemblies and necessitates specialised approaches aitani et al, he most common solution until no has been to seuence a representative diploid species or example in highlyheteroygous autotetraploid potato, a completely homoygous doubled monoploid Solanum tuberosum group Phureja as seuenced otato enome euencing onsortium, hich still represents the primary reference seuence today httpsolanaceaeplantbiologymsuedu n the case of allopolyploids, multiple diploid progenitor species are often seuenced instead e.g. peanut ertioli et al, he emergence of the pangenome concept, originally proposed for microbial species ettelin et al, , has interesting implications for ho highlyheteroygous polyploid genomes ill be presented in future e have already mentioned the arrival of phased genomics ith the seet potato genome, hich aimed to generate six chromosomelength phased assemblies for each of its chromosomes ang et al, a n future, both pangenomes and phased genomes are liely to play a bigger role in polyploid reference genomics xample of polyploid species that have so far been “sequenced” are listed in Table 1. This is by no means an exhaustive list, nor does it describe all developments for the listed species or example, the seuence of allotetraploid Coffea arabica hich accounts for roughly of all coffee production has recently been assembled, ith a draft assembly Coffea arabica v available on the hytoome database phytoomenet hat able highlights is that at the time of riting, there ere already a ide range of polyploid crop species that have ell developed genomic resources, despite the fact that in many cases these are from closely related or progenitor diploid species n time, ust lie for coffee, e predict that direct seuencing of polyploid species themselves ill gradually replace the haploidised reference seuences in importance and application, leading to more insights of direct relevance to polyploids Linkage maps Although the first genetic linage map as developed over a hundred years ago turtevant, , their use in genetic and genomic studies has persisted into the “next generation” era. This can be attributed to a number of factors. A linkage map is a description of the recombination landscape ithin a species, usually from a single experimental cross of interest or breeders, noledge of genetic distance is arguably

38 ______Chapter 2 more important than physical distance, as it reflects the recombination frequencies in inheritance studies as ell as describing the extent of linkage drag around loci of interest. any softare for performing quantitative trait locus T analysis require linkage maps of the markers, not physical maps. This is because coinheritance of markers and phenotypes ithin a population are assumed to be coupled – a physical map gives less precise information about the coinheritance of markers than a linkage map does since physical distances do not directly translate to recombination frequencies particularly in the pericentromeric regions. Another reason hy linkage maps continue to be developed is that they are often the first genomic representation of a species, upon hich more advanced representations can be built. They provide useful longrange linkage information over the hole chromosome, hich is often missing from assemblies of short sequence reads. This fact has been repeatedly exploited in efforts at connecting and correctly orientating scaffolds during genome assembly proects artholom et al., 1ierst, 1.

As mentioned in the ntroduction, polyploids can be divided into disomic or polysomic species, ith the additional possibility of a mixture of both inheritance types in the case of segmental allopolyploids. any linkage maps in polyploids have been based exclusively on 11 segregating markers, also knon as simplex markers because the segregating allele is in simplex condition one copy in one of the parents only. These markers possess a number of advantages over other marker segregation types, but also some distinct disadvantages. n their favour, couplingphase simplex markers in polyploid species behave ust like they ould in diploid species, regardless of the mode of inheritance involved repulsionphase recombination frequency estimates are not invariant across ploidy levels or modes of inheritance, but exert less influence on map construction due to loer scores. The advantage of this is clear in unexplored polyploid species for hich the mode of inheritance is uncertain, simplex markers allo an “assumptionfree” linkage map to be created, following which the mode of inheritance can be further explored. The only exception to this is if double reduction occurs, i.e. hen a segment of a single chromosome gets transmitted ith its sister chromatid copy to an offspring, a consequence of multivalent pairing and a particular sequence of chromatid segregation and division during meiosis aldane, 1ather, 1. ouble reduction occurs randomly in polysomic species and only introduces a small bias into recombination frequency estimates ourke et al., 1. This means that, ignoring the possible influence of double reduction, diploid mapping softare can generally be used for simplex marker sets at any ploidy level and for any type of meiotic pairing behaviour igure , opening up a very ide range of diploidspecific softare options heema and icks, .

39 ______genetic tools Polyploid 40

Table 1. Some examples of currently-available reference sequences for polyploid species. AUTOPOLYPLOIDS Target species Sequenced species (ploidy) Genome browser References lfalfa, Medicago truncatula medicagogenomeorg plantsensemblorg oung et al, Medicago sativa ang et al, iwifruit, Actinidia chinensis bdghfuteducnkir uang et al, Actinidia chinensis bioinfobticornelleducgi binkiwihomecgi otato, Solanum tuberosum solanaceaeplantbiologmsuedu otato enome euencing Solanum tuberosum plantsensemblorg onsortium, weet potato, Ipomoea batatus publicgenomes ang et al, a Ipomoea batatus ngsmolgenmpgdeweetotato ipomoeagenomeorg ALLOPOLYPLOIDS anana, Musa acuminata bananagenomehubsouthgreenfr (D’Hont et al., 2012) Musa acuminata plantsensemblorg offee, Coffea canephora coffeegenomeorg enoeud et al, Coffea arabica otton, Gossypium arboretum cottongenorg i et al, a Gossypium hirsutum Gossypium raimondii ang et al, ilseed rape, Brassica napus genoscopecnsfrbrassicanapus halhoub et al, Brassica napus plantsensemblorg eanut, Arachis duranensis peanutbaseorg ertioli et al, Arachis hypogaea Arachis ipaensis uinoa, Chenopodium quinoa cbrckaustedusachenopodiumdb aris et al, Chenopodium quinoa trawberr, Fragaria vesca rosaceaeorg hulae et al, Fragaria ananassa heat, Triticum aestivum wheaturgiersaillesinrafr nternational heat enome Triticum aestivum plantsensemblorg euencing onsortium, ______Chapter 2

Hoee, le ae et ae oe ltaton. tl, n eletn onl le ae, a lae ooton o ae t eent eeaton atten ae not e. all ee te a oeae (le nean te eae ot o te nal et o ae ae). oe otantl, le ae e lte noaton aot lnae n elon ae, atlal at e lo leel (an eet et al., 201a). ean tat ooloee a an e oe, t te ae nlel

Figure 4. Simplex markers. ee ae a a nle o o te eeatn ae allele an net lal ao all lo leel an an eao, allon lo an otae to e e. Hee, te (le) allele oloe e. to e ellnteate eteen ooloe n a nle aent, an ole to nteate ao aent. n ote o, te oooal nen ll ot lel e nontent eteen aental a onl le ae ae e. on a onen o ll nteate a eale o an eaon, nln en ale to etet an oel oe ole onaton tan t le . eeoe, a tl ollo lnae an tool ol e ale to nle all ae eeaton te, not t 11 eeatn ae.

Polyploid linkage mapping software nae an an e oen nto tee te – lnae anal, ae lten an ae oen. ee ae tll elatel e otae tool tat an eo all tee o tee te o olo ee. ea te ot ellnon an el e otae tool etaloa o no (Haett an o, 200Haett et al., 200). ell a on lnae a o atotetalo ee, t otae alo

41 ______Polyploid genetic tools

i.e.

42 ______Chapter 2

Figure 5. Theoretical rate of decrease in heterozygosity in polyploid species from repeated rounds of inbreeding / selfing, using expressions derived by Haldane (1930).

43 ______Polyploid genetic tools

r r –

44 ______Chapter 2

e.g.

Genome-wide association studies –

Polyploid GWAS

45 ______Polyploid genetic tools

i.e. K K K K

Arabidopsis thaliana Arabidopsis were estimated, using a ‘hideSNP’ simulation for Arabidopsis versus a ‘rule of thumb’ calculation for potato, but mainly from the difference in the extent of LD between A. thaliana S. tuberosum

46 ______Chapter 2 polyploids, which would complement the design of future S studies in polyploid species QTL analysis The term “QTL analysis” usually refers to studies that aim to detect regions of the genome socalled uantitative trait loci eldermann, that have a significant statistical association with a trait in specificallyconstructed experimental populations These populations are most often created by crossing two contrasting parental lines (“bi parental” populations), although there is increasing interest in using more complex population designs in order to increase the range of alleles and genetic bacgrounds being studied e.g. “MAGIC” populations uang et al, s already discussed, there is great difficulty in developing inbred lines by repeatedly selfing polyploids due to the sampling of alleles during polyploid gamete formation in a diploid this sampling generates combinations for a tetraploid this rises to and in a hexaploid 2 4 (combinations,) = 2 resulting in protracted heteroygosity( ) = igure 6 , not to 1 2 6 mention( ) = 20 the problem of inbreeding depression associated with many outcrossing polyploid3 species herefore, most L analyses in polyploid species have been performed using the directlysegregating progeny of a cross between heteroygous parents (a “full sib” population). This leads to poor resolution of QTL positions when compared to the more popular diploid inbred populations lie Ls etc., as well as the fact that populations must be vegetatively propagated if replication over years or different growing environments is desired or many polyploid species, vegetative propagation is indeed possible erben et al, and populations have the added advantage of being relatively uic and simple to develop, while, because of a generally high level of heteroygosity, many loci will be segregating in the herefore despite their drawbacs, populations remain the biparental population of choice for mapping studies

he methods for L analysis in diploid species have become increasingly convoluted van euwi et al, in polyploid species such theoretical complexities have yet to be attempted, given the more immediate difficulties in accurately genotyping as well as modelling polyploid inheritance ust lie for linage mapping and S, the range of software tools available for L analysis in polyploids remains rather limited, although there are a number of recent developments that are helping transform the field

ne of the only dedicated software for tetraploid L analysis is the alreadymentioned etraploidap software acett et al, his software enables interval mapping to be performed in autotetraploid populations as well as a simple singlemarer

47 ______Polyploid genetic tools

AA test), using a restricted range of marers (, and marers only, where denotes a marer dosage of in one parent and in the other, etc.). Although still aailable, it has been superseded by the TetraploidMap software (acett et al., ). TetraploidMap (TM) uses dosage data to either construct a linage map (as already described) or perform QTL interal mapping. In contrast to its predecessor, TM can analyse all marer segregation types, and allows the user to eplore different QTL models at detected peas. At its core is an algorithm to determine identitybydescent (I) probabilities for the offspring of the population, which are then used in a weighted regression performed across the genome.

An independent software tool that has been deeloped to determine I probabilities in tetraploids is Tetrarigin (heng et al., ), implemented in the Mathematica programming language. Tetrarigin relaes the assumption of random bialent pairing during meiosis (which TM employs) to allow for both preferential chromosomal pairing as well as multialent formation and the possibility of double reduction. Although not programmed in a userfriendly format lie TM, it is relatiely straightforward to use, taing an integrated linage map and marer dosage matri as input. It does not perform QTL analysis directly, but the resulting I probabilities can then be used to model genotype effects in a QTL scan either using a weighted regression approach lie TM, or in a linear mied model setting. I probabilities allow interal mapping since they can be interpolated at any desired interals on the linage map.

or ploidy leels other than tetraploid, there are currently no dedicated software tools aailable for QTL analysis or I probability estimation. inglemarer approaches such as AA on the marer dosages (assuming additiity – arious dominant models could also be eplored see e.g. (osyara et al., )) are of course possible and reuire access to basic statistical software pacages such as (or een cel). oweer, such approaches are not ideal – they are only effectie if marer alleles are closely lined in coupling with QTL alleles, and offer no ability to predict the QTL segregation type or mode of gene action as is done for eample in TM (acett et al., ). As interest increases in the genetic dissection of important traits in polyploid species, we anticipate that it is only a matter of time before more fleible crossploidy solutions are deeloped. Methodologies developed for tetraploid species often claim that “extension to higher ploidy levels is straightforward”. These sorts of disingenuous claims attempt to mark new research territory as already soled. If etensions to higher ploidy leels were indeed straightforward we would already be reporting on a wider range of tools aailable for them – as far as we can tell, so far there are none.

48 ______Chapter 2

eturning to the topic of population types we also anticipate that more powerful T analyses can e performed y comining information over multiple populations. pproaches such as pedigreeinformed analyses implemented for diploids in the lexT software ink et al. could overcome some of the limitations imposed y the restrictions on population types in software for polyploids. owever it may take some time efore such tools ecome translated to the polyploid level.

Genomic prediction and genomic selection There has een much attention given to the advantages of using all marker data to help predict phenotypic performance rather than focusing on single markers or haplotypes that are linked to T as was previously advocated. The motivation ehind this is clear – many of the most important traits in domesticated animal and plant species are highly uantitative with far too many smalleffect loci present to e ale to tag them all with single markers ernardo . ne of the most important traits in any reeding program is also a famously uantitative trait yield. t has een suggested that despite many years of phenotypic selection crop yield in tetraploid potato has essentially remained unchanged ansky later et al. . This is a remarkale indictment of traditional selection methods yet offers muchneeded impetus for the development and deployment of new paradigms in reeding for uantitative traits.

enomic prediction first arose in animal reeding circles Meuwissen et al. where the concept of estimating reeding values from known pedigrees was already wellestalished. owever the estimation of reeding values in polyploid species reuires special consideration due to the complexity of polysomic inheritance and the possiility of doule reduction. n practice reeding values are usually estimated using restricted maximum likelihood M to solve mixed model euations reuiring the generation of an inverse additive relationship matrix A-1 also called the numerator relationship matrix. The form of A-1 depends on among other things whether the inheritance is polysomic or disomic and whether doule reduction occurs err et al. amilton and err . ecently the package polyinv was released which computes the appropriate A-1 as well as the kinship matrix K and the inreeding coefficients F amilton and err . owever in one study of nine common traits in autotetraploid potato the inclusion of doule reduction or even the adoption of an autotetraploidappropriate relationship matrix was found to have a minimal impact on the results later et al. . tudies which ignore the specific complexities of autopolyploids may still enefit from genomic prediction and selection as for example was demonstrated in tetraploid potato verrisdttir et al. . ommonlyused

49 ______Polyploid genetic tools

Mode of inheritance The term “mode of inheritance” refers to the randomness of meiotic pairing processes Some “traditional” approaches to predict the mode of inheritance are summarised in

50 ______Chapter 2 and rruda or in ornamentals such as Alstroemeria uitendi et al hich maes the diagnosis of the mode of inheritance een more difficult s more polploid species egin to e genotped the issue of unnon mode of inheritance ill liel eert more influence further necessitating the deelopment of softare tools that can proide an accurate assessment of the inheritance mode using marer data and that can accommodate the full spectrum of polploid meiotic ehaiours

Simulation software s ith an softare tool deeloping standards and scenarios upon hich the performance of the tool can e udged is ital to ensure reliale results n this final section e consider the range of simulation tools currentl aailale for polploids roal the most idelused polploid simulation softare currentl aailale is edigreeSim oorrips and aliepaard riginall deeloped to generate diploid and tetraploid populations the current release edigreeSim can simulate populations of an een ploid leel hat maes edigreeSim particularl attractie is its ailit to simulate a diersit of meiotic pairing conditions including uadrialents hich can result in doule reduction or preferential chromosome pairing t taes four input files hich are relatiel simple to generate that proide a description of the desired simulation parameters and the input marer data The softare then creates dosagescored genotpe data for an pedigreed population e.g. an population of specified sie oorrips and aliepaard Some authors hae used edigreeSim to simulate multiple generations of random mating alloing an inestigation of population structure and linage diseuilirium in polploid species e.g. osara et al os et al hich can e implemented uite easil ith some asic programming noledge edigreeSim is ritten in aa and can run on all maor operating sstems indosased softare ollin hich originall performed topoint linage analsis and simulation of tetraploid populations e et al is no longer aailale The pacage polSegratio aer simulates dominantlscored marer data in autopolploids of an een ploid leel enerating the dosage data is straightforard onl the epected proportion of marer tpes simple duple triple as ell as the ploid is reuired oeer the marers are essentiall completel random ith no connection to an linage map hich is argual of limited use for an application that reuires some degree of linage eteen marers The simulation capacities of polsegatio therefore appear to e most useful for testing functions ithin the pacage itself namel those designed to impute parental dosages gien the osered segregation ratios in offspring scores

51 ______Polyploid genetic tools

final polploid simulation tool that has recentl een deeloped is the aploSim pipeline hich includes the aploenerator function otaedi et al aploenerator is designed to generate seuenceased haplotpes in a polploid of an een ploid taing the fasta file it is proided ith as a reference from hich haplotpes are uilt The softare generates random S mutations at a specified distriution efore simulating netgeneration seuencing S reads in formats corresponding to a numer of current seuencing technologies such as llumina or acific iosstems acio The pipeline as originall deeloped to compare the performance of a numer of haplotpe asseml algorithms otaedi et al ut could also e useful for testing the performance of an other tool hich uses S reads as genotpes

Future perspectives n this reie e hae attempted to descrie the most important softare tools that are currentl aailale to the polploid genetics communit There are liel to e tools that ere missed and tools that hae suseuentl een released – this is the danger of such a reie oeer e hae tried here possile to also discuss the gaps that are apparent in the current set of aailale tools hich ill hopefull help guide their deelopment in future olploid genotping argual remains the most critical step as ithout accurate genotpe data there is little point in uilding models for polploid inheritance oeer e are no itnessing the slo emergence of tools that tae polploid genotpes and use them to mae inferences on the transmission of alleles and the effects of such alleles in polploid populations s genotping technologies continue to eole so too should the suite of tools deeloped to analse those genotpes Tools for analsing S dosage data from S arras are ellestalished ith etensions of current tools to higher ploid leels planned i.e. fitol The coming decade ill liel see a moe aa from S arraased genotping to the use of seuenceread ased genotpes although this ill reuire that all tools heretofore deeloped e updated to accommodate the ne tpe of data nformation on the mode of inheritance from marer data is also needed for each population studied hich deseres more attention than it currentl receies moe from diploidased reference genomes to full polploid and haplotperesoled reference genomes ould also help roaden the oundaries of polploid genetics aa from the diplocentric ie of genomics hich currentl dominates lthough there hae een man eciting discoeries and deelopments in polploid genetics in the past decade or more e feel its golden age has et to arrie an age hich ill e heralded all the sooner the proision of roust and userfriendl tools for the genetic dissection of these fascinating group of organisms

52 Chapter 3

The double reduction landscape in tetraploid potato as revealed by a high- density linkage map

Peter M. Bourke1, Roeland E. Voorrips1, Richard G. F. Visser1, Chris Maliepaard1

1 Plant Breeding, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands.

Published as Bourke, P. M., Voorrips, R. E., Visser, R. G. F. and Maliepaard, C. (2015). “The Double Reduction Landscape in Tetraploid Potato as Revealed by a High-Density Linkage Map”, Genetics 201 (3), 853-863 and reproduced here with permission of the Genetics Society of America

53 ______Double reduction landscape

Abstract T , T , Solanum tuberosum , T ,

Key words , , , ,

54 ______Chapter 3

Introduction , T T , T T , ,

Solanum tuberosum , , , , , , , , , , , , , , , , ,

T , , , , , T , ,

55 ______Double reduction landscape

, , ,

, , , , , , Materials and Methods

Plant material tetraploid potato varieties, cultivars ‘Altus’ (hereafter referred to as parent one, P1) and ‘Colomba’ (P2).

DNA extraction and genotyping ’s instructions (Thermo Scientific). The concentration of DNA was T , – T T , , , , T

56 ______Chapter 3 manufacturer’s protocol at ServiceXS, Leiden, the Netherlands. Each parent was genotyped in duplicate using two iological replicates. other tetraploid accessions were sampled in a similar fashion, as well as diploid accessions for use in another study as well as helping marker dosage fitting.

Assignment of dosages he X and allele signal intensities were imported from the llumina data output into the programming environment ore eam, . SNPs were initially filtered so that the average of their total signal intensity the sum of the X and allele signal intensities over all samples was greater than .. he marker intensities were converted into allele dosages using the fitetra package for oorrips et al., . hanges to the default settings of the saveMarkerModels function of fitetra were as follows p.threshold was decreased from . to ., peak.threshold was increased from . to . and sd.target was set to ., where p.threshold is the “minimum Pvalue reuired to assign a genotype to a sample”, peak.threshold is the “maximum allowed fraction of the scored samples that are in one peak” and sd.target is used to specify the maximum nonpenalised standard deviation of the fit on a transformed scale oorrips et al., . ll diploid and tetraploid samples were included in the fitting ecause this generally results in a etter fit of the dosage classes. ollowing fitting with fitetra, the marker dosage scores were screened to ensure consistency etween parental and offspring genotypes. arkers with up to invalid scores scores that were not expected ased on the parental genotypes and ivalent chromosome pairing were allowed. high freuency of many invalid scores suggests that either the marker performed poorly, there was some consistent error in dosage assignment, or one or oth of the parents had een incorrectly genotyped. ighlyskewed markers p . were also removed at this stage.

Table 1. Breakdown of SNP marker numbers after quality filtering

Steps in SNP filtering #SNPs % SolS nfinium array total SNPs . osages assigned y fitetra a . oth parents ok . pattern acceptale .

monomorphic .

polymorphic . a arkers not scored were either monomorphic or not clearly resolved riteria for lack of fit presence of null alleles, invalid scores, highlyskewed segregation P .

57 ______Double reduction landscape

Marker conversion arkers that segregated in a fashion were relaelled as simplex x nulliplex or nulliplex x simplex for mapping and doule reduction analysis. onsidering markers whose segregating allele is inherited from P, these consisted of triplex x nulliplex, triplex x uadruplex and simplex x uadruplex markers. or example, a triplex x nulliplex marker is expected to produce 50% dosage ‘1’ and 50% dosage ‘2’ among the offspring, with oservable double reduction scores appearing as dosage ‘0’ (a double copy of the ‘0’ allele from P1). Relabelling ‘2’ as ‘0’ and ‘0’ as ‘2’ (with the parents re laelled as simplex and nulliplex achieves the desired result of marker conversion.

Table 2. Tetraploid marker segregation types by number Parental dosage Segregation #SNP a Simplex x nulliplex SxN Nulliplex x simplex uplex x nulliplex xN Nulliplex x duplex Simplex x simplex SxS Simplex x triplex Sx uplex x simplex xS Simplex x duplex Sx uplex x duplex x 7214 a Numer of SNP markers after simplifying marker conversions have een performed. Linkage map construction Simplex x nulliplex marker data were recoded to oinap . crosspollinator format lm x ll. ‘Impossible’ genotypes (invalid scores) were made missing before importation into oinap. ne pair of identical individuals was identified in the dataset similarity of ., therefore we removed individual . arkers were assigned to linkage groups with a minimum L of a higher L was used if clusters roke into large suclusters at a higher L. arker clusters were assigned to physical chromosomes ased on the position of markers on the physical seuence Potato enome Seuencing onsortium, . apping was first performed using the groupings from the roupings ree using maximum likelihood L. omologues were then identified y large gaps in the estimated map distances (≥ 60 cM), which was also often accompanied y a transition in estimated marker phase. arker data for separate homologues was exported from oinap in .loc files and reimported for creation of the homologue maps. fter an initial mapping of the homologues, individual was found to contriute unrealistic numers of recominations in many linkage groups across oth parents and

58 ______Chapter 3 was therefore removed resulting in a final mapping population of 25 individuals. apping was performed using with three rounds of map optimiation using the default settings for spatial sampling thresholds. Haldane’s mapping function was used to convert recombination freuency estimates to map distances as has previously been used for linkage map construction in tetraploid potato (eyer et al. 1ackett et al. 201). In a number of cases we used linkage information from the duplex x nulliplex and simplex x simplex markers to connect subhomologue linkage groups that had poor internal linkage among simplex x nulliplex markers. ap data was exported from oinap as text files and imported into aphart 2. (oorrips 2002) for further plotting.

Comparison of genetic and physical maps he genetic positions of markers were compared with their physical positions as defined in (os et al. 2015). It was found that some markers did not map to the same chromosome as expected from the physical map a list of such markers is included in upplementary table 1. he physical position of the boundaries was initially adopted from previouslypublished values (harma et al. 201). hese were not found to coincide precisely with the points of inflection on the geneticphysical map following which the approximate centromere bounds were redefined by examination of the aligned geneticphysical plots (also referred to as arey aps (hakravarti 11)) and calculating an approximate physical position between marker pairs flanking the points of inflection on these plots. he order of the genetic map was reversed in cases where the genetic maps were found to be inversely ordered with respect to the physical map.

Conversion rate of physical to genetic distance he conversion rate between genetic and physical distance was determined by regressing the genetic positions on their physical positions per homologue arm. he slopes of the regression lines for each homologue arm were tested for euality in an analysis of covariance by introducing where necessary up to three dummy variables (to code for the presence or absence of a homologue) per chromosome arm per parent (ndrade and stvePre 201). n average genomewide estimation of the genetic to physical conversion rate was calculated after excluding a single outlying value from the northern arm of homologue 2 of chromosome 1 in parent 1. his genomewide recombination rate was used to convert the physical map to a pseudointegrated genetic map for use in the simulation studies.

Rates of double reduction fter recoding the 11 segregating marker data duplex marker scores in the offspring were taken as possible evidence for double reduction. uplex scores can also arise as a

59 ______Double reduction landscape result of genotping errors. herefore we used a relatiel strict criterion to decide if such scores were eidence of doule reduction a string of three consecutie duple scored marers on a homologue map was reuired in order to e considered strong enough eidence for doule reduction. his could theoreticall lead to some under estimation of the rates of doule reduction ut the simple marer densit was sufficient that in most cases a doule reduction region would contain at least three segregating simple nulliple marers.

routine was written in to identif strings of three or more duple scores. he rate of doule reduction was determined for each marer counting the numer of times it formed part of a doule reduction segment and diiding this the numer of non missing alues scored for that marer across the population. e then deried the aerage rate of doule reduction per homologue for windows north and south of the centromeric ounds calculating the mean rate of doule reduction oer all marers within that window. hese means were aggregated to gie a single aerage rate of doule reduction per homologue for each window distance from the centromeres across all chromosomes and oth parents. he aerage rate per chromosome was estimated multipling the homologue rates a factor of four.

Simulation of double reduction and prediction of quadrivalent formation An approximate “integrated” genetic linkage map was produced using the average cM to conersion rate and phsical positions of the simple marers. nl marers for which the assigned linage group and phsical chromosome corresponded were considered. arer phase was determined according to the homologue assignment of all marers. hased marer genotpes and a consensus genetic map position are the asic input for the simulation software edigreeim oorrips and aliepaard which simulates diploid or polploid populations with specified leels of multialents and or preferential pairing. ne thousand separate populations of indiiduals were generated using the same simple marer data and approimated map under a range of different fractions of uadrialents. he algorithm for estimating doule reduction was applied to the simulated datasets allowing us to deduce the relationship etween the rate of doule reduction and the freuenc of multialents underling meiosis.

Estimation of the rate of preferential pairing epulsionphase simple marer data can e used to inestigate whether preferential pairing occurs as the estimates for recomination freuenc in repulsion are epected to differ under disomic and tetrasomic inheritance u and Hancoc . e hae adapted the approach of u and Hancoc to correct for multipletesting using

60 ______Chapter 3 the alsediscover rate enamini and ocherg conining our analsis to within chromosomes to reduce the overall numer o tests coupling or repulsion linkage have no meaning when marker pairs rom separate linkage groups are considered

or two markers A and B we deine as the numer o individuals with dosage at oth markers as the numer o individuals with dosage at marker A and dosage 𝑛𝑛00 at marker B and so on he explicit M estimator or the recomination reuenc r in 𝑛𝑛01 coupling phase under oth disomic and tetrasomic inheritance is invariant whereas in repulsion phase the M estimator under disomic 𝑛𝑛01+𝑛𝑛10 inheritance𝑛𝑛00+𝑛𝑛01+𝑛𝑛10 +𝑛𝑛is 11 and under tetrasomic inheritance 𝑛𝑛00+𝑛𝑛11 𝑟𝑟𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 =the𝑛𝑛00 mode+𝑛𝑛01+𝑛𝑛 o10 inheritance+𝑛𝑛11 is tetrasomic should never𝑟𝑟𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 all= 2(𝑛𝑛00+𝑛𝑛11)−(𝑛𝑛01+𝑛𝑛10) elow𝑛𝑛00+𝑛𝑛 the01+𝑛𝑛 value10+𝑛𝑛 11o whereas in the case o disomic inheritance𝑟𝑟𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 his orms the asis o an exact inomial test with and 𝑟𝑟𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ∈ [0,0.5) orrection or multiple testing was perormed using the procedure with α 𝐻𝐻0: 𝑟𝑟𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 1/3 𝐻𝐻1: 𝑟𝑟𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 < 1/3 as descried in enamini and ocherg

Simulation of mapping under different rates of quadrivalent formation and preferential pairing ne o the hpotheses we wanted to test was whether ivalent ormation predominates in tetraploid potato as is commonl assumed e also wanted to see the eect that deviations rom this assumption could have on recomination reuenc estimates that are ased on a ivalent model n this stud we limited our ocus to segregating markers e used edigreeim to simulate new mapping populations o individuals with the raction uadrivalents varing rom ero to one in increments o or each setting one thousand simulated populations were generated he simulated genome had a single chromosome o cM with simplex x nulliplex markers randoml distriuted at positions no closer than cM apart and the centromere at cM he true and estimated recomination reuencies etween the irst marker and the other markers on the chromosome were recorded as well as the and assigned phase (“coupling” or “repulsion”). Recombination frequencies between marker pairs were estimated using M or which explicit estimators can e derived in the case o simplex marker pairs c.f. previous section hase was determined choosing the lowest estimate or the recomination reuenc in the range which we term phasing the minimum recomination reuenc M his diers rom previous studies where the maximum o the loglikelihood M was used to assign the most likel phase uo et al ackett et al egative estimates or r can occur due to

61 ______Double reduction landscape

enelian sampling ariation uner weak repulsion linkage. or stronglnegatie alues (r .) a recombination frequenc of . of ero an pase “Unknown” were assigne an in te case 0.05 ≤ r te recombination frequenc was set to ero an te an pase were left uncange. e recombination frequenc estimates were regresse on teir true alues for bot coupling an repulsion pase in orer to ealuate ow close to te true alue te estimates fell for eac pairing scenario. e proportion of correctlassigne pases for coupling an repulsion pase markers was also recore. Results

Genotyping and dosage assignment f te s assae onl were foun to be acceptable an segregating in tis population (able ). cceptable markers were tose for wic osages coul be assigne b fitetra for wic parental osages were score consistentl between replicates an for wic parental osages an offspring segregation patterns were consistent. pproimatel of te markers coul be assigne osages b fitetra after wic a furter were reecte for aing inconsistent parental – offspring osages or for being too iglskewe (χ² test wit p .). segregating markers forme te largest group among te segregating markers in our population (able ) accounting for oer of useable markers.

Mapping of the 1:1 segregating markers lmost no simple nulliple markers roppe out uring te mapping stage. f te simple nulliple markers in were mappe an out of te markers were mappe (able ). e unmappe markers were lost ue to poor linkage (eiter no cromosome assignment or etremel weak linkage witin a linkage group) or large numbers of missing alues. arker coerage oer all cromosomes was well space wit on aerage oer markers per cromosome. nl cromosomes an a fewer tan mappe markers ( an markers respectiel) wit cromosomes an aing te igest marker coerage ( an markers mappe respectiel). number of omologues were split up oer more tan one linkage group ue to insufficient linkage information. n tese cases an markers were use to proie linkage information between omologue fragments. n eample of te four omologue maps of cromosome in parent is sown in igure .

62 ______Chapter 3

Figure 1. Homologue linkage maps of potato chromosome 1 for parent 2

n o e rker were on o e ren eween er nen o nke ro n oon n er ne roooe on e eene eer e o e ee wo o rker on n on were on o e oon wo oon e o ne ene oon rer e rker

Figure 2. Comparison of genetic to physical distance with homologue maps of potato chromosome 1. roe enroere on re own e ne orreonn o e neon on n e re ere oer n oooe were ne ror o rn reenn e oon neer

63 ______Double reduction landscape

Table 3. Composition of parental homologue maps

Parent 1 Parent 2 Chm h1a h2 h3 h4 Totalb h1 h2 h3 h4 Totalb 1 158 183 2 155 235 3 109 225 4 144 134 5 192 196 6 76 153 7 105 134 8 186 104 9 117 107 10 83 43 11 137 129 12 82 86 1544 1729 ooo ooo etc. ooo en n enorn n aldane’s mapping function, with ner o e rker n rke o ner o e rker

were on o e n nknown oon ro e e e o rker oon eer e o e e roe o ee rker w er e oon n eenr e one o e rker w owe nkero rene were ne n e n o oneron re or oe reon e were ne on e n ene e o er no ene oon

64 ______Chapter 3

Position of the centromeres gaphical compaison of the aligned genetic and phsical maps allowed an estimation of the centomeic ounds igue hen compaed to peiouslpulished centomee oundaies hama et al, the esults do not coespond pecisel fo chomosomes , , , , , and t is possile that the discepancies ae due to the fact that ou estimates ae ased on a tetaploid population athe than a diploid one elche et al, hama et al, since the method used to detemine the oundaies was essentiall the same upplementa tale poides ou estimates fo the centomee ounds, used in the calculation of elatie distance fom the centomee fo the doule eduction analsis

Conversion rate between genetic and physical distance he c conesion ate was detemined pe homologue am acoss all chomosomes in oth paents linea egession of genetic distance on the phsical distance igue pat fom one cleal outling alue due to insufficient mae coeage the ecomination ate was found to e elatiel constant acoss all chomosomes, with an aeage alue of standad eo of the mean

Figure 3. Average recombination rate across homologous chromosome arms ates calculated pe homologue am noth o south of the centomee linea egession of mae positions on the phsical esus genetic distance plots oints ae coloed chomosome, with upwadpointing tiangles denoting noth p ams and downwad denoting south ams data is shown filled tiangles, data empt tiangles

65 ______Double reduction landscape

Double reduction oule eduction eents wee identified on all twele chomosomes, suggesting that multialent paiing stuctues can fom among all potato chomosomes f the indiiduals in the mapping population, showed eidence of doule eduction coming fom meioses and showed doule eduction segments fom otsi indiiduals showed eidence of haing inheited a doule eduction segment fom oth paents ut not necessail fom the same chomosome, which coesponds well with the indiiduals epected unde independence of paental meioses he distiution of duple sting lengths shows that singleton duple scoes pedominate in this dataset upplementa igue ee we hae chosen to conside singleton duple scoes as unsuppoted eidence fo doule eduction which cannot e distinguished fom eos in dosage estimation

Figure 4. Average rate of double reduction versus distance from the centromeres haded aeas epesent confidence egions aound the simulated mean ate of aising fom a faction uadialents of and he standad deiation of the simulated mean ate of inceases towads the telomees, coinciding with geate fluctuations in the tue ate of in these egions

e also used an algoithm which allowed fo possile missing scoes within a sting of duple alues sing this appoach, we wee ale to eeal the elationship etween doule eduction and the aeage distance fom the centomee igue pooling the estimates fom all homologue maps, giing the aeage ate of doule eduction as a function of distance fom the centomee he ate of doule eduction close to the centomees appoaches eo while towads the telomees it inceases sustantiall ithin the centomees themseles thee wee twenttwo maes and fie

66 ______Chapter 3 maes with duple scoes in the offsping f these, wee single occuences which were probable errors (for example, the centromeric marker “PotVar0014900” which mapped to chomosome , homologue in gae fie sepaate duple scoes his mae was also found to hae missing alues, suggesting a lowe eliailit the isolated cases would euie a doule ecomination at oth sides of the maes, which is highl unliel to hae occued hee emained fie cases of longe stings of duple scoes which patiall enteed the oundaies of the centomeic egions upplementa tale , suggesting that in a e limited nume of cases, ecomination ma occu within what is consideed to e a nonecomining egion

Figure 5. True versus estimated r (using maximum likelihood) for coupling-phase simplex markers with fraction quadrivalents 0.2. he staight line y = x shows the line of pefect coespondence etween tue and estimated alues

edigeeim has peiousl een used to detemine the ate of doule eduction in simulated populations and to isualise the elationship etween genetic distance fom the centomee and doule eduction ooips and aliepaad, n this stud we simulated phased mae data and a mapping population sie of in ode to empiicall fit a paiing model to the oseed data he oseed ates of doule eduction and those pedicted simulation oelap well when the faction of uadialents was simulated in the ange – owads the telomees the aeage ate of doule eduction eceeded the epected ates within a confidence inteal, although the confidence inteals wee found to widen geatl in these egions his ma e due to the limited nume of maes at these distances fom the centomees, causing geate uncetaint in the estimates

Evidence for preferential pairing sing the epulsionphase mae data, we inestigated whethe thee was an eidence fo pefeential paiing in this population e found almost no eidence fo pefeential paiing coecting fo multiple testing using the coection n chomosomes and in thee wee fou mae pais out of and pais, espectiel which did show possile eidence of disomic paiing, ut this was not consideed stong enough eidence to suppot a hpothesis of pefeential paiing n , no maes

67 ______Double reduction landscape

isplae isomiclike behaior t was therefore concle that potato follows tetrasomic inheritance as is enerall assme

Effect of quadrivalents on mapping of simplex markers r analsis of oble rection sests that arialents ma accont for between 0 an 0 of all meiotic pairin confirations in this poplation ien that preios mappin sties in potato hae assme that the rate of arialent formation is neliible, we wante to examine what effect arialents hae on recombination freenc estimates (an hence on linkae mappin e compare pairwise estimators for r to their tre nerlin ale (ire for ifferent rates of arialents erall, the effect of arialents on coplinphase estimates for simplex marker pairs was relatiel minor, as shown b the raal ecrease in the slope of the reression between the tre an estimate ales (ire b orrect phasin in the coplin phase was also naffecte b arialents (ire a or a arialent rate between 0 an 0, the effect on coplinphase estimates can likel be inore or replsionphase marker pairs, a reater effect was fon althoh remarkabl, the assinment of marker phasin actall improes slihtl with hiher nmbers of arialents (ire a f the 4 incorrect replsion phase assinments in the prel bialent sitation, onl 14 ha an associate score reater than one his sests that as a precation aainst incorrect phase assinment within a linkae rop, an “unknown” phase be assine in cases where the falls below a certain threshol (e.g. of one

Effect of preferential pairing on mapping of simplex markers r st on the effect of preferential pairin on estimates of r reeale that preferential pairin has no effect on these estimates in coplin phase bt has a ramatic impact in replsion phase (ire b his fact has alrea been reporte ( an ancock, 001oninocoiran et al, 01 an forms the basis for a test of preferential pairin which we also exploit in this st t is eient that preferential pairin can hae a seere impact on the correct assinment of replsion phase (ire a, rearless of whether or is se for phaseassinment (ata not shown ince we fon no eience to sest that an sstematic preferential pairin occrre we can be fairl confient that the estimates for recombination freenc an phase were accratel performe, as confirme b the simlation st

68 ______Chapter 3

Discussion

Linkage maps een pubaon esbes he ehos use o poue a hhens nkae ap o a wesue eapo appn popuaon ake e a usn he nnu oap aa ehe e a houh we hae no aepe o nue a ake pes n he uen nkae aps we hae appe a ae nube o akes n a eapo popuaon whh o he bes o ou knowee s he hhese epoe ake ens o a eapo poao ap

Figure 6. Effect of quadrivalents on linkage analysis. a. opoon o noephase akes pas une een ees o uaaen oaon. b. e o uaaens on aua o r esaes o oupn an epusonphase spe ake pas

69 ______Double reduction landscape

hs has en us aeuae oeae o eoe a hoooous hoosoes an eeop an auae pue o he oube euon ansape n hs eapo spees e hae pesene sepaae hoooue aps ahe han a sne onsensus neae ap pe hoosoe as ahee b ake e a epaae hoooue aps e one he ab o ne he phasn o akes e o he ap onane hapopn whou eouse o hen ako oes ake e a ahouh uae neae aps an enope pobabes esae usn he neae ap w ea o eae powe n subseuen sues

Figure 7. Effect of preferential pairing on linkage analysis. a. opoon o noephase ake pas wh een ees o peeena pan b. e o peeena pan on aua o r esaes o oupn an epusonphase spe ake pas

70 ______Chapter 3

u nn ha he aesae oneson ae beween ene an phsa sane s essena onsan ouse he enoe eons enoewe eobnaon ae has shown ha he pospes o nean aps aoss hoooues an beween paens ae oo an shou no pose unue sess on he unen hoooue aps e aso oun e eene o eobnaon “hotspots” or “coldspots” outside the enoees as eene b he hh aues assoae wh ou enephsa sane eessons uppeena abe

Potato cytology noaon on he pan behao o popos has aona been eneae o ooa sues ne o he oe nuena pubaons on poao oo has been he eew o wanahan an owa who suase he nns o peous eseahes suh as aan a an ans o he ean nube o uaens pe e a akness an s eaphase n eapo S. tuberosum ann o o wanahan an owa hs ooa eene has been use o suppo he use o a spe pan oe n poao appn an anass sne hen ake e a uo e a ake e a ashaw e a ake e a n ou su we hae use ake aa o esae he ae o oube euon an o hs o eapoae he ke euen o uaens we on onse uaaens noe aon o uaaens ansaes o beween an uaaens pe e onssen wh he ona ooa findings of Lamm performed on the cultivar ‘Deodara’ and the line ‘36/209’ from the oss ea senkone n a

General polyploid model eps hae peous been ae o eeop a enea heo o nkae appn n eapos whh suaneous onses he possb o peeena pan an uaen oaon u e a on o he auhos he peeena pan ao s se o o he ase o ano pan he oe pes ha he aon o uaaens w eua an ha o baens hs s onssen wh he anoen pan oe ha assues pan naon ous a one se o eoees wh pobab o ha he pan a he ohe eoees w esu n a sepaaon no baens ohn an eneson u aa shows ha peeena pan oes no ou n poao e we hae no oun a aon o uaaens as hh as u nns on uaaen pan ae n ne wh a peous eew o auopopo eoss whh oun a ean uaen euen aens an uaaens o oe een sues ase an heske has aso been shown ha ow nubes o uaens oes no neessa sues ha peeena pan behao ous bena bena

71 ______Double reduction landscape

Identification of double reduction e decided to tae a more stringent approach than studies hich consider to or even a single locus as sufficient evidence for doule reduction Luo et al 2006acett et al 203 his is liel to have led to an underestimation of D on our part oever all uantification of D using marer data are liel to underestimate the true rate of doule reduction to some etent or instance D segments can e hidden no segregating allele carried on the segment or due to limited numers of marers one might onl recover part of a doule reduction segment igherdensit linage maps here all homologue parts are covered segregating marers ill lead to more accurate estimates of the rate of doule reduction unless a strong ias eists in ho marers are distriuted or here D occurs n this stud ith over 3000 ell distriuted simple nulliple marers e feel e have sufficient marer coverage for a detailed understanding of the doulereduction landscape

imple nulliple marers give the most unamiguous information aout the presence of doule reduction hen compared ith other marer segregation tpes ther marer classes could have een used as ell for eample simple simple marers hich are epected to sho triple scores in 0 of the cases of doule reduction involving one of the simple alleles oever no marer class other than simple nulliple allo D scores to e distinguished directl as a doulereduction product aimum lielihood approaches that estimate the rate of doule reduction such as that descried in Luo et al 2006 ma e useful for the identification of doule reduction in cases here it is not clear although e feel that flaning simple nulliple marer information hich supports the duple score should e used as e have done here

Double reduction increases towards the telomeres t has een idel reported that the rate of doule reduction is epected to increase toards the telomeres ather 936isher 9utruille and oiteu 2000tift et al 200emorin et al 202ielinsi and cheid 202 given that the proailit of a crossover occurring eteen the centromere and a locus should increase as that locus is situated further from the centromere evertheless it has onl rarel een eperimentall verified he clearest evidence e found in the literature came from an analsis of tetraploid potato using isome marers although the numers of marers used ere rather fe ith less than 0 loci considered anes and Douches 993 n our stud e have clearl shon using highdensit marer data of over 3000 marers that the rate of doule reduction steadil increases ith distance from the centromere e have furthermore een ale to visualise this phenomenon hich has not previousl een reported

72 ______Chapter 3

he fact that the freuenc of doule reduction increases toards the telomeres is perhaps cause for some concern as this could e considered a sstematic source of error in the marer data evertheless ith dense marer data it is no possile to accuratel estimate the rate of doule reduction in a mapping population n cases here the rate of doule reduction is lo and marer numer high it is uestionale hether highl complicated models ith man parameters to e estimated are actuall useful particularl if the do not distinguish eteen singleton doule reduction scores and genotping errors ur simulations have shon that even ith full uadrivalent pairing pairise estimators for recomination freuenc eteen couplingphase simple nulliple marers under a ivalent pairing model are close to eing eact and as these are the most informative pairing scenario the are the most important estimates for linage map construction e loo forard to comparing our estimates for doule reduction in tetraploid potato ith other polploid species and in gaining a deeper understanding of h these rates differ in hat are otherise classified collectivel as autopolploids

Double reduction in mapping ome authors claim that doule reduction should e included in map estimation and L analsis to increase the poer and accurac of the analsis Li et al 200 ur findings sho that uadrivalents have little effect on the mapping of simple marers in the highlinformative couplingphase n potato at least our data shos that the level of uadrivalent formation and preferential pairing is ver lo and is therefore not liel to e of serious orr for linage mapping oever confirmation of this finding for other marer tpes is still needed

t is also orth pointing out that uadrivalent formation not onl leads to doule reduction ut can also result in the formation of homologue cominations of more than to parental homologues ved 96 hich can result from pairingpartner sitches ones and incent 99 along the chromosome he fact that this is alread part of the simulation process in edigreeim oorrips and aliepaard 202 increases the accurac of our approach not onl in terms of modelling doule reduction ut also in our stud of the effect of uadrivalents on map estimation

Double reduction in breeding Doule reduction has man implications for polploid reeding ne conseuence that has een descried is its potential to lead to a higher inreeding coefficient in dihaploids derived from tetraploid lines anes and Douches 993 iven the efforts currentl undera toards hrid potato reeding Lindhout et al 20 D ma have unanted impacts on genetic diversit at the diploid level if future diploid founder

73 ______Double reduction landscape material is derived from tetraploid lines n the other hand hrid reeding is dependent on the production of highlhomogous inred lines etraploid potato reeding might elcome greater levels of homogosit in a crop that is often complicated high heterogosit itdeilligen et al 203 as ell as the potential purging effect that D can have eposing deleterious alleles to selection utruille and oiteu 2000 D could also speed up the accumulation of rare ut favorale alleles through marer assisted selection ere e have developed the tools for the identification of D in a segregating population hich could e applied reeders in the selection of founder parents for suseuent crossings or for confirmation studies of L positions

Conclusions n this stud e constructed 96 separate homologue linage maps of tetraploid potato using segregating simple marers e estimated the approimate rate of doule reduction 0 or more at the distal regions and predicted simulation that a fraction uadrivalents of 20 – 30 is reuired to account for this level of doule reduction e found no evidence of preferential pairing in our data consistent ith previous reports on the mode of inheritance in potato imulation studies using simple nulliple marers revealed that marer phasing and recomination freuenc estimation under a simplifing ivalentpairing model are relativel roust even hen some level of multivalent pairing occurs

Acknowledgements he authors ould lie to acnoledge eter os for assistance ith genotpe calling and and veris for providing potato varieties as ell as all partners involved in the TKI polyploids project “A genetic analysis pipeline for polyploid crops” (project numer 260300200 hich helped fund this research he authors ould also lie to than effre ndelman for helpful comments leading to a correction in the original manuscript he development of the ol arra as financiall supported a grant from the Dutch technolog foundation proect 926

Supplementary data is available online at: http//geneticsorg/content/20/3/3supplemental

74 ______Chapter 3

Supplementary Figure 1. istogra shoing distrition of lengths of strings of dple scores in the dataset

75

Chapter 4

Integrating haplotype-specific linkage maps in tetraploid species using SNP markers

Peter M. Bourke1, Roeland E. Voorrips1, Twan Kranenburg1, Johannes Jansen2, Richard G. F. Visser1, Chris Maliepaard1

1 Plant Breeding, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands.

2 Biometris, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands.

Published as Bourke, P.M., Voorrips, R.E., Kranenburg, T., Jansen, J., Visser, R.G., and Maliepaard, C. (2016). “Integrating haplotype-specific linkage maps in tetraploid species using SNP markers”, Theoretical and Applied Genetics 129 (11), 2211-2226

77 ______Integrating haplotype-specific maps

Abstract ,

Solanum tuberosum , ,

Key words , , , , ,

78 ______Chapter 4

Introduction , , , ,

Gossypium hirsutum Triticum durum , , Medicago sativa , , , Solanum tuberosum , , Rosa hybrida , ,

e.g , , , , , , , , , , , , , , , , , , , , , , , e.g. , , , , , ,

79 ______Integrating haplotype-specific maps

Here, ‘phase’ or ‘phasing’ means determining whether linked markers are physically on the same homologous chromosome within a parent n our approach, we first develop separate maps for every parental homologous chromosome (termed ‘homologue’ here) using all marker segregation types, integrating them afterwards into one chromosomal map for each set of eight homologues arker phasing is thus an essential aspect of our approach, which is of importance in the development of marker haplotypes consisting of more than a single marker

lthough methods to include doule reduction in a linkage analysis have already een developed (either using twopoint estimation (uo et al, ) or multipoint estimation (each et al, )), linkage analysis can e consideraly simplified in autopolyploid mapping populations if it is assumed that only random ivalent pairing occurs review of metaphase of autopolyploid meiosis found that ivalents accounted for approimately of the pairing structures oserved (amsey and chemske, ) with uadrivalents accounting for approimately (there were relatively few univalents or trivalents oserved more comple multivalents were not recorded at higher ploidy levels) omparale rates have also een reported for potato (waminathan and Howard, ourke et al, ) n the computations of this study, we have assumed a complete asence of preferential pairing and multivalent formation or eample, in the case of markers, a duple score in the offspring would effectively e treated as a missing value, as it is not an epected score according to our model However, we took care to eamine what effect oth preferential pairing and multivalents might have on the pairwise estimation of recomination freuency as well as the effects on the accuracy of marker phasing

lthough roadly similar, our mapping approach differs from that of Hackett et al () in the following respects

 he initial clustering of all marker segregation types into linkage groups is defined y their linkage to or ( segregating) markers, enaling automatic marker phasing during mapping  Homologue maps are first created (per parent) and then integrated into a single consensus map per chromosome using linear programming  e include the results of a comparison study etween two different methods for deciding the most likely phase  riteria are determined for inning markers together efore map ordering  ll marker segregation types are included in the mapping (in particular, we also include and marker types)

80 ______Chapter 4

n this study, we descrie a method to perform linkage mapping in an autotetraploid species under the assumption of random ivalent pairing, and apply it to a genotyped mapping population of tetraploid potato e eplore some of the potential complications involved in polyploid mapping and discuss the implication of these for future mapping efforts

Materials and methods

Plant material and genotyping n population of individuals from the cross etween two outred tetraploid cultivars ‘Altus’ (parent 1 or P1) and ‘Columba’ (P2) was genotyped using the SolSTW nfinium array which assays , s (os et al, ) arkers were assigned dosages using the fitetra package (oorrips et al, ) as previously descried (ourke et al, ) Highlyskewed markers (using a χ² test with p ) as well as markers with more than missing values were removed from the dataset n a previous study, a pair of duplicate offspring individuals were identified in this population as well as an individual which showed unrealistic numers of recominations (ourke et al, ) he suspect individual was removed as well as the duplicate with most missing values, leaving a mapping population sie of individuals

Marker dosage conversion small numer of markers for which one of the parental dosages was missing were eamined and the likely parental dosage imputed using the oserved offspring segregation (if possile), after which marker dosages were converted to their most fundamental form n a tetraploid species genotyped using iallelic markers, the possile marker dosages classes are (nulliple), (simple), (duple), (triple) or 4 (quadruplex) depending on the number of copies of the ‘reference’ allele carried y that individual arker conversion simplifies the analysis y reducing the numer of marker types that need to e considered or eample, simple nulliple, triple uadruple, triple nulliple and simple uadruple markers all segregate in a fashion and all carry a segregating allele inherited from parent () hey can therefore e recoded as markers using suitale score conversions in the offspring (upplementary ale ) ltimately, this results in nine fundamental marker segregation types as previously mentioned n determining linkage etween markers and other markers in , the set of markers were recoded y symmetry into to facilitate the calculations he physical distriution of the segregating markers was

81 ______Integrating haplotype-specific maps visualised in aphart 2 (oorrips 22) using the previouslypublished centromere boundaries (Sharma et al 21)

Linkage analysis The maximum lielihood framewor for determining pairwise estimators for recombination frequency (r) and their significance () scores under the assumption of random bivalent pairing has already been described in acett et al (21) We independently derived the lielihood functions for all possible marer pairs and phases (of which we counted 2 possible combinations between the nine fundamental marer types mentioned) using routines written in athematica 1 (Wolfram esearch nc 214) We describe the procedure through a wored example in ppendix 1 (Supplementary ile S1) The maximum lielihood functions (for each of these 2 cases) were coded in ( ore Team 21) for use in the linage analysis

Marker phasing To explain the concept of phasing by way of example there are three possible phases between a DxN and a DxS marker: ‘coupling’, ‘mixed’ and ‘repulsion’, with ‘mixed’ implying only one pair of uplex alleles is in coupling phase in P1 (there is no phase consideration in P2 since one of the marers is ulliplex in that parent) or pairs of marers with segregating alleles from both parents such phases are combined (e.g. ‘coupling mixed’ refers to coupling phase in P1 and mixed phase in P2). One criterion for selecting the correct phase (and hence the correct estimator for r) is to use the maximum of the loglielihood function between phases for which r ≤ 0.5 (acett et al 21) which we refer to as nother possibility is to choose the minimum estimate of r over all phases (which we term ) We performed a simulation study to determine which of these criteria was optimal across all possible marer pair combinations using the simulation software PedigreeSim (oorrips and aliepaard 212) ne hundred separate populations were generated for each of three population sies (1 1 2 and 4) ach simulated individual carried a single chromosome with 1 marer positions spaced 1 c apart ll possible marer types were assigned to each of these loci with a random assignment of the segregating alleles across homologues or each simulated population phasing accuracy was determined by recording the proportions of correctlyphased pairs using both the and phasing strategies n a few cases it was not possible to distinguish between phases (e.g. SxS with DxD ‘couplingrepulsion’ and ‘repulsioncoupling’ phases produce precisely the same r estimates and scores) n diploid species an analogous situation can arise in crosspollinated species where certain marer type combinations cannot be phased (e.g. x with x ) in which case other marer segregation types are needed to complete the phasing (aliepaard et al 1) We dealt with such instances

82 ______Chapter 4 by considering both phases to be eually correct since we do not use these particular phase assignments themseles, only their r and OD alues).

Linkage group identification and marker clustering Preliminary identification of linkage groups was performed by clustering the SxN and NxS) markers, based on the OD of the recombination freuency estimate between marker pairs. routine for marker clustering was written in using a grouping algorithm analogous to that employed by the oinap software Stam, 1an Ooien, 2an Ooien and ansen, 21). Of the 1 NxS markers able 1), three did not hae any strong linkage to other NxS markers and were therefore remoed at this stage we were later able to ‘rescue’ one of them when more markers were assigned to chromosomes).

Table 1. Tetraploid marker segregation types after filtering (adapted from Chapter 3) Parental dosage Segregation Amount a Simplex x nulliplex 1:1 1 Nulliplex x simplex 1:1 1 Duplex x nulliplex 1::1 Nulliplex x duplex 1::1 2 Simplex x simplex 1:2:1 2 Simplex x triplex 1:2:1 1 Duplex x simplex 1:::1 Simplex x duplex 1:::1 Duplex x duplex 1::1::1 2 Total 6912 a Number of SNP markers after marker conersions hae been made

t a clustering threshold of OD , the NxS marker data diided into 12 clusters. e isualised how clusters split across different OD alues for these 12 putatie chromosomes in , allowing the identification of tightlylinked subclusters putatie homologues or fragments thereof). n the maority of cases, these large clusters split into four subclusters at higher OD alues as expected for a tetrasomic species). oweer, one cluster of 1 markers) could not be further subdiided at higher OD alues. e therefore assumed that this cluster represented part of) one homologue and used the repulsion linkage information to assign it to one of the other clusters. nother cluster broke down into two clear subclusters at OD i.e. it contained two chromosomal groups), which further subdiided into subclusters at OD . lustering in P1 was more straightforward, with 12 clear chromosomal clusters emerging at OD .

83 ______Integrating haplotype-specific maps

luster numbers were relaced with chromosome numbering for consistenc with the reference hsical ma using marker ositions gien b os et al. ). n chromosomes and contained fie subclusters with the rest haing four. n chromosomes and were found to contain fie subclusters at this stage the rest haing four. n cases where more than four subclusters are identified a isualisation of crosscluster hase assignments allowed us to uickl identif which subclusters were albeit distantl) linked in couling hase resoling the and marker data into linkage grous the eected number of homologues. ollowing this the ast maorit of markers within the comlete dataset were unambiguousl assigned to homologue clusters using coulinghase linkage with markers a single linkage aboe a threshold of was used as eidence of linkage although in ractice there were often hundreds of such linkages identified). markers are etremel useful for this ste as the unambiguousl tag a single homologue. here multile assignments were ossible assignments with the greatest number of significant couling linkages ) were chosen as the most likel linkage grous. or those markers which could not be comletel assigned to the eected number of homologues in both arents due to oor linkage with markers linkage analsis was erformed between these markers and all other marker segregation tes to identif their most likel chromosome and homologue assignment. inall the marker data was slit into twele subsets one for each chromosome) and a comlete airwise linkage analsis was run between all markers within each chromosome.

Map construction

Homologue mapping er chromosome there are eight homologues that can be maed searatel in an autotetraloid. arkers which aear on these homologues are alread identified through their linkage with or ) markers. n addition to the coulinghase linkages considered we also included some reulsionhase linkages in our homologue mas. arkers with at least one ule allele are comletel smmetrical between the ‘reference’ and ‘alternative’ alleles in the Duplex parent. For example, DxN markers are initiall assigned to two homologues these being the homologues on which the reference allele can be found. oweer it is euall informatie to consider the air of ‘alternative’ alleles from the same parent, as these carry the same linkage information as the reference alleles. e therefore used markers in the maing of four homologues and for the maing of fie homologues and markers for

84 ______Chapter 4 all eight. ll of these marker types as ell as x and x are extremely useful as “bridging” markers for the integration of homologue maps.

here remained some linkages that e did not exploit, e.g. xN and xN in repulsion. he variance of these repulsionphase estimates is high and hence, D values are lo, and therefore the added computation time from including these estimates is not orth the marginal increase in linkage information that they yield a similar conclusion as arrived at in previous studies. e.g. ipol et al., .

Marker binning inkage information per homologue as first assemled into to pairise matrices one for r estimates and one for D scores, after hich the strength of linkage as tested to determine hether marker inning as possile – i.e. markers ith a small recomination freuency estimate r of lo variance thus high D ere inned together. he minimum nonero numer of recominations that can e oserved in a mapping population of sie N and hence N gametes is one, in hich case the smallest nonero r estimate should e N ignoring the influence of errors. iven an average missing value rate of µ, a population sie adusted for missing values is approximately and therefore . stimates of r that ere smaller than this value ere taken as eing elo the threshold of minimum resolution . 𝑁𝑁𝑎𝑎 = (1 − 𝜇𝜇)𝑁𝑁 𝑟𝑟𝑚𝑚𝑚𝑚𝑚𝑚 ≈ 1⁄ (2(1 − 𝜇𝜇)𝑁𝑁) 𝑚𝑚𝑚𝑚𝑚𝑚 Not all estimates of the recomination freuency are eually accurate. e therefore𝑟𝑟 determined criteria for inning markers together ith a high degree of confidence. o achieve this, e ran simulations using edigreeim oorrips and aliepaard, and recorded the D scores as ell as the range of true recomination freuencies for r estimates elo the threshold of minimum resolution over a ide range of population sies F to in steps of and rates of missing values to in steps of . e chose a maximum alloale deviation eteen the true and estimated r as . approximately c and determined the corresponding D score to ensure this over all possile marker pair cominations hence e took the most stringent D threshold to cover all cases. For each population sie and rate of missing values, e examined the distriution of D scores for those recomination freuency estimates for hich the deviation as less than . upplementary Figure .. From this e could determine a suitale D threshold for marker inning as a function of mapping population sie and rate of missing data.

iven a set of markers that have een inned together, e chose the xN marker ith the feest numer of missing values for mapping xN recomination freuency estimates ith other marker types are exact as opposed to eing numerically

85 ______Integrating haplotype-specific maps approimated in bins ith no markers the marker ith the feest missing alues as seleted.

Marker ordering he remaining marker data after binning ere onerted into pairisedata file format .pd for eah homologue and imported into oinap . an oien for ordering. hree rounds of mapping using the eighted least suares algorithm ere used (using the default settings with a “jump threshold” of 5), and with Haldane’s mapping funtion used for distane onersion. ap files ere subseuentl eported and all binned markers ere readded to the maps.

Map integration he homologue maps ere first reorientated if neessar before integrating. ap re orientation as ahieed b loating bridging markers beteen the maps and determining the orrelation beteen the positions of these markers. negatie orrelation suggests that maps are orientated in reerse order relatie to one another. ine not all homologues neessaril share bridging markers the pakage igraph sardi and epus as used to find an order of omparison through the eight homologue maps per hromosome to allo stepise orrelations to be alulated for eample might be one suh order. n eample of this is shon in igure for hromosome . n this eample three separate reorientations ere reuired to ensure onsisten in orientation. hen all eight maps ere similarl orientated the pakage merge ndelman and lomion as used to integrate them. merge eliminates the minimum number of onstraints in the marker order of the underling maps onflits in order beteen maps in order to generate hat is alled a “feasible system”, and then uses linear programming to find the solution with the minimum error beteen the underling maps and the integrated map ndelman and lomion .

Map quality checks e heked the ualit of our linkage maps using three different approahes

 onsisten beteen the integrated maps and the underling homologue maps  omparison ith the reported phsial position of the mapped markers  omparison to other published tetraploid potato maps

86 ______Chapter 4

Figure 1. Visualisation of map connections on chromosome 7. n this eample, there were suffiient bridging marers to proide onnetions between all homologues hree reersals were needed in order to ensure all homologues were onsistently orientated before map integration

Consistency between the integrated maps and the underlying haplotype-specific homologue maps e ompared the marer positions on the underlying homologue maps and the integrated maps in order to identify possible map distortion ap distortion is partly refleted in the absolute error (δ) between the underlying maps and the integrated maps Howeer, in ases where the telomeres of all homologues are not eually oered by marers, shifting the position may our to align the maps, ontributing to an apparent inrease in δ without implying any map distortion e ran a simple linear regression between the integrated map positions () and the underlying homologue positions () and reorded the slope and adjusted alues of the fit as well as isually inspeting eah hromosome to identify potential distortion

Comparison with the physical position of the mapped markers he physial positions of the marers were taen as desribed in os et al (5) lotting the geneti positions against the physial positions allowed us to identify whether our maps were orretly orientated (i.e. orresponding to the lowest bp alue) and we reorientated our maps if neessary n ases where a disrepany was found between the reported hromosome and that found through our linage analysis, we ed the marer seuenes (proided in the upplementary aterial of

87 ______Integrating haplotype-specific maps

os et al (5) and reprodued here) to the potato seudomoleules referene genome ersion (Hirsh et al, ) to he the marer positions (website httpsolanaeaeplantbiologymsuedublastshtml, aessed 5)

Comparison to other published tetraploid potato maps he ol nfinium array (elher et al, ) has been used to genotype at least two other published tetraploid potato mapping populations (Haett et al, assa et al, 5) e ompared the geneti map positions of the ommon marers as a further he on the alidity of our maps

Simulation study to check mapping assumptions wo ruial assumptions were made prior to mapping that there is no preferential pairing behaiour between any homologues and that all pairing is between bialents (as opposed to trialents or uadrialents), i.e. there is no double redution t had preiously been established that the rate of uadrialent pairing in this population was between (oure et al, 5) n ontrast to preious studies whih also rely on these assumptions, we wanted to test what effet deiations from these assumptions might hae on our ability to produe unbiased and aurate estimates for r between marer pairs as well as the effets on phasing auray

e simulated mapping populations using different degrees of uadrialents and preferential pairing in edigreeim (oorrips and aliepaard, ) he simulation parameters we hose were population sies of , and offspring, leels of preferential pairing from (fully random pairing or tetrasomi behaiour) to (fully preferential pairing or disomi behaiour, assoiated with allopolyploidy) in steps of , or fration uadrialents from to in steps of or eah population sie we generated separate populations ah simulated indiidual arried a single hromosome with marer positions spaed apart ll possible marer types were assigned to eah of these loi, with a random assignment of the segregating alleles to homologues at all marer positions n total, we simulated , populations to oer our hosen range of uadrialents and preferential pairing for these population sies and number of repetitions (( ))

fter the populations were generated (i.e. their dosage genotypes nown), we ran our linage analysis funtions aross all populations ien that the datasets were simulated, the true reombination freueny and orret phasing between all marer positions was nown, allowing us to test whether our estimation of reombination freueny and marer phasing was robust against these deiations from the random bialent model o generate aerage results from the repeat populations per setting, we ran a simple linear regression on the estimated r ersus true r alues, reording the

88 ______Chapter 4 slope, interept, and residual standard deiation of the regression e also reorded the proportion of2 situations not estimated (e.g. due to undefined numbers in the 𝑅𝑅𝑎𝑎𝑎𝑎𝑎𝑎 lielihood euation) and the proportion of pairing situations that were orretly phased Results

Genotypes he numbers of marers that were aailable for mapping after marer filtering and uality hes was , as outlined in able he breadown of marer segregation types after marer onersions were performed is proided in able

Figure 2. Distribution of 6836 of the 6912 segregating markers used in this study for which a physical assignment was available. istanes shown in bp haded regions indiate entromeres, as preiously defined (harma et al )

pproimately of the marers segregate in a fashion and these were mapped in a preious study (oure et al, 5) he physial distribution of all marer types for whih physial positions were aailable is gien in igure , highlighting the differene in marer distribution between telo and entromeri regions

Table 2. Breakdown of SNP marker numbers after quality filtering (adapted from Chapter 3)

Steps in SNP filtering Amount % ol nfinium array total s osages assigned by fitetra a 5 b pattern aeptable onomorphi 55 5 olymorphi Polymorphic and ≤ 10% NA values 6912 a arers not sored were monomorphi or not learly resoled arers with a single missing parental dosage sore whih was imputed hae been inluded b riteria for la of fit presene of null alleles, inalid sores, highlysewed segregation (P )

89 ______Integrating haplotype-specific maps

Linkage analysis and marker clustering

Maximum likelihood pairwise estimates for r e ounted separate marer type and phase ombinations for whih we deried maimum lielihood funtions, although this may ontain senarios that an be ounted together (preiously, situations hae been reported (Haett et al, )) or larity we proide a table of all the possible marer type and phase ombinations we onsidered (upplementary able ) n many ases, the maimum lielihood euation annot be solved analytically, in which case we used Brent’s algorithm (rent, ) to numerially estimate the reombination freueny with the highest lielihood, onstrained to the interal ,5 t is also possible that a negatie estimate for r ould be the most liely (this an our in lowinformation situations, inoling repulsion phases) e eamined the true alues of r underlying suh ases using simulated data and found a wide range of true r alues were possible heneer r was found, we artifiially set r = 0.499, LOD = 0 and phase “unknown”, thereby excluding these estimates from the map ordering step

Optimal phasing strategy ur simulations reealed the optimum phasing strategy to use for different marer ombinations (upplementary igure ) he maimum log lielihood strategy () as proposed by Haett et al () proed in general to be a ery good method of seleting the orret phase (and hene the orret estimate for r) n only one situation did we find to outperform , namely with marers he improement was, howeer, marginal at a population sie of indiiduals for eample, gae auray ersus auray using heneer phase was inorretly assigned, we found that the sore was also low (and the reombination freueny estimates tended to be high) – in other words, inorret phasing only ourred in poorly informatie situations whih would hae little or no impat on the subseuent ordering step (espeially sine our mapping strategy faours ouplingphase estimates whih tend to be more informatie than those of repulsionphase) n all situations inoling at least one marer there was no differene between the two methods or the ase of paired with marers, the auray of appeared at first glane to be higher Howeer, this partiular ombination of marers ontains an essentially unestimable phase with etremely high ariane (repulsionoupling phase see also esults setion “Simplex x Triplex markers”). Removing this phase from the accuracy calculation, MLL was found to perform signifiantly better among the phases that atually matter in this ombination inally, phasing auray was found to slightly inrease as a funtion of population sie, with auray on aerage for a mapping population of sie

90 ______Chapter 4

compared with 9 accuracy for a population of 00, and 4 for a population of 00). breakdown of the phasing accuracy rates is provided in Supplementary ile S.

Map construction and integration

Simplex x Triplex markers SxT markers have previously been reported as problematic where they are termed SS markers ackett et al., 0)). hen we examined the issue, we found that SxT in combination with SxS produce highly variable estimates for r in repulsioncoupling phase, where the estimates for r are essentially random. They are the only marker combination and phase) that exhibit such behaviour. e therefore artificially set LOD = 0 in this phase it would be small but nonero otherwise) which automatically excludes these estimates from exerting any influence on map ordering.

Marker binning Once all linkages between markers had been estimated we were in a position to identify cosegregating markers. The r and LOD estimates give a convenient measure of linkage which can be applied across all marker segregation types in a binning procedure. n example of the relationship between r and LOD for pairs of DxD markers is visualised in igure . s higher LOD values correspond to a lower standard error in r, we wanted to define thresholds for r and LOD which would identify markers which cosegregate with a high degree of confidence.

Figure 3. Estimated recombination frequencies (r) versus associated LOD for duplex x duplex marker pairs on potato chromosome 1. Only of the 9 possible phases were identified for marker pairs on this chromosome.

e previously introduced the concept of the threshold of minimum resolution for recombination freuency, iven our mapping population sie and missing error

𝑟𝑟𝑚𝑚𝑚𝑚𝑚𝑚. 91 ______Integrating haplotype-specific maps rate in the filtered dataset) we estimated to be approximately 0.00, the smallest nonero recombination freuency we should be able to observe. rom our simulation of 𝑟𝑟𝑚𝑚𝑚𝑚𝑚𝑚 different population sies and rates of missing data, we observed a clearly linear relationship between the mapping population sie and the LOD threshold needed to ensure a margin of error of less than 0.0 in the estimation of r Supplementary igure .c). By performing a linear regression between the LOD thresholds and the adusted mapping population sies ), we were able to derive an empirical relationship between a binning LOD threshold and the adusted population sie which ensures this margin of (𝑁𝑁𝑎𝑎 error in r is not exceeded

𝑎𝑎 iven our dataset, we binned 𝐿𝐿𝐿𝐿𝐿𝐿markers ≈ 23 together.43 + 0.1158𝑁𝑁if we found that the pairwise r estimate was less than 0.00 and the LOD for that estimate exceeded 0.4. n total, 0,49 markers were used in the map ordering step across 9 separate homologue maps note that some markers were present multiple times because they are present on multiple homologues), after which 099 binned markers were reassigned a position to give a total of ,4 map positions Supplementary Table S). s binning was performed using a nearestneighbour clustering, there is a danger that binned markers might have a nonnegligible distance between them. hen we examined this, we found that the maximum recombination freuency estimate between binned markers was 0.0, or approximately 3.3 cM using Haldane’s mapping function (Supplementary ile S). owever, the mean intermarker distance within bins was only 0. cM, and almost 99 of binned markers were less than cM from each other. n other words, our binning strategy rarely appears to have falsely binned markers together. These ,4 map positions represented 90 uniue marker loci, i.e. only two of the 9 markers available for mapping were not mapped. e suspect that the two unmapped markers which showed no linkage may have harboured abnormallyhigh numbers of errors, although we cannot verify this. full list of all marker positions per chromosome is provided in Supplementary ile S.

Map integration Lmerge ndelman and lomion, 04) was run for all linkage groups to determine the integrated maps with lowest absolute error δ between the homologue maps and the integrated map per linkage group referred to as RMS by the authors). lthough the Lmerge algorithm is deterministic . ndelman, personal communiation), we found the resulting maps differed when the input maps were flipped. This suggests the current version of Lmerge should be run twice to identify the best integrated map. The maximum interval sie see ndelman and lomion 04) for a description) was set at default 4) and the map with the minimum error was saved for the final selection of the

92 ______Chapter 4

“globally” optimal integrated maps. LPmerge currently reports the error associated with each possible solution but does not save this information automatically. e therefore altered part of the source code to create an output file of the errors and map lengths per maimum interval, allowing us to identify the best results. he altered source code of the LPmerge function is provided in upplementary File .

Map quality

Consistency between the integrated maps and the underlying haplotype-specific homologue maps e visualised the relationship between the homologue maps and the integrated map for each chromosome (Figure 4). Apart from a small number of ‘kinks’, there was a very high level of linearity observed between the component maps and the integrated maps as well as an acceptable correspondence in map lengths (upplementary able ), demonstrating that the integration step did not create undue distortion. his linearity was also reflected in the high values associated with the regression analysis – with a minimum of . and a mean2 of . (i.e. essentially colinear). he slopes and adusted 𝑅𝑅𝑎𝑎𝑎𝑎𝑎𝑎 correlation coefficients of the different maps are provided in upplementary able 4.

Comparison with the physical position of the mapped markers ne of the advantages of developing mapping theory and software using data from potato is the availability of physical maps which provide a reference marker position (Potato enome euencing onsortium, Felcher et al., os et al., ). e orienting the integrated genetic maps if necessary, we found the epected profiles for all chromosomes (Figure ) and could also clearly identify markers for which the chromosome assignment on the physical map appears to be incorrect. he physical location of other markers was previously unknown (recorded as b on chromosome ), for which we can now provide an approimate physical position based on these plots (upplementary File 4). hese plots also provided information on the location of the pericentromeric regions. e found differences between our identified pericentromeric boundaries and those previously reported for potato chromosomes , , and (harma et al., ). n all these cases we noted that the published regions were too large i.e. some stretches of the pericentromeric regions of these chromosomes show little or no suppression of recombination. ne issue that did arise on chromosome was the mapping of what we suspect is a centromeric marker to a noncentromeric position (Figure ). All markers binned with this marker were therefore also at a noncentromeric position. hen we reran the mapping without binning in this homologue, we saw essentially the same result – i.e. marker binning was not to blame. he integration of information from telocentric chromosomes may result in such minor ordering errors

93 ______Integrating haplotype-specific maps given that multipoint estimates (from which the map is ultimately derived) are onesided at the telomeres.

Comparison to other published tetraploid potato maps A subset of the ol Ps came directly from the olAP P array, with 4 of these having an acceptable F pattern after fitting, of which segregated. e mapped of these olAP markers, allowing a direct comparison of our genetic maps with previouslypublished tetraploid maps which use these markers (ackett et al., assa et al., ). nly a single marker from this set was found to have been assigned to a different linkage group between the three studies (solcapsnpc), which was mapped on chromosome by ackett et al. and which both we and assa et al. mapped on chromosome 4. e doublechecked the physical position by LAing the marker seuence against the potato genome (irsch et al., 4) and found it produced a single hit on chromosome 4.

A comparison of the maps is shown in upplementary Figure . n general, our map positions correspond well with those of previous studies, apart from chromosome where we found that a group of markers most likely to be centromeric (based on a comparison with the physical map) were mapped at c by ackett et al. () even though the pericentromeric region of chromosome is positioned at approimately c in their map. here also appear to be several cases where markers were binned together by ackett et al. () which we assigned to different genetic positions (upplementary Figure , chromosomes , 4, and ). owever, the binning strategy employed by ackett et al. () differs considerably from our approach, in that markers are not automatically binned before map ordering, but only end up in a “bin” if they fail to map after two rounds of JoinMap’s weighted regression ordering algorithm. espite these discrepancies, the mapping performed in previous studies is broadly consistent with our results.

94 ______Chapter 4

Figure 4. Comparison between marker positions on underlying homologue maps and integrated map positions for potato chromosomes 1-12. ifferent colours denote different homologues. δ denotes the absolute error between the eight homologue maps and the integrated map, as calculated by merge.

95 ______Integrating haplotype-specific maps

Effect of quadrivalents and preferential pairing on mapping

Quadrivalents ne notable effect of uadrialent pairing in meiosis is the phenomenon known as double reduction, where two copies of the same parental homologue segment are transmitted to an offspring. t has preiously been shown that uadrialents hae a relatiely minor impact on recombination freuency estimates of pairs of markers ourke et al., . ere we etend the analysis to all possible marker segregation types of a tetraploid cross. n general, we can confirm our preious finding that uadrialents hae a minor impact on r estimates for most marker pairs and phases, but lead to an underestimation of r when the proportion of uadrialents approaches one e.g. and in couplingrepulsion phase, upplementary igure . oweer, no obserations of such high proportions of uadrialents hae been reported yet as far as we know in an autopolyploid species waminathan and oward, amsey and chemske, ourke et al., . n upplementary ile .a, we proide full details of the results of this study.

Preferential pairing referential pairing constitutes a much greater deiation from the assumption of random bialent pairing, and this was reflected in the results of the simulation study. e again saw a downward bias in r with greater leels of disomic behaiour, which was accompanied by a drop in the correlation between the true and estimated alues. higher population sie can help to mitigate these effects, but when the rate of preferential pairing p eceeds . this makes little difference. orrect phase estimation was surprisingly robust against preferential pairing, although in a fully disomic situation p it was not possible to estimate r in certain cases specifically, the combination between a duple and simple allele in either parent when both alleles are present on the same bialent, leads to on aerage inestimable alues. eertheless, recombination freuency estimates showed high leels of stability and robustness, een when significant leels of preferential pairing occur. or identifying linkage groups, preferential pairing has almost no impact on the accuracy of coupling linkage estimates with markers which we use for marker clustering, with the possible eception of markers. n eample of the results for and markers in couplingcoupling phase is gien in upplementary igure . n upplementary ile .b, full details of the results of this study can be found.

96 ______Chapter 4

Figure 5. Comparison between physical and integrated genetic maps for 6872 mapped markers on potato chromosomes 1-12. ifferent colours denote different marker segregation types. entromeres as defined in harma et al. are shown with dashed lines. markers for which the chromosome assignment differed were remoed before plotting. utlying markers positioned at Mb had no physical position – for which we suggest approimate positions based on these plots c.f. upplementary ile .

97 ______Integrating haplotype-specific maps

Discussion

Homologue mapping r

e.g.

98 ______Chapter 4

The necessity of simplex x nulliplex markers –

Marker binning

99 ______Integrating haplotype-specific maps

Figure 6. Distribution of binned markers on eight homologue maps of potato chromosome 7.

100 ______Chapter 4

Map integration

Application to other tetraploid species various reports of “segmental allopolyploidy” e.g.

Conclusions

101 ______Integrating haplotype-specific maps of map omputation as ell as providing longrange aplotype information it marer pase eing automatially assigned prior to mapping itout te need for manual intervention e timelimiting step remains marer ordering ut e ave found tat our inning approa offers a sustantial speedup in omputational time itout adversely affeting map uality

Acknowledgements e autors ould lie to anoledge eter os and erman van for saring teir epertise aul rens for ritially reading te manusript and oan van oien yama for elpful suggestions e autors also is to anoledge te editor and to anonymous revieers ose omments elped improve te manusript unding for this research was provided through the TKI polyploids project “A genetic analysis pipeline for polyploid crops”, project number BO

Supplementary data is available online at: ttpslinspringeromartilesupplementaryaterial

102 ______Chapter 4

Supplementary Figure 1. a. lot of true recombination freuencies versus their O scores when the estimated r value fell below the threshold of minimum resolution rmin In this eample, the simulated population sie was and there were missing values in the data thus the adjusted population sie a) was 595). The red line shows the “acceptable” deviation of adopted in this study b. istribution of O values for recombination freuency estimates that fall below the threshold of minimum resolution rmin and have a deviation of less than from the true value The approimate O threshold for this population sie is shown by the arrow c. elationship between O threshold and a to guarantee a maimum deviation of between an estimate of ero recombination freuency and its true value for all possible tetraploid marer pair combinations derived using simulated data The fitted line was used to determine an appropriate binning threshold given the population sie and average rate of missing data used in this study

103 ______Integrating haplotype-specific maps

Supplementary Figure 2. eatplots showin reslts o the silation to deterine the overall phasin accrac in all pairs o arer tpes sin ai lo lielihood ) or ini r ) or varios appin poplation sies.

104 ______Chapter 4

Supplementary Figure 3. oparison between previoslpblished tetraploid potato linae aps acett et al. assa et al. 5) and the interated ap described here.

105 ______Integrating haplotype-specific maps

Supplementary Figure 4. isalisation o the eect o adrivalents on pairwise r estiates or and arers in coplinreplsion phase. q denotes rate o adrivalent pairin.

106 ______Chapter 4

Supplementary Figure 5. isalisation o the eect o preerential pairin on pairwise r estiates or and arers in coplincoplin phase. p denotes strenth o preerential pairin ro polsoic) to disoic).

107

Chapter 5

Partial preferential chromosome pairing is genotype dependent in tetraploid rose

Peter M. Bourke1, Paul Arens1, Roeland E. Voorrips1, G. Danny Esselink1, Carole F. S. Koning-Boucoiran1, Wendy P. C. van ‘t Westende1, Tiago Santos Leonardo1, Patrick Wissink1, Chaozhi Zheng2, Geert van Geest1,3, Richard G. F. Visser1, Frans. A. Krens1, Marinus J. M. Smulders1, Chris Maliepaard1

1 Plant Breeding, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands.

2 Biometris, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands.

3 Horticulture and Product Physiology, Dept. of Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands.

Published as Bourke, P.M., Arens, P., Voorrips, R.E., Esselink, G.D., Koning- Boucoiran, C.F.S., Van ‘T Westende, W.P.C., et al. (2017), “Partial preferential chromosome pairing is genotype dependent in tetraploid rose”, The Plant Journal 90 (2), 330-343

109 ______Partial preferential chromosome pairing

Abstract t as on een reconised tat oyoid secies do not aays neaty a into te cateories o auto or allopolyploid, leading to the term “segmental allopolyploid” to descrie everytin in eteen. Te eiotic eaviour o suc interediate secies is not uy understood, nor is tere consensus as to o to ode teir ineritance atterns. n tis study, e used a tetraoid cut rose Rosa hybrida ouation, enotyed usin te WaSP array, to construct an utrai density inae a o a oooous croosoes, usin etods reviousy deveoed or autotetraoids. sin te redicted ivaent coniurations in tis ouation, e uantiied dierences in airin eaviour aon and aon oooous croosoes, eadin us to correct our recoination reuency estiates to account or tis eaviour. Tis resuted in te reain o , SP arers across a oooues o te seven rose croosoes, taiored to te airin eaviour o eac croosoe in eac arent. We conired te inerred dierences in airin eaviour aon croosoes y eainin reusionase inae estiates, ic aso carry inoration aout reerentia airin and recoination. Currenty, te cosestseuenced reative to rose is Fragaria vesca. inin te interated utradense rose a it te straerry enoe seuence rovided a detaied icture o te synteny, conirin overa coinearity ut aso reveain ne enoic rearraneents. ur resuts suest tat airin ainities ay vary aon croosoe ars, ic roadens our current understandin o seenta aooyoidy.

Key words idensity interated a, seenta aooyoid, oyoid enetic inae a, Rosa hybrida, eiotic croosoa airin eaviour.

110 ______Chapter 5

Introduction olyploids are generally diided into to types – atopolyploids and allopolyploids here ontines to e deate aot the deinition o these ategories – namely, hether they shold e deined y their presmed or non mode o origin or their osered mode o inheritane amsey and hemse, , otherise desried as the taonomi or geneti deinitions oyle and gan, ording to the taonomi deinition, polyploids are distingished y their nmer o onder speies one in the ase o atopolyploids, to or more in the ase o allopolyploids hereas the geneti deinition distingishes polysomi inheritane reslting rom random pairing o hromosomes dring meiosis atopolyploid, rom disomi inheritane reslting rom nonrandom or preerential pairing allopolyploid oyle and herman royles,

heoretially at least, it has long een reognised that there may also ‐e intermediate orms o polyploidy, ariosly termed segmental allopolyploidy teins, , partial preerential pairing et al, , inomplete polysomy imares et al, , heterosomy o and annell, or miosomy oltis et al, n these intermediate ategories, the geneti deinition i.e. pairing ehaior dring meiosis taes preedene igre a his pairing ehaior is primarily important as it determines the etent o reomination eteen homologes or homoeologes, proiding a diagnosti o the type o polyploidy arisod et al, oyle and herman royles, rthermore, it may inlene or aility to prode linage maps and sseently perorm arate antitatie trait los analysis, o ‐ importane or oth ndamental and applied plant researh nderstanding the pairing ehaior o polyploid speies is also releant or the stdy o genome eoltion pon genome dpliation

arios ornamental rops, Rosa speies in partilar, hae a long history o hyridisation and polyploid ormation eteen and ithin speies to generate the diersity o ltiars e hae today mlders et al, his reeding and seletion is liely to hae ontrited to dierenes in homology eteen arios hromosomes, thoght to play a role in meioti pairing ehaior ingham and illies, ent et al, airing in other speies has een shon to e nder geneti ontrol or eample, the Ph1 los o heaploid heat amoto, iley and hapman, or the PrBn los in Brassica napus enesi et al, iolas et al, , t may also e inlened y enironmental ators omlies et al, espite some adanes in

111 ______Partial preferential chromosome pairing

Figure 1. The meiotic pairing behaviour of tetraploid rose is not fully understood, yet exhibits many features of a segmental allopolyploid; mapping populations and high- resolution maps can shed light on such questions. a. b. c.

112 ______Chapter 5

Genetic mapping in rose Rosa Rosa Rosa Synstylae Gallicanae Indicae Pimpinellifoliae Rosa hybrida Rosa chinensis R. gallica R. alba R. x odorata Rosa hybrida Rosa Rosa chinensis

Identifying and quantifying preferential pairing

113 ______Partial preferential chromosome pairing

i.e.

(𝜌𝜌)

Experimental Procedures

Plant material, DNA isolation and genotyping The tetraploid “K5” cut rose mapping population, consisting of 172 individuals of the cross between “P540” (mother) and “P867” (father) was used in this study

114 ______Chapter 5

This population has previously been used in studies on powdery mildew (Podosphaera pannosa) resistance (an et al, 2006), a range of morphological traits (itonga et al, 2014itonga et al, 2016), stomatal functioning (arvalho et al, 2015), and linage map construction using P and marers (Koningoucoiran et al, 2012) enomic was etracted from freeedried young leaves, using the easy Plant ini Kit (iagen, httpwwwiagencom) following the protocol of (sselin et al, 200) amples were sent to ffymetri for genotyping using the aghP 68 iom P array (Koningoucoiran et al, 2015) This array targets 68,8 Ps with every P targeted with two probes (which we refer to as the P and probes) oth parents were genotyped in triplicate (two biological replicates and one technical replicate)

Genotype calling and data preparation The P array data was converted into dosage scores using the fitTetra pacage (oorrips et al, 2011) uality checs were subseuently performed to ensure consistency between the parental scores and those of the offspring, and to chec for the possibility of “shifts” in dosage assignments which can occur when fewer than the five possible dosage classes occur s each P was targeted with two probes (P and ), the genotype calls for these probe pairs had to be compared and merged if they were found to be consistent ( 10 conflicts, where missing values were not considered conflicting), with conflicting scores made missing Probes with more than 10 conflicting scores (including parental scores) were kept as separate markers, by appending the letters “P” and “Q” to the marker names. Markers with more than 10% missing values were removed from the dataset, as well as those marers showing highlysewed segregation patterns under either a tetrasomic or disomic model (P 0001)

n a tetraploid genotyped using biallelic P marers, the possible dosage classes range from nulliplex (0 copies of the alternative allele, coded ‘N’) to quadruple (4 copies, coded ‘Q’), with marker segregation types defined by their parental scores. Marker dosage scores were recoded as described in (oure et al, 2016), resulting in marer classes (, , , , , T, , , ) uplicate individuals were identified by genotype pairs with an unusuallyhigh correlation ( 5 similar scores) these individuals were merged (if both scores were nonmissing and conflicting, the merged score was made missing) ndividuals with more than 10 missing values were also removed

Marker clustering and linkage group assignment nitially, the simple nulliple () and nulliple simple () marers were clustered using the for linage value of 4 was found to split the data evenly, with clusters of fifteen or more marers selected as candidate homologue groups n P1

115 ______Partial preferential chromosome pairing this resulted in clusters, one more than the expected. ach cluster was then tested over a range of thresholds (from to 10) to ensure the markers remained clustered together. n P at a threshold of , clusters were identified. wo of these clusters split at , resulting in 0 P clusters. hese clusters were subsequently assigned to linkage groups using their linkage to xN (or Nx) markers, which provide crosshomologue linkage. here more than four putative homologues were present, the predicted phasing across clusters was used to merge these into single homologues. n total, P1 homologues across chromosomes were identified, and P homologues across chromosomes in P (i.e. we missed one P homologue). hromosomes were renumbered according to the M numbering previously introduced (piller et al., 011), through linkage with markers from a previous study (oningoucoiran et al., 01). ll other marker segregation types were subsequently assigned to both chromosomes and homologues based on their linkage to xN markers ( ).

Linkage analysis and map construction under a tetrasomic model Pairwise linkage analysis between all marker types was performed per chromosome and per parent. he maximum likelihood framework used to estimate the (phased) recombination frequency and score under the assumption of random bivalent pairing (i.e. no double reduction and no preferential pairing) is already described elsewhere (ackett et al., 01ourke et al., 01). Pairwise marker phase was primarily based on likelihood maximisation (ackett et al., 01), although minimum recombination frequency was used to phase x marker pairs (as described in ourke et al. (01)).

prototype version of the MMap software was kindly made available by its authors for map ordering (Preedy and ackett, 01). Maps were produced using unconstrained weighted metric multidimensional scaling (with as weights and using Haldane’s mapping function) followed by principal curve fitting in two dimensions. Poorly mapping markers were identified either as outliers in the P plots (udged by eye), or if their nearestneighbour fit exceeded . uch markers were removed, and up to three rounds of MMap were performed until stable maps free of outlying markers were produced. he final map positions of the 0 pairs of (unmerged) P and Q probes were compared. n cases, one of the probes was lost (either during the clustering stage or later when outliers were removed from the linkage maps) and the remaining probe was taken as the consensus marker. n cases where both probes were mapped, only 1 mapped less than 1 cM apart and had the same parental dosages these probes could be

116 ______Chapter 5

eged hile the est ee disaded he aing edue as eeated using the aended dataset

Estimation of a preferential pairing parameter e delled eeential aiing in the ntet ialent aiing nsideing deiatins ρ the eeted ailities unde a and del hee the ailit aiing eteen hlgues and and eteen and dented is and that eteen is e aided siultaneusl estiating 1 1 𝜌𝜌 a eeential aiing at and a einatin euen hih as und t lead t 3 + 𝜌𝜌 3 − 2 an eestiatin the leel eeential aiing u et al in au a tste edue that ist estiates a a using tint tetasi linage analsis and then eises thse estiates hen a eeential aiing at has een deteined using a ultiint hidden a del H aah heng et al

he ae data as siliied unding ae sitins t the neaest entigan ate hih ne eah ae segegatin te as seleted at eah sitin hee e than ne ae a atiula te as esent at a lus the ae ith the eest issing alues as hsen etaigin ides the st liel ialent aiing in eah indiidual lasses e enueed hlgues suh that the aiing niguatin ith the highest edited unt in etaigin as etc e then alied a χ² test n the unts eah lass t test deiatins P the nue stutues edited in eah the thee aiing lasses 1 3ae and and assuing and the lielihd untin gien the seed unts is 𝑛𝑛1 𝑛𝑛2 𝑛𝑛3 𝑛𝑛1 ≥ 𝑛𝑛2 𝑛𝑛1 ≥ 𝑛𝑛3

𝑛𝑛1 𝑛𝑛2+𝑛𝑛3 1 1 𝜌𝜌 ℒ(𝜌𝜌) ∝ ( + 𝜌𝜌) ( − ) ling the lielihd euatin leads3 t the aiu3 2 lielihd estiate

𝜌𝜌

2 1 𝑛𝑛1 − (𝑛𝑛2 + 𝑛𝑛3) 𝜌𝜌̂= 3 3 etaigin as eun alling 𝑛𝑛the1 + ssiilit 𝑛𝑛2 + 𝑛𝑛3 uadialent aiing in the sing t inestigate the leel dule edutin and t see hethe eeential aiing as still edited unde a del that inluded uadialents he ate dule edutin as deteined e lus as the aeage the halte ailities that eeeded ass the ulatin

117 ______Partial preferential chromosome pairing

Preferential pairing estimated from repulsion-phase linkages e estiated the e ate in the gente data ais duliated indiiduals hih nineteen ais ee identiied he aiate e ate as taen as the aeage nlit ate e all duliate ais e lled usti et al in thei deinitin the esndene eteen the tue and aaent ates einatin gien an e ate s usti et al Haett and adt (𝜃𝜃0) (𝜑𝜑)

0 0 he iniu eslutin einatin𝜑𝜑 =𝜃𝜃 (1 − 𝑠𝑠euen) + (1 − 𝜃𝜃in)𝑠𝑠 a aing ulatin sie N and issing alue ate is ue et al assuing eee data n the esene es the ae euatin then ilies 𝜇𝜇 𝑟𝑟𝑚𝑚𝑚𝑚𝑚𝑚 ≈ 1/(2(1 − 𝜇𝜇)𝑁𝑁)

1 − 2𝜇𝜇𝜇𝜇𝜇𝜇 𝑟𝑟𝑚𝑚𝑚𝑚𝑚𝑚 ≈ e identiied all ais aes2(1 − 𝜇𝜇that)𝑁𝑁 aed ithin this distane r n dieent hlgues eulsin ais nde a uel disi del the eulsinhase aiu lielihd estiate the einatin euen is gien hee is the nue sing ith a dsage at 𝑛𝑛00+𝑛𝑛11 ae𝑟𝑟𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑑𝑑 =and𝑛𝑛00 a+ dsage𝑛𝑛01+𝑛𝑛10 +𝑛𝑛 11 at ae𝑛𝑛01 etc inheitane is tetasi this estiate nee alls el u and Han ue et al his s the asis 1 a inial test H3 rdisom ≥ 1/3), corrected for multiple testing using FDR with α enaini and Hheg his identiied sshlgue ais that shed signiiant deiatins a tetasi del e then estiated a eeential aiing aaete the hlgue inatins that shed signiiant eidene eeential aiing t eulsinhase aes and lated at the sae geneti sitin the einatin euen estiate eteen the is a untin indeendent asstent ith n ntiutin sses u and Han ien a eeential aiing aaete the ailities seing eah the lasses in the sing ae esetiel his 𝜌𝜌 1 𝜌𝜌 1 𝜌𝜌 1 𝜌𝜌 1 𝜌𝜌 leads t the lling euatin 𝑛𝑛00, 𝑛𝑛01, 𝑛𝑛10, 𝑛𝑛11 6 − 4 , 3 + 4 , 3 + 4 , 6 − 4

𝜕𝜕 −(𝑛𝑛00 + 𝑛𝑛11) 𝑛𝑛01 + 𝑛𝑛01 (ln ℒ(𝜌𝜌)) ∝ + = 0 1 𝜌𝜌 1 𝜌𝜌 𝜕𝜕𝜕𝜕 4 − 4 + ling ields the aiu lielihd(6 4) estiate(3 4) the eeential aiing aaete 𝜌𝜌

118 ______Chapter 5

2(𝑛𝑛00 + 𝑛𝑛11) − 4(𝑛𝑛01 + 𝑛𝑛01) 𝜌𝜌̂= 00 01 10 11 e eded the ean and standad3(𝑛𝑛 deiatin+ 𝑛𝑛 + 𝑛𝑛 the+ nn 𝑛𝑛 negatie) estiates t geneate aiingseii estiates

Linkage analysis of a𝜌𝜌 segmental allopolyploid Haing estiated the stength eeential aiing all aeted linage gus e eestiated the aiise einatin euenies these ases he aiu lielihd estiates einatin euen ee deied in atheatia la eseah n and eted use in e ea desitin the lielihd del inluding eeential aiing is ided in uleenta ethds aes ee suseuentl eassigned t hses and hlgues gien the udated inatin and the as ee ealulated using a eed and Haett

Visualisation and quantification of haplotype diversity suset aes hih thee ee at least signiiant linages t eah the eeted nue hlgue lustes as used t deine haltes hese alled us t eaine aiise diesit eteen hlgues t inestigate the elatinshi eteen hlgue siilait and aiing ehaiu e deined a sile dissiilait easue eteen hlgues and ithin a ind ented at ae sitin j as

𝑁𝑁𝑗𝑗

𝐴𝐴−𝐵𝐵,𝑗𝑗 1 𝐴𝐴,𝑖𝑖 𝐵𝐵,𝑖𝑖 𝒟𝒟 = 𝑗𝑗 ∑|𝑑𝑑 − 𝑑𝑑 | 𝑁𝑁 𝑖𝑖=1 hee is the allele se hlgue at ae sitin i (either ‘0’ or ‘1’) and is the nue aes ithin a sliding ind aund eah ae 𝑑𝑑𝐴𝐴,𝑖𝑖 sitin j 𝑁𝑁𝑗𝑗 Integration with SSR and AFLP map eiusl a geneti linage a the sae ulatin using and aes as ulished ninguian et al e initiall atteted t a all aes tgethe ut und that the aes tended t luste tgethe at the ends a linage gu suggesting high tensin eteen the t ae datasets inage eteen aes and the set aes as ealuated ith the thee st signiiant linages eded giing these aes aiate sitins n the a

119 ______Partial preferential chromosome pairing

Synteny analysis with Fragaria vesca he eree eee t () ro hih the h io rr rer ere erie ere e i erh it the oo trerr eoe e (Fragaria vesca 01 eooee (ooe o 19/09/2016 from http://www.rosaceae.org)) with ‘N’ used at the SNP position, as roie i eetr e o oiooir et (01) e 0 threho o 110 e to reti o hihhoooo hit ith tie hit itere ot to oi treti tiee iie he roe ie ere oriette ori to the orer o the Fragaria eoe eee ( i oe et (01)) the reti te iie i the iro otre (rii et 00) he ot ie oitio o et o oereti rer ere o eterie e o the oitio o their hit i the trerr eoe

roii itio iitio or ioi eretio Results

Marker filtering and dosage assignment o the h rr ere e to roe eie to e to the i eee t oth tr (oiooir et 01) roe ere ieeet ore i itetr (oorri et 011) ere itere ori to er o it riteri h otii t ot ii e (i ter iteri te ter eri roe thi ree to 10) ei oee (P 0001) otii eer th ii or eete ore (e 1) itere roe ere ere here oie ith oiti roe ei et erte i the i it he thoh ret ere eote i triite to o the ret 1 (1) reite hoe hih er o ii e o oore ith the ori (e et oe eote ere iorretee ri reio tiitio te) hereore the reite o 1 ith the et oore ith the ori ore eete here or to o the three reite ere o to he e oit (0) o e ere here ere 10 ereti rer to ei i ith oo re oer the ie rer e (e ) ietee ir o ite iii ere ietiie (irie eote orretio 0) o iii ith ore th 10 ii e ere o reoe reti i i otio ie o 11 iii

Linkage map construction under a tetrasomic model e ooe the i roere erie or otto i ore et (01) thoh it ot oie to oite hoooe (ie ie or )

120 ______Chapter 5 custers into chromosoma groups on the asis of repusionphase inages due to more noise in the data. nstead, couping inages with dupe nuipe N) marers were used to proide these associations. e used new marer ordering software Sap Preed and acett, 2016) which great faciitated the creation of integrated chromosoma inage maps in this stud. ose is a predominant tetrasomic species and thus we can identif four homoogous copies of each chromosome from each parent, assuming sufficient SN and NS marers eist. er oth parents and seen inage groups there are therefore 2 6 homoogue groups epected. iftfie of the epected 6 homoogues were identified, foowing which we assigned eer other marer tpe to oth its most proae inage group chromosomes, termed integrated consensus map ) groups 1 to , c.f. Spier et a. 2011)) and homoogues) using a inage ogarithm of odds ) threshod of .

Estimation of preferential pairing parameters ien an integrated inage map and a popuation with dosage scores, etrarigin heng et a., 2016) can e used to infer the identitdescent ) proaiities for a offspring, usefu for suseuent anases acett et a., 201). ere, we used it to estimate preferentia pairing etween chromosomes. sing the assumption of iaent pairing, we appied a χ² test to the predicted iaent counts under the hpothesis of random pairing ae 1). t a significance ee of 0.001 three chromosomes ehiited unusua ehaiour 1 of P1 and and of P2), with an oer Table 1. Predicted pairing behaviour according to TetraOrigin, showing significance of the deviation from random pairing and the estimated preferential pairing parameter, ρ P1 a ICMb 12_34 c 13_24 14_23 χ² P-value d ρ 1 96 20 6. 1.00E-14 0.02 2 0 6 0. 0. 6 1.0 0.6 60 6 .2 0.0 6 0 12.2 0.002 6 61 6 .2 0.02 66 .1 0.02 P2 ICM 56_78 57_68 58_67 χ² P value ρ 1 6 1 6. 0.0 2 .6 0.1 1 22.6 0.00001 0.1 0 0 1 2.0 1.3E-06 0.196 6 6 9. 0.01 6 2 2 0. 0.9 6 2 2 . 0.02 a P1 Parent 1, P2 Parent 2 integrated consensus map c he predicted numer of offspring with iaent pairing etween homoogues 1 and 2, and the second iaent pairing etween and . d Pvalue of the χ² test of the hypothesis of random pairing, significant values (P 0.001) in od

121 ______Partial preferential chromosome pairing

representation of one iaent configuration and an underrepresentation of the other two, which was most etreme on chromosome 1 of P1 96 out of 11 counts were of the same pairing configuration). or in P1, there was a nearsignificant departure from random pairing P 0.002) ut this arose from an underrepresentation of one configuration, not two, which we hae not attempted to mode.

Preferential pairing estimated from repulsion-phase linkages Preferentia pairing has preious een deduced from repusionphase estimates between “nearby” SxN marker pairs in other studies. We estimated the minimum resoution of recomination freuenc rmin) as approimate 0.011 and found ,6 repusion SN marer pairs in P1 and 10,96 pairs in P2 mapped to within this distance on the integrated map ae S). f these, 296 had significant eidence for non tetrasomic ehaiour using a ase iscoer ate )corrected threshod of 0.0001, where α 0.0). Strong preferentia pairing was identified on 1 in P1, confirming our preious findings. he negatie og10P) aues of these tests in isuaised in igure 2. he strength of significance was not constant across the chromosome, part due to uneen marer distriution. oweer, in the case of 1 of P1 this appears to aso e due to differences in pairing ehaiour etween chromosome arms. possie mode epaining this ehaiour is represented in igure 1. in which ariae pairing affinities are reaised through a miture of iaent and nonrandom uadriaent formation. ur estimated preferentia pairing parameters corresponded we with those from etrarigin ae 2), athough of P1 aso shows a region of preferentia pairing, which was not predicted etrarigin. he strongest eidence for preferentia pairing from repusionphase inages remains that of 1 in P1 and in P2.

Table 2. Significant preferential pairing identified using closely-mapping SxN marker pairs in repulsion

a c d Parent ICM h_a h_b ρ sd N Ns 1 1 1 0. 0.1 612 60 1 1 2 0.2 0.02 16 2 1 2 0.22 0.0 66 2 2 2 0.1 0.0 106 2 6 1 2 0.1 0.02 660 1 2 0.16 0.0 21 12 2 1 2 0.06 0.0 2 a integrated consensus map he standard deiation of the estimate for ρ, the preferentia pairing parameter etween homoogues ha and h c he numer of nonnegatie estimates on which the estimate of ρ was ased d he numer of repusionpairs that showed significant eidence for nontetrasomic ehaiour

122 ______Chapter 5

Figure 2. Evidence for both disomic and tetrasomic behaviour from pairs of closely-mapped repulsion-phase SxN markers. he –oP aues o a inomia test aainst the hypothesis rr_d ≥ or SxN marker pairs in repusion are potted aainst their approximate eneti position. 1 eions3 o disomi behaiour hih aues o –oP are not uniormy distributed suestin dierenes in pairin ainities aon hromosomes. he siniiane threshod is shown as a dashed red ine. hh reers to marker pairs rom homooues and . or simpiity homooues in parent hae been abeed .

Ultra-high density Rosa hybrida linkage maps, tailored for segmental allopolyploidy We modiied our pairwise reombination reueny ramework to inude preerentia pairin and reauated the reombination reueny sore and phase or a aeted inkae roups and then remapped these hromosomes usin the amended estimates. or the ast maority o markers in the dataset there was no hane in the assinment to hromosomes or homooues i.e. marker phasin was unaeted. We

123 ______Partial preferential chromosome pairing

ρ

Table 3. Numbers of markers mapped per linkage group on the ultra-high density Rosa hybrida map. Rosa ICM a length / cM N markers Max. gap (cM)

Prediction of quadrivalents and double reduction

124 ______Chapter 5

Figure 3. Comparison between the integrated genetic map (in centiMorgans) under a random bivalent model (x-axis) and a bivalent model that takes account of preferential pairing (y-axis) for rose chromosomes 1, 3 and 4. 𝑙𝑙𝑙𝑙𝑙𝑙 𝛿𝛿

Haplotype diversity and pairing behaviour

125 ______Partial preferential chromosome pairing

Synteny analysis with Fragaria vesca Rosa Fragaria c.f. –

Fragaria – –

126 ______Chapter 5

Figure 4. Overview of the synteny between Rosa hybrida and Fragaria vesca. a. Fragaria vesca . Fragaria vesca . b. Fragaria .

127 ______Partial preferential chromosome pairing

Discussion

Non-uniform distribution of preferential pairing . Indeed, in keeping with Stebbins’ description, Rosa chromosomal “mixosomy”; in rainbow trout it has been proposed that there may be “residual tetrasomy” Rosa e.g. Rosa

128 ______Chapter 5

runeau et al., ound the same chloroplast gene haplotypes in multiple species. more precise answer may thereore come rom genome seuences o arious diploid and polyploid accessions.

It is also worth noting here that preerential pairing and marker skewness are dierent phenomena that may be conused. arker skewness can be caused by selection and can cause problems in linkage map construction an oien and ansen, . or example, lethality or reduced itness is unlikely to be caused by a combination o alleles present in one o the parents since that parent is alie and thereore, crossparental lethal combinations are more probable, which will not result in a preerential pairing signal in one o the parents. Similarly, i selection is positie e.g. at a resistance locus then the skewness is not in aour o a particular combination o parental homologues but rather o speciic alleles; in cases where there are two unctional copies in a single parent the eect would appear as selection against a particular homologue combination i.e. “unpreferential” pairing). In any case, in this study we only mapped nonskewed markers, so that all our eidence or preerential pairing comes rom markers with no signiicant skewness. his is also isualised in igure S, showing that skewness leels among mapped markers were essentially uniorm across the rose genome.

Predicting pairing behaviour through homology e were interested to see whether pairing behaiour might be predicted rom the homology as deined by our haplotypespeciic linkage maps. e chose a simple dissimilarity measure to analyse the diersity between homologues in this study, similar to the most recent common ancestor measure used in population genetic studies oly et al., . e did not obsere any clear relationship between homology as estimated rom the mapped markers and pairing behaiour. oweer, as in all mapping studies, our data was biased towards higher diersities since only segregating markers can be included on a genetic map. e should thereore treat these results with caution as they do not gie a complete picture o homology, which only seuencing can ully reeal. elomeric homology, where pairing initiation is thought to occur Sed, ;Sybenga, ;iuentes et al., , might hae more inluence than chromosomelength homology and could be a target o uture seuencing eorts to clariy this.

Tailored genetic linkage maps In this study we generated genetic linkage maps tailored to the particular pairing behaiour o each chromosome. e could hae urther reined our maps by allowing the estimate or the rate o preerential pairing to ary along the chromosome arms i this was deemed necessary. oweer, in this population the rate o preerential pairing on

129 ______Partial preferential chromosome pairing the affected chromosomes was low enough that we could hae ignored it in map construction. e can confirm our prediction that linage mapping can e safely performed under the assumption of random pairing up to a leel of preferential pairing as high as ρ .) after which estimates ecome seriously iased oure et al., ). his finding will liely e welcomed y researchers in Rosa, for which the assumption of tetrasomic inheritance has generally een used. In this study, we used marers to proide lins etween putatie homologue clusters of marers helping to identify chromosomal linage groups). oweer, these may not always e informatie in cases of etreme preferential pairing, where simple simple ) marers may need to e used to proide crosshomologue linages instead.

Synteny between Rosa and Fragaria In this study we used the close relationship etween Rosa and Fragaria to confirm the order and organisation of the linage maps produced, as well as allowing us to position a set of nonsegregating marers which were indicatie of etreme preferential pairing in a distal region of maternal chromosome I. It also yielded a detailed picture of the high leel of genomic conseration etween these species igure a,). oth Rosa and Fragaria elong to the Rosoideae clade in the Potentilleae and Roseae tries, respectiely), whose diergence is estimated to hae occurred approimately mya million years ago) iang et al., ). igh leels of gene conseration hae also een reported in the grass family, for eample enneten, ). n the other hand, despite an estimated diergence time of mya, there is a much more striing difference in the genomic configuration of Brassica rapa and Arabidopsis thaliana urat et al., ), although this could e due to the fact that rassicaceae possess an eceptionally high rate of iological radiation and diersification rane et al., ). It is tempting to speculate why rates of genomic diergence differ etween plant families for eample, the leel of actiity of transposale elements within the genome enneten and ang, )), although such a metaanalysis using data across multiple plant families has yet to appear in the literature. hat our study does mae clear is that the pulished reference seuence of Fragaria vesca continues to e a ery useful genomic resource for the ose research community, at least until such time as a ose seuence ecomes pulically aailale.

Concluding remarks In this study we hae applied a noel approach for detecting and uantifying the leel of preferential pairing in a tetraploid iparental population of cut rose, and used this information to refine our estimates of recomination to produce a tailored genetic map of this economically important species. e found conclusie eidence for parent specific preferential pairing ehaiour on three chromosomes whereas the other

130 ______Chapter 5 chromosomes showed tetrasomic ehaiour. he picture that emerges is that the meiotic ehaiour in polyploids may e difficult to predict without detailed analysis. e found that meiotic pairing ehaiour can ary from chromosome to chromosome and from genotype to genotype, and perhaps een along chromosome arms. ools which utilise chromosomewide inheritance information across all homologues are ery powerful in reealing underlying meiotic mechanisms and confirm our findings using repulsion phase linage information as well as information from syntenic mapping of non segregating duple marers to infer inheritance type. e uncoered a detailed picture of consered synteny with the closelyrelated woodland strawerry, highlighting oth telomeric inersions and reciprocal translocations which occurred somewhere in the lineages of one or oth of these species. ur ultrahighdensity linage map will facilitate downstream applications such as current efforts at genome assemly as well as proiding a asis for detailed analysis in the future.

Acknowledgements he authors would lie to than r. hristine acett and r. atherine reedy io) for sharing a deelopmental ersion of ap, which was used in this study. he authors also wish to acnowledge the two anonymous reiewers, whose comments helped improe the manuscript. unding for this research was proided through the I polyploids project “A genetic analysis pipeline for polyploid crops”, project number BO ..

Supplementary data is available online at: httponlinelirary.wiley.comdoi.tp.full

131 ______Partial preferential chromosome pairing

Supplementary Figure S1. Aerage estimated probability of double reduction per parental meiosis oer te seen rose cromosomes

132 ______Chapter 5

Supplementary Figure S2. isualisation of meioticallytailored igdensity linage maps, soing te distribution of segregating marer alleles across all eigt parental omologues for eac of te seen rose linage groups – , also listed in upplementary able using original marer dosage coding

Supplementary Figure S3. Aerage itinparent and beteenparent dissimilarity D for eac of te seen rose cromosomes ultiple points correspond to eac set of omologue pairs itin tese categories

133 ______Partial preferential chromosome pairing

Supplementary Figure S4. D

134 ______

Supplementary Figure S5. i.e. corrected for multiple testing using FDR with α = 0.05) assuming a χ

Chapter 5

135

Chapter 6 polymapR – linkage analysis and genetic map construction from F1 populations of outcrossing polyploids

Peter M. Bourke1, Geert van Geest1,2, Roeland E. Voorrips1*, Johannes Jansen3, Twan Kranenburg1, Arwa Shahin1,4, Richard G. F. Visser1, Paul Arens1, Marinus J. M. Smulders1, Chris Maliepaard1

1 Plant Breeding, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands.

2 Deliflor Chrysanten B.V., Korte Kruisweg 163, 2676 BS Maasdijk, The Netherlands.

3 Biometris, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands.

4 Van Zanten Breeding B. V., Lavendelweg 15, 1435 EW, Rijsenhout, The Netherlands.

Published (with modifications) as Bourke, P.M., van Geest, G., Voorrips, R.E., Jansen, J., Kranenburg, T., Shahin, A. et al. (2018). “polymapR – linkage analysis and genetic map construction from F1 populations of outcrossing polyploids”, Bioinformatics, doi: 10.1093/bioinformatics/bty371

137 ______polymapR - linkage mapping software

, many of the world’s most important crops. Genetic mapping in polyploids is more , . .

. , , , , , , . , , , , .

() ...

, , , , ,

138 ______Chapter 6

n recent years there has een an acceleration of progress in the nderstanding of the genetics nderlying important traits in atopolyploid species. his has een to a large etent de to deelopments in highdensity genotyping platforms for single ncleotide polymorphism marers which hae fond increasing application in polyploids. or eample highdensity arrays hae een deeloped in potato elcher et al. os et al. rose oningocoiran et al. alfalfa i et al. and chrysanthemm an Geest et al. c enhancing the scope for genetic stdies in these species.

n polyploid species as opposed to diploids codominantly scored marers can possess multiple classes in the heterozygous condition, usually termed marker “dosage”. In a tetraploid there are fie possile dosage classes of a iallelic marer namely nlliple with a dosage for one of the alleles simple with dosage dple with dosage triple with dosage and adrple with dosage . n a heaploid the nmer of dosage classes at a iallelic locs rises to seen. arios software hae een deeloped to conert the signal from e.g. arrays into these discrete dosage calls for polyploids sch as fitetra oorrips et al. or lsterall chmit arley et al. .

Genetic linage maps hae traditionally een sed for oth eploratory trait mapping often termed analysis and the sseent fine mapping of traits as well as for assisting genome assemly efforts y giding the integration and orientation of contigs. ighdensity linage maps may also improe or nderstanding of the chromosomal composition and genetics of polyploid species ncoering sch phenomena as dole redction or partiallypreferential chromosome pairing. n many polyploid species which lac reference genome seences linage maps are also a ital first genomic description of that species.

espite the importance of oth linage maps and polyploid species there are still relatiely few software tools aailale for polyploid linage map constrction. llopolyploid species showing disomic inheritance can e treated geneticallyspeaing as diploids with a wide range of software options aailale. n the case of polysomic polyploids atopolyploids and segmental allopolyploids the options aailale to the research commnity are limited. roaly the most wellnown atopolyploid mapping software is etraploidap acett and o acett et al. which has een sed in stdies of arios atotetraploid species sch as potato alfalfa rose and leerry e.g. radshaw et al. oins et al. Gar et al. callm et al. . ecently its sccessor etraploidap has een released to

139 ______polymapR - linkage mapping software accommodate highdensity marker data rom arrays ackett et al., . oeer, it can only handle autotetraploid datasets and proides a graphical user interace or the indos platorm only. inkage studies in species ehiiting strong preerential chromosomal pairing or other ploidy leels are not currently possile using this sotare. n alternatie polyploid mapping sotare is the package in

randke et al., . oeer, this sotare has een deeloped or use ith or ackcross populations rom homozygous parents only. In many cases, either due to inreeding depression or the diiculties imposed y polysomic inheritance, populations rom to heterozygous parents are typically used instead.

In short, there is currently no sotare hich can perorm linkage mapping at arious ploidy leels under a ariety o inheritance models or outcrossing species using dosage scored marker data. ere e present polymap, an package ore eam, or linkage mapping in outcrossing polyploid species hich can generate linkage maps or polysomic triploids, tetraploids and heaploids, accommodating either ully tetrasomic or mied meiotic pairing ehaiour segmental allopolyploidy at the tetraploid leel. Its modularity ill acilitate its adaption to other marker genotyping technologies or ploidy leels in the uture.

he polymap pipeline consists o our parts – data inspection, linkage analysis, linkage group assignment and marker ordering, hich are detailed elo. description o the unctions ithin polymap is descried in the ignette hich accompanies the package, going through all the steps in a typical mapping proect. or consistency and simplicity, all eamples mentioned here descrie a tetraploid cross.

he input data or polymap is dosagescored marker data, aailale rom a numer o packages such as itetra oorrips et al., or lusterall chmitz arley et al., . oth itetra and lusterall are limited to tetraploid data etensions to itetra to accommodate multiple ploidy leels are underay. egardless o ho it is generated, the input dosagescored marker data should consist o a column o marker dosages or the mother, one or the ather olloed y a column or each o the ospring o the cross. hecks or marker skeness and shited markers hen dosage scores are shited y a ied amount are currently proided in polymap rom a suite o tools deeloped or the itetra package oorrips et al., .

140 ______Chapter 6

considered for a pair of 2x0 markers, from left to right, “coupling”, “mixed” and “repulsion”. could also be termed “subgenomespecific” versus “subgenomestraddling”.

– e.g.

141 ______polymapR - linkage mapping software

ighualit data facilitates the generation of highualit maps. ne indication of poor data ualit is a high proportion of missing values. he user ma choose to screen out markers or individuals ith more than a desired rate of missing values b default up to 0 is tolerated, or duplicate individuals. dentical markers, hich often occur in high densit marker datasets ith limited population sies and hence a limited number of recombination events, can be identified and reduced to one representative marker for the mapping steps, and reintegrated later. principal component analsis can also be performed and visualised, hich ma highlight some unanted structure in the population for example due to pollination from an unknon external pollen parent or from selfpollination or outling individuals for example because of admixture.

2.1 Linkage analysis under a polysomic model n autopolploid species ith polsomic inheritance, it is possible to model meiotic pairing structures as random bivalents or multivalents. n practice, both pairing structures tend to occur, ith a relativel lo freuenc of multivalents in stable autopolploids antos et al., 200omblies et al., 20. he main conseuence of multivalent formation from a genetic perspective is the phenomenon of double reduction, here to segments of a particular homologue can end up in the same gamete and become transmitted together to offspring. t has been demonstrated that double reduction introduces some bias in recombination freuenc estimates under a random bivalent model. his can be safel ignored if the rate of uadrivalent pairing is lo ourke et al., 20ourke et al., 20.

nder a random bivalent model, there are three possible bivalent pairing conformations in a tetraploid. n general, for an even ploid p 2n there are possible (2𝑛𝑛)! 𝑛𝑛 bivalent pairing conformations to be considered. iven an pair of𝑐𝑐 marker = (2 ).𝑛𝑛! loci ith unknon recombination freuenc r, e consider the contribution of recombinant homologues ith a ithinbivalent probabilit of and nonrecombinant homologues 1 ith a ithinbivalent probabilit of . n2 cases𝑟𝑟 here recombinants and non 1 recombinations cannot be distinguished,2 (1 both − 𝑟𝑟)are assigned a probabilit of . ssuming 1 random pairing, the probabilit of an particular pairing configuration is 2 in the case 1 of preferential pairing, e introduce a preferential pairing factor to model deviations 𝑐𝑐 from randomness here.

he expected freuenc of each offspring class is first summed over all c bivalent conformations 𝑛𝑛𝑖𝑖𝑖𝑖 0 ≤ 𝑖𝑖, 𝑗𝑗 ≤2𝑛𝑛

142 ______Chapter 6

𝑐𝑐 1 𝐸𝐸(𝑛𝑛𝑖𝑖𝑖𝑖) = ∑ 𝑓𝑓𝑘𝑘(𝑟𝑟, 1 − 𝑟𝑟) 𝑘𝑘=1 𝑐𝑐 here denotes a function of and , dependant on the marker combination considered. iven these expected freuencies, e relate them to the 𝑓𝑓𝑘𝑘(𝑟𝑟, 1 −𝑟𝑟) 𝑟𝑟 1 − 𝑟𝑟 observed counts of individuals in each class to ield the likelihood function

𝑂𝑂(𝑛𝑛𝑖𝑖𝑖𝑖) ℒ(𝑟𝑟) 2𝑛𝑛 𝑂𝑂(𝑛𝑛𝑖𝑖𝑖𝑖) ℒ(𝑟𝑟) ∝ ∏ 𝐸𝐸(𝑛𝑛𝑖𝑖𝑖𝑖) 𝑖𝑖,𝑗𝑗=0 he likelihood euation results from euating the first derivative of the log of the likelihood function ith ero

2𝑛𝑛 𝑐𝑐 1 𝑑𝑑 ∑ (𝑂𝑂(𝑛𝑛𝑖𝑖𝑖𝑖) . ∑ ln (𝑓𝑓(𝑟𝑟, 1−𝑟𝑟))) = 0 𝑖𝑖,𝑗𝑗=0 𝑘𝑘=1 𝑐𝑐 𝑑𝑑𝑑𝑑 n cases where no analytical solution exists, we use Brent’s algorithm rent, to numericall maximise the log likelihood function in the bounded interval . or an pair of markers there are a number of possible phases beteen these markers to 0 ≤ 𝑟𝑟 < 0.5 consider, hich describe the phsical linkage beteen marker alleles. n the case of a pair of duplex x nulliplex (2x0) markers, these phases are termed “coupling”, “mixed” and “repulsion” (Figure 1.a). As the phase between markers is initially unknown, we must compute expressions for each of the possible phases, and select the most likel as the phase for hich and hich maximises the log of the likelihood ackett et al., 20. 0 ≤ 𝑟𝑟 < 0.5 inall, e also compute the logarithm of odds score, hich provides a useful measure of the confidence in the estimate and is used for both marker clustering and marker ordering

ℒ(𝑟𝑟 =𝑟𝑟̂) 𝐿𝐿𝐿𝐿𝐿𝐿 =𝑙𝑙𝑙𝑙𝑙𝑙10 ( ) here is the maximum likelihood estimate ℒof( 𝑟𝑟r. = 0.5)

2.2 Linkage𝑟𝑟̂ analysis in the presence of preferential chromosomal pairing n certain polploid species the meiotic pairing is neither full random nor full partitioned into exclusivelpairing subgenomes, a situation described as segmental allopolploid tebbins, . egardless of the underling mechanism, the result of

143 ______polymapR - linkage mapping software preferential pairing is that both the segregation ratios and the coinheritance of marker alleles are affected. n the example of a 2x0 marker introduced earlier, the expected segregation ratio in a polysomic autotetraploid is 11. ith increasing preferential pairing, this ratio will approach 121 in the case of subgenomestraddling markers (Figure 1.b right), or approach nonsegregation in the case of subgenomespecific markers (Figure 1.b left).

n order to model this behaiour, we introduce a preferential pairing parameter ρ, such that (in the case of a tetraploid) the probability of the chromosome pairing configuration 12 is and the probability of pairing configurations 1 2 and 1 2 is 1 . Attempting3 + 𝜌𝜌 to model preferential pairing at higher ploidy leels introduces further 1 𝜌𝜌 complications hu et al. (201) hae proposed a solution for hexaploids by introducing 3 − 2 three preferential pairing parameters θ1, θ2 and θ3 to model deiations in bialent configurations 12, and respectiely, with all other configurations haing a probability of . n our software, we hae not yet attempted to 1 1 model segmental allohexaploidy, and confine our attention to the tetraploid leel for 15 − 12 (𝜃𝜃1 + 𝜃𝜃2 + 𝜃𝜃3) now.

e do not simultaneously estimate ρ and r, which can lead to an oerestimation of the preferential pairing parameter (u et al., 2002). nstead, we estimate the chromosome wide strength of preferential pairing after map construction and thereafter correct the pairwise recombination freuency estimates to reise the maps. A robust method of preferential pairing detection and estimation is to use inheritance probability estimates such as those proided by etrarigin (heng et al., 201) in polymap we offer a simpler likelihoodbased approach which uses closelylinked repulsion marker pairs to test for deiations from random pairing and simultaneously estimate the strength of this deiation

2(𝑛𝑛00 + 𝑛𝑛11) − 4(𝑛𝑛01 + 𝑛𝑛01) 𝜌𝜌 = 00 01 10 11 where is the number of offspring3(𝑛𝑛 with+ 𝑛𝑛a dosage+ 𝑛𝑛 0+ at 𝑛𝑛marker) A and 1 at marker B etc.

01 ien 𝑛𝑛a parent and chromosomespecific estimate for the preferential pairing factor ρ, we modify the expression for the expected freuency of indiiduals in marker class of a tetraploid as follows 𝑛𝑛𝑖𝑖𝑖𝑖

1 1 𝜌𝜌 1 𝜌𝜌 𝐸𝐸(𝑛𝑛𝑖𝑖𝑖𝑖) = ( + 𝜌𝜌)1 𝑓𝑓 (𝑟𝑟, 1−𝑟𝑟) + ( − ) 𝑓𝑓2(𝑟𝑟, 1−𝑟𝑟) + ( − ) 𝑓𝑓3(𝑟𝑟, 1−𝑟𝑟) 3 3 2 3 2

144 ______Chapter 6

ue to the lack of symmetry, we must consider all possible conformations within each phase, an example of which is shown in Figure 1.b. he procedure for estimating r and remain otherwise the same. he inclusion of preferential pairing imposes an extra computational burden as each phase can hae up to four subphase conformations, all of which are calculated prior to selection of the most likely phase and its associated r and score.

Finally, in both the case of random and preferential pairing, linkage calculations can be run in parallel (using doarallel (icrosoft orporation and eston, 201)) on any indows or nixlike multicore desktop computer resulting in significant timesaings. ighdensity marker datasets with tens of thousands of markers can be processed in a few hours.

n diploid studies, the term linkage group is loosely synonymous with the term chromosome. n autopolyploids two leels of linkage group exist – homologue groups and integrated chromosomal groups. he first step in linkage group assignment is to cluster the 1x0 linkage data into homologue groups, for which we currently use the package igraph (sardi and epus, 200). lustering is performed using the pairwise linkage scores, although the for independence can be used if desired, which may be more robust with skewed marker data (an oien and ansen, 201).

A number of isual aids are proided to assist in clustering (Figure 2). n general, clustering should be performed oer a suitable range of thresholds (e.g. from to 10) in order to inform the choice of score to partition the data into both homologues and chromosomes (Figure 2.a, b). f chromosome and homologue clusters cannot be readily identified using 1x0 markers alone, couplingphase homologue clusters are first identified at a high and later reconnected into chromosomal clusters using a higherdose marker type (Figure 1.c). isualisations help display the strength of associations between homologues (Figure 2.c). ccasionally homologues may split apart arious possibilities to merge these fragments are proided (Figure 2. d, e, f).

n the case of triploid populations, the phasing approach differs between the diploid and tetraploid parents for the diploid parent, phasing can be achieed directly using the phase assignment from the linkage analysis. Following the definition of the chromosome and homologue structure using the 1x0 markers, all other marker segregation classes are assigned to both homologues and chromosomes using their linkage to these markers, generating the final phase assignment of all marker types.

145 ______polymapR - linkage mapping software

146 ______Chapter 6

r –

147 ______polymapR - linkage mapping software

Citrullus lanatus lanatus Vitis vinifera

– –

TetraploidSNPMap (TSNPM)

148 ______Chapter 6

PERGOLA

149 ______polymapR - linkage mapping software

n00 n44 labours under no such “generalist” difficulties. Phase considerations are trivial and – function to 400 cM using Haldane’s lengths of approximately 90 cM using Haldane’s and

work was supported through the TKI projects “A genetic analysis pipeline for polyploid crops” (project 001) and “Novel genetic and genomic tools for polyploid crops” (project number BO

150 ______Chapter 6

omparison of final linkage maps produced by polymap (yaxis) and TetraploidNPMap (xaxis) using the sample dataset provided with the polymap package. or all linkage groups the marker order map length and the number of mapped markers were almost identical between both software.

151

Chapter 7

An ultra-dense integrated linkage map for hexaploid chrysanthemum enables multi- allelic QTL analysis

Geert van Geest1,2,3, Peter M. Bourke1, Roeland E. Voorrips1, Agnieszka Marasek- Ciolakowska4, Yanlin Liao1, Aike Post2, Uulke van Meeteren3, Richard G.F. Visser1, Chris Maliepaard1, Paul Arens1

1 Plant Breeding, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands.

2 Deliflor Chrysanten B.V., Korte Kruisweg 163, 2676 BS, Maasdijk, the Netherlands

3 Horticulture & Product Physiology, Department of Plant Sciences, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands 4 Research Institute of Horticulture, Konstytucji 3 Maja 1/3, 96-100 Skierniewice, Poland

Published (with modifications) as Van Geest, G., Bourke, P.M., Voorrips, R.E., Marasek-Ciolakowska, A., Liao, Y., Post, A., et al. (2017), “An ultra-dense integrated linkage map for hexaploid chrysanthemum enables multi-allelic QTL analysis”, Theoretical and Applied Genetics 130, 2527-2541

153 ______Ultra-dense chrysanthemum map

Abstract nstructin an use linae as is callenin in ealis it lysic ineritance ull a interatin reuires calculatins recinatin reuency eteen arers it cle sereatin tyes n aitin, etectin QTL in ealis reuires inratin n all si alleles at ne lcus r eac iniiual e escrie a et tat e use t cnstruct a ully interate linae a r crysanteu (Chrysanthemum x morifolium, 2n ) A iarental 1 ulatin 0 iniiuals as entye it an 1,000 entyin array Te resultin linae a cnsiste 0,12 sereatin arers all ssile arer sae tyes, reresentin nine crsal linae rus an 107 ut 10 eecte lues ynteny it lettuce (Lactuca sativa) se lcal clinearity erall, it as i enu t nuer te crysanteu crsal linae rus accrin t tse in lettuce e use te interate an ase linae a t recnstruct ineritance arental altyes in te 1 ulatin stiate railities r te arental altyes ere use r ultiallelic QTL analyses n ur traits it ierent unerlyin enetic arcitectures Tis resulte in te ientiicatin ar QTL tat ere aecte y ultile alleles ain a ierential eect n te entye Te resente linae a sets a stanar r uture enetic ain analyses in crysanteu an clsely relate secies reer, te escrie ets are a ar ste rar r linae ain an QTL analysis in ealis

Keywords auteali, arer sae, ientityyescent, altye, lysic lyli

154 ______Chapter 7

Introduction A linae a is a startin int r lcalisatin enic reins tat are assciate it ariculturally irtant traits Tis aes it an irtant tl r Ainre reein (eace, 2017) r lylis, Ainre reein as lae ein care t ilis, ecause entyin cinant arers an linae a cnstructin in lylis reuires secialise ets uc ets nee t e ale t anle ierse arers As se t ilis, lylis ae ultile cnratins eteryus entyes n a lcus it t alleles, a eali can ae ie ierent eteryus entyes ranin r a sae ne t a sae ie Teter it te t yus cnratins, tis as u t seen ierent sae scres

any linae as lylis are cnstructe it sinlese (resentasent) arers usin ets eele r ilis Tse ins as are liite t reresentin nly iniiual lues nteratin searate as lus crses is neee r transeraility results eteen ain stuies an ain traits it a cle enetic arcitecture n an interate a, all arers are lcate relatie t eac ter, resultin in ne reresentatin te sitins all ae lci, irresectie te ase teir alleles Tis enales carisns linae as ase n ierent ulatins a interatin reuires estiatin linae eteen sinlese arers in reulsin r linae eteen ier se arers stiatin linae arers in reulsin is ierent r ilis an can nly e ne it ery l cnience, esecially in a eali (u et al, 12) ereatin ratis ier se arers are airly cle, an calculatin recinatin reuency nees seciic statistical ets (acett et al, 1) ecause te clicate nature recinatin reuency estiatin eteen ier se arers, eicate stare is reuire

n an utcrssin secies, te nuer alleles tat can aect a trait in a sinle iniiual is te sae as te liy leel (iure 1a) r QTL etectin in an utcrssin ullsi ulatin itut any rir nlee n te inle alleles, all tele ssile alleles tat can e inerite r te arents sul terere e taen int accunt (iure 1) it use te sitins arers n a nninterate linae a lues, nly inratin n te resence r asence ne ut tele arental alleles is aailale (iure 1c) te ter eleen alleles are inre, any QTL tat es

155 ______Ultra-dense chrysanthemum map nt ae unerlyin alleles it ar eect n te trait ill e isse ultiallelic QTL ain terere nees inratin all alleles er lcus

Figure 1. Marker allele considerations in an autohexaploid. a. ale a crss t arents it eac si ierent alleles b. An 1 reny is cse t aetes it recinatins r t arents c. y cnsierin a sinle 11 sereatin arer, e can nly caracterise asence r resence ne allele in te 1 srin (clure an inicates sere resence a arer allele) eer, te actual situatin is uc re cle n tat sae lcus tere are ie ter alleles tat all cul ae seciic eects n te entye entye raility els try t recnstruct te actual situatin y estiatin asence an resence r eac allele er lcus

st rress in linae ain in an utcrssin eali it lysic ineritance as een rerte in seet tat (Ipomoea batatas) n tis secies, seeral nninterate as ae een ulise (sit an Tsn, 17riener et al, 200eranteslres et al, 200an et al, 200nen et al, 201irasaa et al, 2017), it tree ulicatins rertin inratin n lus crses itut actually interatin te as n t ulicatins, tis inratin as ase n arers tat a a sae t (ule) in ne arent an er (nullile) in te ter (20 arers), an arers it a sae tree (trile) in ne arent an er in te ter (0 arers(sit an Tsn, 17eranteslres et al, 200)) ters ae ientiie lus crses ase n alinent t a reerence ene (irasaa et al, 2017) iilar t seet tat, crysanteu is an utcrssin eali it lysic ineritance (an eest et al, 2017c) airin at eisis is riarily tru ialents, ut ultialents ccur (as et al,

156 ______Chapter 7

1en et al, 200) erte ets r linae a cnstructin (an et al, 2010aan et al, 2011) an QTL analysis (an et al, 2012aan et al, 2012an et al, 201a) r crysanteu ae een liite t ets eele r ilis, an cnstructe as are terere nt interate

n a eali, an interate linae a is st reeraly cnstructe y estiatin linae it ier se arers Tse ultise arers can cnnect lus crses itin arents an eteen arents an can terere e use t interate te r tetralis, ets t estiate linae eteen ier se inant arers ae een eele (acett et al, 1), an alie t cnstruct interate linae as (eyer et al, 1Lu et al, 2001callu et al, 201) Later, tese ets ae een etene an alie t iallelic arers (acett et al, 201ure et al, 201ure et al, 2017) uc ets ul nee t e etene t ealis in rer t enerate interate linae as it use ier se arers

An interate linae a can e use t recnstruct ineritance arental altyes t arac a reresentatin as in te eale in iure 1 Te t alleles iallelic s can e in linae iseuiliriu it ultile altyes, eac ain a ierent eect n te entye uc altyes can e ientiie ase n te cniuratin neiurin alleles ets r recnstructin altye ineritance y estiatin railities ientityyescent () in tetrali iarental ulatins ae een eele (acett et al, 201ure, 201en et al, 201) Altu all ets are teretically etenile t ealis, te et eele y (ure, 201) is liyleel ineenent an is terere irectly alicale t ealis

n tis aer, e escrie te cnstructin an interate linae a r all ssile arer sae tyes in eali crysanteu e are settin te stanar r transeraility results y crsal linae ru nuerin ase n synteny it lettuce (Lactuca sativa) an y eneratin a cre set arers tat can e use t ancr uture as it te interate linae a, e recnstruct altyes ase n arental riins usin a relatiely sile rceure e enstrate te useulness r QTL ain r ur traits r ic inratin all tele sereatin alleles as taen int accunt

157 ______Ultra-dense chrysanthemum map

Materials and methods

Plant material and phenotyping e analyse te sereatin arers in a iarental ulatin tat cnsiste 0 iniiuals riinatin r a crss eteen 1 (1) an 27 (2), t aisytye, ite crysanteu cultiars entyin t lace in te sae eerient as escrie y (an eest et al, 2017) n srt, te srin an arents ere rn in tree ranise lcs in eac tree seasns suer (ay t uly 201), late suer (Auust t cter 201), an autun (eteer t eer 201) A relicate cnsiste a iel cntainin 10 t 0 lants lants ere rn in 12, 12 an 1 ays 1, 21 an 21 teris r te suer, late suer an autun, resectiely T inuce lerin, tey ere suseuently rn in 12 teris r te lants rn in suer an late suer an in 11 teris r te lants rn in autun ler clur as recre ase n isual seratin lers ere cletely ite, tey ere scre as 0, i tey ere slitly in as a 1, an in lers ere scre as 2 lerin tie as recre as te nuer ays (at srt teri) neee t reac cercial aturity r at least 0 te lants rn in a sinle iel Te nuer ray lrets as cunte r te tir ler ea r te t ne ler ste r eac relicate Te entyic scres taine r is lret ereenin are escrie y (an eest et al, 2017) eritaility as calculate y iiin te estiate entyic ariance y entyic ariance ariances ere estiate usin an analysis ariance (AA) it trial an entye as ie eects

Mitotic chromosome counting r ittic etaase crse analysis, 1 c ln rts ere cllecte r 1 (1) an 27 (2) an incuate in eenr tues in ice ater r 2 r an ten ie in etanl–acetic aci (1) slutin r 12–2 r ts ere stre in iatie at 20 until use r crse rearatins, te rt tis ere ase ties in in enye uer (001 citric acisiu citrate, ) an incuate in an enye iture cntainin 1 () ectlyase 2, 1 () cellulase at 7 r aut 1 uas rearatins ere ae in a r acetic aci an ren in liui nitren Te cer slis ere ree y usin a rar lae Te slies ere ten eyrate in aslute etanl, air rie an staine it 1 l , iaiin2enylinle (A, ia) in ectasiel (ectr Laratries) aes lurescently staine crses ere acuire usin a ann iital caera attace t an Ait icrsce it an arriate ilter an ten rcesse usin

158 ______Chapter 7

Genotyping and marker quality filtering

p χ²

Linkage map construction r i.e. r i.e.

159 ______Ultra-dense chrysanthemum map

r in the range 0 ≤ r

suggested by the authors: we used Haldane’s mapping function, two

160 ______Chapter 7 with marers that formed the bacbone clustering f there were at least fie coupling linages with marers at greater than , alleles were assigned to a homologue f the number of marer alleles was not eual to the number of assigned homologues, the marer was not included in the phased map

s marers were discoered from an se deried transcriptome assembly, each marer is associated with a transcript contig seuence his information was used to inestigate the uality of the map arers of the type that originated from the same transcript contig should hae a distance approaching c on the integrated map assuming the contig was assembled correctly or each homologue combination and for each linage group, an oerall deiation was uantified by calculating the root mean suare error of these differences on the integrated map for all mapped marers originating from the same transcript contig

o enable alignment of any future linage maps in chrysanthemum by gene seuences, marers were identified that originated from contigs representing characterised genes or this, mapped marers were aligned to all proteins from the nirot database from Chrysanthemum x morifolium taonomy using ltschul et al, Hits were filtered for alignment lengths greater than and more than identity subset of marers originating from these filtered transcript contigs spread oer all linage groups was selected to form a reference linage map

Synteny with lettuce o inestigate the synteny of the integrated linage map with lettuce Lactuca sativa, mapped transcript contigs were aligned to the mapped unigenes of lettuce as aailable from the ettuce hip roect website ruco et al, using ltschul et al, niue hits with an ealue smaller than were used to assess synteny hrysanthemum were renumbered based on the number of alignment hits with the lettuce linage groups

IBD probabilities n order to estimate presence of parental haplotypes in the offspring, we calculated probabilities as described by oure, schematic oeriew of the method is shown in igure ased on the phased map and linage information, probabilities per marer locus were calculated for each member of the population in two steps he information was stored in a threedimensional array for each chromosomal linage group, with marer, offspring indiiduals and homologue on the , y, and dimensions n the first step, only fully informatie dosage scores were used to fill the probability array his means that if a progeny has inherited all alleles of a marer on specific homologues, this progeny will be assigned an probability of for these homologues

161 ______Ultra-dense chrysanthemum map at that marer locus f none of the alleles are inherited, the probabilities for these homologues at the locus would be n general, any scores in the progeny that were larger than ero and smaller than the sum of the parental dosage scores were considered as noninformatie e.g. for a marer, progeny with a dosage of or a dosage of were considered informatie, and with a dosage of noninformatie robabilities of loci at homologues of progeny that had noninformatie marer scores were gien a starting probability of n the second step, intermarer distance was used to estimate the probabilities of homologue loci with noninformatie marer scores or each marer locus at each homologue in each progeny the closest informatie marer was located, and probabilities were calculated based on this closest informatie marer, where:

if Pj , and 𝑃𝑃𝑖𝑖 = 1 − 𝑟𝑟𝑖𝑖𝑖𝑖

𝑖𝑖 𝑖𝑖𝑖𝑖 if Pj 𝑃𝑃 = 𝑟𝑟 Here, P represents the probability, i indicates a marer without full information and j indicates a fully informatie marer, and r represents recombination freuency between an informatie and noninformative marker as calculated using Haldane’s mapping function from estimated distance on the integrated map fter assignment of probabilities, the sum of probabilities per parent was normalised to three as there are three homologous chromosomes in a gamete or each homologue in each indiidual, a cubic spline was fitted oer probabilities ersus position to calculate probability interpolations oer c interals enotype information content for interal k at homologue h for n indiiduals was calculated with the following formula:

𝑛𝑛 2 𝐺𝐺𝐺𝐺𝐺𝐺ℎ𝑘𝑘 = 1 − ∑ | 𝑃𝑃𝑖𝑖 − ⌊𝑃𝑃𝑖𝑖⌉ | where 𝑛𝑛 𝑖𝑖=1

⌊𝑃𝑃𝑖𝑖⌉ = 0, 0 ≤ 𝑃𝑃𝑖𝑖 ≤ 0.5, 𝑖𝑖 𝑖𝑖 his results in a score for ranging⌊𝑃𝑃 ⌉ = 1, from 0.5 to < , 𝑃𝑃 where≤ 1 represents a locus with little information, and with complete information

QTL mapping analysis was performed on bloccorrected mean phenotypic alues using an probability model, as described before for tetraploids oure, n additie model modified from empthorne, as suggested by Hacett et al, Hacett et al, was modified to the heaploid leel:

162 ______Chapter 7

𝑌𝑌 = 𝜇𝜇 +𝛼𝛼2𝑋𝑋2 + 𝛼𝛼3𝑋𝑋3 + 𝛼𝛼4𝑋𝑋4 + 𝛼𝛼5𝑋𝑋5 + 𝛼𝛼6𝑋𝑋6 + 8 8 9 9 10 10 11 11 12 12 ere αi and Xi are te𝛼𝛼 main𝑋𝑋 + effects 𝛼𝛼 𝑋𝑋 +and 𝛼𝛼 indicator𝑋𝑋 + 𝛼𝛼 variales𝑋𝑋 + 𝛼𝛼for 𝑋𝑋allele i resectivel e arameters reresenting omologue and omologue ere taken as te reference classes and ere terefore omitted from te model as in all cases tree alleles are inerited er arent o calculate te significance tresold for detecting significant a tousand ermutations ere run it randoml ermuted enotes urcill and oerge taking te t ercentile of te ordered minimum values from eac genomescan analsis as an aroimate significance tresold o identif omologues affecting te trait a simle linear model as run for eac of te telve alleles searatel

Figure 2. Visualisation of the estimation of IBD- probabilities. otetical integrated linkage ma and te searate linkage mas of te si omologues of one arent are son in dark gre and ite resectivel n te uer anel of te line gra roailit te calculation of roailities for omologue are son for marker loci v ink trile w urle dule and y green simle in a situation in ic all alleles of marker x lue simle and z red dule are inerited ince all alleles of loci x and z are inerited tese loci get an roailit of for ineritance of omologue or marker loci v w and y none of te marker alleles are resent on omologue t is terefore not knon eter is inerited at tese loci e orange lines deict te relationsi eteen genetic distance and recomination freuenc r as a function of ma distance ecause distance eteen all marker cominations is knon from te integrated ma e estimate te roailities of loci v w and y as r in case of ineritance of all alleles of x and z ere r is te recomination freuenc eteen te locus and te closest informative marker marker x in te case of w and v and z in te case of y e loer anel of te line gra saded in gre roailit deicts te situation ere none of te alleles of loci x and z are inerited Here roailities for v w and y are estimated as r

163 ______Ultra-dense chrysanthemum map

Results

Linkage map fter removal of markers tat ere nonsegregating ad distorted segregation or ad more tan missing values markers remained in te dataset f tose ad uniue dosage scores across te rogen igure ecause markers it identical dosage scores in eac individual nonuniue markers ill ma to te eact same osition te ere reduced to a single uniue marker for calculation of linkage and ordering e oters ere added to te linkage ma after ma construction it onl uniue markers

Figure 3. Distribution of 19 different marker types segregating in the bi-parental population. otal numer of markers of ic ere uniue e nonuniue markers ad dulicate dosage scores across te oulation e laels on te ais reresent marker segregation types such as simples x nulliplex (1x0) etc. (“dosage parent 1” x “dosage parent 2”)

imle nullile and nullile simle and markers ere used to construct ackone clusters tat reresent omologues is resulted in clusters for and clusters for eac containing five or more markers romosome counting soed tat ot arents ad n cromosomes te eected euloid cromosome numer ur dataset as terefore lacking markers identifing one out of te omologous cromosomes of dentification of cromosomal linkage grous it simle simle markers resulted in a netork of nine reresenting all omologue clusters of ot arents ll oter marker tes ere suseuentl assigned to a ased on linkage it and markers n total uniue markers could e assigned arkers ere ordered er ased on recomination freuenc it as eigts resulting in ma lengts ranging

164 ______Chapter 7

rom . to .0 c. ter ordering the groups o nonuniue marers ere added to the linage map ased on the position o their uniue representing marer resulting in a linage map containing 012 marers (. o initial ale 1). the ordered marers the alleles o 2 (. o initial) could e phased to an expected numer o homologues ased on parental dosages ith at least ie signiicant linages to 1x0 marers (ale 1) resulting in a ully phased linage map.

Table 1. Summary statistics of integrated linkage map CLG a Length (cM) N b Phased markers Contigsa Roundsb 1 2 2 22 11 2 2 . 1 110 111 3 . 20 2 12 4 .2 01 21 1 5 0. 2 10 6 1.1 1 1 2 7 1. 121 2 8 0 10 2 9 .1 10 0 1 sum 2.1 012 2 102 mean . 12 1 a chromosomal linage group numer o mapped marers c umer o transcript contigs associated ith mapped marers. umer o rounds o prolematic marer remoal and reordering ater the irst ordering.

Synteny with lettuce and reference map e aligned mapped transcript contigs o chrysanthemum ith mapped lettuce unigenes (igure ). e aligned the 102 mapped chrysanthemum transcript contigs to 121 mapped lettuce unigenes and otained uniue hits ith an ealue smaller than 1100. his resulted in the identiication o syntenic linage groups eteen lettuce and chrysanthemum. ll cominations o linage groups o chrysanthemum and lettuce ith maximum numer o hits ere uniue except or and . hese to had oth most hits ith lettuce . he chrysanthemum ith least hits to lettuce as renumered ased on the nonassigned linage group rom lettuce. his comination still had 12 hits indicating partial similarity. yntenic analysis per resulted in identiication o large regions ith linear correspondence eteen locations o genes so the genomes appear to e partly colinear at local scale. his as not clear at a larger scale as syntenic regions ere scattered across the linage map o lettuce hich can e

165 ______Ultra-dense chrysanthemum map interpreted as that each chromosome carries maor inersions and translocations. ith use o this data e ased the numering o chrysanthemum on the numer o signiicant alignments o mapped transcripts. o mar these nine chrysanthemum linage groups e present 2 deining marers. hese are eenly spread oer all nine and originate rom contigs representing genes coding or protein entries o the nirot dataase (igure ). his should e a useul tool or uture studies in chrysanthemum.

Figure 4. Synteny between the lettuce ultra-high density map (Truco et al. 2013) and chrysanthemum. ach dot represents a signiicant alignment eteen lettuce unigenes and chrysanthemum transcript contigs

Linkage map quality e used to analyses to ealuate the uality o the linage map. irst to inestigate the concordance eteen estimated pairise r (rpairise) and r ased on map distance (rmap) these to estimators o r ere plotted against each other. ith high scores these to estimators ere in concordance ith each other oer a ide range o r (rom 0 to 0.). econd to ealuate the position o neary 1x0 marers in coupling and repulsion e aligned the position on the integrated map o 1x0 marers that originated rom the same transcript contig rom the se assemly. he positions o marers that originated rom the same contig and had the same phase ( marers in total) aligned nearly perectly (igure ) indicating lo errorrates. he position o marers phased

166 ______Chapter 7 on dierent homologues (2 marers in total) as more spread. he residual mean suared error () as calculated or each linage group (igure ) and each comination o homologues rom the same linage group. as generally elo c ith some outliers on 2 and . hese outliers ere caused y one to and to marers respectiely.

Figure 5. Integrated linkage map of phased markers with 1x0 markers (black), other marker types (grey) and CLG defining markers (red)

IBD probabilities he presence o each o the tele segregating haplotypes per locus as estimated in all progeny indiiduals at 1 c map interals hich as expressed in proailities. n the middle o the the proailities could e estimated ith high conidence. there ere no marers in large parts o one homologue the proailities could still e close to 0 or 1 ecause inormation rom the ie other homologues can complement the missing inormation. en i no marers ere mapped on the entire homologue e.g. homologue 12 rom 2 on linage group proailities ere complemented ith inormation rom the other ie homologues. enotype inormation content as loer toards telomeres ecause in those regions marers ere oten missing in a large range in at least to homologues and inormatie marers ere present on only one side.

167 ______Ultra-dense chrysanthemum map

Figure 6. Scatterplot of marker positions of 1x0 markers on the integrated map that originated from the same transcript contig. ach dot represents a comination o marers that originated rom the same transcript contig. he red dots indicate marers phased on the same homologue grey dots on dierent homologues. he lac line represents y x

QTL mapping he population as phenotyped or our dierent traits loer colour loering time dis loret degreening and numer o ray lorets. ll our traits had a moderately high heritaility ranging rom 0. to 0.2. he phenotypes ere itted against the proailities at 1 c interals ith a main eects model.

o regions ere highlysigniicantly associated ith loer colour at and and one region at as slightly associated (igure ). he highly signiicant loci ere oth simplex (igure .a). nalysis o ariance o the interaction eteen the associated alleles shoed a highly signiicant (p 11) interaction indicating that oth alleles need to e present to get a pin loer colour. ogether the to 1x0 marers that ere most closely lined to each o the explained . o the ariation indicating that the trait is mainly inherited y to maor alleles segregating rom to loci rom each o the to parents. here is a minor on ut some genotypic ariation is still to e explained y undetected .

168 ______Chapter 7

Figure 7. QTL analysis of flower colour (purple), flowering time (red), disk floret degreening (green) and number of ray florets (blue). igniicance thresholds ased on 1000 permutations are indicated y a dashed line

or loering time e ound three clear s on 2 and and one minor at (igure ). or the simplex on presence or asence o the allele at homologue 11 had a maor eect on the trait. n oth loci on 2 and presence o dierent alleles could hae a positie eect a negatie eect or no signiicant eect on the phenotype (igure .). hereore at least three alleles underlie the s. or dis loret degreening three located on and ere detected (igure ). or all three multiple alleles played a role (igure .c). he on explained most phenotypic ariation and presence o the allele on homologue had the strongest eect on the mean alue o dis loret degreening. or numer o ray lorets to minor ere ound on and . he on as aected y one allele rom the maternal parent. he on as aected y alleles that originated rom only the maternal parent ith opposite eects rom dierent homologues (igure .d).

169 ______Ultra-dense chrysanthemum map

Figure 8. Analysis per homologue for four QTL. raits studied ere a. loer colour at b. loering time at 2 c. dis loret degreening at d. numer o ray lorets at . he palue or testing the signiicance o the explained ariation o proailities o a single allele ersus phenotype ( ) is shon as a heatmap. he estimated eect o ull asence or presence o an allele on the phenotypic alue is shon in the black points. he plot 𝐺𝐺 𝐶𝐶 𝑖𝑖 𝑖𝑖 limits o the eect is shon𝑌𝑌 =in the 𝜇𝜇 +black 𝛼𝛼 𝑋𝑋 box meaning that the black points can range ithin the negatie and positie alue eteen the oundaries eteen homologous indicated y grey lines. he dashed grey lines represent an eect o ero

170 ______Chapter 7

Discussion n this paper e report the irst integrated linage map in a hexaploid species ith polysomic inheritance. e ere ale to assign multidose marers to their parental homologues. ith this phasing inormation e could reconstruct inheritance o haplotype alleles in the iparental population and perorm analyses. e proide maor steps to oercome a numer o limitations to linage map construction ased on marers in hexaploids including ull map integration and phasing.

An integrated and phased linkage map he ultradense integrated map contained marers o all 1 possile types in a hexaploid. ith our approach e irst deined acone marer clusters that represented homologues ased on linages eteen simplex x nulliplex (1x0) marers. ith 1x1 marers that contained inormation aout homologous chromosomes e created netors o 1x0 linage groups that represented (chromosomal linage groups). useuently the other marer types ere assigned to these acone clusters. ecause e irst deined acone marer clusters ased on 1x0 marers deinition o homologues relied on presence o 1x0 marers. ac o 1x0 marers on a homologue is possile i there is a high degree o inreeding the population is rom a seling or i there is selection or a phenotype or hich alleles hae an additie eect. oeer our experience in polyploid mapping to date has shon that lodosage marers tend to e the most aundant marer type (oure et al. 201oure et al. 201an eest et al. 201c). eertheless e ere not ale to deine one homologue on linage group een though all others each had 1 marers or more. oo e marers thereore seems unliely. reason could e that some cominations ith alleles on this homologue might hae een lethal and 1x0 marers on this homologue might thereore hae had highly distorted segregation. hese marers ould hae een iltered out prior to linage mapping.

he ap algorithm (reedy and acett 201) has proed particularly useul or the eighted ordering o our large numer o dierse marer types. s the or linage aries or the same alues o r or dierent marer type cominations and phases a eighted ordering algorithm as reuired. he most reuently used algorithm or eighted ordering is ased on a eighted linear regression () algorithm as deployed in oinap (an oien 200). oeer current processor speed o destop computers limits the use o this algorithm to approximately a hundred marers per linage group. ecause o this restriction (oure et al. 201) used the algorithm to construct homologue maps separately hich ere later integrated. n a suseuent mapping study in tetraploid rose the ap algorithm as used to construct an integrated map containing oer 2 s ithout the need or inning or the separation o homologue

171 ______Ultra-dense chrysanthemum map maps eore integration (oure et al. 201). he ap algorithm also orms the core o the mapordering module ithin etraploidap sotare (acett et al. 201) although its release as a separate pacage opens up the possiility o high density mapping at any conceiale ploidy leel gien pairise recomination reuency inormation. ith the ap algorithm e ere ale to order all marers rom a in one run (ranging rom 121 to 20 marers per ) resulting directly in an integrated map. his numer o marers ould preiously hae een completely intractale using a . ith the ne algorithm such maps can e produced on an aerage destop computer ithin hours. s the ordering step is time and resource eicient running multiple rounds o mapping ith remoal o prolematic marers is much more easile.

inage map uality as assessed ased on to analyses on the concordance eteen rpairise and rmap and on the relatie position eteen 1x0 marers originating rom the same contig rom our transcriptome assemly. ccording to the comparison o rpairise and rmap the to estimators ere in concordance i rpairise could e estimated ith high conidence (high ). hereore there is little discrepancy eteen the distance and ordering o dierent cominations o marers on the linage map and their initial estimation o r. he second analysis resulted in inormation on the uality o local integration o homologous chromosomes. his is ased on the assumption that recominations are essentially asent ithin a transcript contig. hereore e ould expect that marers originating rom the same contig hae a distance ery close to 0 c. rom the position o marers originating rom the same contigs a dierence rom ero can e calculated and ith that a measure or error e used the . he o 1x0 marers mapped on the same homologue as generally ery lo indicating oth a highuality assemly o transcripts containing mapped marers and highuality local ordering at the leel o the homologue. he o 1x0 marers phased on dierent homologues as higher. all marertype cominations the estimation o genetic distance eteen 1x0 marers in coupling is most accurate. stimation o distance eteen 1x0 marers in repulsion relies on higher dose marers ecause high conidence estimation o recomination reuency in repulsion o 1x0 marers as not possile ith our population sie o 0. he positions o 1x0 marers that are on dierent homologues relatie to each other are thereore estimated ith loer certainty than i they ere in coupling phase. oeer errors in estimating distance eteen these 1x0 marers in repulsion ere in general loer than c. serious ordering issues occurred a much higher alue ould e anticipated. eertheless there ere our homologue cominations ith alues higher than 10 c. nly one or to marers per caused these high alues. s these marers ere not associated ith any notale stress on the map it is liely that these marers ere actually rom dierent

172 ______Chapter 7 loci in the genome and the contigs they originated may e the result o a chimeric contig assemly rom to ery similar transcripts originating rom the same chromosome.

arlier linage maps o chrysanthemum are ased on (hang et al. 2010a) and marers (hang et al. 2011a). disadantage o these types o molecular marers is that they are diicult to transer and dierent linage maps thereore cannot e integrated. marers are seuence ased and executing single assays lie or aan are commonly applied laoratory procedures. hey can thereore e lalessly transerred eteen laoratories. o set a standard or chrysanthemum e present the seuences o a set o 2 elldistriuted marers originating rom consered coding seuences that can e used as a core set to align uture linage maps to each o the chromosomal linage groups presented here.

Estimating IBD probabilities e used a relatiely simple approach to estimate proailities or asence or presence o parental haplotypes in our segregating population (oure 201). he method only uses inormation o dosage scores i they are ully inormatie. his means that in case o a 1x1 marer or example a dosage o 0 and a dosage o 2 in the progeny is ully inormatie (hile assuming asence o doule reduction) ecause it represents inheritance o respectiely none o the associated homologues or oth. dosage o 1 is not ully inormatie as it is not non rom hich parental homologue the allele originated. hereore higher dose marers carry relatiely e inormatie dosage scores. conseuence o our method is that it is only accurate i marers ith a large raction o inormatie dosage scores are eually distriuted oer the homologues. n our data parts o homologues ere sometimes poorly endoed ith inormatie marers. his did not turn out to e prolematic i at that position all other ie homologues or that parent carried enough inormation. ore sophisticated methods hae shon that higher dose marers add more inormation to the estimation o proailities (acett et al. 201heng et al. 201). uch methods could result in more accurate estimates ut an adeuate marer distriution oer all homologues is ey to all methods.

he accuracy o genetic analysis ased on proailities relies on the uality o the integrated map. the estimation o distance eteen marers ith alleles on dierent homologues is poor estimation o proailities o alleles on the presumed same locus ill e rong and ill thereore proide a poor representation. oeer the o the marer positions on the integrated map as generally ell elo c. his ould not hae a large eect on the estimation o proailities ecause according to Haldane’s mapping function a distance of 5 cM corresponds to a

173 ______Ultra-dense chrysanthemum map recomination freuenc of resulting in a relatiel lo error of on te estimation of proailities

QTL mapping it te integrated map and proailities e ere ale to perform a multiallelic analsis n a polploid tis tpe of analsis as large adantages oer te use of metods tat are deeloped for diploids ecause tat are regulated multiple different alleles can e detected and teir genetic arcitecture inestigated Hacett et al n a polploid more tan to alleles can underlie a is means tat te genotpe does not onl ae a dosage ut can also e multiallelic i.e. not onl different conformations of te alleles and ut also cominations of e.g. and are possile itin a locus o inestigate te genetic arcitecture and it tat te occurrence of multiallelic e performed a analsis tat maes use of using proailities for four traits it different underling genetic arcitecture

e maor loci associated it floer colour ere iallelic ogeter te eplained a large part of te penotpic ariation and ere affected one allele for eac of te to loci e to loci clearl soed an interaction suggesting tat presence of ot alleles is needed for pin colouration n crsantemum pin colouration is caused antocanin accumulation ticland e interaction eteen alleles could e caused te reuirement of to enme ariants needed for te production or regulation of production of antocanin or to gene copies tat are reuired for te same limiting step needing te additie effect of ot to ecome isile

eeral s associated it floering time dis floret degreening and numer of ra florets ere multiallelic ese s ad underling alleles it a positie effect a negatie effect and no significant effect on te penotpe indicating presence of at least tree alleles e eact numer of uniue alleles tat affect te penotpe is difficult to determine o aplotpes tat ae te same effect on te penotpe could ae te same underling polmorpism affecting te penotpe ic ould mae tem te same alleles n te oter and te could contain different causatie polmorpisms tat ae a similar effect on te penotpe ased on our data it is not possile to uniuel identif suc alleles ecause our analsis is ased on genetic linage and te causatie alleles cannot e identified

ompared to floer colour te genetic arcitecture for floering time as more comple e at as iallelic meaning tat presence of one allele affected te trait ereas te oter eleen alleles did not significantl affect te penotpe Hoeer in to oter maor on and multiple alleles ere inoled ter studies on te ineritance of floering time in crsantemum also suggested

174 ______Chapter 7 inolement of multiple loci ang et al ang et al a loering time in sort da plants is mainl te result of an interaction eteen grot rate and signal transduction of enironmental cues lie dalengt and temperature s tese cues are strictl controlled in a greenouse te role of te enironment ould e epected to e relatiel small is is supported te relatiel ig eritailit ic as also found earlier e ong Hoeer genetic regulation of signal transduction and grot rate is liel comple and it is terefore not surprising tat multiple loci are inoled

is floret degreening is an important determinant of postarest performance of crsantemum after long storage an eest et al ree multiallelic ere identified ese eplained a relatiel small fraction of te penotpic ariation is floret degreening is a psiologicall comple trait in te inestigated population it is related to carodrate content of te dis floret at arest an eest et al Man sutraits could affect carodrate content including genotpic ariation related to potosntetic rate and sourcesin relationsips urtermore it as son tat carodrate content is not te onl factor affecting degreening an eest et al t is terefore not surprising tat e did not find maor for dis floret degreening issecting te trait furter penotping for sutraits suc as carodrate content or accrossing progen arouring specific trait caracteristics migt elp to furter identif specific loci underling tis comple trait

e numer of ra florets ad te igest eritailit of te inestigated traits ut least ariation could e eplained detected steraceae plants carr composite floer eads tat are comprised of multiple florets ose florets can e categorised into dis florets and ra florets e numer of ra florets is affected te numer of florets on a capitulum and organ identit of tose florets egulation of floret identit is generall inerited troug one or to maor loci in steraceae illies et al t is terefore uite unepected e did not find an maor associated it te trait s ot parents ere of te single floer tpe it is possile tat ot laced allelic ariation in te maor genes and e onl found ariation in more complel regulated minor allelic effects

n te analses possile interactions eteen alleles ere not taen into account n alternatie model as descried Hacett et al tat uses all possile genotpe classes as parameters ould enale detection of interactions Hoeer te metod e used to estimate proailities is not ale to estimate proailities for tese genotpe classes directl More importantl in a tetraploid tere are possile genotpe classes leading to a model it parameters tat is alread prone 4 4 (2) × (2))

175 ______Ultra-dense chrysanthemum map to oerfitting n a eaploid tis ould e genotpe classes leading to parameters oerfitting ould definitel ecome an issue 6 6 (3) × (3) ur results so tat eaploid in crsantemum complicates analsis ecause multiple alleles it a differential effect can underlie an associated locus it te integrated map and proailities e ere ale to identif ineritance of parental aplotpes in te progen enaling us to identif effects of specific alleles tat affected te penotpe e indeed found clear eamples in ic different alleles from te same locus and parent affected te trait negatiel or positiel it tese findings e so tat polploids it polsomic ineritance can arour muc more diersit on a single locus compared to a diploid and tis is er important to tae into account during detection and reeding Conclusions e metods descried in tis paper enale construction of integrated linage maps in eaploids it polsomic ineritance ur presented metods can e used for future proects tat aim to construct integrated linage maps and perform multiallelic analses in eaploids uccess of suc proects depends on seeral features of te inestigated organism and te otained dataset irst it depends on te predominance of random ialent pairing at meiosis econd sufficient and eenl distriuted marers are reuired tat can define eac omologous cromosome ast igerdose codominant marers ot uniparental and iparental e.g. and it alleles on eac omologous linage group are needed tat proide information to integrate omologous linage groups into cromosomal linage groups it te resulting integrated linage maps it is possile to perform analsis tat taes all possile alleles into account at te same locus is as maor impact on te possiilities for localisation of genomic loci and teir genetic arcitecture associated it traits in crsantemum ut also for oter agriculturallimportant eaploid species suc as seet potato ii and persimmon Acknowledgements e autors ould lie to tan aterine reed and ristine Hacett of io te ames Hutton nstitute in undee for maing a deelopmental ersion of te MMap softare aailale e autors ould also lie to tan en mulders for suggesting reisions to te manuscript

Supplementary data is available online at: ttpslinspringercomarticles5upplementarMaterial

176 Chapter 8

Quantifying the power and precision of QTL analysis in autopolyploids under bivalent and multivalent genetic models

Peter M. Bourke1, Christine A. Hackett2, Roeland E. Voorrips1, Richard G. F. Visser1, Chris Maliepaard1

1 Plant Breeding, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands.

2 Biomathematics and Statistics Scotland, Invergowrie, Dundee DD2 5DA, UK.

Submitted

177 ______Power and precision of QTL analysis

Abstract New genotyping technologies, offering the possibility of high genetic resolution at low cost, have helped fuel a surge in interest in the genetic analysis of polyploid species. Nevertheless, autopolyploid species present extra challenges not encountered in diploids and allopolyploids, such as polysomic inheritance or double reduction. Here we investigate the power and precision of quantitative trait locus (QTL) analysis in outcrossing autotetraploids, comparing the results of a model that assumes random bivalent chromosomal pairing during meiosis to one that also allows for multivalents and double reduction. Through a series of simulation studies we found that marginal gains in QTL detection power are achieved using the double reduction model but at the cost of an impaired ability to determine the most likely QTL segregation type and mode of action. We also explored the effect of variable genotypic information across parental homologues and found that both QTL detection power and precision require high and uniform information contents. This suggests linkage analysis results for autopolyploids should be accompanied by marker coverage information across all parental homologues along with the per-homologue genotypic information coefficients (GIC). Visualising the GIC landscape of the homologues of interest around QTL peaks will help elucidate the limitations of QTL power and precision in further studies. Application of these methods to an autotetraploid potato (Solanum tuberosum L.) mapping population confirmed our ability to locate and dissect QTL in highly heterozygous outcrossing autotetraploid populations.

Key words Quantitative Trait Locus (QTL) analysis, autopolyploid, double reduction, QTL power, Bayesian Information Criterion (BIC), genotypic information coefficient (GIC).

178 ______Chapter 8

Introduction Autopolyploid species, characterised by having more than two homologous copies of each chromosome, present a number of challenges to genetic research not present in diploids or allopolyploids (which are essentially already diploidised, genetically- speaking). They have therefore been somewhat left behind when it comes to tools and methods for their genetic analysis. Among these challenges are the complexity of modelling polysomic inheritance and the occurrence of double reduction (the phenomenon whereby a particular segment of a parental chromatid and its recombinant “sister” copy migrate to the same gamete, which can only occur if multivalent pairing structures are formed and maintained through meiosis 1 (Haldane, 1930;Mather, 1935)). However, we are now at a stage where the availability and low cost of genotyping tools (based on single nucleotide polymorphisms, or SNP markers) are making the analysis of autopolyploids not just feasible, but of practical importance to crop breeders, increasing the need for both the methods and the tools to conduct these analyses, as well as knowledge on how best to apply these methods.

Breeders and researchers are often interested in knowing the genetic architecture of important traits, for example: 1. whether they are oligo- or polygenic; 2. where the quantitative trait loci (QTL) influencing the trait lie on the genome; 3. from which specific parental homologous chromosome (which we term “homologue”) the favourable alleles originate; 4. whether these alleles exhibit additive or dominant gene action; 5. whether they interact with alleles at other loci, or with the environment. Up to now, approaches such as QTL mapping in bi-parental populations or genome-wide association studies have been proposed, although not always addressing all of the above points. More recently, genomic selection has been advocated as a powerful method to increase genetic gain, without necessarily needing an understanding of the genetic architecture (Meuwissen et al., 2001;Slater et al., 2016). For quantitative traits with hundreds or thousands of causative genes, this is likely to be more appropriate in breeding programs. However, for traits with a few major causative loci, QTL mapping remains a viable option that additionally offers the promise of both understanding the genetics underlying the trait (including the possibility of finding the underlying genes) while also facilitating selection for it through marker-assisted selection.

QTL mapping in autopolyploids has evolved in the last 20 years to keep pace with changes in genotyping technologies. Approaches have been developed for both co- dominant and dominant marker systems, and range from simplified models that only consider bivalent pairing to more complex models that also include double reduction. However, there has been almost no investigation into the applicability or advantages of different models. Despite this, it is often asserted that models that ignore double

179 ______Power and precision of QTL analysis reuction are a priori inferior to those that inclue it (uo et al, ) ore complete moels of polysomic inheritance that inclue oule reuction are often evelope uner the assumption of completelyinformative marer systems (e.g. (ie an u, i et al, u et al, )), therey avoiing the statistical compleities impose y partiallyinformative marers (such as osagescore marers) n fact, it is uite telling that the only pulicallyavailale tools for analysis in autopolyplois have aopte the simplifying assumption of ranom ivalent pairing (e.g. etraploiap (acett an uo, acett et al, ) an etraploiap (acett et al, )), whereas more complete moel escriptions remain unimplemente

ecently, a metho to reconstruct inheritance proailities (or ientityyescent () proailities) uner oth ivalent an multivalent pairing moels has een evelope into a software pacage etrarigin (heng et al, ) e use etrarigin an the simulation software eigreeim (oorrips an aliepaar, ) to investigate mapping in autopolyplois, estimating etection power an precision, the effect of oule reuction an multivalent pairing (while comparing moels that oth ignore an inclue it), the impact of population sie, trait heritaility an marer istriution, an the ifferences in etection an iagnostic power (i.e. correctly preicting the position as well as the composition of the alleles) etween simple or more comple segregation types an ifferent moes of action (aitive or ominant) e also eamine the wellstuie traits of plant maturity (earliness) an flesh colour in a i parental tetraploi potato population to further illustrate our finings, comparing the locations to the physical positions of caniate genes unerlying these loci

ne important aspect of mapping that remains conspicuously asent from most pulishe stuies in oth iploi an polyploi species is the topic of information content. Originally it was noted that “marker information content” could adversely affect the estimate position of a if marers of variale informativeness were locate near a (nott an aley, ) ncreasing information content was foun to lea to an increase test statistic, which coul ias the location of a pea (nott an aley, nott et al, ) ther authors have propose alternative measures of information content than that of nott an aley, such as one ase on Shannon’s information content (eyesales an illiams, ) n autopolyploi species the issue of information content is argualy even more important than in iplois, as information content generally varies etween homologues e prefer to use the term “genotypic information coefficient” (an oien, an oien, ) as it avois confusion when ealing with proailities, which are multipoint estimates of homologue transmission proailities an not the marer genotypes themselves n this article we eten the efinition of the genotypic information coefficient () to

180 ______Chapter 8 autopolyploids and eplore its usefulness in predicting detection power and precision. Our work will help to guide etterinformed analyses in autopolyploids as well as deepening our understanding of the genetic architecture controlling important traits in these species.

Figure 1. Distribution of markers on potato chromosomes 1 and 12 from the integrated genetic maps of Hackett et al. (2013). hromosome shows poor marker coverage across most homologues in the region – c and was therefore chosen to investigate the genotypic information coefficient . hromosome has etter overall marker coverage. arental homologue h numers are shown as h – h for parent and h – h for parent .

Materials and Methods Simulated tetraploid populations were the asis for much of the work presented here. e ased our simulated populations on previouslypulished genetic maps of tetraploid potato developed from dosagescored S markers ackett et al. . n polyploids marker dosages are generally understood to correspond to the allele counts of the “alternative” allele (as opposed to the “reference” allele of a biallelic S marker. n heteroygous autotetraploids the possile dosages are and with a marker eing defined y its maternal and paternal dosages e.g. means a dosage of in parent the mother and a dosage of in parent the father. here are nine “fundamental” marker segregation types to consider in an autotetraploid cross, namely and to which all other marker types can e converted without loss or distortion of linkage information ourke et al. . or convenience, we often refer to simplex x nulliplex or “SxN” markers to indicate both or markers similarly duple nulliple markers imply either or . Simple simple SS refers to markers whose fundamental form is .

181 ______Power and precision of QTL analysis

Table 1. Details of parameter levels used in the simulation study. q N h2 QTL seg. QTL action QTL pos. QTL model Chm. 1 SxN additive random no xN dominant SxS Chm. 12 SxN additive c no xN dominant c SxS he phased genetic linkage maps of potato chromosomes and from ackett et al ( were used in the simulation of genotyped tetraploid populations with edigreeSim in this study Chm. chromosome q rate of uadrivalent formation (specified in edigreeSim chrom file N population 2 sie of population h (broadsense trait heritability QTL seg. segregation type, given as the maternal x paternal dosage of the (alternative marker allele he codes SxN refers to x and x markers, xN refers to x and x markers and SxS refers to x markers QTL pos. genetic position of the his was either random (chm or confined to a telomeric ( c versus centromeric ( c position on chm QTL model model used in analysis, either with double reduction ( or not (no

e limited our attention to potato chromosomes and as these displayed contrasting levels of marker coverage (igure he range of parameters used in both simulation sets are outlined in able e used potato chromosome , with its uneven marker distribution, for the investigations into the genotypic information coefficient (, while chromosome was used for the power analysis nless otherwise stated, all statistical analyses and visualisations were performed using the statistical computing environment version ( ore eam,

GIC study – potato chromosome 1 or each set of population parameters (all possible combinations of population sie (op and rate of uadrivalent formation (q we simulated separate populations using edigreeSim and the phased linkage map of chromosome from ackett et al ( for the phased parental marker positions and dosages (visualised in igure , left hand side. ach simulated individual carried a single chromosome or each population, we generated phenotype sets for all possible combinations of the factors heritability (h2, segregation type ( seg and action (able he phenotype of the ith individual ( with dosage was randomly sampled from a Normal distribution according to 𝑃𝑃𝑖𝑖 𝑑𝑑𝑖𝑖

2 𝑃𝑃𝑖𝑖~𝒩𝒩(𝜇𝜇𝑖𝑖 +𝑄𝑄∗𝑑𝑑 , 𝜎𝜎𝑒𝑒 )

182 ______Chapter 8

where , he environmental variance 2 was determined by 2 1−ℎ 2 2 first calculating𝜇𝜇 𝑄𝑄 the genotypic variance across𝜎𝜎𝑒𝑒 the= (wholeℎ ) 𝜎𝜎 𝑔𝑔population given the individual dosages (in the case of a dominant2 these were taken as a dosage of 𝜎𝜎𝑔𝑔 and only ffspring genotypes were derived from the hsa and hsb output files of edigreeSim (which provide the exact location and origin of recombination points along offspring homologues oth the position and the configuration of the were randomised for each phenotype set (to c accuracy

etrarigin (heng et al, was run on athematica version (olfram esearch nc, with input files derived from the integrated linkage maps and dosage output of edigreeSim, using both bivalentdecoding options (alse rue to generate probabilities under both a model that allowed for double reduction ( and one that did not (no he other parameter settings used were parental dosage error probability (eps , offspring dosage error probability (eps , and parental bivalenthasing rue (i.e. assuming purely bivalent pairing predominates to determine parental marker phase, for computational efficiency (heng et al, he probabilities at the marker positions were used to fit splines (using the smooth.spline function in ( ore eam, from which renormalised probabilities were interpolated at a c grid of positions (using the predict function in for subseuent analysis analysis was performed using a weighted regression of the homologue effects, weighted by the genotype probabilities (ackett et al, he model described by ackett et al (, , derived from the earlier work of empthorne (empthorne, , can be written as

′ 2 2 3 3 4 4 6 6 7 7 8 8 having taken𝑌𝑌 the constraints =𝜇𝜇 + 𝛼𝛼 𝑋𝑋 + 𝛼𝛼 𝑋𝑋 + 𝛼𝛼 𝑋𝑋 + 𝛼𝛼 𝑋𝑋 +and 𝛼𝛼 𝑋𝑋 + 𝛼𝛼 𝑋𝑋 + 𝜀𝜀 into account ere, Y corresponds to the trait values, Xi the indicator variables for the presence 𝑋𝑋1 + 𝑋𝑋2 + 𝑋𝑋3 + 𝑋𝑋4 = 2 𝑋𝑋5 + 𝑋𝑋6 + 𝑋𝑋7 + 𝑋𝑋8 = 2 absence of a particular parental homologue ( for parent , for parent and the residual term. Hackett et al. (2014) describe this as the “additive” model, and used the 𝜀𝜀 probabilities of the possible genotypes as weights in a regression using the above model form. We firstly applied this approach (which we term the “no double reduction” or “noDR” model, where all or , subseuently extending it to use the set of genotype probabilities as weights (termed the “DR” model which includes the additional 𝑋𝑋𝑖𝑖 genotypes resulting from double reduction, i.e. are no longer constrained to eual and , but rather = 0, 1 or 2). The “logarithm of odds ratio” (LOD) score for the 𝑋𝑋𝑖𝑖 regression was calculated using the formula 𝑋𝑋𝑖𝑖

183 ______Power and precision of QTL analysis

0 𝑁𝑁 10 𝑅𝑅𝑅𝑅𝑅𝑅 𝐿𝐿𝐿𝐿𝐿𝐿 = log ( 1) where N is the population sie, RSS0 is2 the residual𝑅𝑅𝑅𝑅𝑅𝑅 sum of suares under the null hypothesis of no TL ( for trait values and overall trait mean ), and RSS1 is the residual sum of suares from2 the regression model (roman et al., 200). 𝑅𝑅𝑅𝑅𝑅𝑅0 = ∑𝑖𝑖(𝑦𝑦𝑖𝑖 − 𝑦𝑦̅) 𝑦𝑦𝑖𝑖 𝑦𝑦̅ chromosomewide scan was performed at 1 c intervals and the LOD score recorded at each position.

ignificance thresholds were determined through permutation tests (hurchill and Doerge, 14), with each of the 1000 simulated phenotype sets per parameter set (10 populations 100 phenotypes) permuted once before recording the maimum LOD score from the chromosomewide scan (i.e. recording 1000 maima). This generated approimate eperimentwise LOD thresholds by taking the 0. uantile of the sorted LOD values. TL was declared detected if the significance at the TL position eceeded the significance threshold. ecause the true positions of most TL were not at the grid of 1 c positions tested, splines were fitted to the LOD profile to enable interpolation of the approimate LOD score at the position of the TL itself (which was used to derive TL detection rates).

The values for homologue j at a particular locus were determined as follows

𝑁𝑁 4 𝐺𝐺𝐺𝐺𝐺𝐺𝑗𝑗 = 1 − ∑ 𝜋𝜋𝑟𝑟(1 − 𝜋𝜋𝑟𝑟) 𝑁𝑁 𝑛𝑛=1 using the noDR D probabilities, where is the probability of presence of homologue j in individual n at this locus (ppendi 1). values were calculated at all 1 c 𝜋𝜋𝑟𝑟 splined positions used in the TL scan. We considered the etension of the to include the case of double reduction, but found a homologuespecific was no longer easily defined when an offspring can inherit more than one copy of part of a particular homologue.

To better understand the relative importance of on the power of TL detection, we recoded TL detection nondetection as 1 0 and ran both a simple O as well as a L (using a inomial model with logit link) using the following model

2 where D is the TL𝐷𝐷 detection= 𝑞𝑞 indicator +𝑃𝑃𝑃𝑃𝑃𝑃 +ℎ variable,+ 𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄 and𝑄𝑄 + the𝑄𝑄𝑄𝑄 eplanatory𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄 + 𝐺𝐺𝐺𝐺𝐺𝐺 variables are q (rate of multivalent formation), Pop (opulation sie), h2 (heritability), QTLseg (TL

184 ______Chapter 8 segregation type), QTLact (mode of TL action, either additive or dominant) and GIC (the product of perhomologue values underlying the TL alleles).

To understand the influence of on the detection of more comple TL segregation types, we categorised the perhomologue as either high (H) or low (L) using a threshold of both 0. and 0. for high (so for eample in the former, Low 0.9 and High GIC ≥ 0.9). A DxN or SxS QTL could then be categorised as either LL, LH, HL or HH, depending on the underlying at each of the alleles with positive effect. or each parameter set we compared the power of detection of LL, LH HL (since both have one low and one high allele they were grouped together) and HH TLs.

– The simulations using chromosome 12 were similar to those of chromosome 1 with some differences (Table 1). i different rates of uadrivalent formation were tested (q = 0, 0.1, ..., 0.) and for each set of populationwise parameters, 0 separate populations were simulated. or each simulated population, 0 sets of phenotypes were generated as described above, ecept that the position of the TL was confined to two positions, namely 14 c (telomeric) and 4 c (centromeric). The choice of these positions was not arbitrary they were chosen to minimise the effect that differences in might have on the results, whilst noting that TL positioned at the telomere itself (0 c) would be likely to suffer from lower detection rates due to lower information contents typically observed at the telomeric etremes. The centromeric position (4 c) was selected for study as it is known that the rate of double reduction typically falls to ero at the centromeres (ourke et al., 201).

The TL analysis and setting of significance thresholds was performed as described above (although permutation tests were now based on 200 permutations, one of each phenotype set, with α = 0.05 as before). We also wished to investigate the rate at which the TL segregation type and mode of action was correctly reconstructed. Hackett et al. (2014) describe a TL modelselection method by fitting the phenotype means at the TL peak and comparing the ayesian nformation riterion () (chwar, 1) for , D and models. This is given by the formula

where is the likelihood of the𝐵𝐵𝐵𝐵𝐵𝐵 TL= model −2 log being(𝐿𝐿) +tested, 𝑛𝑛 n ∗𝑝𝑝 is the number of observations (either or 100 for the noDR and DR models, respectively) and p is the number of 𝐿𝐿 parameters in the TL model.

185 ______Power and precision of QTL analysis

The (log) likelihood is a sum over the different genotype classes in the population, defined by their dosages in the case of an additive TL, or presence absence of the TL in the case of a dominant TL

𝑛𝑛 1 2 log(𝐿𝐿) = − (log(2𝜋𝜋) + 1 + log ( ∑ 𝜎𝜎𝑗𝑗 (𝑛𝑛𝑗𝑗 − 1))) 2 𝑛𝑛 𝑗𝑗 where is the variance associated with TL genotype class j, and is the number of observations2 within genotype class j (so that ). 𝜎𝜎𝑗𝑗 𝑛𝑛𝑗𝑗 𝑗𝑗 𝑗𝑗 s well as the , the kaike nformation∑ riterion𝑛𝑛 = 𝑛𝑛 () (kaike, 14) was also recorded for comparison, where

Whereas Hackett et al. (2014)𝐴𝐴𝐴𝐴𝐴𝐴 restricted= −2 log their(𝐿𝐿) + model 2 ∗ 𝑝𝑝 search to only three TL segregation types, we epanded the search to all possible biallelic TL, comprising in total 240 different TL models (listed in upplementary Table 1). We compared the performance of the search of the full biallelic model space to that of a restricted model space (the first models of upplementary Table 1, containing only , D and TL).

The Altus Colomba () tetraploid potato mapping population was used to eplore both TL models and test the methods described earlier for simulated data. The genetic positions of 10 markers from the olTW 20 array (os et al., 201) were taken from a previouslypublished highdensity linkage map developed using this population (ourke et al., 201), distinct from the maps of Hackett et al. (201) used in the simulation study which were based on a different population. subset of these markers were selected as input data for TetraOrigin (heng et al., 201). arkers were selected so that each consecutive 0. c window had (if possible) one marker of every segregation type (in total there are nine see beginning of ethods section), selecting markers randomly among those with fewest missing values. TetraOrigin D probabilities computed under the assumption of no double reduction (noDR) or allowing for the possibility of double reduction (DR) were saved for later TL analysis (after confirming that homologue numbering between the noDR and DR datasets was consistent).

186 ______Chapter 8

The two traits investigated were plant maturity and tuber flesh colour. oth were scored on an ordinal scale, with maturity scored from 1 (very late) to (very early) in increments of 1, and flesh colour scored from 4 to through varying shades (4 = white, = cream, = light yellow, = yellow, = dark yellow). aturity was scored visually in the field during the growing seasons 2012, 201 and 2014, with flesh colour scored postharvest for each of these years (three replicates). TL analysis was performed as described in the previous sections, using splined D probabilities as weights in both a noDR and DR model for comparison purposes. ndividual analyses per year were performed, as well as a general analysis using best linear unbiased estimates (Ls) generated using the lme function from the nlme R package (inheiro et al., 201) with ear as random effect and genotype as fied effect. TLs were remapped by saturating the LOD support intervals around the TL peaks (no marker binning performed) and reestimating D probabilities in TetraOrigin. TL analysis was subseuently performed at the marker positions themselves (rather than at splined positions) to better estimate the peak positions. The location of the L DO TOR 1 (StCDF1) locus on chromosome (loosterman et al., 201) with gene annotation 000D4000140, and the TROT HDROL 2 (StChy2) locus on chromosome (Wolters et al., 2010) with gene annotation 000DT40002 from the potato genome seuence version 4.0 (pudD enome rowser S. tuberosum group hurea D1 (otato enome euencing onsortium, 2011Hirsch et al., 2014)) were used to compare the physical and genetic positions of the TL peaks. The most likely TL model at the peak positions was eplored by calculating the for all 240 biallelic TL models (see previous section for details) and selecting the minimum as most likely. odels within 10 of the minima were also deemed plausible and recorded.

s epected, there was a clear relationship found between the per homologue and the marker coverage of that particular homologue (an eample is given in igure 2.a). Differences between coupling and repulsion marker information are detectable, for eample where a single 10 () marker tagging homologue 4 in parent 1 at 2. c gave a large boost to the otherwise low values in that region on homologue 4, but also slightly increased the values on homologues 1, 2 and (there is no information about the meiosis of parent 2 from such a marker). The results of both the O and L analyses showed that the eplains a large proportion of the variance for TL

187 ______Power and precision of QTL analysis detection power, although not as much as the population sie or trait heritability (upplementary ile 1).

part from the influence of on detection power, we were also interested in understanding the influence of on the accuracy of TL analysis. or this we eamined more closely the position of TL peaks in relation to their true position, for TL only (since these originate from a single homologue and are simpler to track). We noted a dramatic influence of on the TL peak position in regions of variable , even in situations with 100 detection power (igure 2.b). Local maima in such as that observed in parent 1 homologue 4 at 28.5 cM serve as local “attractors” for TL peaks, an effect seen across all homologues. enerallyspeaking, there is a tapering of profiles at the telomeres, a conseuence of poorer marker information (coming from one side only).

Where is high, the true position and detected TL peak closely corresponded (igure 2.b, 40 – 100 c region). visualisation of the homologuespecific variation in TL detection power is given in igure 2.c. or more comple TL types such as D or TL, we were curious to know whether the presence of a single TL allele on a homologue with high would be enough to detect that TL, or whether high was needed on both homologues. s described in the ethods section, we categorised TLs as either LL, LH, HL or HH depending on the perhomologue underlying the TL alleles with positive effect. s can be seen in Table 2, the detection power of LL type TL tended to be lower than that of LH or HLtype TL, which themselves tended to be detected less often than HHtype TL. n fact, the intermediate class (having only one positive TL allele residing on a high homologue) were detected at approimately the midpoint of the LLtype and HHtype detection rates (upplementary igure 1).

s noted in the previous section, population sie and trait heritability were found to have the most impact on TL detection power, followed by . On chromosome 12 we deliberately chose two TL locations to minimise the impact of and allow a comparison of centromeric versus telomeric effects (with average crosshomologue s of 0. and 0. for 14 and 4 c respectively). The four most important factors in determining TL detection power (ecluding ) were population sie, trait heritability, TL segregation type and TL mode of action (igure ). When we ran an O using all the available eplanatory variables we found that neither the rate of multivalent formation (q) nor the form of the model used (DR or noDR) had any real impact on the TL detection power overall (upplementary ile 2). However, there were

188 ______Chapter 8 some instances here the model could improve detection poer. or eample hen the population sie or trait heritabilit is lo and rate of multivalent pairing is high the model does offer a slight advantage igure 4.a helping to maintain the same level of poer as that achieved hen there is strictl bivalent pairing q . he average distance from the pea to the true position is also adversel affected b double reduction igure 4.b. oever in this instance no matter hat model is used the analsis ill become slightl less accurate at higher values of q although there is some mitigation of the loss of accurac hen the model is used. ere e used distance as an absolute measure – the direction of this distance appeared to be biased toards the side of the ith greater genetic length upplementar igure 2.

High GIC threshold = 0.9 High GIC threshold = 0.95 2 .1 A 4 11 114 124 1 1 2 .2 A 2 1285 1 12 4 1 2 .1 D 41 121 12 124 1 2 2 .2 D 54 1228 15 11 1 22 2 .1 A 4 1 415 1 51 145 2 .2 A 4 2 4 11 5 14 2 .1 D 44 4 2 18 41 11 2 .2 D 85 8 1418 48 15 4 .1 A 54 1255 12 121 5 1 4 .2 A 12 14 124 8 8 4 .1 D 81 114 125 125 8 1 4 .2 D 84 118 12 112 52 2 4 .1 A 4 8 415 12 42 1 4 .2 A 5 4 421 1 52 14 4 .1 D 5 4 45 18 488 1 4 .2 D 11 11 44 148 QTL detection power was determined for two different definitions of “high GIC” – either eceeding . or eceeding .5. esults for different levels of multivalent pairing and model used and no ere combined. umbers in bracets refer to the numbers of separate on hich the estimates of poer are based. N opulation sie h2 heritabilit Seg. segregation tpe here denotes a duple so either 2 or 2 and implies a 11 marer Act. mode of action either A additive or D dominant. and refer to ith alleles on homologues of both o of o and igh and of both igh respectivel. esults for and ere considered euivalent and ere combined.

189 ______Power and precision of QTL analysis

Infence of marer distrition on GIC aes cacated sing no I proaiities on potato chromosome acett et a with parent aes in the ower pane and parent aes in the pper pane erage GIC aes oer the simated popations are shown as ines aoe the marer distrition and carr the same coor n the ais h – h parent

190 ______Chapter 8

– – q –

191 ______Power and precision of QTL analysis

q

192 ______Chapter 8

– –

i.e. allelic QTL models (either a “restricted” search over 58 models or a “full” over 240 models (all possible bi – – –

193 ______Power and precision of QTL analysis

The mea ra of the correct QTL model amo those tested over a rae of differet rates of multivalet formatio (q The left pael depicts results for the restricted model search (ol ad QTL cosidered i total 58 models hereas the riht pael depicts the results he the full biallelic QTL model space as searched (240 models olid lies ith circles idicate the o results hile dashed lies ith triales idicate the results (icludi double reductio mea ra of implies that o averae the correct QTL model as alas raed as most liel (iori ties The averae rates at hich the correctl idetified the QTL cofiuratio (i.e. the fractio of cases here the of the correct model as raed as most liel The averae differece (i uits of betee the selected model ad the true QTL model

194 ______Chapter 8

f e eamie the cause of this poor performace e fid that the has difficulties correctl predicti additive or QTL hereas domiat QTL of these tpes cause o such issue (upplemetar iure 4 The aie formatio riterio ( as foud to perform mariall better tha the if the aalsis as coducted usi the model (data ot sho althouh it ould still be uise to use it i couctio ith the model

To help illustrate our fidis e looed at to ellstudied traits for hich pheotpic data as available from the tetraploid potato populatio is the result of a ide cross betee the late hitecream fleshed starch cultivar Altus ad the earl ello fleshed are cultivar Colomba averae the rate of uadrivalet pairi as 024 (ad as similar betee parets paret 02 00 ad paret 2 025 005 (upplemetar iure 5 cosistet ith a previous estimate of 02 – 0 from this populatio usi ol marer iformatio (oure et al 205 The pheotpic traits themselves (plat maturit ad flesh colour are alread eeticall ell characterised offeri the opportuit to compare QTL pea positios ith the phsical locatio of the uderli cadidate ees as ell as a eploratio of the most liel QTL models or both traits a sile maor QTL as foud ith both the o ad models (iure s ca be see from the bottom pael of iure the for some homoloues as uite variable (e.g. chromosomes 4 or 2 but as overall relativel hih across both paretal maps

There as a sile QTL pea for plat maturit o chromosome 5 aroud 8 – 20 c (Table si the model slihtl more of the variace as eplaied tha ith the o model (44 versus 4 ad the idth of the L2 itervals as arroer ( c versus c he e saturated the QTL reio ith marer iformatio ad rera the calculatios ad QTL aalsis e ere able to icrease the proportio of eplaied variace at the pea QTL positio to 4 ith the pea occurri at approimatel 20 c usi both models (Table 4 The most liel QTL model as a additive oooo x qqqQ with the early allele (‘Q’) coming from Colomba usi both the o ad models (Table 4). Here ‘o’ denotes an allele about which we have no information, which could be ‘q’, ‘Q’ or any other nonsereati allele ote that for a simple QTL a additive ad a domiat model produce the same epected sereatio ad therefore caot be distiuished ithi a populatio The mea maturity of offspring with the early ‘Q’ allele from Colomba as 08 ad ithout as 58 0 The et most liel QTL model as the additive model QQQ Q althouh this as far less likely (δ BIC = 18.1 or 16.8 for the noDR or DR models, respectivel t is possible that a multiallelic model (ith differet earliess alleles of

195 ______Power and precision of QTL analysis different effect sies) may have fitted the data better (Hacett et al., 4), although we did not see the need to model that here.

– aturity no . . . .4 no . . . . 4 no . .4 4 . .4 s no . .4 s . .44 lesh no . 4 .4 colour . .4 no . .4 . .4 4 no .4 4 . 4 . . s no . . s . . Chm. chromosome number Year year of phenotypic measurement, including best linear unbiased estimates (s) over the years Model QT model used, either random bivalents (no) or also allowing for double reduction () Peak (cM) position of QT pea in centiorgans N number of individuals with matching phenotypic and genotypic data LOD score at the pea LOD-2 (c) range of QT support interval c positions (loci within of maximum ) |LOD-2| width of the support interval in centiorgans Var. proportion of phenotypic variance explained by QT pea.

or flesh colour, the no and models resulted in the same rate of explained variance at the chromosome pea (), with the interval for the no narrower than that for the model ( c versus c), in contrast to the results for plant maturity. hen we searched for the most liely QT model we found in both cases (no and ) that a dominant model fit the data best, with segregation type oooo x QQqq (Table 4), while the additive version was not far behind (δ BIC = 3.4 in both cases). ndividuals who inherited either the allele from h or h had an average flesh colour of .4 ., whereas those without them had an average flesh colour of 4. .. nterestingly, the next most liely model was . away using the no model, while only away using the model.

e were interested in comparing the position of the QT peas with the physical location of the candidate genes StCDF1 (for maturity) and StChy2 (for tuber flesh colour). s described in the ethods section, we saturated the support interval

196 ______Chapter 8

c.f. esults using the no model (random bivalent) are shown above those of the model (allowing double reduction) for both traits. significance thresholds are shown as dashed red lines. The lowest panel shows the per homologue for the eight parental homologues (using the no probabilities), using the same colour scheme as igure .a. around the QT peas with marers and reran both the calculations and QT analysis, in a “remapping” of both traits. or both chromosomes three and five, the expected sigmoid correspondence between the genetic and physical position of marers

197 ______Power and precision of QTL analysis

as obsere, ith an approimately linear orresponene in the regions of interest igre a an he pea positions ere omparable sing both the no an moels igre b an an able , hih e ompare to the profiles for the homologes in estion homologe for StCDF1, an homologes an for StChy2 n both ases, the gene position appeare to fall ithin the interals of the peas, althogh there as no ifferene in these interals for plant matrity for flesh olor, the interal as narroer sing the no moel, an appeare to gie a better iniation of the position that the moel igre n both ases, there appeare to be sffiient marer oerage on the important homologes ithin the spport interals, as reflete by the relatiely high ales or plant matrity, the StCDF1 region ha far more mappe marers than elsehere, sggesting that this los as speifially targete in the eelopment of the ol array iteilligen et al, os et al, s an be seen from igre a, e ere nable to separate these marers genetially e to the limite poplation sie se for linage map onstrtion , highlighting the inaeay of this poplation sie for finemapping or

Peak Trait Chm. Model LOD Var. QTL seg. BIC (cM) atrity no A/D oooo A/D oooo lesh no D oooo olor A oooo D oooo A oooo ploration of the maor peas sing all aailable marer information in the spport regions Chm. hromosome nmber Peak (cM) position of pea in entiorgans LOD sore at the pea Var. ariane eplaine by pea QTL seg. segregation type an moe of ation, either A aitie or D ominant A/D is se to iniate that either an aitie or ominant moel is possible, as these annot be istingishe in the ase of simple alleles with a positive effect are denoted ‘Q’, alleles with a negative effect are denoted ‘q’ and those with unknown effect ‘o’; BIC ayesian nformation riterion

lthogh reporte as early as nott an aley, , the inflene of a ariable in the iinity of has essentially been ignore in many sbseent sties both at the iploi an polyploi leel n this sty e hope to reemphasise its importane by emonstrating the effet of lo on etetion poer, as ell as the effet of ariable on preision

198 ______Chapter 8

enetic versus phsical position of suset of arkers within the reapped Q region of chroosoe support intervals for the no and odels overlapped, shown here as horiontal green purple lines he StCDF1 locus is highlighted a vertical arrow at p Inset: Genetic v’s physical position of markers used in initial etrarigin analsis on chroosoe , with reapped region highlighted in red profiles of the reapping of the chroosoe Q for plant aturit no odel results are shown as a green solid line, with odel a purple solid line he s of hoologue on which the Q earl allele was predicted to lie, highlighted underneath for each arker position used in the Q scan are also shown dotted red line axis labels “h5” – “h8” signify parent 2 hoologues through enetic versus phsical position of suset of arkers within the reapped Q region of chroosoe support intervals for the no and odels are shown here as horiontal green purple lines he StChy2 locus is highlighted a vertical arrow at p Inset: Genetic v’s physical position of markers used in initial TetraOrigin analysis on chromosome , with support reapped region highlighted in red profiles of the reapping of the chroosoe Q for flesh colour no odel results are shown as a green solid line, with odel a purple solid line he product of s π(GIC) for hoologue on which the Q alleles are predicted to lie highlighted underneath for each arker position used in the Q scan are also shown dotted red line

199 ______Power and precision of QTL analysis

ccording to our analysis the G is the thirdmost important consideration in a T study after population sie and trait heritability suggesting that dense marker coverage across all homologous chromosomes is important for successful T mapping G values ere found to drop at the telomeres a conseuence of the onesided information available at these regions in the multipoint estimation This has the unanted effect of biasing the T detection positions inards making it unlikely for a telomeric T to be found at the correct position This could be cause for some concern given that telomeric regions tend to be more generich than more centromeric positions oever as telomeric regions are also knon to undergo more recombination Gaut et al 2i et al 25 the extent of this effect is likely to be diminished by the genetic extension of telomeric regions

articularly in the case of autopolyploids knoledge of homologuespecific G values is crucial in predicting hether a T is likely to lie beneath a T peak since variable G profiles can lead to a significant bias in the estimated T position e ere unable to model homologuespecific G in the context of double reduction as it as not obvious ho G should behave hen an offspring can inherit more than one copy of part of a particular homologue This could also be seen as an advantage of using the no model here a homologuespecific G is a clearlydefined concept ppendix

The G cannot be further increased by increasing mapping population sies hich is often thought to be the only ay to increase poer above the limitations imposed by marker density distribution and informativeness f G values are found to be lo on certain homologues it could be orthhile to develop more markers ithin that region on the affected homologues n scenarios here this is impossible e.g. due to long stretches of homoygosity across homologues the investigator remains blind to any potential T ithin that region although it could be argued that such regions are unlikely to harbour segregating T either or complex T types ith more than one positive allele contributing to the trait our results sho that it is preferable to tag all T alleles through nearby informative markers rather than ust one or none

e initiated this study to determine hether it is orthhile to include double reduction in a T model Through a large simulation study e have demonstrated that the improvements to T analysis are at best marginal and sometimes counterproductive e saw at most a 5% increase in detection power for “lowpower” situations (population sie 2 and trait heritability of hen a significant proportion of multivalents are formed q 5 n practice hoever one is unlikely to encounter rates of multivalent

200 ______Chapter 8 formation tis i (oure et al 5 e main disadantae to usin te doule reduction ( model is tat it can efuddle later eploration of peas introducin a lare deree of uncertaint into te most liel sereation tpe and mode of action oweer tere is no ue computational urden to runnin ot analses and comparin results e timelimitin step in te current pipeline was te calculation of parental marer pase usin etrariin (wic is arual a redundant step ien current linae mappin metodoloies wic also determine parental marer pase (acett et al oure et al e suseuent estimation of ot no and proailities can e er uicl enerated etrariin followin wic ot models can e fitted ermutation test results for ot models were found to e etremel comparale and do not need to e run twice per trait analsed ltou ardl surprisin our results suest tat studies in autopolploid species sould focus more on eperimental setup and marer distriution tan on weter doule reduction sould e included in te model or not

reious studies ae attempted to incorporate doule reduction in teir models witout proidin sufficient motiation for suc models apart from te completeness of teir model (ie and u i et al u et al s demonstrated ere and elsewere (oure et al 5en et al estimation of te rates of doule reduction per parental cromosome as well as te freuenc of multialent pairin are now relatiel straitforward ien modern idensit marer datasets it tis stud we ope to ae proided a sufficientl alanced analsis of ot te no and models to allow an informed decision aout wic model is most appropriate ien nowlede of te specific meiotic eaiour of te population in uestion

e compared a searc of te full iallelic model space (under a purel additie or dominant model onl wit one restricted to ust and iallelic and found tat a penalt was incurred includin more models in te searc space

onsiderin all possile multiallelic ( 5 and a more eneral formulation of interallelic interactions would increase te computation time of tis step witout necessaril increasin predictie accurac eerteless for ier ploid leels te lieliood of ain more tan two distinct functional alleles at a sinle locus increases wic also increases te possiilit of noel interactions etween tese alleles (which cannot simply be termed “dominance effects” anymore). Extension of our current approac to include triallelic or noel allelic interaction effects would e relatiel straitforward t is also interestin to speculate weter macine learnin aloritms mit e applied to sole tis prolem rater tan proidin a fied numer of input models as potential candidates usin some form of crossalidation to cec te

201 ______Power and precision of QTL analysis results. e found that a naie application of the was not suitable for the more complex model. his was because by usin the model we were enerally introducin extra classes unnecessarily so as it turned out. eelopin a model comparison framewor that weihts offsprin enotype classes by their probability (since double reduction is a positiondependent phenomenon occurrin with reater freuency towards the telomeres) appears to be necessary althouh inestiations in this direction fell outside the scope of the current study.

e hae shown with this wor that sliht ains in detection power in autopolyploids can be achieed usin a model that incorporates double reduction althouh this comes at the cost of a diminished ability to correctly predict the sereation type and mode of action usin current methods such as the ayesian nformation oefficient. his study also reinstates the importance of hih enotypic information content particularly when framed within the context of autopolyploids. e now live in the age of “high density” marker sets and linkage maps, even in polyploid species. oweer hih marer densities on an interated map do not necessarily translate to hih marer densities per homoloue. herefore such terminoloy may offer a misleadin impression and fail to proide an accurate description of marer density and distribution unless haplotypespecific maps hae also been examined and proided (e.g. iure ). e hae shown here that has an impact on detection power and accuracy. n extendin the definition of to autopolyploids we hope to encourae future studies in polyploid species to include this important enetic description alonside any reported results. Funding for this research was provided through the TKI polyploids projects “A genetic analysis pipeline for polyploid crops” (project number BO.001) and “Novel enetic and enomic tools for polyploid crops” (project number BO.). receied an E short term fellowship to wor in the roup of at io undee cotland ( number – ) durin which this research was initiated. he wor of and the etraploidap software deelopment were funded by the ural Enironment cience nalytical erices iision of the cottish oernment. he authors wish to acnowlede the potato breedin companies eris eeds .. and .. for proidin the phenotypic data on the x population ichiel laassen for helpful comments and discussion and lenn arion for suestions on the final manuscript.

Supplementary data will accompany the online version of the published manuscript.

202 ______Chapter 8

an Ooijen (00) described a procedure to decompose the variance associated with a

T (VQ) into that which is eplained by the markers (VM) and a residual variance for which uncertainty remains (VR) The I is then defined as (an Ooijen, 00)

𝑉𝑉𝑀𝑀 𝑉𝑉𝑅𝑅 𝐺𝐺𝐺𝐺𝐺𝐺 = ⁄ 𝑄𝑄 = 1 − ⁄ 𝑄𝑄 e consider the case of the I for one𝑉𝑉 homologue𝑉𝑉 of a tetraploid, and limit our attention to the assumption of random bivalent pairing (no model) The derivation of the I per homologue when double reduction is admissible is less intuitive and has been omitted here

At a given locus, we assume we have already calculated the IB probability of inheritance of homologue j ( ) using TetraOrigin (heng et al, 01) or some alternative method (e.g. (ackett et al, 01Bourke, 01)) e can consider the IB 1 ≤ 𝑗𝑗 ≤8 probability of inheritance of a particular homologue to be the inheritance probability of the socalled alternative allele ( ), with the corresponding probability of inheritance of the reference allele 𝜋𝜋𝑎𝑎 𝑟𝑟 𝑎𝑎 sing the definition𝜋𝜋 of= variance 1 − 𝜋𝜋 as ar() – () , and using a mean of 1 for the reference allele and 1 for the alternative allele, the residual variance ( is given by 𝑉𝑉𝑅𝑅)

𝑁𝑁 1 2 2 2 𝑉𝑉𝑅𝑅 = ∑(𝜋𝜋𝑟𝑟(1) + 𝜋𝜋𝑎𝑎(−1) − (𝜋𝜋𝑟𝑟(1) + 𝜋𝜋𝑎𝑎(−1)) ) 𝑁𝑁 𝑛𝑛=1

𝑁𝑁 1 2 ⇒ 𝑉𝑉𝑅𝑅 = ∑((𝜋𝜋𝑟𝑟 + 𝜋𝜋𝑎𝑎) − (𝜋𝜋𝑟𝑟 − 𝜋𝜋𝑎𝑎) ) 𝑁𝑁 𝑛𝑛=1 and since ,

𝜋𝜋𝑟𝑟 + 𝜋𝜋𝑎𝑎 = 1 𝑁𝑁 1 2 𝑉𝑉𝑅𝑅 = ∑(1 − (2𝜋𝜋𝑟𝑟 − 1) ) 𝑁𝑁 𝑛𝑛=1

203 ______Power and precision of QTL analysis

𝑁𝑁 1 ⇒ 𝑉𝑉𝑅𝑅 = ∑ 2𝜋𝜋𝑟𝑟(2 − 2𝜋𝜋𝑟𝑟) 𝑁𝑁 𝑛𝑛=1

𝑁𝑁 4 ⇒ 𝑉𝑉𝑅𝑅 = ∑ 𝜋𝜋𝑟𝑟(1 − 𝜋𝜋𝑟𝑟) 𝑁𝑁 𝑛𝑛=1 N

j

𝑁𝑁 4 𝐺𝐺𝐺𝐺𝐺𝐺𝑗𝑗 = 1 − ∑ 𝜋𝜋𝑟𝑟(1 − 𝜋𝜋𝑟𝑟) 𝑁𝑁 𝑛𝑛=1 This is a quadratic function of the IBD probability π, taking a minimum when the IBD

π

204 ______Chapter 8

𝑞𝑞 → 0.5

205 ______Power and precision of QTL analysis

q

Codes: A B C D – additive

206 ______Chapter 8

207

Chapter 9

Multi-environment QTL analysis of plant and flower morphological traits in tetraploid rose

Peter M. Bourke1*, Virginia W. Gitonga1,2*, Roeland E. Voorrips1, Richard G. F. Visser1, Frans A. Krens1, Chris Maliepaard1

1 Plant Breeding, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands.

2 Current address: Selecta Kenya GmbH & Co. KG, P. O. Box 64132 – 00620, Nairobi, Kenya

* These authors contributed equally to this work.

Submitted

209 ______Multi-environment QTL analysis

Abstract Rose, one of the world’s mostloed nd ommerllmortnt ornmentl lnts, s redomnntl tetrlod, ossessn for rther thn two oes of eh hromosome hs ondton omltes enet nlss nd so the mort of reos enet stdes n rose he een erformed t the dlod leel nsted oweer, there m e dntes to erformn sh nlses t the tetrlod leel, not lest ese ths s the lod leel of most reedn ermlsm ere, we l reentldeeloed methods for nttte trt lo deteton n seretn tetrlod rose

olton to nrel the enet ontrol of nmer of e morholol trts hese trts were mesred oth n the etherlnds nd en ne ornmentl lnt reedn nd seleton s nresnl en erformed t lotons other thn the rodton stes, enronmentnetrl re rered to mmse the effeteness of reedn rorms e deteted nmer of rost, mltenronment for sh trts s stem nd etole rles, etl nmer nd stem wdth nd lenth, tht were lolsed on the reentldeeloed hhdenst lne m for rose r wor elores the omle enet rhtetre of these mortnt morholol trts t the tetrlod leel, whle heln to dne the methods for mrertrt elorton n ollod sees

Key words nttte rt os nlss, Rosa x hybrida , totetrlod, morholol trts, mltenronment trls

210 ______Chapter 9

Introduction Rose Rosa x hybrida s wdel onsdered to e one of the most mortnt ornmentl lnt sees rrentl ltted he ens Rosa ontns oth dlod nd tetrlod sees wth se hromosome nmer of seen n onsstent hromosoml nmern sheme hs een roosed, sed on n nterted onsenss lne m whh nororted mrers med n nmer of dfferent rose mn oltons ller et l, nd whh we follow here lso hs onssten n hromosoml nmern hs fltted the omrson of reslts etween stdes, heln to onfrm the oston of mortnt sores of enet rton for lre nmer of trts of nterest

mon these trts, nderstndn the enet ss of lnt nd flower morholo n rose hs een mor reserh m n the rose ommnt for mn ers orholol trts sh s lnt or, resene of rles on stem or etoles, the nmer of flower etls etc. he redomnntl een nestted t the dlod leel eener, resel et l, hert et l, n et l, rndnt nt et l, Romn et l, rhett et l, wth fr fewer stdes ondted t the tetrlod leel Rse et l, onnoorn et l, hs s no dot de to the omltons of enote lln, lne mn nd nttte trt lo nlss n totetrlods, for whh methods nd softwre otons remn mh more lmted thn for dlods oweer, there s n nresn nterest n the enet nlss of toollod sees oorrs et l, ett et l, ore et l, ore et l, ett et l, hmt rle et l, , srred on eerderesn enotn osts

he mort of rose ltrs re tetrlod mlders et l, nd most reedn wor s erformed t the tetrlod leel r et l, ndeed, the deeloment of lre seretn dlod mn oltons s rl of lttle or no ommerl mortne eener nd nde, whh m mose onstrnts on the se of sh oltons enet nshts ned t the dlod leel re often dretl led to relted ollods, t stdes re nresnl dentfn enet nd eenet ontrol mehnsms t the ollod leel tht nnot e dretl redted roentor dlod sees ewlformed ollods m eerene homeoloe loss nd enome restrtrn, ltered tterns of ene eresson s well s trnsrtonl nd eenet hnes olts et l, or emle n totetrlod otto, ot of set of enes were fond to e dfferentll eressed n n eermentl toollod seres tr et l, ore reentl, referentl llele eresson for lre set of enes ws deteted mon nel of s totetrlod otto ltrs hm et l, s modern ltted t rose s rol est lssfed s sementl

211 ______Multi-environment QTL analysis

lloollod wth mnl tetrsom nhertne ore et l, , t s lel to he dered from the hrdston of dstnt t loselrelted roentor sees s sh, nshts nto oth llo nd toollodston re otentll relent n the well stded llotetrlod otton Gossypium hirsutum , tetrlods were fond to he hher elds nd roded hherlt fre thn ther dlod roentors rown n the sme enronment n et l, llohelod whet Triticum aestivum ws fond to e snfntl more slttolernt thn ether ts dlod Aegiolops tauschii or tetrlod T. turgidum roentors hs ers to he een onseene of omnn forle trts from oth rents, t lso from ollodsef henomen sh s the sltnded eresson of trnsorter whh ws trnsrtonll rerormmed follown ollodston n et l, tdes le these emhsse tht ollods m ossess emerent roertes nd trts not fond t the dlod leel t therefore ers referle to nestte mortnt reedn trts t the tetrlod leel, oth to ldte reos stdes n dlod rose s well s to nderstnd the enet ontrol of these trts t the lod leel t whh the re sll seleted for

n ths std we emned nmer of morholol trts sh s the nmer of etls, the resene of rles on stems nd etoles, the stem wdth nd lenth, the hlorohll ontent of lees nd the resene of sde shoots hese trts were ssessed n dfferent lotons, enln n nestton of ossle enote enronment ntertons ton et l, hs s of rtlr relene n modernd ornmentl reedn, where seleton nd rodton re often erformed n dfferent lotons for detls see ton et l tlt of eresson oer oth seleton nd tret enronments s needed f mrersssted seleton s to e effetel led ere, we renlse the dt of ton et l to erform nlss sn the reentllshed hhdenst tetrlod rose lne m ontnn oer snlenleotde olmorhsm mrers ore et l, r rrent std hels to nrese or nderstndn of the enet ontrol of these trts n tetrlod rose, s well s testn nd elorn the effeteness of reentldeeloed methods hter for the enet nlss of toollods

212 ______Chapter 9

Materials and Methods

Plant material and genotyping The tetraploid “K5” rose population, the result of a cross between contrasting lines “P540” and “P867” was used in this study. P540 (the maternal parent) possesses dark red flowers, has prickles on both stem and petiole and is susceptible to powdery mildew, whereas P867 (paternal) has pale salmoncoloured flowers, few to no prickles and is more resistant to powdery mildew (Koningoucoiran et al., 0itonga et al., 04).

The population resulting from this cross originally comprised of 8 indiiduals (an et al., 006) but subseuently was found to consist of only 5 uniue indiiduals (ourke et al., 07) after genotyping with the 68K aghP iom P array (Koningoucoiran et al., 05). The population segregates for uite a number of important traits (including flower colour, for which a T analysis has already been performed (itonga et al., 06)), making it suitable for the current study into morphological traits (an eample of the range of phenotypes obsered for the number of flower petals is shown in igure ). iscrete dosage calls (ranging from nulliple condition (dosage 0) to uadruple condition (dosage 4)) were assigned using the fitTetra package in (oorrips et al., 0) as preiously described (ourke et al., 07).

Figure 1. Example of the phenotypic diversity for petal number (and flower colour) in the tetraploid rose K5 population

213 ______Multi-environment QTL analysis

Phenotyping Plant trials were performed at three locations ageningen () in The etherlands and at inchester farm, airobi () and griflora farm, oro (), two production sites in Kenya. n the etherlands, obserations were made during both the summer of 007 () and the winter of 007 008 () whereas in Kenya the obserations were made during the period anuary – uly 00. ull details of plant propagation and growth conditions are described in itonga et al. (04). description of the traits measured is proided in upplementary Table . Three uenile traits (date of bending, plant height and plant igour) were only recorded in the ageningen summer trial (). ll traits were measured with at least two replicates, with 4 indiidual plants per genotype constituting a replicate. There were two completely randomised blocks for both ageningen trials, and three completely randomised blocks for both Kenyan trials. est linear unbiased estimates (s) for traits across enironments were calculated using the nlme package (Pinheiro et al., 07) in ( ore Team, 06). Pearson correlations between singleenironment traits and multi enironment s were calculated in , as well as the freuency distributions of the traits in each of the enironments (using the density function in ). Linkage map construction and QTL analysis The linkage map used in the current study has already been published, and was created using scripts deeloped using preiously described methods (ourke et al., 06Preedy and ackett, 06ourke et al., 07). ull details of map construction are described in ourke et al. (07). The final integrated linkage maps had 5,65 P markers (not all uniue positions) and coered 55 of the 56 epected parental homologues (the base chromosome number in Rosa is 7 each tetraploid parent is epected therefore to have 28 “homologue” maps, resulting in 56 maps across both parents).

subset of these markers was chosen for the estimation of inheritance probabilities in the population using the Tetrarigin software (heng et al., 06). n an outcrossing autotetraploid there are nine distinct marker segregation types, namely 0, 0, 0, 0, , , , and , where the numbers represent the dosage of the marker in the mother and father, respectiely. ll other marker types (e.g. 4) can be conerted to one of these types without loss or distortion of their linkage information (ourke et al., 06). or each centiorgan (c) interal, a single marker from each marker segregation type was selected (if possible) which had the lowest number of missing obserations across the population. Tetrarigin (heng et al., 06) was run on athematica ersion 0 (olfram esearch nc., 04), allowing both bialentdecoding options (alse True) in the ancestral inference stage. This generated

214 ______Chapter 9 ientitbescent probabilities or the population uner a ouble reuction model (which we subsequently refer to by “DR”) that allowed for both bivalents and multivalents in the parental meiosis i.e. bivalentecoing alse, as ell as a purel bivalent moel “noDR”) for which double reduction is ignored (i.e. bivalentecoing rue n the latter case, unepecte scores are treate as genotping errors b the sotare he olloing settings ere use parental osage error probabilit eps ospring osage error probabilit eps parental bivalenthasing rue hich assumes bivalent pairing preominates across the population in the etermination o parental marer phase he probabilities at the marer positions ere interpolate at c intervals using the eault settings o the smooth.spline unction in ore eam, 26 an save or subseuent analsis

scan as perorme using a eighte regression on the trait phenotpes ith probabilities rom the an no moels as eights he orm o the moel use has been escribe in etail elsehere empthorne, 5acett et al, 2acett et al, 2, namel

′ 2 2 3 3 4 4 6 6 7 7 8 8 here each X𝑌𝑌i is an =𝜇𝜇inicator+ 𝛼𝛼 𝑋𝑋 +variable 𝛼𝛼 𝑋𝑋 + or 𝛼𝛼 one𝑋𝑋 +o𝛼𝛼 the𝑋𝑋 eight+ 𝛼𝛼 parental𝑋𝑋 + 𝛼𝛼 homologues,𝑋𝑋 + 𝜀𝜀 having taen the inheritance constraints an into account, an eighting b the probabilities as calculate4 b etrarigin8 ∑𝑖𝑖=1 𝑋𝑋𝑖𝑖 = 2 ∑𝑖𝑖=5 𝑋𝑋𝑖𝑖 = 2 o simpli the analsis ue to uneual numbers o observations both ithin an across blocs, e use the mean values per environment in the analsis he moel it environmental an eects but not their interaction b irst itting environment eects an saving the resiuals or the scan or the traits bening time, height an vigour that ere measure in the ageningen summer season alone , a single environment analsis as perorme, using the phenotpe values rather than resiuals as the epenent variable enomeie signiicance threshols per trait ere etermine using permutation tests b recoring the maimum score rom each o genomeie scans using permute genotpes, ith the 5uantile o the sorte scores taen as the threshol ingleenvironment analses ere also perorme to assess the stabilit o across environments, ith signiicance threshols per environment an per trait etermine using permutation tests as escribe o acilitate visualisation, thresholcorrecte proiles ere etermine, assigning a value o to an score belo the threshol, an a value o – threshol or an score above

215 ______Multi-environment QTL analysis

Regions for which the D rofile of the multienvironment analysis eceeded the significance threshold were remaed by saturating the D intervals of the eas with etra marers before rerunning etrarigin to generate more recise D robabilities in the vicinity of hese etra marers were selected as reviously described but with a binning window of c in the interval added to the already selected marer set from the initial scan e followed the same aroach as before for detection (weighted regression with the D robabilities as weights) albeit limited to the linage grous where were originally detected

eas from the (marersaturated) multienvironment analysis were also elored to determine the most liely segregation tye and mode of action using the ayesian nformation riterion () (chwar ) as described by acett et al ( ) ll ossible biallelic models (assuming additivity or dominance) were tested at each ea which meant comaring models for each detected or a olyloid various dominance models are ossible here we only tested for the simlest case (i.e. aaa aa a versus aaaa) ny model within of the most liely model (i.e. minimising the ) was recorded as a otential candidate model n cases where there were more than five ossible models the QTL segregation type and mode of action were classified as “unclear”.

singlemarer analysis of variance () was also conducted for each trait on the marer dosage classes for all maed marers with the –log(value) of the model fit used as a roy for the D score ignificance thresholds were determined using permutation tests with N = 1000 and α = 0.05 as described above. ANOVA and IBD based results were visualised together to enable a comarison of the two aroaches

he genotyic information coefficient () er homologue was calculated using the following formula

𝑁𝑁 4 𝐺𝐺𝐺𝐺𝐺𝐺𝑗𝑗 = 1 − ∑ 𝜋𝜋(1 −𝜋𝜋) 𝑁𝑁 𝑛𝑛=1 where is the inheritance robability of homologue j in individual n at a articular locus estimated using the noDR model (hater ) 𝜋𝜋

216 ______Chapter 9

Figure 2. LOD profiles of QTL scans for morphological traits across different growing environments in the K5 tetraploid rose population. Te results from singleenironment analyses are sown as coloured dased lines wit te multienironment analysis results as a solid black line (“Combine”). Environments tested were Njoro (“NJO”), Nairobi (“NAI”), Wageningen winter (“WAG_W”) and Wageningen summer (“WAG_S”). Significance thresholds for the multi enironment analysis as determined y permutation tests are sown as red dased lines. ote tat singleenironment L profiles dased QTL lines eceeding te multienironment L tresolds are not necessarily significant.

217 ______Multi-environment QTL analysis

Results

Trait correlations across environments rait values were in general relativel consistent across environments (Sulementar igure ). owever for the traits chlorohll content and side shoots, we observed a oor correlation across environments, with correlation coefficients as low as . in the case of the number of side shoots recorded in Nairobi when comared to the multi environment Es. We also visualised the freuenc distributions of the henotes across each environment, showing that some traits (like the number of etals) had a ver similar distribution of values across environments, whereas other (like chlorohll content) did not (Sulementar igure ).

Detection of QTL using IBD probabilities We discovered a number of in this stud, although not all traits roduced significant eaks (igure ). O significance thresholds for the multienvironment analses were in the region of . – . as determined b ermutation tests. able summarises the main findings from the initial scan as well as those from the subseuent eloration of the eaks. or most traits, the osition, O score and elained variances were relativel consistent between the initial scan and the subseuent remaing. We were unable to detect for the traits chlorohll content or number of side shoots, either in the single or multienvironment analses (igure ). urthermore, some that were detected in singleenvironment analses were not confirmed in the multienvironment analsis (e.g. s for stem length on linkage grous and in Wageningen winter and summer, resectivel (igure )).

he rediction of segregation te using the aesian Information Criterion (IC) was not articularl clear, since man could be assigned multile segregation tes and modes of action (able ). Onl the eak for stem rickles on linkage grou IC had uneuivocal suort for a single model ( with additive action), with as man as otential models identified in the case of the IC eak for stem width (able ). We comared the results of the multienvironment analsis between a model incororating double reduction ( model) with one that ignored it (no model). In general the results were ver comarable, although the model that included double reduction had slightl more detection ower in that the eaks rose slightl higher above the threshold (Sulementar igure ). revious simulation work has shown that the decihering of configuration and mode of action is less accurate when the model is used (details in Chater ). herefore, given that the model did not alter the results significantl, we oted to onl reort results from the no model for simlicit (whilst still roviding a visual comarison in Sulementar igure

218 ______Chapter 9

Table 1. Summary of multi-environment QTL detected in this study using noDR IBD probabilities Trait N Thr. ICM cM LOD Expl. Var Model Act. BIC Stem A Length oooo x QQqq A 307.8 qqQq x QQqq D 308.1 QQqQ x QQQq D 309.0 QQqQ x QQqq A 309.1 Stem width Unclear: 14 models Prickles on D stem qqQQ x Qqqq A 248.9 Unclear: 6 models A Prickles on A petiole qQqq x Qqqq A 85.1 qQqq x Qqqq D 86.5 oooo x Qqqq A 88.9 Number of A petals qqQQ x QqQq A 343.9 oooo x qqQq A 345.7 qqQQ x QQQq A 348.2 A QQqQ x QQqQ A 346.6 QQqq x QQqQ A 346.8 QQqQ x QQqQ D 350.9 N Thr. N α ICM cM LOD Expl. Var. Model “Unclear” refers to situations wher i.e. there was no clearly best model(s) suggested). “Q” signifies a QTL allele with positive effect effect, “q” with a negative effect, and “o” a neutral effect (or more precisely, no Act. BIC

219 ______Multi-environment QTL analysis

Figure 3. Threshold-corrected LOD profiles for QTL scans of morphological traits in the K5 tetraploid rose population. or each trait and each environment (including the multi environment analysis), significance thresholds were determined using permutation tests, and subtracted from the L profiles shown in igure to show only significant results. The yaes therefore correspond to the Lecess above the significance thresholds. or traits hlorophyll content and ide shoots, no significant regions of the genome were identified and were therefore not plotted. The singleenvironment analyses were conducted for trait data from oro (), airobi (), ageningen winter () and ageningen summer ().

Single-marker ANOVA The singlemarer analysis on the seven multienvironment morphological traits in this study produced slightly different results to that using the probabilities (Table upplementary igure ). n some cases, the power of QTL detection was slightly higher using the ANOVA approach, for example a single marker “K5520_777” on linage group was found to be significantly associated with chlorophyll content whereas no QTL were detected for this trait using the model. single QTL for stem width was identified on with the model, as opposed to

220 ______Chapter 9 using the moel hich colocalise ith the etecte for stem length on 2 or the other traits stem length, prickles on stem or petiole an numer of petals the same set of ere etecte, aleit ith somehat ifferent peak positions ales an 2

Table 2. Summary of multi-environment QTL detected in this study using the single-marker ANOVA additive effects model Trait N Thr. ICM cM -log10(p) Expl. Var Peak marker Marker phase tem ength 20 2 50 05 _2 x tem ith 20 7 7 7 50 05 K77_275 x N rickles on stem 20 2 7 022 K72_57 x 722 7 020 K7_ x 05 K0_2 x rickles on petiole 20 02 _70 x hlorophyll content 20 5 5 0 K5520_777 x Num petals 20 2 5 0 5_0 x 5 0 K5_277 x N numer of offspring for hich genotype an phenotype ata ere aailale for each trait Thr. experimentie significance threshol, etermine y permutation test with N = 1000 and α = 0.05; ICM chromosomal linkage group, using the integrate consensus map numering of piller et al 20 an ourke et al 207 cM centiorgan position of peak log0p significance at peak, using the palue from the moel fit Expl. Var. fraction of phenotypic ariance explaine y the moel at the peak position auste 2 from the regression Peak marker marker ith the highest trait association aoe the threshol on that linkage group Marker phase parental marker phase consistent ith homologue numering from ale n cases here the marker as not phase ue to insufficient linkage to x0 markers, the segregation type is gien instea

Comparison with previously-reported QTL A comparison of our results ith those of preious stuies shoe that some ha alreay een ientifie in other populations on the same linkage groups as e ientifie, hile some iffere ale or example, an earlier stuy in the iploi population foun a for oth stem length an ith on 2 an et al, 2007, although they also ientifie on an 5 for these traits hich e i not or the trait stem prickles e ientifie on , an , hereas preiously stem prickle hae een etecte on 2, an respel et al, 2002ine et al, 200Koningoucoiran et al, 202 he on 2 as reporte y Koning oucoiran et al 202 as foun using the same K5 mapping population an same phenotype ata as here although ith a ifferent type of analysis an ifferent genotypes here, the trait as foun to e associate ith linkage groups A22 an A2, corresponing to our 2 as ell as A, corresponing to in parent n our stuy, there as a rise in O significance alues on 2 at the same position for this trait, although falling ust elo the significance threshol igure 2

221 ______Multi-environment QTL analysis

Table 3. Overview of previous QTL detected for the morphological traits in this study Trait Pop. N Ploidy LG Reference tem length 2x 2 5 an et al, 2007 tem ith 2x 2 5 an et al, 2007 tem prickles K5 x 2 Koningoucoiran et al, 202 2x respel et al, 2002 77 270 2x ine et al, 200 etiole 0 2 52 x aapakse et al, 200 prickles 0 2 52 x hang et al, 200 hlorophyll 2x 2 content 7 an et al, 2007 Num petals 0 2x eener, 2x respel et al, 2002 77 270 2x ine et al, 200 2x iranaint Oyant et al, 200 K5 x Koningoucoiran et al, 202 N mapping population sie use Ploidy of mapping population use, either iploi 2x or tetraploi x LG linkage group numering accoring to the integrate consensus map numering of piller et al 200 n the case of the aapakse et al 200 an hang et al 200 stuies, the numering as impute through linkage ith microsatellite markers see main text for etails

Genotypic Information Coefficient (GIC) he genotypic information coefficient gies a measure of the leel of information proie y marker ata on a scale from 0 to , ith 0 corresponing to no information, an to full information an is homologuespecific in our formulation hen ase on proailities, can e thought of a measure of ho certain e are aout the inheritance of a parental homologue at a particular position, aerage across the population ost alues exceee 0, reflecting the ense marker coerage across all parental homologues An example of the profile for linkage group is shon in igure he telomeric region of homologue h7 of parent 2 ha aitional marker coerage hen compare to the other parental homologues ue to a 0 c stretch of simplex x nulliplex markers hich mappe there he content on all other seen homologues steaily ecreases toars the telomere in this region, hile for homologue h7 it remains close to ips in the can also e seen to correspon to

222 ______Chapter 9 areas of loer marker ensity along the chromosome igure he profiles for all seen rose linkage groups are proie in upplementary igure

Figure 4. Example of the distribution of marker alleles across the eight parental homologues on linkage group ICM 3, with the per-homologue genotypic information coefficient (GIC) plotted above. he loer section shos the four homologues of arent , the upper section those of arent 2 alues ere foun to e in the range 0 – approximately for oth parents, ith higher alues inicating greater amounts of genetic information for that homologue Discussion

Multi-environment QTL stuies often report ithout necessarily consiering hether these ill e useful in practice One of the more important consierations is hether the reporte are roust across enironments or are only expresse in a specific enironment his is particularly releant for moern rose reeing, for hich the selection an prouction enironments often iffer itonga et al, 20 n this stuy e analyse phenotypic ata from three locations, to in Kenya an one in the Netherlans, ith an extra set of phenotypes ealuate uring oth a summer an inter season in the Netherlans e ere most intereste in ientifying that are effectie across oth the selection an target enironment e foun that a numer of ere enironment specific, such as the for stem length present on linkage groups an , ut most ere ientifie in oth the Kenyan an utch trial ata igure he implication is that for the traits stuie here, selection in a nontarget enironment shoul

223 ______Multi-environment QTL analysis

heritable traits (according to the “breeder’s equation”); those

Consistency with previous studies

224 ______Chapter 9

oth aaase et al. () and hang et al () reorted inding a or etiole

ricles in a tetraloid mapping population on their linkage group “B7” (Table 3). hang () roided the rier sequences o icrosatellite arers and hich ere reortedl lined to this trait (hang et al ) n o these rier sequences on the Fragaria vesca genoe roduced hits on chroosoe Fv hich shares snten ith rose (oure et al ) ntriguingl the arer as reiousl used in genoting the aing oulation (oningoucoiran et al ) and as aed on hich ould be consistent ith our result o a single aor or etiole ricles on (igure ) oeer this arer () shoed greatest linage to sile arers on (oure et al () able recobination requenc r ≤ 0.09 and LOD ≥ 20.4), consistent ith the n results ndeed a ore detailed loo at the and as o oningoucoiran et al () ith the as o oure et al () shos that linage grous and should robabl hae been assigned to and to (in other ords the hoologues o linage grou do not aear to or a single chroosoal linage grou) t is thereore unclear here the or etiole ricles o aaase et al () and hang et al () should be laced on or hese sorts o eales deonstrate the ressing need or a reliable hsical sequence in Rosa so that reorted ositions can be traced and quicl checed against the reerence hsical a to hel coare and integrate results ro dierent studies

esite these hurdles e ere able to conir an o the reiouslreorted in the resent stud as ell as roiding robust eidence or soe reiouslunreorted or these iortant orhological traits (e.g. the or ste ricles on and the or etal nuber on ) n general the results ro diloid studies tall ell ith our results ro a tetraloid analsis (e.g. inde et al () or ebener and attiesch ()) onersel an o the reorted ro the diloid oulation b an et al () ere not detected in this stud (e.g. the or ste length and idth on linage grous and or the our searate the detected or chlorohll content (able )) his a suggest that either large dierences in the genetic control o soe orhological traits could eist beteen loid leels or (ore liel) that or these traits segregated in their oulation hilst not in ours nother ossible cause is dierences in the ethodologies o detection and articularl signiicance threshold setting hich a hae led to either inlated e or e errors beteen their stud and this one resectiel

s long as rose breeding and roduction continues to be erored at the tetraloid leel studies at the tetraloid leel continue to hae clear adantages oer diloid studies

225 ______Multi-environment QTL analysis

or eample marker aa that hae been ientiie at the iploi leel ma not neearil perorm ell at the tetraploi leel (e.g. ue to unknon lanking o target hbriiation etc.). Tetraploi opring ith eirable ombination o trait an be iretl ue in urther breeing ork (or ma repreent a inihe ariet) herea there i le ue or opring o a iploi ro. urthermore although there are more tool aailable or iploi geneti anali thoe or tetraploi tuie are beoming inreaingl ophitiate (e.g. Tetraploiap (akett et al. 7) or the metho eribe here). e thereore beliee that geneti anale at the polploi leel ill beome inreaingl ommonplae in the uture.

Phasing QTL ne o the prinipal aantage o T anale uing B probabilitie in polploi i that T etetion i perorme aro all homologue imultaneoul. inglemarker approahe rel on oupling linkage beteen marker allele an T allele an ma be less powerful due to “genotypic noise” from other homologues if the marker is not in omplete oupling phae ith the T allele. Thi beome een more important at higher ploi leel here there are een more T oniguration poible ereaing the likelihoo o ull oupling linkage beteen a T an a nearb marker. n thi tu e perorme both an Bbae a ell a inglemarker T anali. The reult rom both approahe ere uite onitent (a a omparion o Table an ho). oeer e i in a ingle igniiant marker (“K5520_777” on linkage group ) hih a aoiate ith hlorophll ontent through the anali ith no orreponing peak uing the Bbae approah. t i poible that noie in the B probabilitie near the T aerel aete the etetion poer (the at (marker “K5520_777” is located at 45.9 cM) for homologues h2, h4 and h5 were ≈ 0.858 , 0.843 and 0.895 respectively, below the chromosomeale aerage o .33 . an .3 although the o the other homologue ere all aboe .). n the other han e etete a T or tem ith on uing the B moel hih a not pike up in the inglemarker anali. Thi peak o loalie ith the T etete or tem length on inreaing the oniene that an important grothetermining ator i loate in that region.

part rom etetion poer Bbae anale alo oer the poibilit to etermine the mot likel T egregation tpe an moe o ation hih an onl be guee at uing inglemarker approahe (i.e. b looking at the egregation tpe o the mot igniiant marker Table ). The Baeian normation riterion (har 7) ha preioul been hon to orretl ienti the T moel in polploi T tuie ith preition auraie o up to or imple T (etail in hapter ). e ere thereore omehat iappointe ith the reult obtaine in thi tu or T

226 ______Chapter 9 segregation type and mode of action able ). n only one case the on M for stem prickles) did we hae a clear indication of the most likely phase, namely a simple triple additie segregating as . hased information could help improe the accuracy of markerassisted selection, where breeders would no longer select indiiduals based on single marker alleles, but on desirable combinations of parental haplotypes through a combination of specific alleles). aplotype assisted selection in polyploids has yet to be applied to our knowledge) but would reuire much clearer phasing than we were able to achiee in this study to be able to reliably construct a potential tagging haplotypemarker. he possible causes for the uncertainty are many – conflicts in genotype information, inaccurate linkage maps, more comple or multiallelic rather than biallelic) types, poorlyscored or confused phenotype data etc. espite this, our approach of saturating regions with etra markers tended in general to increase the significance and proportion of eplained ariances at positions able ), with a single eception that may hae been due to a suboptimal phasing solution returned by etrarigin which is a stochastic rather than deterministic procedure heng et al., 20)). igh genotypic information coefficients around positions on the releant homologues hae been shown to be necessary for the accurate and clear detection of as well as the precise estimation of their location and phase hapter ) and need to be taken into account if studies in polyploids are to delier applicable results.

Concluding remarks n this study we applied recently deeloped methods to help unrael the genetic architecture underlying a number of important morphological traits in tetraploid rose. Most of the detected were found to be robust across enironments, suggesting that selection for these traits can successfully be performed in locations other than the production site. oweer, traits displaying significant genotype enironment interaction effects should probably not be selected for in another than the target location. part from identifying and confirming an important set of at the tetraploid leel, our work helps pae the way towards haplotypeassisted selection methods in the future. Acknowledgements unding for this research was proided through the TKI polyploids project “Novel genetic and genomic tools for polyploid crops” (project number BO2.0009004). he support of the companies participating in the polyploids proects is gratefully acknowledged. erra igra .. is acknowledged for generating the K5 mapping population and making it aailable.

Supplementary data will accompany the online version of the published article.

227 ______Multi-environment QTL analysis

Supplementary Figure 1. omparison beteen single environment trait values and Best inear nbiased stimates (Bs across environments for each of the seven morphological traits recorded in this study The four environments are shon on the lefthand margin namely Nairobi (NI Njoro (NO ageningen summer ( and ageningen inter ( The suared earson correlation coefficients ( are printed for each comparison Bs values ( ais are eual across the four environmental comparisons

Supplementary Figure 2. istribution of rose morphological trait values over four different environments (Nairobi (NI Njoro (NO ageningen summer ( and ageningen inter ( ean parental values are shon as either solid vertical arros (maternal mean, ♀) or dashed vertical arrows (paternal mean, ♂).

228 ______Chapter 9

Supplementary Figure 3. omparison beteen multienvironment T analysis for the seven multienvironment morphological traits studied using a model that ignores double reduction (no versus one that includes it ( ignificance thresholds (as determined by permutation tests are shon as dashed lines ith little difference found beteen thresholds for the no and models

229 ______Multi-environment QTL analysis

Supplementary Figure 4. ingleenvironment T analysis results for the traits bending time plant height and plant vigour assessed in ageningen ( only ignificance thresholds (as determined by permutation tests are shon as dashed lines the analysis shon used the no model hich assumes random bivalent pairing during meiosis only

230 ______Chapter 9

Supplementary Figure 5. omparison beteen IBbased T analysis results (blac line ith a singlemarer NO results (green dots for the seven multienvironment morphological traits studied ignificance thresholds (as determined by permutation tests are shon as dashed lines ith the red line corresponding to the IBbased thresholds and light green line the NObased thresholds ignificance values for the IBbased analysis are given on the left hand aes (O scores ith the NO significance values on the righthand aes (log(p values

231 ______Multi-environment QTL analysis

Supplementary Figure 6. enotypic Information oefficient (I plots for rose linage groups – visualising the information content per homologue of the IB probabilities used in the initial T scan arer allele distributions for arent on homologues h – h are shon in the loer section ith marer allele distributions for arent (homologues h – h shon in the upper section I values ere found to be in the range – (approimately for both parents ith higher I values indicating greater amounts of genetic information for that homologue The colouring scheme of the I per homologue and the marer distribution on that homologue are identical

232 Chapter 10 polyqtlR – an R package to analyse quantitative trait loci in autopolyploid populations

Peter M. Bourke1, Roeland E. Voorrips1, Christine Hackett2, Geert van Geest1,3, Richard G. F. Visser1, Chris Maliepaard1

1 Plant Breeding, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands.

2 Biomathematics and Statistics Scotland, Invergowrie, Dundee DD2 5DA, UK.

3 Deliflor Chrysanten B.V., Korte Kruisweg 163, 2676 BS Maasdijk, The Netherlands.

233 ______polyqtlR: QTL analysis software

The investigation of quantitative trait loci (QTL) is an essential component in our understanding of how organisms develop and respond to their environment through the identification of genes and their alleles that contribute to trait variation. This knowledge can help guide and accelerate breeding programs aimed at developing superior genotypes through genomics-assisted breeding. Here we present polyqtlR, a new software tool to facilitate QTL analysis in autopolyploids that performs QTL interval mapping in F1 populations of outcrossing autopolyploids and segmental allopolyploids using identity- by-descent (IBD) probabilities. The allelic composition of discovered QTL can be explored, enabling favourable alleles to be identified and tracked in the population. Visualisation tools within the package facilitate this process, and options to include genetic co-factors (QTL positions or markers), covariates, randomised blocks or different environments are included.

Quantitative Trait Locus (QTL) analysis, polyploidy, identity-by-descent (IBD) probability, interval QTL mapping

234 ______Chapter 10

Polyploids, which carry more than the usual two copies of each chromosome, are an important group of organisms that occur widely among plants, particularly domesticated ones (Salman-Minkov et al., 2016). Many theories to explain their prevalence among crop species have been proposed, identifying features which may have appealed to early farmers in their domestication of wild species. Such features include their larger organs such as tubers, fruits or flowers (the so-called “gigas” effect) (Sattler et al., 2016), increased heterosis (Comai, 2005), their genomic plasticity (te Beest et al., 2011), phenotypic novelty (Udall and Wendel, 2006), their ability to be clonally propagated (Herben et al., 2017), increased seedling and juvenile vigour (Levin, 1983), the masking of deleterious alleles (Renny-Byfield and Wendel, 2014) or the possibility of seedlessness which accompanies aneuploidy (Bradshaw, 2016). It is currently believed that all flowering plants have experienced at least one whole genome duplication (WGD) during the course of their evolution, with many lineages undergoing multiple rounds of WGD followed by re-diploidisation (Vanneste et al., 2014). Polyploidy may also be induced deliberately (usually through the use of some chemical cell division inhibitor such as colchicine (Blakeslee and Avery, 1937)), often to combine properties of parents that could not otherwise be crossed (Van Tuyl and Lim, 2003), or to benefit from some of the other advantages listed above.

However, not all features of polyploids are necessarily advantageous. Perhaps one of the greatest difficulties in polyploid cultivation and breeding is the constant re-shuffling of alleles in each generation, a consequence of polysomic inheritance. We are currently witnessing a genomics revolution which could hold the key to understanding and tracking polyploid inheritance in detail. From a breeding perspective, we would like to be able to identify genomic regions that contribute favourable alleles to a particular trait of interest (quantitative trait loci (QTL)), and predict which offspring in a population carry favourable combinations of parental alleles. The software tools to perform such analysis at the polyploid level are still relatively rare – here, we introduce a novel R package, polyqtlR, for performing these analyses in outbreeding populations of autopolyploid and segmental allopolyploid species.

The analysis of QTL within polyqtlR can either be single marker-based, or identity-by- descent (IBD) probability-based. For single marker analyses, the dosage of each bi- allelic marker is used as the explanatory variable in a genome-wide scan. In polyploid species, marker dosage refers to the count of marker alleles (by convention the count of the “alternative” allele at a bi-allelic locus, as opposed to the “reference” allele). In an autotetraploid for example, the possible dosages range from nulliplex (0 copies of the alternative allele), simplex (1 copy), duplex (2 copies), triplex (3 copies) to quadruplex

235 ______polyqtlR: QTL analysis software

copies). he assignent of arer dosage in polploids is a nontrivial proble in itself, but there are a nuber possibilities for achieving this using dedicated softare idsehaug et al., oorrips et al., erang et al., chit arle et al., ).

n the other hand, identitbdescent I) probabilities are the inheritance probabilities of parental alleles in a apping population. n algorith eploing idden arov odels as recentl proposed for tetraploids and ipleented in the softare progra etrarigin heng et al., ). poltl can utilise the output of etrarigin for the analsis of uantitative traits. lternativel, a ore siplistic algorith to approiate I probabilities as developed ithin our lab oure, ) and has also been ipleented in poltl. his algorith has the advantage of being applicable to all ploid levels as ell as being coputationall ver efficient, and as recentl used in the analsis of several traits in heaploid chrsantheu van eest et al., a). poltl offers the possibilit to eplore the possible allelic coposition at positions and to trac the transission of such alleles ithin the apping population. ignificance thresholds can be deterined using perutation tests, ith parallelised calculations to iprove perforance. he inclusion of genetic cofactors, phenotpic co variates or cofactors to specif eperiental design e.g. blocing factors) can increase the poer of detection, as ell as alloing for possible interactions beteen to be detected. s ith ost ne tools, poltl ill continue to be developed to deal ith advances in genotping technologies as ell as ethodologies for analsis in polploid populations. hat said, the current ipleentation of poltl should alread have a significant positive ipact on the breeding of polploid crops.

poltl reuires dosagescored arer inforation ith an accopaning integrated linage ap fro an population. inage aps can be generated for tetraploid populations using softare such as etraploidap acett et al., ) or polap hapter ). or triploid or heaploid populations polap is recoended as etraploidap does not currentl handle these ploid levels. In the case of tetraploids, linage ap phase and I probabilities can be estiated eternall using the etrarigin softare heng et al., ). urrentl, the onl public ipleentation of etrarigin is ritten in the proprietar softare atheatica olfra esearch Inc., ), although an version ore ea, ) a also becoe available in

236 ______Chapter 10 future. therise, I probabilities can be estiated ithin the pacage using a coputationall efficient internal algorith that uses the phased ap output of polap.

etails of the ethodolog behind the calculation of I probabilities in etrarigin can be found in heng et al. ). he internal algorith for estiating I probabilities in poltl currentl relies on a ethod originall described in oure et al. ) and reipleented b van eest et al. ). ne prereuisite to calculating I probabilities ithin poltl is a fullphased linage ap such as those produced b polap hapter ). In a heaploid species, for eaple, such a ap ould be of the for shon in able .

sI. . cI. . I. . pI. . I. . I. . I. . pI . sI. . tI. . As well as the marker position on an integrated map (columns “LG” for linkage group and “position” in centiorgans), the configuration of arer alleles across the telve parental hoologues is also shon h – h for parent , h – h12 for parent 2). Generally a “1” corresponds to the presence of the segregating allele, although if the original marker coding is used, “1” corresponds to the alternative and “0” to the reference allele.

I probabilities of . are initiall assigned to the population for all parental hoologues at all arer positions copletel uninforative priors). ullinforative arer scores are then used to assign probabilities of in the case of inheritance, or in the case of no inheritance. or eaple, in a tetraploid population at a arer single segregating allele originating fro parent ) hich as phased , offspring ith arer dosage ould initiall be assigned probabilities of .,.,., .,.,.,.), and offspring ith arer dosage ould initiall be assigned probabilities of .,.,., .,.,.,.) noralisation of probabilities occurs later). artiallinforative dosages in the offspring are not used e.g. an offspring dosage of fro a arer – the origin of this allele could be fro either parent). nce all such probabilities have been assigned, probabilities at all other arer positions are approiated b first locating the nearest lined arer ith an inforative

237 ______polyqtlR: QTL analysis software proaility (0 or 1). he proaility is then estimated as r in the case of a starting proaility of 0, or 1 – r in the case of a starting proaility of 1, where r is the recomination freuency etween the two markers, as derived from the map positions (for example, if Haldane’s mapping function (aldane, 11) was used to generate the map, the inverse of this function is used to recalculate r). he proailities are then normalised to ensure for each parent. inally, proailities𝑃𝑃𝑖𝑖 are imputed at a grid of positions using𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 the function in (y default, ∑𝑖𝑖 𝑃𝑃𝑖𝑖 = ⁄smooth.spline at 1 c spacings). 2

Single marker analysis or the single marker analysis, a genomewide scan is performed y fitting the following additive model at each marker position where is the vector of phenotypes, is the vector of marker dosage scores, is the 𝑌𝑌 = 𝑦𝑦̅ + 𝛼𝛼𝛼𝛼 + 𝜀𝜀 overall mean and the residuals. his model assumes additivity of L effects 𝑌𝑌 𝐷𝐷 𝑦𝑦̅ depending on the dosage, which may or may not hold. rom the AA tale, the sum 𝜀𝜀 of suared residuals (RSS1) is recorded and used to calculate the logarithm of odds (L) score as follows (roman et al., 200)

0 𝑁𝑁 10 𝑅𝑅𝑅𝑅𝑅𝑅 where N is the population sie, 𝐿𝐿𝐿𝐿and𝐿𝐿 RSS= 0 islog the residual( ) sum of suares under the ull 2 𝑅𝑅𝑅𝑅𝑅𝑅1 ypothesis of no L, i.e. . he amount of eplained variance at any tested position is also calculated using 2 𝑅𝑅𝑅𝑅𝑅𝑅0 = ∑𝑖𝑖(𝑦𝑦̅ − 𝑦𝑦𝑖𝑖)

𝑀𝑀𝑀𝑀𝑅𝑅 where is the mean suared 𝑒𝑒𝑒𝑒residuals𝑒𝑒𝑒𝑒. 𝑣𝑣𝑣𝑣𝑣𝑣. from the =1− AA tale, and is the total 𝑀𝑀𝑀𝑀𝑇𝑇 mean suares,𝑀𝑀𝑀𝑀𝑅𝑅 (where are the total sum of suares𝑀𝑀𝑀𝑀𝑇𝑇 terms from ∑ 𝑆𝑆𝑆𝑆 the AA tale,𝑇𝑇 N the population sie and the numer of locks, taken to 𝑀𝑀𝑀𝑀 = 𝑁𝑁∗𝑛𝑛𝑛𝑛.𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏−1 𝑆𝑆𝑆𝑆 e one in the case of no eperimental locking). his is the same formula used to 2 𝑛𝑛𝑛𝑛. 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 generate the adusted from the linear model fit (as returned y the summary.lm function in ).

IBD-based analysis he ased analysis has two different forms, depending on the origin of the proailities. f the proailities were estimated in etrarigin, then the proailities of all possile tetraploid genotype cominations are known, either possiilities in the

238 ______Chapter 10 case of purel ialent pairing, or possiilities in te case ere uadrialent pairing is also alloed ese genotpe proailities are ten used as eigts in a regression model originall proposed emptorne (emptorne, , ut more recentl adopted in te etraploidap pipeline (Hacett et al, Hacett et al, Hacett et al,

ere te constraints′ and ae alread 𝑌𝑌 =𝜇𝜇 + 𝛼𝛼2𝑋𝑋2 + 𝛼𝛼3𝑋𝑋3 + 𝛼𝛼4𝑋𝑋4 + 𝛼𝛼6𝑋𝑋6 + 𝛼𝛼7𝑋𝑋7 + 𝛼𝛼8𝑋𝑋8 + 𝜀𝜀 een applied Here, te Xi are indicator ariales for eac parental omologue ( for 𝑋𝑋1 + 𝑋𝑋2 + 𝑋𝑋3 + 𝑋𝑋4 = 2 𝑋𝑋5 + 𝑋𝑋6 + 𝑋𝑋7 + 𝑋𝑋8 = 2 parent , for parent in a tetraploid and is te residual term

or te proailities estimated internall,𝜀𝜀 te proailities of combinations of parental alleles are unnon, ut te ineritance proailities of eac individual parental omologue is non e form of te model appears identical, i.e.

except no te regression′ is uneigted, and te Xi are ectors of ineritance 𝑌𝑌 =𝜇𝜇 + 𝛼𝛼2𝑋𝑋2 + 𝛼𝛼3𝑋𝑋3 + 𝛼𝛼4𝑋𝑋4 + 𝛼𝛼6𝑋𝑋6 + 𝛼𝛼7𝑋𝑋7 + 𝛼𝛼8𝑋𝑋8 + 𝜀𝜀 proailities of a specific parental omologue at a locus

or a exaploid, te model is extended to include ten of te tele parental omologues, i.e.

′ 𝑌𝑌 =𝜇𝜇 + 𝛼𝛼2𝑋𝑋2 + 𝛼𝛼3𝑋𝑋3 + 𝛼𝛼4𝑋𝑋4 + 𝛼𝛼5𝑋𝑋5 + 𝛼𝛼6𝑋𝑋6 + etc. 𝛼𝛼8𝑋𝑋8 + 𝛼𝛼9𝑋𝑋9 + 𝛼𝛼10𝑋𝑋10 + 𝛼𝛼11𝑋𝑋11 + 𝛼𝛼12𝑋𝑋12 + 𝜀𝜀 pproximate significance tresolds are determined using ermutation ests (urcill and oerge, default, permutations of te genotpes are generated, folloing ic te maximum genomeide scores are recorded from eac of te analses, and te t uantile of te ordered score is taen as an approximate significance tresold Hoeer, tese numer are merel guidelines, and te user as full control oer te numer of permutations and te selection of te approximate Type I error rate α.

n cases ere a largeeffect is present and segregating in a population, it can e adantageous to reduce te leel of acground noise at oter loci accounting first for te maor and running an analsis on te corrected penotpes uc an approac as preiousl een termed multiple mapping (ansen, or composite interal mapping (eng, n poltl, e offer a simple approac to correct for genetic cofactors, eiter suppling te name of a marer closel lined to

239 ______polyqtlR: QTL analysis software te maor pea, or te pea position from te genomeide scan (ic is usuall performed at a grid of splined positions ere is no limit to te numer of co factors tat can e added, ut a parsimonious analsis it onl significant, unlined as genetic cofactors is recommended (as ell as analses including tese co factors singl e model descried aoe for proailities is initiall fitted at te supplied position(s and te residuals are saed to replace te ector of penotpe alues in te scan

enotpic coariates tat ma represent a source of confounding ariation can also e supplied, and tese are also initiall fitted ( and te residuals saed as a replacement for te penotpes under inestigation n cases ere incomplete 𝑌𝑌 ~𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 coariates are supplied (i.e. containing missing alues te missing alues are replaced it te mean coariate alue, ilst arning te user of te extent of tis prolem f experimental design factors are included in te analsis, te are also fitted ( after ic te residuals are used to perform te genomeide scan ote tat if 𝑌𝑌 ~𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 locs are included, te apparent dimension of te model increase e genotpe matrix is erticall concatenated according to te numer of locs, ic also motiated our decision to permute te genotypes rater tan te phenotypes in te ermutation est 𝑌𝑌 (te association is roen in ot cases, ut permuting genotpes and ten concatenating tem ensures te correct leel of nesting of permutations is acieed ot locs, coariates and cofactors can e included (or an comination of tese tree, in ic case locs are first fit, folloed coariates and ten cofactors ecause te analsis relies on simple linear models (using te lm function in ( ore eam, , missing penotpic alues can e prolematic f locs are used, tere is te possiilit to impute missing penotpes using te fitted model for loc effects and te nonmissing penotpe scores for tat indiidual in te oter locs default, at least oserations are reuired for imputation (e.g. minimum out of penotpes non missing for tat indiidual in a loc situation

ne of te main adantages of an ased analsis oer singlemarer metods is te ailit to explore pea positions to determine te most liel configuration and mode of action (additie dominant ese can e compared using a model selection procedure suc as te aesian nformation riterion ( (car, as preiousl proposed (Hacett et al, n addition, e can run a peromologue analsis on linage groups non to possess segregating o aciee tis, e rerun te analsis ile restricting te genotpic matrix to a single column of aplotpespecific proailities is ma assist in te interpretation of te most 𝑌𝑌 liel origin of faourale (or unfaourale alleles (altoug onl relial so in

240 ______Chapter 10 cases of a simplex – for more complex, multiallelic te output can e misleading n cases ere a locing factor(s is included in te analsis, est linear uniased estimates (s sould first e calculated using te function proided (ic uses te lme function from te pacage nlme (ineiro et al, e enotpic nformation oefficient ( is a conenient measure of te precision of our noledge on te composition of parental alleles at a particular position in eac offspring, aeraged across te mapping population e of omologue j ( at eac position is calculated from te proailities using te formula 1 ≤ 𝑗𝑗 ≤ ∑ 𝑞𝑞 𝑁𝑁 4 𝐺𝐺𝐺𝐺𝐺𝐺𝑗𝑗 = 1 − ∑ 𝑃𝑃𝑛𝑛,𝑗𝑗(1 − 𝑃𝑃𝑛𝑛,𝑗𝑗) ere is te sum of parental ploid𝑁𝑁 leels,𝑛𝑛=1 N is te population sie and is te proailit of ineriting omologue j in indiidual n at tat position (tis is a ∑ 𝑞𝑞 𝑃𝑃𝑛𝑛,𝑗𝑗 generalisation of te measure used in ap (an oien, an oien,

e isualisation of results is an important aspect of an analsis and is not neglected in poltl profile plots across all cromosomes can e simpl generated, and can e oerlaid it preious analses if direct comparisons are reuired (suc as te impact of including a cofactor (or multiple cofactors in te analsis e profiles of eac parental omologue can also e plotted in a similar fasion, facilitating a comparison eteen te position of peas and te landscape in tat region e results of omologuespecific analses can also e plotted to elp interpret te possile configuration, and te results of te model selection procedure ( are also plotted to gie an indication of te lieliood of competing models e aplotpe conformation of eac indiidual for an specified genomic region can e plotted, eiter for selected indiiduals or for tose indiiduals it extreme penotpes t is also possile to automaticall screen for and identif recominant indiiduals itin a specified region, or indiiduals carring doule reduction products

241 ______polyqtlR: QTL analysis software

h2 † h2 h2 gene action, either “additive”, “dominant” or “epistatic additive” † “Broadsense” heritability of the simulated trait, as defined by 2 𝜎𝜎𝑔𝑔 2 2 2 ℎ = 𝜎𝜎𝑔𝑔+𝜎𝜎𝑒𝑒 i 𝑃𝑃𝑖𝑖 𝑑𝑑𝑖𝑖 2 𝑃𝑃𝑖𝑖~𝒩𝒩(𝑄𝑄𝑖𝑖 ∗𝑑𝑑 , 𝜎𝜎𝑒𝑒 ) + 𝒩𝒩(𝑌𝑌, 𝑠𝑠)

2 2 1−ℎ 2 2 𝜎𝜎𝑒𝑒 = ( ℎ )𝜎𝜎𝑔𝑔 2 h2 𝜎𝜎𝑔𝑔 to the “broadsense” heritability

242 ______Chapter 10 inherited, dosageeffects ere also included or traits and here multiple contributed, singlelocus phenotypes were added together to produce a “multilocus” phenotype sing the sample tetraploid dataset described, e estimated B probabilities using etrarigin heng et al, as ell as using our on B estimation algorithm for comparison he etrarigin Bs ere used to perform a analysis for the three simulated traits described in able , ith the results visualised in igure or rait, three significant peas ere detected as listed in able , corresponding to the positions of the three simulated or rait, a single very sharp pea as observed at

ignificance thresholds are shon as horiontal dotted lines, as determined by permutation tests or rait, a cofactor analysis using the maor pea on linage group yielded no etra although the significance threshold reduced slightly, as shon by the loer blue dotted line or rait, the addition of the three peas as cofactors reduced the significance threshold also loer purple dotted horiontal line, although no ne ere detected purple profile line he distribution of marers on the integrated linage map are shon at the ais as vertical dashes rue positions are shon by green triangles

243 ______polyqtlR: QTL analysis software

c although the support interal did not include the true position which was at c coactor analysis using the pea position on linage group did not result in any etra eing detected igure second panel or rait three signiicant peas were detected on linage groups and with the remaining smaller on linage groups and not detected ncluding either o the peas on linage groups and as a coactor increased the signiicance at the other pea ut had little eect on the proile on linage group igure ncluding all three peas simultaneously as coactors in the analysis reduced the signiicance threshold slightly ut did not result in the detection o any new

he alues per homologue were calculated and ound to oten e close to ull inormation oweer some maor dips in the proile can e seen or eample on linage group or homologues and corresponding to the region on the genetic linage maps which harours no segregating marers upplementary igure

rait A D A A D rait A rait D A D A Unclear : 10 models a inage roup o position osition o pea in centiorgans c support interal in centiorgans d signiicance threshold as estimated rom permutation testing with N and α e dusted rom the model it at the pea the proportion o ariance eplained y the pea redicted coniguration using model selection procedure ased on BIC. “Unclear : 10 models” refers to situation when no clear minimum BI occurred here with competing models within units o the minimum g ode o action either additie A or dominant D h ayesian normation riterion alues or models listed in column ‘Configuration’

he prediction o segregation type and mode o action y selecting the model which minimised the was accurate or rait and rait ut prediction o the coniguration o underlying rait proed more diicult ale t each pea oth additie and dominant iallelic models are tested o which there are in total – c.f. hapter and any models within o the minimum are

244 ______Chapter 10 returne as ossile moels igure . or rait a iallelic moel for the was not entirel aroriate an the reicte te Qqqq qQqq ale was not eactl that which was simulate Qooo oqoo where o is an allele which oes not contriute to trait ariation while Q an q o. In some cases more than one moel was suggeste igure .a while at times one clearl otimal moel was foun igure .. he outut of a single homologue analsis was also isualise which ma hel in the interretation of results an eamle of which is shown in igure .c. he reicte moel can e erifie eamining the halote structure of offsring with etreme henotes which in theor shoul carr some or all of the “positive alleles” for the QTL igure .

he Baesian Information Criterion BIC results for the linage grou ea of rait. ach numer corresons to a ifferent configuration for which there are in total ulementar ale . he aitie moel moel minimises the BIC for this trait an moel the ominant ersion of this moel was secon most liel. BIC results for the ea of rait with the aitie moel suggeste moel . ere there are no other moels within BIC of the minimum. erhomologue analses can also comlement the analsis here the results of rait are shown. Colour scales are use to inicate significance of the singlehomologue moel fit with the irection of the homologue effect shown lac lines aoe ais ositie elow ais negatie. ere we see that ositie allele effects originate from homologues an .

245 ______polyqtlR: QTL analysis software

the silate ataset provie ith the paae e i ot ile arivalet pairi strtres or preferetial hroosoal pairi ri the poplatio silatio i eireei oever a ertai proportio of ltivalet pairi strtres are epete to for ri atopolploi eiosis hih a lea to the pheoeo of

ole retio aalsis of the potato poplatio eotpe si the olT sile leotie polorphis arra os et al as esrie i ore et al reveale the presee of ole retio prots i a sset of iivials i the poplatio a saple for potato liae rop is sho i ppleetar ire

Three QTL ere etete for Trait ire o liae rops L a The QTL positios are epite otte re oe ith plsalleles of positive effet sho re stars a alleles of eative effet a ope sare ivials a ha the hihest trait vales L a resp poplatio ea eeral these iivials arrie the preite favorale alleles at these positios h visalisatios a help ofir QTL aalsis reslts as ell as failitate haplotpeassiste seletio

ole retio i eeral eerts ol a ior iflee o liae aalses ore et al ore et al or QTL aalses hapter a a safel e iore i ost ases poltl QTL aalsis si a oel that iles ole retio is straihtforar provie that proailities fro a oel that alloe for ole retio are availale for eaple si Tetrarii he et al e aitioal featre of poltl that a e sefl for ee appi a seletio is its ailit to ietif reoiat iivials ithi a speifie reio of iterest a

246 ______Chapter 10 eaple of hih is sho i ppleetar ire here e ietifie eiht iivials ith a reoiatio etee hooloes a i the iterval – o liae rop

iall e also perfore a silearer aalsis si the arer osaes as a aitive preitor of pheotpes hihlihti soe ifferees etee ase a sile arerase aalses The ase aalsis as fo to e slihtl ore poerfl ith the L pea for Trait ei etete si proailities ire t isse etirel si ol arer osaes ppleetar ire iifiae threshols varie etee approahes t ere ot osistetl hiher or loer i oe set or the other sile etreelsiifiat arer as fo to e assoiate ith Trait o L as a e see i ppleetar ire This arer (“DN_III35.6”) mapped to 19.3 cM (just 0.7 cM from the QTL) and was phased exactly as the QTL itself other ors it as a earperfet arer for this partilar QTL

this paper e have esrie the possiilities for perfori QTL aalsis i i paretal poplatios of atopolploi speies si the poltl paae i e i this si silate tetraploi arer osae ata a three assoiate atitative traits ith otrasti eeti arhitetre reore over three ears e ofie or attetio to a tetraploi ataset a have ot eostrate trait aalsis i triploi or heaploi poplatios the other ost ool eotere ploi levels for eperietal plat poplatios a reei proras oever all ftios ithi poltl have ee evelope to e ale to aalse these a potetiall a other ploi levels a therefore the reslts ol e sstatiall siilar see for eaple va eest et al for a esriptio of a QTL aalsis i the atoheaploi oraetal speies Chrysanthemum morifolium this isssio e osier a er of poits – the relative avataes or isavataes of ase aalses over silearer approahes the iportae of a opariso ith alterative softare a the sope for ftre evelopets

poltl offers the possiilit to perfor QTL aalses either si ietiteset proailities or si the arer osae sores iretl lthoh e shoe that the ase aalsis ha soehat ore poer eteti a QTL that as isse the silearer aalsis e have ot perfore a oprehesive poer aalsis

247 ______polyqtlR: QTL analysis software to erfy ths. IDased analyses allow us to predct the most proale QTL sereaton type and mode of ene acton as well as tracn the transmsson of faourale or unfaourale alleles n the populaton. They therefore offer a more comprehense approach to QTL detecton than snlemarer studes. oweer there may also e stuatons n whch snle marer analyses may e preferale to IDased approaches. or example n cases where t s not possle to relaly assn dscrete dosae scores contnuous marer enotypes mht e the only aalale enotypn data. These hae already proen to e useful n QTL assocaton mappn studes n arous polyplod speces (rande et al. 016Tumno et al. 016). The methods to determne ID proaltes usn contnuous enotypes rather than dscrete dosae scores are not yet deeloped (nor ndeed are there methods to construct lnae maps from such enotypes to the est of our nowlede). In such crcumstances a snle marer analyss may e the only aalale opton. In the unlely eent that an nddual marer cosereates nearly perfectly wth a QTL allele (as was the case here for Trat_) then snlemarer approaches wll also e at least as powerful than IDased analyses f not more so. In cases where the estmaton of ID proaltes s dffcult due to serous errors n the enotypes lnae map andor marer phasn then (accurate) snle marers may proe more sutale to detect QTL effects than an IDased analyss. oweer on alance oth approaches hae ther merts and t s enerally a ood dea to perform oth analyses and comne results for a more comprehense pcture.

It has preously een shown that a hh and unform I s needed for accurate QTL analyses (hapter ). ere we descre the calculaton of I and appled t to a smulated tetraplod dataset from the polytl pacae. oweer the usefulness of the I measure depends to a lare extent on the ualty of the alorthm to enerate ID proaltes upon whch t s ased. In cases where the ID proaltes are close to 0 or 1 ut are also ncorrect the I may e msleadn the I can only ndcate reons where there s low nformaton content not incorrect nformaton. In what we present here we enjoyed unrealstcally fortunate crcumstances wth errorfree smulated data contann no mssn alues. Ths was reflected n consstentlyhh Is for most homoloues and lnae roups (ure 1 ottom panel). In realty we are unlely to encounter such hhualty datasets. lthouh we hae studed the I n detal elsewhere (hapter ) we hae not yet attempted to uantfy what consttutes a “sufficientlyhigh” GIC. However, we found that regions of variable GIC will tend to as the detecton of QTL to nehourn loc wth hh I and that I s a stron predctor of QTL detecton power (hapter ). Therefore t s prudent to examne the I profles around QTL peas partcularly on the homoloues whch are ndcated to carry the QTL alleles.

248 ______Chapter 10

he only other software that is currently available to erfor siilar analyses is etraloida Hacett et al., , available as free downloadable software with an advanced userfriendly GI that can roduce integrated and hased linage as as well as erfor interval aing in tetraloid oulations. e transfored our siulated arer dosages, hased linage a and henotyes into coatible inut and reran the analysis for the henotyes we used the s as henotyes as there is no facility for including blocs in . n overview of the results is given in uleentary able . n the whole the results are very coarable. detected the sae eas in alost recisely the sae ositions, although the ea scores and estiated elained variances were not eactly the sae tended to have slightly higher scores and greater elained variances at the eas, although both software eloy siilar ethods. he aor difference in the results was in the redicted segregation tye and ode of action. art fro additive and doinant odels, also tests “codominant factors”, where the QTL genotype classes are allowed to have independent eans. esite this, the odel search sace is currently liited to , and tyes, whereas we currently consider all ossible segregation tyes there are biallelic odels to consider in total Chater , uleentary able , albeit without the codoinant odel currently included. If we coare the redictions of olytl able and uleentary able with the true configurations able , both erfored well, but there are soe sall differences. In general, returned ore ossible odels which we tae as being within IC units of the iniu IC, which corresonds to between ositive and strong evidence of a eaningful difference between odels eath and Cavanaugh, . or the linage grou of rait with configuration , found that was the best fitting, which is as far as it is able to go. olytl also tests all possible 1x2 QTL types (what we term “simplex x duplex” QTL), and therefore found that the model Qqqq x QqqQ was the most likely. For Trait_2 (the “monogenic” trait both software correctly guessed that the segregated as with additive effect. However, also returned a second coeting odel, naely QQqq x qqqq with codominant effect, with a ΔBIC of 1.9 (Supplementary Table 2). For olytl there was no coeting odel within at least IC, i.e. it was a very clear frontrunner. Interestingly, both software had soe difficulties with rait – neither detected the inor s on G or , and olytl in articular had robles in correctly assigning the segregation tye for the eistatic s of G , and . did better, correctly redicting the configuration of all three , but not always the ode of action which was difficult to define – an allele count of at least was needed at all three loci to roduce a subseuent additive effect. oth software use

249 ______polyqtlR: QTL analysis software the ayesian nformation riterion () for model comparisons, and therefore discrepancies are most likely due to slight differences in the probabilities (which we were unable to compare as they are not part of the T output).

The statistical models behind the QTL analysis in polyqtl are simple yet powerful, with most of the complications of polyploid inheritance already dealt with in the earlier stages of phased linkage map construction and in the calculation of probabilities. There is however still scope to expand upon the developments already implemented. For example, we have not yet implemented any formal procedure for detecting epistasis. s we saw with Trait_, polyqtl appears to have difficulties in correctly identifying the QTL configuration in such circumstances. n our treatment of genetic cofactors we do not currently compare between models, but a model comparison framework such as the kaike or ayesian nformation riterion could be quite instructive here. There is also currently no provision for investigating genotype x environment (x) interactions, despite these interactions being relatively common (van euwik et al., 21). ne reason we have not (yet) implemented x in polyqtl is the difficulty of including an interaction term for a polyploid population. hould this interaction effect be investigated homologuebyhomologue erhaps two at a time, or simultaneously n other words, capturing an interaction effect in a polyploid QTL model would likely require some simplifying assumptions which may or may not be appropriate. n time these will no doubt be investigated and implemented.

nother direction polyqtl could take in future is to integrate information across multiple populations in a single analysis either using pedigree or genomic relationship matrices, such as is done for example in FlexQTL (ink et al., 2). The current dependence on biparental F1 populations means that each analysis must be done separately, with perhaps some ad hoc integration of QTL results afterwards. n future, merging such analyses into a single study will improve not only detection power but also the robustness of results, but will require the development of novel approaches to estimate probabilities across populations and pedigrees.

Finally, we also expect that polyploid genotyping may soon differ from the single dosages that are currently the basis for much of the analysis in polyqtl. Time will tell how successful these new genotyping technologies will be at accurately capturing the full breadth of allelic diversity in polyploids and exploiting it for research and breeding purposes. polyqtl is flexible enough to allow adaption to deal with such multiallelic haplotypebased markers, offering greater choice to users in future experiments and analyses.

250 ______Chapter 10

This work was supported through the TKI projects “A genetic analysis pipeline for polyploid crops” (project number BO000001) and “Novel genetic and genomic tools for polyploid crops” (project number BO00000) The authors would like to thank ric van de eg erman van ck en mulders ichiel Klaassen ( lant Breeding) and amillo Berenos (mmen Orange B) for their constructive feedback and discussions

isualisation of the marker coverage of the tetraploid linkage maps provided with the polyqtlR package arent 1 homologue maps are numbered h1 – h with parent homologue maps numbered h – h The integrated maps are shown in the centre for each linkage group (1 – corresponding to the five linkage groups) aps were generated and plotted using the polymapR package (hapter ) ertain regions e.g. 0 – 0 c for linkage group 1 homologues 1 and contain no segregating marker information which is reflected in low I values for the affected homologues (c.f. igure 1)

251 ______polyqtlR: QTL analysis software

redicted haplotype conformation showing possible double reduction segments in red from selected individuals of the potato A 1 population chromosome 1 ighter shades of blue correspond to intermediate probabilities ertain individuals e.g A001 or A00 proved difficult to reconstruct with TetraOrigin and harbour double reduction segments of improbablyshort length

isualisation of the haplotypes of eight 1 individuals from the simulated dataset identified as having a recombination between homologues 1 and on linkage group between 1 c and c The user has the ability to select any set of homologues involved in recombination and any desired stretch of chromosome as well as any threshold probability with which a homologue (or segment thereof) is considered to be inherited (by default 0)

252 ______Chapter 10

inglemarker analysis results (dark green points) using additive linear model on marker dosages to detect T reen horiontal dotted lines correspond to O significance thresholds as determined by permutation tests (N 1000 α 00) Also shown are the results of intervalT mapping using IB probabilities (reproduced from igure 1) with significance thresholds as horiontal gold dotted lines The T peak on for Trait was not detected using the singlemarker approach but was found using the IBbased approach

253

Chapter 11

General Discussion

255 ______General discussion

Foreword Working under the auspices of the department of plant breeding, one experiences a sense of responsibility that research findings should ultimately have a practical application in the plant breeding sector. This is particularly true of a project embedded within a public- private partnership, with a consortium of a dozen breeding companies providing matching funding for this work. That said, there was never the feeling that those enticing yet often time-consuming theoretical side-alleys should not be explored. In this discussion therefore, I address both aspects – how the findings of this thesis might be used to help in the breeding of polyploid crops, but also how this thesis has deepened our understanding of polyploids. Ultimately, both can help in the pursuit of better polyploid crops. As the saying goes, sometimes “there’s nothing more practical than a good theory” (Lewin, 1951). Or taken from a plant-breeding perspective, “the full potential and effectiveness of a plant-breeding program can be realised only when the reproductive, genetic and breeding behaviour of a species is known and understood” (Dewey, 1966). Can experimental populations yield unbiased information on meiosis? In this thesis I have tried to investigate real marker datasets beyond the traditional boundaries of genetic mapping, with a particular interest in polyploid meiotic phenomena. This began in Chapter 3 with a description of the “double-reduction landscape” in a tetraploid potato population. In Chapter 5 we reconstructed the pairing behaviour in a tetraploid rose population and from that inferred that homologue pairing and recombination was not random across all chromosomes. We took the further step of trying to relate homology at the level of SNP markers to the pairing behaviour, testing the theory that chromosomal homology is an important factor in pairing recognition and synapsis formation during meiosis (Naranjo and Corredor, 2008). Later in this discussion I also present some results on meiotic cross-over frequencies in polyploid meiosis using a tetraploid potato mapping population. In all of these cases, I am assuming that an experimental mapping population represents an unbiased record of parental meiosis. This is of course an unrealistic assumption, and like many experiments using living populations, suffers from what could be termed the survivors’ paradox. We only use plants that have survived in a mapping study, which often means that an initial selection has already occurred before the experiment has even started ( during gamete formation, post-fertilisation death of embryos, non-, pre- or post-emergence diseases ). One of the well-known consequences of such unknown selection is marker skewness (also called segregation distortion (Mangelsdorf and Jones, 1926)), where

256 ______Chapter 11 unepected or unbalanced proportions of marker alleles are observed. These markers are often removed prior to mapping to avoid the possible complication of false linkages, although in so doing, parts of entire linkage groups may unwittingly be removed an oien and ansen, 3.

ne of the uestions that arose during our work on rose meiosis Chapter 5 was whether segregation distortion and preferential homologue pairing could leave identical signatures in a mapping population and hence be confounded effects. e argued that the overabundance of certain pairing combinations was unlikely to have been caused by segregation distortion, unless for eample a combination of alleles proved to be debilitating but nonlethal to the spore in four of the si homologue pairings possible which is uite unlikely, but not impossible. In potato we estimated the average rate of double reduction at the centromeres and telomeres Chapter 3, which could have been influenced by systematic bias if for eample gametes resulting from multivalents were more freuently infertile than those from bivalents loyd and omblies, . ith only genotyping data from a mapping population we cannot definitively resolve these ambiguities.

There are some complementary eperimental approaches that could help provide more conclusive answers to the precise nature of the biological phenomena involved. any of the earliest papers on polyploid meiosis were cytological – i physically observing the pairing behaviour of chromosomes during meiosis and for eample counting the number of bivalents and multivalents that persisted into metaphase I estergaard, Cadman, 3amm, 5waminathan and oward, 53. I luorescence i siu hybridisation or I enomic i siu hybridisation techniues could be used to identify homologous and homoeologous pairing during meiosis, although till now these techniues have mainly been used to establish base chromosomal number or identify subgenomes in polyploids (D’Hont, 2005;Eberhard et al., Chester et al., iu et al., . Another approach to investigate meiosis is tetrad analysis reuss et al., . In raidopsis aiaa, mutant lines carrying mutations in the genes or produce conoined pollen tetrad cells which can help reveal the eact pairing and recombination behaviour at meiosis inker et al., 3. owever, being a recessive mutation reuss et al., it is uncertain how useful mutants would be in genetic studies of polyploids, assuming of course that homologues or genes of similar function eist. Tetrad analyses also suffer from the survivors’ paradox, given that experimental results are based on surviving pollen cells the original study found that only half of all tetrads possessed four viable pollen grains reuss et al., . Therefore it seems that almost all eperimental approaches, no matter how elegant, rely on surviving germplasm with which to make inferences on

257 ______General discussion meiosis. he original tologial studies ere arguabl the most unbiased regarding meiosis itsel, but the viabilit o the observed strutures as inevitabl unnon. erhaps an exhaustive analsis that tras ever single spore and egg ell ould be imagined, but in pratie it seems hardl easible. or a breeder the issue o population indued bias is perhaps moot, as breeders onl or ith surviving germplasm. or a undamental researher hoever, suh biases need to be areull onsidered and aounted or. ur or sits somehere beteen theoretial and applied genetis, and it is thereore hoped that highlighting suh issues here and in previous disussions is suiient or no. he results e have obtained in this thesis regarding polploid meiosis using experimental populations are ertainl relevant or breeders, although orroborating data, ideall togeneti, ould be advantageous in uture studies to ull address the more undamental meioti uestions e have attempted to anser. Cross-over rates in bivalent and multivalent meiosis ne o the e ators that ensures the orderl division o homologues during meiosis is the ormation o hiasmata hih helps generate mehanial tension at the entromeres and inetohores during spindleibre ormation. his tension is sensed b the ell, signalling that spindle ibres are orretl attahed and that homologues should segregate orretl to opposite poles (ilas, ;ampson and heeseman, 20. ivalent pairing in a polploid poses no additional hallenges to the mehanisms that have evolved to ensure balaned hromosomal segregation in diploids (sine all pairing in diploids occurs in bivalents). A multivalent on the other hand is a “novel” meiotic struture that potentiall disrupts these mehanisms. homologous pairing ere let unheed it ould lead to meioti haos, ith man unresolved entanglements beteen homologues leading to improper segregation. t has long been hpothesised that a redution in rossover reuen ours in autopolploid meiosis hih ould have a stabilising eet on meiosis (osto, 0;ers, ;ollum, 5;Haaria and ees, . his hpothesis ollos a ertain line o logi eer rossovers ould result in eer multivalents and hene a more orderl meiosis, sine at least three separate rossovers are reuired to generate an eetive multivalent struture that an persist through to metaphase I (Chapter 1). Note that an “effective” multivalent in this ontext generall means a uadrivalent; a trivalent univalent ombination (in a tetraploid is not tolerated b the ell as it ould lead to unbalaned segregation (omblies et al., 20. his has been borne out b a metaanalsis o autopolploid meiosis, here ver e univalent or trivalent strutures ere reorded (amse and hemse, 2002. n theor hexavalent strutures are also possible at higher ploid levels, but these have onl rarel been reported in the literature (amse and hemse,

258 ______Chapter 11

). Assuming that feer crossovers do indeed occur in autopolploids than ould be epected from recombination rates in progenitor diploid species (aaria and ees 1) the uestion still remains ho this might be achieved b the cell. An interesting possible mechanism that as recentl proposed is that an increase in the effective distance of crossover interference could pla a significant role (omblies et al. 1). As breeding ver often relies on recombinations to sort and shuffle favourable alleles into a single genotpe an limitation on the number of crossovers could have implications on the efficac of traditional breeding approaches.

Figure 1. Cross-over rates in bivalent and multivalent pairing structures. a. Average rate of osrvd crossovers per meiosis in the AC tetraploid potato population either resulting from bivalent pairing (red suares) or uadrivalent pairing (blue triangles). rror bars sho confidence intervals around the means for each chromosome. osided ttests on the difference of means ere all highlsignificant ( 1 111). b. ample of possible discrepanc beteen the observed number of crossovers from I probabilities (above) and the actual crossovers that occurred during meiosis (belo). Inherited chromosomal segments are highlighted in green and correspond to the I predictions. oever the recombinant products from ivalent are not transmitted to the gamete and therefore the crossover is not observed. he data used to generate figure (a) onl consider observed recombinations. Crossovers ere identified from the output of etrarigin (heng et al. 1) using a presence probabilit threshold of . (i an inheritance probabilities . ere discounted) and a minimum recombinant segment length of c.

259 ______General discussion

oever there remains considerable uncertaint as to hether crossover rates (and hence uadrivalents) are activel suppressed in autopolploids. In this thesis e have provided clear evidence for the eistence of uadrivalents in the meiosis of autopolploids lie potato and rose (Chapter and Chapter respectivel). herefore the assertion that uadrivalents are aberrant structures that cause a failure in the proper division of homologues (arlington 1ostoff 1Carvalho et al. 1lod and omblies 1) does not hold across all species (and particularl not among established autopolploids man of hich are domesticated crop species). ere e are once more plagued b the survivors’ paradox it could be that the – uadrivalent pairings that e estimated to have contributed to the potato AC population (Chapters ) might onl represent a fraction of the multivalent meioses that occurred the rest having perished. oever even ere this to be true it still represents a large proportion of successful and balanced meioses that involved uadrivalents a fact that some of the literature mentioned above ould have ou doubt is even possible.

In fact e can actuall test the hpothesis that crossover rates influence the pairing behaviour using our population datasets. o demonstrate this point I reanalsed the identitbdescent (I) probabilities of the AC potato population reconstructing the crossover events that ere predicted for each of the 1 individuals (igure 1). I chose a presence threshold probabilit of . (hich meant that if the inheritance probabilit at a marer for a homologue as less than . I did not consider that marer to have been inherited) and ignored an recombinant sections less than c in length (to prevent overestimation of the number of recombinations in the case of nois results). After this initial filtering I counted the number of crossovers from bivalent or uadrivalent meioses (I treated double reduction products separatel and added them afterards). As can be seen in igure 1.a the rate of observed crossovers from bivalent pairing as significantl loer than the number observed due to multivalent pairing across all 1 potato chromosomes ( 1 111). he rates for bivalents comprised of recombinations observed from to bivalents the actual rates of crossover formation pr iva ould be epected to be approimatel the same if e consider that potentiall of all crossovers are unobserved (igure 1.b). he findings here are consistent ith the previouslreported rates of crossovers per bivalent of approimatel 1.1 in the autotetraploids saria viuira . ous oriuaus . and raidopsis arosa . (ulligan 1avies et al. 1ant et al. 1). here is also the possibilit of unobserved recombinations hen multivalents occur although an estimation of hat proportion this represents is less obvious than for bivalents. It ill be somehat less than depending on the total number of crossovers and ho the are arranged beteen chromatids. Crossovers from a uadrivalent structure can be hidden in the gamete if the

260 ______Chapter 11

(which seems a less parsimonious explanation). This phenomenon (“tmosaicism”) is it is considered of little practical relevance, but also because we didn’t have the tools to

Figure 2. Examples of offspring harbouring chromosomal mosaics of three parental homologues from potato chromosome 2 (AxC population).

261 ______General discussion

Opening the polysomic “black box” There are man challenes in breedin polsomic polploid crops. ne of the maor difficulties encountered is that ever offsprin of a cross contains an unnown combination of parental homoloue mosaics. n inbreedin diploid populations, parental lines are often full homoous, in which case the transmission of parental homoloues is essentiall nown. n outbreedin diploids this is not the case, but at an locus each individual carries one of possible allelic combinations. or tetraploids 2 2 this becomes ( ) ×combinations( ) = 4 (assumin no double reduction – it is 1 1 4 4 otherwise), while( ) for× ( hexaploids) = 36 this rises to combinations (or if 2 2 6 6 we include double reduction). hen we start( ) to× consider( ) = 400 multiple loci toether (for example in the case of a trait influenced b a 3number3 of different enes) the number of possible allelic combinations explodes. olsomic inheritance, coupled with hih heteroosit, leads to far more complex inheritance patterns than anthin encountered at the diploid level.

evelopin homoous lines in polploids would simplif matters considerabl, both from a breedin and a enetics perspective. owever, aldane demonstrated the infeasibilit of reachin homoosit throuh repeated selfin (aldane, ), a techniue that is reularl used in diploid breedin and research ( iure of hapter ). part from the impracticalit of inbreedin, man polploids are also outbreedin species that poorl tolerate inbreedin (tetraploid lee (iu apoprasu .) is a ood example (e lerc and an ocstaele, )). This is enerall thouht to be caused b lare numbers of deleterious recessive alleles (also called the i oad, a point to which later return). ecause of these and related difficulties, the use of polploids in breedin prorams is sometimes viewed as an unnecessar complication. fforts to convert potato from a hihlheteroous outbreedin polploid to an inbreedin, homoous diploid are currentl underwa (indhout et al., ans et al., ). n suar beet (a vuaris .) breedin, the use of tetraploid parental lines has been larel abandoned (ewe, racott, ) because of difficulties in autopolploid breedin. reviousl, triploids and anisoploids (mixture of diploids, triploids and tetraploids from an open pollination) were developed for their superior ields, but these have larel been replaced b diploids (crath and un, ).

evertheless, it is unliel that polsomic polploids will completel disappear from breedin prorams. n ornamental breedin, it is routine to induce polploidisation while developin new varieties which has led to a preponderance of polploid in man ornamental breedin prorams (an Tul and im, araseiolaowsa et al.,

262 ______Chapter 11

usa iruus aaus iis diao saiva rioiu pras which are helping to turn the key to open the polysomic “black box”.

ea of whether the polysomic “black box”

263 ______General discussion inestigate by simulation. ien that the costs of genotyping are continually falling the economics of such an exercise may start to make sense ery soon.

part from unerstaning inheritance this thesis has been primarily motiate by the eelopment of tools to fin spii markers to tag important traits. breeer might rightly woner why markers shoul be use if one or fie might be use instea. nee specific markers that tag important loci resistance genes or introgresse segments woul represent a more economical first use of marker technology. oweer the argument that polysomic inheritance is an insurmountable hurle to breeing programs nees to be put into its proper context gien moern techniues. breeer may not necessarily fully unerstan the moe of inheritance of the breeing material which such a marker set coul help reeal. s well as that knowlege of genomewie inheritance coul help facilitate socalled “genomic selection”, which has been aocate for polyploi breeing later et al. but has yet to be shown to prouce substantial gains in practice. enomic selection theory as currently implemente assumes that all markers are potentially in linkage iseuilibrium with genes that affect uantitatie traits of interest euwissen et al. euwissen et al. an therefore ery large marker atasets are often use. ore recently haplotype base approaches hae been propose which may offer greater accuracy an power than single approaches ess et al. . herefore the selection of a set of highly informatie an wellspace markers for a range of applications is likely to become a regular feature of polyploi breeing programs in the future.

264 ______Chapter 11

ocal a o al ak o ack o a aoalo a t chosen ositions , and c a single mae is deloed maes ae consideed to oide the most inheitance inomation in an atotetaloid olation heng et al, b ectieness o sing a minimal mae set to tac meiosis in an atotetaloid he and o genote edictions associated with dieent mae nmbes ae comaed, stating with mae at each o the thee ositions and ogessing thogh to e locs oth the and the genotic inomation coeicient ae aected b mae densit olations wee simlated in edigeeim oois and alieaad, and obabilities wee estimated in etaigin heng et al,

265 ______General discussion

o obl co a a o aal ne o the henomena looed at in some detail ding this thesis was doble edction , which it is claimed is the incial soce o ssai mae segegation distotion in olloid genetic stdies o et al, he nonandomness o this distotion aises om the act that the ate o inceases towads the telomees athe, ishe, n that sense the bias is ositional, bt nlie te segegation distotion that ma be associated with alleles o allelic combinations o a aticla locs, is combinatoicall a andom ocess which occs with eal eenc on all homologes s mentioned in hate , some athos lie to emhasise the imotance o inclding in thei models o comleteness oten castigating those who do not, bt so a all hae ailed to sbstantiate these claims with actal data o et al, et al, and a, o et al, i et al, et al, et al,

n this thesis began b looing at the bias that intodces in twooint linage analses sing a andom bialent aiing model and simle nllile mae data hate sbseentl eanded this to inclde all ossible atotetaloid mae segegation tes hate also comaed the owe and ecision o a model that inclded ess one that did not hate n the case o linage maing, introduces a small bias in the estimation of recombination frequencies, although it’s negligible i the ate o adialent aiing is low aged that this bias cold sael be ignoed, aticlal when we conside that linage mas ae imail bilt om highl inomatie linages between neab maes that ae nliel to be seeel aected b a small nmbe o neected genotes sch as intodces ciall, geneates a noel osing genote class o simle nllile maes, namel dle osing, which ae not eected in a andom bialent setting n a maing aoach that ignoes the eectiel become missing ales in the initial stes, which is ite a dieent oosition than segegation distotion o genoting eos andoml missing ales will edce the eectie olation sie on which ecombination eenc is estimated, theeb inceasing the aiance and deceasing the scoe t this noise is essentiall nbiased aat om ositional consideations it is cetainl not sewness imle nllile maes om the centeiece o the theoetical amewo we hae deeloed o olloid linage maing hates , and ae also the mae te that aea to occ in the geatest abndance in olations hates , , heeoe, no sstematic deiations ae intodced b to the incial stctal elements o o linage mas

266 ______Chapter 11

egarding the realence of , it has been reorted in a number of searate studies that is a rare henomenon that occurs at lo frequencies radsha et al, tift et al, acett et al, heoreticians lie to argue oer the maimum rate of , ith suggestions ranging from uo et al, radsha, heng et al, , ather, tift et al, ,1 oorris and alieaard, and een ⁄4 benga, radsha notes that the rate of deends on hether a ring or chain 1⁄ 1⁄ 1⁄ quadrialent6 is inoled, hich comlicates7 matters further radsha, n our8 or ith tetraloid otato e found that the rate of reached a maimum of about – at the telomeres hater , hich is one of the fe studies that actuall quantifies the rate of in a oulation as oosed to merel mentioning it theoretical maimum rate of reresents the eected rate at an infinite genetic distance from the centromere ften, such rates are used to motiate more adanced descritions of olloid meiosis, although it is questionable hether such rates ould eer occur in ractice urthermore, if bialent airing is indeed someho romoted b the olloid cell it is een more unliel that high rates of ill occur a oint that is all but ignored b theoreticians he consistent reorts of lo rates of can therefore be seen as further ustification for the use of random bialents to model olsomic inheritance

n eloring the effects of ignoring double reduction in maing e found that the gains in oer or recision ere darfed b other considerations such as the trait heritabilit, the oulation sie or the marer distribution near regions hater ts inclusion as also found to adersel affect our abilit to correctl redict the segregation te using either the aie nformation riterion aie, or the aesian nformation riterion char, nce again e found that onl laed a minor role and could safel be ignored oeer, a fleible model of inheritance that allos for the ossibilit of multialents and can at times achiee much more accurate reconstructions of robabilities oer random bialent models his is aarent hen one eamines some of the outut of etrarigin heng et al, hen alied to real datasets or eamle, e found that indiidual from the otato oulation carried an etensie segment on chomosome hen e ran the analsis using a bialent model onl, the redicted halotes reeal a atchor of segments across both sets of arental homologues igure a t is unliel that such a high number of recombinations actuall occurred he indiidual highlighted earlier as carring a mosaic of three arental homologues igure as also oorl reconstructed hen onl bialent airing structures ere alloed igure b ien that the accurac and oer of our analsis largel rests on the accurac of the robabilities, it is orring to see this sort of behaiour ere at last e hae clear eidence that alloing for multialents including but not restricted to the henomenon

267 ______General discussion

xal o a lal ol ca oo a bal ol c o cooo cooo a b

a o o ac od o iria – disomic, polysomic (also termed “tetrasomic” for tetraploids and “hexasomic” for hexaploids) and mixosomic. Mixosomy can be interpreted as either a

268 ______Chapter 11 chromosome pairin, and is the mode of inheritance associated ith semental allopolyploidy (tebbins, oltis et al., ). n many important polyploid crop species the mode of inheritance is flly areed pon (examples inclde heat, oilseed rape (disomic) and potato (tetrasomic)), hile in other species it is not (examples inclde sarcane, lee and the ornamental species sroria). t may be that there is no fixed mode of inheritance for a species that, once dianosed, ill predict all ftre meiotic behaior of members of the same species. e hae already encontered an example of this in rose the mode of inheritance aried beteen the parents of the mappin poplation, ith both disomic and polysomic inheritance detected for different chromosomes in both parents (hapter ).

ne may onder hether the mode of inheritance is merely of academic interest – do e need to orry abot it as lon as crosses can be made and iable seeds prodced f e old lie to employ marers in or breedin proram the anser is clearly yes, nderstandin the mode of inheritance is ital. enotypin polyploids is in eneral more complex and less reliable than enotypin diploids (Mason, ). his cold be cased by offtaret amplification de to dplication of marer seences at other loci (imbor et al., ), bt is also de in lare part to the difficlty in correctly estimatin the nmber of copies of a marer allele, also non as marer dosae assinment (hapter ). ne of the standard practices hen assinin dosae is to compare marer scores beteen parents and offsprin. his acts as both a alitychec, by comparin the expected and obsered allele freencies, bt also acts as a means of determinin the correct rane of dosae ales (oorrips et al., acett et al., chmit arley et al., ). n order to compte the expected allele freencies in a polyploid poplation ien the parental dosae scores, one needs to no the mode of inheritance i adva. his is potentially one of the reatest hrdles in the deelopment of enomic resorces for noel polyploid research proects. hen additional isses sch as aneploidy occr (incldin the possibility of compensated aneploidy (sin, hester et al., )), it may be ery hard to interpret marer data, ith no solid rond pon hich to bild a hypothesis. oeer, all sch isses can potentially be resoled if they are nderstood, bt reire pioneerin enotypin (and possibly cytoloical) stdies to pae the ay for ftre breedin and research applications.

nderstandin the mode of inheritance is also reired for the constrction of phased, interated linae maps. e demonstrated this throh simlation stdies in hapters and , deelopin a flexible approach to linae analysis in hapter hich can accommodate flctations in the mode of inheritance. oel mappin alorithms that do not require estimates of recombination frequency (“modelfree” approaches) might be more robst. oeer, so far none hae demonstrated the ability to enerate pasd

269 ______General discussion linage maps of fundamental importance in polyploid genetics (hapter ) herefore in order to use current mapping tools such as etraploidap (acett et al ) or polymap (hapter ) the mode of inheritance of the population must be established etraploidap is only suitable for tetrasomic species hereas polymap has been deeloped to accommodate all modes of inheritance from disomy through to tetrasomy (but at the tetraploid leel only) egarding miosomic linage mapping in non tetraploids it should be noted that etending the linage mapping frameor ithin polymap to include heasomic inheritance as itself a maor undertaing necessitating linage functions that required oer million characters of code ( ore eam ) n contrast tetrasomy required only linage functions and ust oer characters ncluding the possibility of partial preferential pairing for tetraploids required almost million characters representing a fold increase oer the tetrasomic functions or heaploidy this foldincrease can be epected to be much greater than fie gien the increased compleity herefore as currently implemented alloing for miosomy in heaploids ould be a gargantuan undertaing ne solution ould be to implement a fullygeneral lielihood frameor rather than implementing each lielihood function separately similar to the approach used in etraploidap ( acett personal communication) urrently hoeer implementing dynamic algebra in is cumbersome and that is hy preprogrammed solutions generated in athematica ere used instead

e demonstrated ho important the mode of inheritance as for unbiased linage analyses in hapter o the mode of inheritance affects detection models as not inestigated in this thesis although based approaches could be epected to be quite robust against deiations from a polysomic model oeer this as not erified here therefore remains more or to do in properly accounting for the aried modes of inheritance in our polyploid genetic mapping pipeline o o a co ne of the main preoccupations of diploid biostatisticians is to improe their models to glean more predictie poer from the data they are presented ith his hunger for bigger and better has yielded many fine deelopments oer the years or linage mapping it is typical to consider topoint analyses as an initial step toards more accurate threepoint or multipoint analyses (idout et al each et al ong et al ) or mapping more poerful approaches might include mied model methodologies ayesian mapping or machinelearning algorithms (ang et al aaier et al ) or employing more comple or pedigreed populations (in et al aanagh et al uang et al ) n this thesis e did not enture beyond topoint linage analysis simple linear models or biparental mapping

270 ______Chapter 11 populations here appears to be plenty of opportunities to eplore more comple (and potentially more poerful) approaches at detecting and deciphering effects in polyploid populations

oeer e are no liing in a time hen the deelopments in genotyping technologies are dictating the deelopment of statistical methodologies in genetic mapping and not the other ay around erhaps this has alays been the case but feel the pace of change has increased in recent decades o sooner hae e established an effectie pipeline to perform genetic mapping in autopolyploids using data than it becomes potentially redundant due to a shift toards genotypingbysequencing () data (lshire et al ) optical mapping data (oodin et al ) i data (iebermaniden et al ) e are all guilty of technologychasing as nobody ants to be seen to be stuck working with yesteryears’ tools. On the other hand, by putting all our hope in a technological fi rather than a methodological fi to improe poer or precision issues e ris becoming loced into a cycle of using suboptimal methods to eep pace

articularly in polysomic polyploids here the compleity is already substantial for “basic” methods and models, it seems implausible to have sufficient time to replicate the hole suite of approaches that hae been deeloped for diploid species in the time afforded by intertechnological cooloff periods ach ne genotyping technology carries ith it its on set of challenges and characteristics from different ays to scoring marer dosage (oorrips et al an i et al lischa et al ) to differences in the amount and type of missing data or genotyping errors (lshire et al again et al ) he later steps of linage mapping analysis genome ide association mapping cannot be performed ithout the initial preprocessing of genotype data the tools for hich can often appear many years after the first introduction of the technology (diploid tools tend to be deeloped sooner often by the genotyping proider but also by the much larger research community engaged in diploid genotyping eperiments) nless a stable genotyping technology for polyploids comes along the polyploid genetics community is unliely to ascend to the upper echelons of diploid biostatistics before building a ne ladder from scratch again urrently improements in poer and precision of genetic mapping eperiments are being drien by improements in genotyping approaches (although the production of robust and accurate phenotypic data remains a critical component as ell)

271 ______General discussion

x o oloc ollo b To the early agriculturalists, the polyploid nature of their crops probably wasn’t a major hindrance given the relatively simple techniues that were employed in the selection and maintenance of germplasm. odern plant breeding as a science and art did not evolve until relatively recently, and therefore our ancestors were hardly troubled by issues such as polysomic inheritance, partial preferential chromosome pairing or double reduction indeed they were most liely unaware of these issues in the first place. owadays, society is demanding the production of more with less, given the projected increase in the world’s population coupled with dwindling fossil fuel resources, increased climatic instability and the degradation of agricultural land radshaw, e et al., . These appear to be insurmountable challenges, yet we have at our disposal a much wider array of intellectual resources than ever before. The uestion is whether these can be translated into effective solutions uicly enough, particularly given that many of our most important agricultural crops are polyploid.

erhaps the single most important trait within all breeding programs across all crops is yield, a trait that is notoriously comple even within diploids. It has been suggested that despite years of breeding for increased yield, phenotypic selection methods have done nothing to raise average potato yields in that time ansy, later et al., , in contrast to other major diploid crops lie maie or rice. n the other hand, some authors remain convinced about the merit of polyploids and their potential to deliver higher yields through heterosis ennyyfield and endel, . I don’t believe that polyploids lac the necessary ualities to rise to the challenges we are facing. owever, I do believe we need to carefully consider how polyploids are currently bred and what the implications are for future breeding material given our present selection decisions.

ne of the main difficulties in developing elite breeding material in polysomic polyploids is their i oad. ere, genetic load refers to lethal or debilitating recessive mutations that are mased in the heteroygous condition but which carry a high fitness cost when present in homoygous condition. lthough it is claimed that neo autopolyploids have a lower genetic load than their progenitor diploids tto and hitton, , in time it builds up to greater levels in established autopolyploids, leading to inbreeding depression onfort, tto, . This is not to be confused with selfincompatibility which, if present, is usually under the control of one or more specific selfincompatibility loci idout et al., ansy et al., . olyploids tolerate higher genetic loads than diploids, as for eample elegantly demonstrated in a study of diploid and tetraploid pollen vigour usband, . ossibilities to select parents with a lower genetic load include evaluating derived haploids which may epress the deleterious alleles or by selfpollinating parental lines and observing the

272 ______Chapter 11 proportion of nonvigorous offspring ansy, . owever, it is unliely that stringent selection against genetic load is routinely performed in polyploid breeding programs. ithout the benefits of purging selection, polysomic polyploids can be epected to slowly accumulate deleterious mutations which may reduce the uality of breeding lines for many future generations.

ne solution would be to simply avoid the effects of genetic load by increasing the diversity of breeding material, thereby ensuring heteroygosity at as many loci as possible. The development of different heterotic pools is one possibility to achieve this in. or eample the tetraploid potato population hapters , , , a cross between a starch and a table variety, represents a population that normally would not be generated in a breeding program since these breeding pools are usually ept distinct os et al., . In we observed transgressive segregation for a number of important traits, demonstrating that wide crosses may generate unepected and favourable results. lternatively, genomewide marer data could be used to predict heterotic effects, by crossing parents nown to carry different alleles across multiple loci.

n alternative to maimising heteroygosity is to try to reduce the genetic load. e have seen that inbreeding is not a practical proposition at the polyploid level aldane, it therefore seems we might have to “clean” polyploids at the haploid or diploid level by eposing deleterious mutations to selection. owever if we were to go that route, returning to the polyploid level might not be needed indhout et al., enny yfield and endel, ansy et al., . educing genetic load at the polyploid level might be aided by genomic tools but it would tae a significant and perhaps unrealistic effort to pinpoint all such loci, over multiple traits, and eclude them from future breeding germplasm using marer data. evertheless, the identification of some major lethality loci using the tools developed in this thesis could be a very useful research aim of breeding programs for eample by identifying regions of sewed segregation in selfed progenies and associating these with reconstructed parental haplotypes in these regions. y tracing and purging such alleles from breeding material, breeders could enjoy slightly more freedom to mildlyinbreed their parental lines, as well as improving the overall fitness of their breeding stoc. This may have direct implications to improve comple traits lie yield, but could more generally be seen as improving the elite germplasm within a crop, with positive nocon effects for a whole range of traits.

In considering the future of polysomic plant breeding it is important to distinguish between two categories, namely vegetativelypropagated and seedpropagated crops. amples of the former include potato and chrysanthemum, while eamples of the latter include alfalfa, red clover and leek. The potato cultivar Russett’s Burbank (introduced in 1902 as “Netted Gem” ethe et al., is an eample of a single genotype that

273 ______General discussion possessed an aboveaverage combination of alleles for multiple traits that has led to it become North America’s most widelycultivated potato variety, despite all the competition from cultivars that have been bred in the intervening century (ardigan et al., 201. n the ., aris iper (introduced in 19 continues to be the most widely grown variety, while in the Netherlands Binte (introduced in 1910 still holds this honour. egetative or clonal propagation immortalises favourable allelic combinations and heterotic effects which can lead to a single variety having a very long production life. owever, clonal propagation also comes with its own inherent disadvantages such as a slow rate of seed stock multiplication, increased risk of spreading soilborne diseases or endemic viral pathogens, more costly distribution and storage of propagation material and difficulties in maintaining germplasm in genebanks. espite these drawbacks, clonal propagation continues to dominate in crops like potato, chrysanthemum and rose.

The breeding of seedpropagated polysomic crops is arguably much more challenging given the varietal registration reuirements of distinctness, uniformity and stability (. btaining uniform progeny through seed can only be reliably predicted if parental lines are homoygous at all loci affecting descriptor traits (these usually concern the morphology of the adult plant. or eample, uniformity continues to be one of the main breeding goals for autotetraploid leek (iu apoprasu (e lerc and an Bockstaele, 2002. n recent years there has been a move from openpollinated varieties to 1 hybrids in leek breeding. lonallypropagated malesterile plants as maternal lines restricts the amount of selfing that occurs, thought to account for up to 200 of open pollinations (e lerc and an Bockstaele, 2002. Given the etremely high genetic load in leek, selfed progeny are likely to be the biggest contributor to lack of uniformity, with substantial loss of yield after only one or two rounds of selfpollination (e lerc and an Bockstaele, 2002. Therefore, the use of specialised 1 hybrid breeding is another eample of how breeders have devised strategies to overcome the high genetic load in polysomic polyploids.

nother eample of a seedproducing autopolyploid is the autotetraploid red clover (rioiu pras .. Tetraploid clovers are often favoured over diploids due to their higher yields, greater persistence and higher levels of resistance to biotic and abiotic stresses (Taylor, 200leugels et al., 201. owever, the main drawback to tetraploid cultivars is their reduced seed yield (leugels et al., 201. The precise cause(s of this reduction remain unknown despite many years of investigation. The mechanism could be related to problems during meiosis or abnormal growth (Bykkartal, 2002200, but it has also been proposed that flower morphology (particularly the length of the corolla may result in lower pollination levels in tetraploid clovers (Bender, 1999uruya, 2001. This trait represents one of the most important breeding aims in

274 ______Chapter 11 tetraloid cloer breedi which has til ow roed etremely diiclt to derstad or imroe ith the deloymet o moleclar marers across tetraloid olatios it cold be ossible to determie the eetic cotrol i ay o seed mber i tetraloid cloer which cold lead to more roitable ad reliable seed yields

e o the more iteresti ossibilities or olysomic olyloid breedi that has bee sested is the se o apoii rerodctio Aomiis ca be deied as the rodctio o seeds rom materal tisses byassi ormal seal rerodctio omai icell ad ataach here are meros reorts o the associatio o ametohytic aomiis with olyloidy et ad arla hitto et al with a mber o ossible theories roosed both mechaistic ad eoltioary bt o cosess o the recise reaso why this shold be ari et al omai ielisi ad cheid hateer the coectio the otetial attractieess o rorammable aomiis or breedi has widely bee acowleded illae et al Abdi et al icell et al radshaw ad ot st at the olyloid leel st lie cloal roaatio aomiis wold allow the iatio o heterosis ad has bee sested as a serior method to rodce hybrids by immortalisi the rather tha recreati it each seaso throh reeated crossi o ibred aretal lies his wold combie the adataes o tre seed rodctio while aoidi the meros disadataes o eetatie cloes as already discssed Aomiis has bee show to be der eetic cotrol i mltile secies reiewed i icell ad ataach with a mber o recet eetic mai stdies i troical olyloid orae rasses sch as the tetraloid raiaria dus or the healoid rooa uidioa reeali maor derlyi the trait ia et al orthito et al A ice eamle o where aomiis is actiely sed i olyloid breedi is i etcy blerass oa prasis this secies aomiis is the orm whereas true outcrossings are termed “aberrants”. echies to icrease the rodctio o aberrats dri the breedi cycle are sed radshaw ollowi which serior lies are selected ost roey are od to be aomictic oce more reslti i immortal materal lies which ca sbseetly be mareted as stable hybrids Althoh it miht seem lie a ideal sceario the breedi o aomicts resets its ow set o challees articlarly i the mber o aberrats eerated is too low to allow crossi Aeloidy may also accomay aomiis as it is ot selected aaist dri meiosis which may lead to roblems i tre crosses i the aeloid is aythi other tha the aomictic aret idi sorces o aomiis ees i related ad crossable secies is a rther challee i traseic aroaches to itrodci the trait are ot easible or accetable this search the se o moleclar marers ad eomic resorces throh cadidate ee aroaches will be ital Althoh aomiis may oer may adataes there has

275 ______General discussion

et to be an eame o it being successu bred into a ooid cro and commercia eoited. ime wi te whether this orm o breeding wi be adoted b the ooid communit or not.

eturning ina to the humbe otato the aarent ac o rogress in ied oer the ast ears can erhas be best eained in terms o osomic inheritance with each new ariet carring a reshued set o aees as we as erhas some seciic disease resistance oci introgressed rom wid secies. hae hoeu demonstrated that osomic inheritance need no onger be considered a bac bo. e are entering a time where autoooid breeders wi hae the choice to no onger ust accet the shued hand the are deat but to choose their own hand based on among other actors marer data. n time wi te whether this wi become an intrinsic art o ooid breeding. conomics and the ease at which marers can be incororated into the inrastructure o eisting breeding rograms wi utimate dictate the uture direction o ooid breeding.

ocl ak n this thesis hae tried to strie a baance between the deeoment o toos to he unrae the comeities o ooid genetics and what these toos can roide to breeders and researchers. do not consider these toos as ends in themsees but rather see them as a means to achiee certain goas be it the identiication o marers ined to imortant traits the creation o inage mas reresenting the moecuar arote o a ouation or the understanding o ooid meiosis incuding how airing and recombination roceed. e hae seen that such toos are etreme oweru at oening u a window into ooid genetics which beiee wi in time aow or man more genomicsinormed breeding decisions to be made. e are ust at the beginning o a new era in ooid breeding with man eciting oortunities ahead.

276 References Abdi, S., Dwivedi, A., and Bhat, V. (2016). "Harnessing apomixis for heterosis breeding in crop improvement," in Molecular Breeding for Sustainable Crop Improvement. (Springer), 79-99. Acquaah, G. (2012). Principles of Plant Genetics and Breeding. Wiley-Blackwell. Aguiar, D., and Istrail, S. (2013). Haplotype assembly in polyploid genomes and identical by descent shared tracts. Bioinformatics 29, i352-i360. Akaike, H. (1974). A new look at the statistical model identification. IEEE transactions on automatic control 19, 716-723. Akhunov, E., Nicolet, C., and Dvorak, J. (2009). Single nucleotide polymorphism genotyping in polyploid wheat with the Illumina GoldenGate assay. Theoretical and Applied Genetics 119, 507-517. Al-Janabi, S.M., Honeycutt, R.J., and Sobral, B.W.S. (1994). Chromosome assortment in Saccharum. Theoretical and Applied Genetics 89, 959-963. Albrechtsen, A., Nielsen, F.C., and Nielsen, R. (2010). Ascertainment Biases in SNP Chips Affect Measures of Population Divergence. Molecular Biology and Evolution 27, 2534-2547. Allendorf, F.W., Bassham, S., Cresko, W.A., Limborg, M.T., Seeb, L.W., and Seeb, J.E. (2015). Effects of crossovers between homeologs on inheritance and population genomics in polyploid-derived salmonid fishes. Journal of Heredity 106, 217-227. Allendorf, F.W., and Danzmann, R.G. (1997). Secondary tetrasomic segregation of MDH-B and preferential pairing of homeologues in rainbow trout. Genetics 145, 1083-1092. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., et al. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25,3389-3402. Andrade, J., and Estévez-Pérez, M. (2014). Statistical comparison of the slopes of two regression lines: A tutorial. Analytica chimica acta 838, 1-12. Bajgain, P., Rouse, M.N., and Anderson, J.A. (2016). Comparing genotyping-by-sequencing and single nucleotide polymorphism chip genotyping for quantitative trait loci mapping in wheat. Crop science 56, 232-248. Baker, P. (2014). polySegratio: Simulate and test marker dosage for dominant markers in autopolyploids. R package version 0.2-4. Baker, P., Jackson, P., and Aitken, K. (2010). Bayesian estimation of marker dosage in sugarcane and other autopolyploids. Theoretical and applied genetics 120, 1653-1672. Balsalobre, T.W.A., Da Silva Pereira, G., Margarido, G.R.A., Gazaffi, R., Barreto, F.Z., Anoni, C.O., et al. (2017). GBS-based single dosage markers for linkage and QTL mapping allow gene mining for yield-related traits in sugarcane. BMC genomics 18, 72. Barker, M.S., Arrigo, N., Baniaga, A.E., Li, Z., and Levin, D.A. (2016). On the relative abundance of autopolyploids and allopolyploids. New Phytologist 210, 391-398. Barringer, B.C. (2007). Polyploidy and self-fertilization in flowering plants. American Journal of Botany 94, 1527-1533. Bartholomé, J., Mandrou, E., Mabiala, A., Jenkins, J., Nabihoudine, I., Klopp, C., et al. (2015). High- resolution genetic maps of Eucalyptus improve Eucalyptus grandis genome assembly. New Phytologist 206, 1283-1296. Bassil, N.V., Davis, T.M., Zhang, H., Ficklin, S., Mittmann, M., Webster, T., et al. (2015). Development and preliminary evaluation of a 90 K Axiom® SNP array for the allo-octoploid cultivated strawberry Fragaria × ananassa. BMC Genomics 16, 155. Beçak, M.L., Beçak, W., and Rabello, M.N. (1966). Cytological evidence of constant tetraploidy in the bisexual South American frog Odontophrynus americanus. Chromosoma 19, 188-193. Behrouzi, P., and Wit, E.C. (2017a). De novo construction of q-ploid linkage maps using discrete graphical models. arXiv preprint arXiv:1710.01063. Behrouzi, P., and Wit, E.C. (2017b). netgwas: An R Package for Network-Based Genome-Wide Association Studies. arXiv preprint arXiv:1710.01236. Benabdelmouna, A., and Ledu, C. (2015). Autotetraploid Pacific oysters (Crassostrea gigas) obtained using normal diploid eggs: induction and impact on cytogenetic stability. Genome 58, 333-348. Bendahmane, M., Just, J., Vergne, P., Raymond, O., Dubois, A., Szcesi, J., et al. (Year). "The Rose Genome Sequencing Initiative, Prospects and Perspectives. Abstract W659", in: Plant and Animal Genome XXIV Conference: https://pag.confex.com/pag/xxiv/webprogram/Paper19873.html).

277 ______References

Bender, A. (1999). An impact of morphological and physiological transformations of red clover flowers accompanying polyploidization on the pollinators' working speed and value as a guarantee for cross pollination. Agraarteadus, 4, 9-23. Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 289-300. Bennetzen, J.L. (2007). Patterns in grass genome evolution. Current Opinion in Plant Biology 10, 176-181. Bennetzen, J.L., and Wang, H. (2014). The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annual review of plant biology 65, 505-530. Berger, E., Yorukoglu, D., Peng, J., and Berger, B. (2014). Haptree: A novel Bayesian framework for single individual polyplotyping using NGS data. PLoS computational biology 10, e1003502. Bernardo, R. (2008). Molecular Markers and Selection for Complex Traits in Plants: Learning from the Last 20 Years. Crop Science 48, 1649-1664. Bernardo, R. (2016). Bandwagons I, too, have known. Theoretical and Applied Genetics 129, 2323-2332. Bertioli, D.J., Cannon, S.B., Froenicke, L., Huang, G., Farmer, A.D., Cannon, E.K., et al. (2016). The genome sequences of Arachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut. Nature Genetics 48, 438-446. Bertioli, D.J., Ozias-Akins, P., Chu, Y., Dantas, K.M., Santos, S.P., Gouvea, E., et al. (2014). The use of SNP markers for linkage mapping in diploid and tetraploid peanuts. G3: Genes| Genomes| Genetics 4, 89-96. Bethke, P.C., Nassar, A.M., Kubow, S., Leclerc, Y.N., Li, X.-Q., Haroon, M., et al. (2014). History and origin of Russet Burbank (Netted Gem) a sport of Burbank. American journal of potato research 91, 594- 609. Bicknell, R., and Catanach, A. (2015). "Apomixis: the asexual formation of seed," in Somatic Genome Manipulation. (Springer), 147-167. Bicknell, R., Catanach, A., Hand, M., and Koltunow, A. (2016). Seeds of doubt: Mendel’s choice of Hieracium to study inheritance, a case of right plant, wrong trait. Theoretical and Applied Genetics 129, 2253-2266. Bingham, E.T., and Gillies, C.B. (1971). Chromosome pairing, fertility, and crossing behavior of haploids of tetraploid alfalfa, Medicago sativa L. Canadian Journal of Genetics and Cytology 13, 195-202. Bink, M., Boer, M., Ter Braak, C., Jansen, J., Voorrips, R., and Van De Weg, W. (2008). Bayesian analysis of complex traits in pedigreed plant populations. Euphytica 161, 85-96. Blakeslee, A.F., and Avery, A.G. (1937). Methods of inducing doubling of chromosomes in plants: By treatment with colchicine. Journal of Heredity 28, 393-411. Blischak, P.D., Kubatko, L.S., and Wolfe, A.D. (2016). Accounting for genotype uncertainty in the estimation of allele frequencies in autopolyploids. Molecular ecology resources 16, 742-754. Bomblies, K., Higgins, J.D., and Yant, L. (2015). Meiosis evolves: adaptation to external and internal environments. New Phytologist 208, 306-323. Bomblies, K., Jones, G., Franklin, C., Zickler, D., and Kleckner, N. (2016). The challenge of evolving stable polyploidy: could an increase in “crossover interference distance” play a central role? Chromosoma 125, 287-300. Bourke, P.M. (2014). QTL analysis in polyploids. MSc thesis, Wageningen University, Wageningen. Bourke, P.M., Arens, P., Voorrips, R.E., Esselink, G.D., Koning-Boucoiran, C.F.S., Van ‘T Westende, W.P.C., et al. (2017). Partial preferential chromosome pairing is genotype dependent in tetraploid rose. The Plant Journal 90, 330-343. Bourke, P.M., Voorrips, R.E., Kranenburg, T., Jansen, J., Visser, R.G., and Maliepaard, C. (2016). Integrating haplotype-specific linkage maps in tetraploid species using SNP markers. Theoretical and Applied Genetics 129, 2211-2226. Bourke, P.M., Voorrips, R.E., Visser, R.G.F., and Maliepaard, C. (2015). The Double Reduction Landscape in Tetraploid Potato as Revealed by a High-Density Linkage Map. Genetics 201, 853-863. Bradbury, P.J., Zhang, Z., Kroon, D.E., Casstevens, T.M., Ramdoss, Y., and Buckler, E.S. (2007). TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633- 2635. Bradshaw, J. (2007). The canon of potato science: 4. Tetrasomic inheritance. Potato Research 50, 219-222. Bradshaw, J.E. (2016). Plant Breeding: Past, Present and Future. New York: Springer. Bradshaw, J.E., Hackett, C.A., Pande, B., Waugh, R., and Bryan, G.J. (2008). QTL mapping of yield, agronomic and quality traits in tetraploid potato (Solanum tuberosum subsp. tuberosum). Theoretical and Applied Genetics 116, 193-211.

278 ______References

Bradshaw, .., ande, B., Bryan, .., Hackett, C.A., Mclean, K., Stewart, H.., et al. (200). nterval appin of uantitative trait loci for resistance to late bliht Phytophthora infestans (Mont.) de Bary, heiht and aturity in a tetraploid population of potato (Solanum tuberosum subsp. tuberosum). Genetics 16, . Brent, R. (1). Aloriths for iniiin without derivatives. renticeHall, nlewood Cliffs (.). Broman, K.W., Wu, H., Sen, Ś., and Churchill, G.A. (2003). R/qtl: QTL mapping in eperiental crosses. Bioinformatics 1, 0. Brouwer, ., and sborn, T. (1). A olecular arker linkae ap of tetraploid alfalfa (Medicago sativa .). Theoretical and Applied Genetics , 111200. Bruneau, A., Starr, .R., and oly, S. (200). hyloenetic relationships in the enus Rosa: new evidence fro chloroplast A seuences and an appraisal of current knowlede. Systematic Botany 2, 66 . Brustowic, ., Merette, C., ie, ., Townsend, ., illia, T., and tt, . (1). Molecular and statistical approaches to the detection and correction of errors in enotype databases. American Journal of human Genetics , 1111. Buitendik, .H., Boon, .., and Raanna, M.S. (1). uclear A Content in Twelve Species of Alstroemeria . and Soe of their Hybrids. Annals of Botany , . Bushan, B., Robbins, M., arson, S., and Staub, . (2016). enotypin by Seuencin in Autotetraploid Cocksfoot (Dactylis glomerata) without a Reference enoe, in Breeding in a World of Scarcity. (Spriner), 11. Butruille, ., and Boiteu, . (2000). Selection–utation balance in polysoic tetraploids: ipact of double reduction and aetophytic selection on the freuency and subchroosoal localiation of deleterious utations. Proceedings of the National Academy of Sciences , 660661. Bykkartal, H..B. (2002). n Vitro ollen erination and ollen Tube Characteristics in Tetraploid Red Clover (Trifolium pratense .). Turkish Journal of Botany 2, 61. Bykkartal, H..B. (200). Causes of low seed set in the natural tetraploid Trifolium pratense . (Fabaceae). African Journal of Biotechnology , 120. Cadan, C. (1). ature of tetraploidy in cultivated uropean potatoes. Nature 12, 1010. Cao, ., sborn, T.C., and oere, R.W. (200). Correct estiation of preferential chroosoe pairin in autotetraploids. Genome Research 1, 62. Carvalho, A., elado, M., Baro, A., Frescatada, M., Ribeiro, ., ikaard, C.S., et al. (2010). Chroosoe and A ethylation dynaics durin eiosis in the autotetraploid Arabidopsis arenosa. Sexual plant reproduction 2, 2. Carvalho, .R., KoninBoucoiran, C.F., Fanourakis, ., Vasconcelos, M.W., Carvalho, S.M., Heuvelink, ., et al. (201). T analysis for stoatal functionin in tetraploid Rosa× hybrida rown at hih relative air huidity and its iplications on postharvest lonevity. Molecular Breeding , 111. Cavanah, C., Morell, M., Mackay, ., and owell, W. (200). Fro utations to MAC: resources for ene discovery, validation and delivery in crop plants. Current opinion in plant biology 11, 21221. Cavanah, C.R., Chao, S., Wan, S., Huan, B.., Stephen, S., Kiani, S., et al. (201). enoewide coparative diversity uncovers ultiple tarets of selection for iproveent in heaploid wheat landraces and cultivars. Proceedings of the National Academy of Sciences 110, 0062. CervantesFlores, .C., encho, .C., Kriener, A., ecota, K.V., Faulk, M.A., Mwana, R.., et al. (200). evelopent of a enetic linkae ap and identification of hooloous linkae roups in sweetpotato usin ultipledose AF arkers. Molecular Breeding 21, 112. Chaffin, A.S., Huan, .F., Sith, S., Bekele, W.A., Babiker, ., nanesh, B.., et al. (2016). A consensus ap in cultivated heaploid oat reveals conserved rass synteny with substantial subenoe rearraneent. The plant genome . Chakravarti, A. (11). A raphical representation of enetic and physical aps: the Marey ap. Genomics 11, 21222. Chalhoub, B., enoeud, F., iu, S., arkin, .A., Tan, H., Wan, ., et al. (201). arly allopolyploid evolution in the posteolithic Brassica napus oilseed enoe. Science , 0. Chan, K.., o, H.F., ai, .C., ao, .., in, K.H., and Hwan, S.. (200). dentification of uantitative trait loci associated with yieldrelated traits in sweet potato (Ipomoea batatas). Botanical studies 0, . Cheea, ., and icks, . (200). Coputational approaches and software tools for enetic linkae ap estiation in plants. Briefings in bioinformatics 10, 60.

279 ______References

Plant systematics and evolution , Tragopogon miscellus Proceedings of the National Academy of Sciences , Genetics , New Phytologist , Brassica napus Theoretical and Applied Genetics , Philosophical Transactions of the Royal Society of London B: Biological Sciences , Nature Reviews Genetics , Capsella bursa pastoris Molecular ecology , Spartina pectinata‐ Molecular breeding : Rosa Theoretical and Applied Genetics , InterJournal, Complex Systems , D’hont, A. ( Cytogenetic and genome research , D’hont, A., Denoeud, F., Aury, J. Musa acuminata Nature , Molecular plant , Saccharum spontaneum Genome , Brassica napus Functional & Integrative Genomics , Recent advances in cytology. BMC genomics : Lotus corniculatus Chromosoma , Nature genetics , Allium crop science: recent advances, Chrysanthemum morifolium Euphytica , Plant, cell & environment , Gartenbauwissenschaft , Critical reviews in plant sciences ,

280 ______References

Denoeud, F., rreterouet, ., Dereeer, A., Dro, ., uyot, ., etre, ., et . (. he oee enoe rode nht nto the onerent eouton o ene oynthe. Science , . Deey, D.. (. nreedn dereon n dod, tetrod, nd heod reted hetr. Crop Science , . Deey, D.. (. oe Aton nd ton o ndued oyody to nt reedn, n Polyploidy: Biological Relevance, ed. .. e. (oton, A rner , . Ddon, J.., n, ., herd, ., Fu, .., n, ., De en, F..., et . (. Doery o noe rnt n enotyn rry roe enotye retenton nd redue ertnent . BMC genomics :. Doye, J.J., nd n, A.. (. Dtn the orn o oyody eent. New Phytologist , . Doye, J.J., nd hern roye, . (. Doue troue tonoy nd denton o oyody. New Phytologist, , . Dryott, A.. (. Sugar‐ beet. John ey on. Dreehu, ., nd Johnon, .A. (. eroduton nt rente ro. Current Biology , . erhrd, F.., hn, ., ehene, A., re, .A., endorer, ., nd uthernd, .. (. hroooe ooton o n F Triticum aestivum T. turgidum . duru ro nyed y DAr rer nd F. Crop and Pasture Science , . de, .A., oden, .., nd ond, J. (. Aton o outon euenn ( or ordern nd utn enotynyeuenn rer n heod het. G3: Genes, Genomes, Genetics , . hre, .J., ut, J.., un, ., ond, J.A., oto, ., uer, .., et . (. A rout, e enotynyeuenn ( roh or hh derty ee. PloS one , e. nden, J.. (. e orth roe ne truture o the rey onenu . BMC Genomics :. nden, J.., nd oon, . (. ere n e or ern enet y ner rorn. Bioinformatics , . en, ., uder, ., nd on, . (. dentton o ut roe (Rosa hybrida nd rootto rete un rout euene ted rotete te rer. Theoretical and Applied Genetics , . n, .J., h, .., odn, .., ed, .A., nd ed, A.A. (. outon o the ret n enoe. Genome Biology and Evolution , . Feher, .J., oo, J.J., , A.., ney, .., ton, J.., eeu, .., et . (. nterton o to dod otto ne th the otto enoe euene. PLoS One , e. Fert, J.. (. n ne to orret nd od de noo enoe ee ethod, hene, nd outton too. Frontiers in Genetics . Fher, .A. (. he theory o ne n oyo nhertne. Philosophical Transactions of the Royal Society B: Biological Sciences , . Fetro, ., euen, ., nd tener, J. (. F rer ny uort tetron nhertne n Lotus corniculatus . Theoretical and Applied Genetics , . Fntr, .A., hornerry, J.., nd uer , .. (. truture o ne deuru n nt. Annual review of plant biology , . FoureDnen, ., Joy, ., runeu, A., o, .F., nd hn, .. (. hyoeny nd oeorhy o d roe th e ttenton to oyod. Annals of Botany , . Frne, A., y, .A., Aheh, .A., oh, .A., nd uenho, . (. e y r the eoutonry htory o Brassicaceae. Trends in plant science , . Furuy, . (. Comparisons of seed weight and seedling characteristics of diploid and autotetraploid red clover. Don o oee nerte, Ford ord o duton. rdo, .A., tto, ., otoerd, ., n, .., rn, ..., nodon, .J., et . (. Aoton n o eed uty trt n Brassica napus . un A nd nddte rohe. Molecular Breeding :. rdo, .., h, J., oneyutt, ., ed, ., nd her, . (. Doery o tetrody n . Nature , . oy, .F., nd tteron, J.. (. nreedn dereon n n utotetrod her three ohort ed tudy. New Phytologist , .

281 ______References

oy, .F., tteron, J.., nd r, J.. (. utron rte nd nreedn dereon n the hereou utotetrod, Campanula americana. Heredity, , . r, ., rent, D.J., , .J., en, ., he, ., yrne, D.., et . (. An utotetrod ne o roe (Rosa hybrida dted un the trerry (Fragaria vesca enoe euene. PLoS One , e. ut, .., rht, .., on, ., Dor, J., nd Anderon, .. (. eonton n underreted tor n the eouton o nt enoe. Nature Reviews Genetics , . edernn, . (. netton on nhertne o unttte hrter n n y ene rer . ethod. Theoretical and Applied Genetics , . dehu, ., ent, ., ye, .J., nd en, . (. enotye n nd n o utte rnt un n Atnt on eet rry. Bioinformatics , . e, A.., u, ., oen, .., nd Aott, .J. (. n ry n the Ateree enet nd eouton o rdte eru dod oer hed, n Developmental genetics and plant evolution, (ed. . ron, . ten J. n. yor nd Frn, ondon, . ton, .., onnouorn, .F., ernden, ., Dotr, ., er, .., erd, ., et . (. enet rton, hertty nd enotye y enronent nterton o orhoo trt n tetrod roe outon. BMC Genetics :. ton, .., toer, ., onnouorn, .F.., Aee, ., er, ..F., erd, ., et . (. nhertne nd ny o the deternnt o oer oor n tetrod ut roe. Molecular Breeding :. oer, .., edet, ., nd Deo, . (. ooeoo ht re they nd ho do e ner the Trends in plant science , . odhdt, . (. oe et o eouton. Science , . oodn, ., heron, J.D., nd oe, .. (. on o e ten yer o netenerton euenn tehnooe. Nature Reviews Genetics , . rnde, F., nnthn, ., n er, ., De n, J.., nd eter, D. (. A t nd deternt ne n o oyod. BMC Bioinformatics :. rnde, F., nh, ., euen, ..., De n, J.., nd eter, D. (. Adnte o ontnuou enotye ue oer enotye e or A n hher oyod orte tudy n heod hryntheu. BMC Genomics :. rndont, ., Jene, ., nd oyd, A. (. eo nd t deton n oyod nt. Cytogenetic and genome research , . rth, A.J., eer, .., rro, .., nd Doeey, J. (. An introduction to genetic analysis (10th edition). n. ret, ., nd Arrud, . (. urne eno detn the oe enoe o n ortnt tro ro. Current Opinion in Plant Biology , . ret, ., Dhont, A., oue, D., Fednn, ., nud, ., nd nn, J.. (. F n n utted urne (Saccharum . enoe rnton n hy oyod nd Aneuod ntere yrd. Genetics , . ure, .., , .., nd or, ... (. orte n o Andropogoneae Saccharum L. (sugarcane) and its relation to sorghum and maize. Proceedings of the National Academy of Sciences , . ett, .A., rdh, J., nd o, J. (. nter n o unttte trt o n utotetrod ee. Genetics , . ett, .A., rdh, J., eyer, ., o, J., ourne, D., nd uh, . (. ne ny n tetrod ee uton tudy. Genetics Research , . ett, .A., nd uo, . (. etrod ontruton o ne n utotetrod ee. Journal of Heredity , . ett, .A., nde, ., nd ryn, . (. ontrutn ne n utotetrod ee un uted nnen. Theoretical and Applied Genetics , . ett, .A., o, ., oo, A., reedy, .F., nd ne, . (. etrod otre or ne Any nd n n Autotetrod outon n Doe Dt. Journal of Heredity , . ett, .A., rdh, J.., nd ryn, .J. (. n n utotetrod un doe norton. Theoretical and Applied Genetics , . ett, .A., nd rodoot, .. (. et o enotyn error, n ue nd ereton dtorton n oeur rer dt on the ontruton o ne . Heredity , .

282 ______References

ett, .A., en, ., nd ryn, .J. (. ne Any nd n n Doe Dt n etrod otto n outon. PLoS One , e. ett, .A., ne, ., rdh, J.., nd uo, . (. etrod or ndo ne ontruton nd n n utotetrod ee. Journal of Heredity , . dne, J. (. he onton o ne ue nd the uton o dtne eteen the o o ned tor. Journal of Genetics , . dne, J.. (. heoret enet o utooyod. Journal of Genetics , . ton, J.., ney, .., htty, .., toe, ., , A.., n Deyne, A., et . (. ne nueotde oyorh doery n ete north ern otto er. BMC Genomics , . ton, .., nd err, .J. (. outton o the nere ddte retonh tr or utooyod nd uteody outon. Theoretical and Applied Genetics , . rdn, .A., eer, F..., eton, ., ron, ., ton, J.., nourt, ., et . (. enoe derty o tuerern onu unoer oe eoutonry htory nd tret o doetton n the utted otto. Proceedings of the National Academy of Sciences . rn, J.., nd De et, J..J. (. n . ne nd ryer the orn o oyody. The botanical review , . rron, .J., Aey, ., nd enderon, .. (. eo n oern nt nd other reen orn. Journal of experimental botany , . yne, ., nd Douhe, D. (. tton o the oeent o doue reduton n the utted tetrod otto. Theoretical and applied genetics , . r, ., nd ee, . (. enoty ontro o hroooe ehour n rye. . hroooe rn nd ertty n utotetrod. Heredity , . e, ., u, ., outt, .., nd dout, .. (. oyn to uort toont ne ny n utotetrod. Bioinformatics , . enderon, .A., nd eeney, . (. yn yntone oe ntton to the orton nd rored rer o DA douetrnd re. Proceedings of the National Academy of Sciences of the United States of America , . Herben, T., Suda, J., and Klimešová, J. (2017). Polyploid species rely on vegetative reproduction more than dod reenton o the od hyothe. Annals of Botany , . erot, ., nd t, . (. ute nd yetr orn o oyod do roe hyrd (Rosa . et. Caninae (D. er. non unredued ete. Annals of Botany , . e, ., Druet, ., e, A., nd rr, D. (. Fedenth hotye n roe eno redton ury n n ded dry tte outon. Genetics Selection Evolution :. rndnt ynt, ., ree, ., e, ., hn, ., nd Fouher, F. (. enet ne o roe ontruted th ne rotete rer nd otn ontron oern trt. Tree Genetics & Genomes , . rh, .D., ton, J.., hd, .., ee, J., ron, ., nourt, ., et . (. ud D A reoure or nn euene, enotye, nd henotye to eerte otto reedn. The Plant Genome . ot, F. (. hene or eete rerted eeton n nt. Genetica , . un, .., ery, .., ery, A.., hn, ., nh, .., ur, ., et . (. A outon n ro urrent ttu nd uture roet. Theoretical and Applied Genetics , . un, ., Dn, J., Den, D., n, ., un, ., u, D., et . (. Drt enoe o the rut Actinidia chinensis. Nature communications :. u, D.. (. uere, n Fodder crops and amenity grasses, (ed. . oer, .. oet F. erone. rner, e or, . uee, A.., e, J., ee, J., Ahr, ., uyyru, ., Fn, D.D., et . (. Deeoent o Arry or otton nd hDenty n o ntr nd ntere outon o Gossypium . G3: Genes|Genomes|Genetics , . unter, . (. eot reonton the eene o heredty. Cold Spring Harbor perspectives in biology , . und, .. (. ontrnt on oyod eouton tet o the norty ytotye euon rne. Proceedings of the Royal Society of London. Series B: Biological Sciences , . und, .. (. et o nreedn on oen tue roth n dod nd tetrod Chamerion angustifolium Do oyod utton od n oen American journal of botany , .

283 ______References

nternational heat enome Seuencing onsortium (201). chromosomebased drat seuence o the heaploid bread heat (Triticum aestivum) genome. Science 1. sing, . (1). ytogenetic studies in Cyrtanthus . Segregation in an allotetraploid. Hereditas , 27. slam, .S., Thyssen, .., Jenins, J.., and ang, .. (201). etection, validation, and application o genotypingbyseuencing based single nucleotide polymorphisms in pland cotton. The Plant Genome . Jannoo, ., rivet, ., avid, J., hont, ., and lasmann, J. (200). ierential chromosome pairing ainities at meiosis in polyploid sugarcane revealed by molecular marers. Heredity , 07. Jannoo, N., Grivet, L., Dookun, A., D’hont, A., and Glaszmann, J.C. (1999). Linkage disequilibrium among modern sugarcane cultivars. Theoretical and Applied Genetics , 10100. Jansen, .. (1). nterval mapping o multiple uantitative trait loci. Genetics 1, 20211. Jansy, S. (200). hapter 2 reeding, enetics, and ultivar evelopment, in Advances in Potato Chemistry and Technology. (San iego cademic Press), 272. Jansy, S.H., harosi, .., ouches, .S., usmini, ., ichael, ., ethe, P.., et al. (201). einventing potato as a diploid inbred line–based crop. Crop Science , 112122. Jarvis, .., Ho, .S., ightoot, .J., Schmcel, S.., i, ., orm, T.J.., et al. (2017). The genome o Chenopodium quinoa. Nature 2:07. Jencesi, ., ber, ., rimaud, ., Huet, S., ucas, .., onod, H., et al. (200). PrBn, a maor gene controlling homeologous pairing in oilseed rape (Brassica napus) haploids. Genetics 1, . Je, J.., ee, S.., and Sherp, .. (201). The net green movement plant biology or the environment and sustainability. Science , 12112. Jiang, .., right, .J., li, K.., and Paterson, .H. (1). Polyploid ormation created uniue avenues or response to selection in Gossypium (cotton). Proceedings of the National Academy of Sciences , 12. Jiao, ., icett, .J., yyampalayam, S., handerbali, .S., andherr, ., alph, P.., et al. (2011). ncestral polyploidy in seed plants and angiosperms. Nature 7, 7100. John, ., and Henderson, S. (12). synapsis and polyploidy in Schistocerca paranensis. Chromosoma 1, 11117. Joly, S., ryant, ., and ochart, P.J. (201). leible methods or estimating genetic distances rom single nucleotide polymorphisms. Methods in Ecology and Evolution , . Joly, S., Starr, J.., eis, .H., and runeau, . (200). Polyploid and hybrid evolution in roses east o the ocy ountains. American Journal of Botany , 122. Jones, .., Jerry, .., Khatar, .S., aadsma, H.., an er Steen, H., Prochasa, J., et al. (2017). comparative integrated genebased linage and locus ordering by linage diseuilibrium map or the Paciic hite shrimp, Litopenaeus vannamei. Scientific Reports 7. Jones, ., and incent, J. (1). eiosis in autopolyploid Crepis capillaris. . utotetraploids. Genome 7, 70. Kaitani, ., Toshimoto, K., oguchi, H., Toyoda, ., gura, ., uno, ., et al. (201). icient de novo assembly o highly heteroygous genomes rom holegenome shotgun short reads. Genome Research 2, 11. Kantarsi, T., arson, S., hang, ., ehaan, ., orevit, J., nderson, J., et al. (2017). evelopment o the irst consensus genetic map o intermediate heatgrass (Thinopyrum intermedium) using genotypingbyseuencing. Theoretical and Applied Genetics 10, 1710. Kaur, S., ranci, .., and orster, J.. (2012). dentiication, characteriation and interpretation o single nucleotide seuence variation in allopolyploid crop species. Plant biotechnology journal 10, 12 1. ‐ Kempthorne, . (17). An introduction to genetic statistics. John iley Sons, e or. Kerr, .J., i, ., Tier, ., utosi, .., and crae, T.. (2012). se o the numerator relationship matri in genetic analysis o autopolyploid species. Theoretical and applied genetics 12, 1271 122. Khaaa, H..T., Sybenga, J., and llis, J.. (17). hromosome pairing and chiasma ormation in autopolyploids o dierent Lathyrus species. Genome 0, 7. Kihara, H., and no, T. (12). hromosomenahlen und systematische ruppierung der umerten. Zeitschrift für Zellforschung und mikroskopische Anatomie , 71.

284 ______References

im, C., Guo, ., ong, ., Chandnani, ., huang, L.., and aterson, A.. (1). Aliation o genoting b sequening tehnolog to a variet o ro breeding rograms. Plant Science , 1. im, ., lagnol, ., u, .., oomaian, C., Clark, .., ssoski, ., et al. (). eombination and linkage disequilibrium in Arabidopsis thaliana. Nature genetics 9, 11111. irov, ., an Laere, ., De iek, J., De eser, ., an o, N., and hrustaleva, L. (1). Anhoring linkage grous o the Rosa geneti ma to hsial hromosomes ith ramide and N markers. PloS one 9, e99. loosterman, ., Abelenda, J.A., Gomez, .D..C., ortin, ., De oer, J.., oitanih, ., et al. (1). Naturall ourring allele diversit allos otato ultivation in northern latitudes. Nature 9:. nott, ., Neale, D., eell, ., and ale, C. (199). ultile marker maing o quantitative trait loi in an outbred edigree o lobloll ine. Theoretical and Applied Genetics 9, 1. nott, .A., and ale, C.. (199). aimum likelihood maing o quantitative trait loi using ullsib amilies. Genetics 1, 1111. oningouoiran, C..., sselink, G.D., ukosavlev, ., ant estende, ..C., Gitonga, .., rens, .A., et al. (1). sing NAeq to assemble a rose transritome ith more than 1, ull length eressed genes and to develo the aghN k Aiom N arra or rose (Rosa L.). Frontiers in Plant Science :9. oningouoiran, C..., Gitonga, .., an, ., Dolstra, ., an Der Linden, C.G., an Der hoot, J., et al. (1). he mode o inheritane in tetraloid ut roses. Theoretical and Applied Genetics 1, 91. ooman, .J., osman, ., abatino, G.J., isser, D., an ulenbroek, J., De iek, J., et al. (). AL markers as a tool to reonstrut omle relationshis in the genus Rosa (Rosaceae). American Journal of Botany 9, . orte, A., and arlo, A. (1). he advantages and limitations o trait analsis ith GA a revie. Plant methods 9:9. osto, D. (19). ertilit and hromosome length. Correlations beteen hromosome length and viabilit o gametes in autoolloid lants. Journal of Heredity 1, . rebs, .L., and anok, J.. (199). arlating inbreeding deression and rerodutive suess in the highbush blueberr, Vaccinium corymbosum L. Theoretical and Applied Genetics 9, . riegner, A., Cervantes, J.C., urg, ., anga, .., and hang, D. (). A geneti linkage ma o seetotato Ipomoea batatas (L.) Lam. based on AL markers. Molecular Breeding 11, 19 1. rzinski, .., hein, J.., irol, ., Connors, J., Gasone, ., orsman, D., et al. (9). Ciros An inormation aestheti or omarative genomis. Genome Research 19, 191. Lahane, J., and ishko, .A. (1). N asertainment bias in oulation geneti analses h it is imortant, and ho to orret it. Bioessays , . Lamm, . (19). Ctogeneti studies in Solanum, set. uberarium. Hereditas 1, 11. Lamson, .A., and Cheeseman, .. (11). ensing entromere tension Aurora and the regulation o kinetohore untion. Trends in cell biology 1, 11. Le Comber, ., Ainouhe, ., ovarik, A., and Leith, A. (1). aking a untional diloid rom olsomi to disomi inheritane. New Phytologist 1, 111. Leah, L.J., ang, L., earse, .J., and Luo, . (1). ultilous tetrasomi linkage analsis using hidden arkov hain model. Proceedings of the National Academy of Sciences 1, . Lealertioli, ., hirasaa, ., Abernath, ., oretzsohn, ., Chavarro, C., Clevenger, J., et al. (1). etrasomi reombination is surrisingl requent in allotetraloid Arachis. Genetics 199, 19 11. Lentz, ., Crane, C., leer, D., and Loegering, . (19). An assessment o reerential hromosome airing at meiosis in Dactylis glomerata. Canadian Journal of Genetics and Cytology , . Levin, D.A. (19). inorit tote elusion in loal lant oulations. Taxon, . Levin, D.A. (19). olloid and Novelt in loering lants. The American Naturalist 1, 1. Lein, . (191). Field theory in social science: selected theoretical papers. London avistok. Liarhetti, C., Le ras, C., Chastellier, A., elion, D., orel, ., akr, ., et al. (1). D henoting and L analsis o a omle harater rose bush arhiteture. Tree Genetics & Genomes 1:11. Li, ., an, G., ang, ., un, ., uan, ., ong, G., et al. (1a). Genome sequene o the ultivated otton Gossypium arboreum. Nature genetics , .

285 ______References

Li, J., Das, ., u, G., ong, C., Li, ., obias, C., et al. (1). algorithm or maing quantitative trait loi in multivalent tetraloids. International Journal of Plant Genomics 1, 11. Li, ., an, ., ei, ., Ahara, A., armer, A.D., o, J., et al. (1b). Develoment o an alala N arra and its use to evaluate atterns o oulation struture and linkage disequilibrium. PLoS One 9, e9. Li, ., Li, L., and an, J. (1). Disseting meioti reombination based on tetrad analsis b single mirosore sequening in maize. Nature Communications :. LiebermanAiden, ., an erkum, N.L., illiams, L., makaev, ., agoz, ., elling, A., et al. (9). Comrehensive maing o longrange interations reveals olding riniles o the human genome. Science , 99. Limborg, .., eeb, L.., and eeb, J.. (1). orting duliated loi disentangles omleities o olloid genomes masked b genoting b sequening. Molecular ecology , 1119. Linde, ., attendor, A., aumann, ., and Debener, . (). oder milde resistane in roses L maing in dierent environments using seletive genoting. Theoretical and applied genetics 11, 1119. Lindhout, ., eier, D., hotte, ., utten, .C., isser, .G., and an k, .J. (11). oards 1 hbrid seed otato breeding. Potato Research , 11. Liorzou, ., ernet, A., Li, ., Chastellier, A., houroude, ., ihel, G., et al. (1). Nineteenth entur renh rose (Rosa s.) germlasm shos a shit over time rom a uroean to an Asian geneti bakground. Journal of experimental botany , 11. Liu, ., oulsen, .G., and Davis, .. (1). nsight into otoloid straberr (Fragaria) subgenome omosition revealed b G analsis o entaloid hbrids. Genome 9, 9. Llod, A., and omblies, . (1). eiosis in autoolloid and alloolloid Arabidopsis. Current opinion in plant biology , 111. Lu, ., ang, ., ong, C., Li, ., eng, ., ang, ., et al. (1). A multivalent threeoint linkage analsis model o autotetraloids. Briefings in bioinformatics 1, . Luo, ., akett, C., radsha, J., niol, J., and ilbourne, D. (1). Constrution o a geneti linkage ma in tetraloid seies using moleular markers. Genetics 1, 191. Luo, ., hang, ., and earse, . (). heoretial basis or geneti linkage analsis in autotetraloid seies. Proceedings of the National Academy of Sciences of the United States of America 11, . Luo, .., hang, ., Leah, L., hang, .., radsha, J.., and earse, .J. (). Construting Geneti Linkage as nder a etrasomi odel. Genetics 1, . able, ., Aleandrou, ., and alor, . (11). Genome duliation in amhibians and ish an etended snthesis. Journal of Zoology , 111. aenhout, . (1). rogeno. (Ghent niversit, elgium). alieaard, C., Jansen, J., and an oien, J. (199). Linkage analsis in a ullsib amil o an outbreeding lant seies overvie and onsequenes or aliations. Genetical Research , . amanova, L., Coe, A.J., ott, C.., ozarea, ., urner, .., umar, A., et al. (1). arget enrihment strategies or netgeneration sequening. Nature Methods , 11111. andkov, ., Jol, ., rzinski, ., ummenho, ., and Lsak, .A. (1). ast diloidization in lose mesoolloid relatives o Arabidopsis. The Plant Cell , 9. angelsdor, .C., and Jones, D.. (19). he eression o endelian ators in the gametohte o maize. Genetics 11:. arasekCiolakoska, A., Arens, ., and an ul, J. (1). he ole o olloidization and nterseii bridization in the reeding o rnamental Cros, in Polyploidy and Hybridization for Crop Improvement. CC ress), 1911. ason, A.. (1). Challenges o Genoting olloid eies, in Plant Genotyping: Methods and Protocols, ed. J. atle. (Ne ork, N ringer Ne ork), 111. ason, A.., iggins, .., nodon, .J., atle, J., tein, A., erner, C., et al. (1). A user guide to the Brassica 60K Illumina Infinium™ SNP genotyping array. Theoretical and Applied Genetics 1, 1. assa, A.N., anriqueCarintero, N.C., Coombs, J.J., arka, D.G., oone, A.., irk, .., et al. (1). Geneti linkage maing o eonomiall imortant traits in ultivated tetraloid otato (Solanum tuberosum L.). G3: Genes| Genomes| Genetics , . ather, . (19). edutional and equational searation o the hromosomes in bivalents and multivalents. Journal of Genetics , . ather, . (19). egregation and linkage in autotetraloids. Journal of Genetics , 1.

286 ______References

cllister .. an iller .. 06. Single nucleotie polymorphism iscoery ia genotyping y seuencing to assess population genetic structure an recurrent polyploiiation in Andropogon gerardii. American journal of botany 0, . callum S. raham . orgensen . olan .. Bassil N.. ancoc .. et al. 06. onstruction of a SNP an SS linage map in autotetraploi lueerry using genotyping y seuencing. Molecular Breeding 6:. camy P. olloay . u . unne .. Schart B.. Patton .. et al. 0. SNPase highensity linage map of oysiagrass Zoysia japonica Steu. an its use for the ientification of associate ith inter hariness. Molecular Breeding 0. clintoc B. . he significance of responses of the genome to challenge. Science 6, 0. collum .. . omparatie stuies of chromosome pairing in natural an inuce tetraploi Dactylis. Chromosoma , 60. conal .. ice .P. an esai .. 06. Se spees aaptation y altering the ynamics of molecular eolution. Nature , 6. crath .. an ung . 06. se of polyplois. interspecific an intergeneric ie hyris in sugar eet improement in Polyploidy and Hybridization for Crop Improvement. Press 0 0. enel .. 66. ersuche er Pflanenhyrien. Verhandlungen des naturforschenden Vereines in Brünn B. I hanlungen, –. ercier . ar . encesi . acaisne N. an relon . 0. he molecular iology of meiosis in plants. Annual review of plant biology 66, . euissen . ayes B. an oar . 06. enomic selection paraigm shift in animal reeing. Animal frontiers 6, 6. euissen ... ayes B.. an oar .. 00. Preiction of otal enetic alue sing enomeie ense arer aps. Genetics , . eyer . ilourne . acett . Brasha . cnichol . an augh . . inage analysis in tetraploi potato an association of marers ith uantitatie resistance to late light Phytophthora infestans. Molecular and General Genetics , 060. icrosoft orporation an eston S. 0. oParallel oreach parallel aaptor for the parallel pacage. R package version .0.. ilourne . Brasha .. an acett .. 00. olecular apping an Breeing in Polyploi rop Plants in Principles and Practices of Plant Genomics (Vol. 2: Molecular Breeding), es. . Kole .. ott. Science Pulishers . oncaa ..P. oar . ontoya .. onle . Spinel . an ccouch S. 06. genetic linage map of coffee Coffea arabica . an for yiel plant height an ean sie. Tree genetics & genomes :. onen . ara . aa . ahana . Koayashi . auchi . et al. 0. onstruction of a linage map ase on retrotransposon insertion polymorphisms in seetpotato ia highthroughput seuencing. Breeding science 6, . onroe .. llen .. anger P. ullen .. oell .. oyers B.. et al. 0. SPmap a tool maing use of traeling salesperson prolem solers in the efficient an accurate construction of highensity genetic linage maps. BioData mining 0:. oore . 0. eiosis in polyplois in Polyploid and hybrid genomics, es. .. hen .. Birchler. ohniley Sons Inc. . oragues . omaran . augh . ilne I. laell . an ussell .. 00. ffects of ascertainment ias an marer numer on estimations of arley iersity from highthroughput SNP genotype ata. Theoretical and applied genetics 0, . organ .. . anom segregation ersus coupling in enelian inheritance. Science , . orrison . an ahathy . 60. reuency of uarialents in autotetraploi plants. Nature :. otaei . e ier . iners . an aliepaar . 0a. riPoly a haplotype estimation approach for polyplois using seuencing ata of relate iniiuals. bioRxiv, doi 10.1101/163162. otaei . iners . aliepaar . an e ier . 0. ploiting netgeneration seuencing to sole the haplotyping pule in polyplois a simulation stuy. Briefings in bioinformatics, 0. uller . . hy polyploiy is rarer in animals than in plants. The American Naturalist , 6. ulligan .. 6. iploi an autotetraploi Physaria vitulifera ruciferae. Canadian Journal of Botany , .

287 ______References

Genome biology : Dactylis glomerata Botanical Gazette , Cytogenetic and genome research , Wiley Interdisciplinary Reviews: Computational Statistics , BMC genomics , Dioscorea alata Molecular Breeding , Nature , Frontiers in Plant Science Science , Brassica napus Plant Cell , Evolution by gene duplication. Wheat Inf Serv The American Naturalist , Nature Reviews Genetics , Cell , Annual review of genetics , and Evaluation of a High Density Genotyping ‘Axiom_Arachis’ Array with 58 K SNPs for Scientific Reports New Phytologist , PLoS One , Horticulture research : Rosa Journal of experimental botany , ‐ The Plant Journal ‐ Genetics , https://CRAN.R-project.org/package=nlme Proceedings of the 3rd international workshop on distributed statistical computing PloS one ,

288 ______References

Nature Genetic Reviews : Nature , Theoretical and Applied Genetics , Arabidopsis QUARTET QRT Science , Hyla versicolor Evolution , Nature , Theoretical and Applied Genetics , Paspalum notatum Sexual Plant Reproduction , Theoretical and Applied Genetics , Annual Review of Ecology and Systematics, Annual review of ecology and systematics, Bombyx Carlsberg Research Communications , Gossypium hirsutum Gossypium barbadense Scientific Reports Science , American journal of botany , Genetics Research , Genetics Research , Journal of heredity , Nature , Gene , Crop Science , Tree genetics & genomes : Genetics Research , The Plant Genome New Phytologist , ‐ ‐

289 ______References

Molecular Ecology , ‐ Higo chrysanthemum Dendranthema grandiflorum Journal of the Japanese Society for Horticultural Science , Nature plants : Arabidopsis thaliana Genetics , Planta , Phaseolus vulgaris Genetics : The Plant Genome Ranunculus kuepferi Ranunculaceae Plant Systematics and Evolution , Theoretical and Applied Genetics, Current opinion in plant biology , H1 Solanum tuberosum Plant Breeding , Frontiers in Plant Science The annals of statistics , Molecular Ecology Resources , ‐ PLoS One , Xenopus laevis Nature : G3: Genes| Genomes| Genetics , Scientific Reports Ipomoea batatas Scientific Reports Fragaria vesca Nature genetics , Acta Hort , Molecular breeding , The Plant Genome

290 ______References

Theoretical and applied genetics , Rosa Wild Crop Relatives: Genomic and Breeding Resources Plantation and Ornamental Crops American Journal of Botany , Current opinion in genetics & development , Proceedings of the National Academy of Sciences , Annual review of plant biology , — Nature biotechnology , Theoretical and Applied Genetics , Theoretical and Applied Genetics , The Plant Journal , Stebbins, G.L. (1947). Types of polyploids: their classification and significance. Advances in Genetics , Annals of Botany , Genetics , Journal of evolutionary biology , Urticaceen Jahrb Wiss Bot , American Journal of Human Genetics , Genetics , Drosophila Journal of Experimental Zoology Part A: Ecological Genetics and Physiology , ‐ BMC bioinformatics : Theoretical and Applied Genetics , Genomics , Genetics : Heredity , Theoretical and Applied Genetics ,

291 ______References

Solanum tuberosum Bibliographia Genetica , – General Cytogenetics. Chromosoma , Genome , Genome , Medicago truncatula BMC genomics : Crop Science , Beest, M., Le Roux, J.J., Richardson, D.M., Brysting, A.K., Suda, J., Kubešová, M., et al. (2011). The Annals of Botany , Streptococcus agalactiae “pangenome”. Proceedings of the National Academy of Sciences of the United States of America , G3: Genes, Genomes, Genetics , The Plant Genome Tree genetics & genomes , G3: Genes, Genomes, Genetics , Euphytica : Theoretical and Applied Genetics , Crop Science , PLoS One , Journal of the American Society for Horticultural Science , Ambystoma jeffersonianum Copeia , Nature Reviews Genetics , BMC Plant Biology : Current opinion in plant biology , Theoretical and Applied Genetics , Postharvest Biology and Technology ,

292 ______References

an eest, ., ost, A., Arens, ., isser, R.., and an Meeteren, . (201b). Breeding or postharvest perormance in chrysanthemum by selection against storageinduced degreening o dis lorets. Postharvest Biology and Technology 12, . an eest, ., oorrips, R.., sselin, D., ost, A., isser, R.., and Arens, . (201c). onclusive evidence or hexasomic inheritance in chrysanthemum based on analysis o a 1 S array. BMC genomics 1:. an oien, J. (200). MapTL , Sotare or the mapping o uantitative trait loci in experimental populations o diploid species. Kyama B.., ageningen, etherlands. an oien, J.. (12). Accuracy o mapping uantitative trait loci in autogamous species. Theoretical and Applied Genetics , 011. an oien, J.. (200). JoinMap , Sotare or the calculation o genetic linage maps in experimental populations. Kyazma B.V., Wageningen, The Netherlands. an oien, J.., and Jansen, J. (201). Genetic mapping in experimental populations. ambridge niversity ress. an s, ., Andreesi, S., Baer, ., Barrena, ., Bryan, .J., aromel, B., et al. (200). onstruction o a 10,000marer ultradense genetic recombination map o potato providing a rameor or accelerated gene isolation and a genomeide physical map. Genetics 1, 1010. an Tuyl, J.M., and Lim, K.B. (200). nterspeciic hybridisation and polyploidisation as tools in ornamental plant breeding. Acta Horticulturae 12, 122. anneste, K., Baele, ., Maere, S., and an De eer, . (201). Analysis o 1 plant genomes supports a ave o successul genome duplications in association ith the retaceous–aleogene boundary. Genome research 2, 11. anraden, .M. (200). icient Methods to ompute enomic redictions. Journal of Dairy Science 1, 12. igna, B.B., Santos, J.., Jungmann, L., Do alle, .B., Mollinari, M., astina, M.M., et al. (201). vidence o allopolyploidy in Urochloa humidicola based on cytological analysis and genetic linage mapping. PloS one 11, e01. leugels, T., euppens, B., nops, ., Lootens, ., an aris, .R., Smagghe, ., et al. (201). Models ith only to predictor variables can accurately predict seed yield in diploid and tetraploid red clover. Euphytica 20, 02. oorrips, R. (2002). Maphart sotare or the graphical presentation o linage maps and TLs. Journal of Heredity , . oorrips, R.., ort, ., and osman, B. (2011). enotype calling in tetraploid species rom biallelic marer data using mixture models. BMC Bioinformatics 1212. oorrips, R.., and Maliepaard, .A. (2012). The simulation o meiosis in diploid and tetraploid organisms using various genetic models. BMC Bioinformatics 1:2. os, .., aulo, M.J., oorrips, R.., isser, R.., an c, .J., and an eui, .A. (201). valuation o LD decay and various LDdecay estimators in simulated and Sarray data o tetraploid potato. Theoretical and Applied Genetics 10, 121. os, .., itdeilligen, J..a.M.L., oorrips, R.., isser, R..., and an c, .J. (201). Development and analysis o a 20K S array or potato (Solanum tuberosum) an insight into the breeding history. Theoretical and Applied Genetics 12, 2201. uosavlev, M., Arens, ., oorrips, R., an T estende, ., sselin, .D., Boure, .M., et al. (201). ighdensity Sbased genetic maps or the parents o an outcrossed and a seled tetraploid garden rose cross, inerred rom admixed progeny using the rose S array. Horticulture Research 102. ang, ., hang, ., Rose, S., and an Der Laan, M. (201a). A ovel Targeted Learning Method or uantitative Trait Loci Mapping. Genetics 1, 11. ang, K., ang, ., Li, ., e, ., ang, J., Song, ., et al. (2012). The drat genome o a diploid cotton Gossypium raimondii. Nature genetics , 10110. ang, S., ong, D., orrest, K., Allen, A., hao, S., uang, B.., et al. (201b). haracteriation o polyploid heat genomic diversity using a high density 0 000 single nucleotide polymorphism array. Plant biotechnology journal 12, . estergaard, M. (10). Studies on cytology and sex determination‐ in polyploid orms o Melandrium album. Dansk Botanisk Arkiv 10, 111. et, J.D., and arlan, J. (10). Apomixis, polyploidy, and speciation in Dichanthium. Evolution 2, 20 2.

293 ______References

International Journal of Plant Sciences , Arabidopsis thaliana Elife , Plant biotechnology journal , ‐ Rosa Botanical Journal of the Linnean Society , ‐ ‐ Plant Molecular Biology , Brachiaria decumbens Genetics , Theoretical and Applied Genetics , Genetics , Theoretical Population Biology , Journal of Computational Biology , Theoretical and applied genetics , Molecular Biology and Evolution Genetical Research , Briefings in Bioinformatics , Euphytica , Podosphaera pannosa European journal of plant pathology , Euphytica : Proceedings of the National Academy of Sciences , Nature Plants : Saccharum Molecular Breeding : Arabidopsis arenosa Current Biology , Medicago Nature , Frontiers in Plant Science

294 ______References

, Medicago sativa Molecular plant pathology , ‐ ‐ ‐ ‐ Proceedings of the National Academy of Sciences , Dendranthema morifolium Molecular breeding , Chrysanthemum morifolium Euphytica , Chrysanthemum morifolium Scientia Horticulturae , Chrysanthemum morifolium PLoS one , The Journal of Horticultural Science and Biotechnology , Molecular breeding , The Journal of Horticultural Science and Biotechnology , The American Journal of Human Genetics , Journal of the American Society for Horticultural Science , Medicago sativa PloS one , Nature Genetics , Genetics , bioRxiv, doi 10.1101/119271 Cold Spring Harbor perspectives in biology , Polyploidy and genome evolution

295

Summary Almost all higher organisms are diploid i.e. they carry two copies of each chromosome, one inherited from each parent. Although diploidy appears to be the norm in higher organisms, it is certainly not the rule. A small but significant proportion of higher organisms possess more than two copies of each chromosome. These are collectively termed “polyploids” and make up the central subject matter for this thesis.

We are now living in a time of unprecedented development in scientific tools, particularly for the life sciences. For many years diploid species have been the subject of genetic investigations, often with the aim of understanding the inheritance of specific traits. For polyploid species, many of which are important agricultural crops, this has often remained infeasible due to the increased complexity of polyploid genomics. This thesis represents part of the ongoing development of methods and software tools required to make polyploid genomics a reality, enabling genomics-assisted breeding decisions to be made for polyploid crops.

We begin in Chapter 2 with a review of the tools currently available to analyse polyploid populations. These tools encompass activities such as genotype assignment and marker dosage calling, haplotype assembly, linkage analysis and genetic map construction, genome-wide association analysis, genomic selection, the creation of reference sequence assemblies and the simulation of polyploid populations. Although introductory to the thesis, this review represents the current state of the art in polyploid software tools (and was compiled after the tools described in this thesis were developed).

Linkage maps represent a molecular karyotype of a species and its recombination landscape and have numerous applications such as in quantitative trait loci (QTL) analysis, synteny analysis, reference sequence assembly, fine mapping studies etc. They can also provide insights into meiosis and for polyploids, can be used to uncover some of the more curious features of polyploid meiosis. In Chapter 3 we generated a high- density linkage map from single nucleotide polymorphism (SNP) genotype data of a tetraploid potato population. This map allowed us to study the phenomenon of double reduction, where both copies of (part of) a chromatid end up being transmitted to an offspring. This was achieved using simplex x nulliplex markers only, which give unambiguous information about double reduction. We broadened our mapping approach to include all other SNP marker types in Chapter 4, while also investigating the effects of double reduction and non-random chromosomal pairing on the estimation of recombination frequency in autotetraploids. The topic of non-random, or preferential, chromosome pairing returns in Chapter 5, where we again used high-density linkage mapping to understand the patterns of meiosis in a tetraploid rose population. We

297 ______Summary discoered that pairin ainities aried beteen parents chromosomes and een chromosomal position in our population ith eidence or stronly preerential pairin localised to certain parts o a number o parental chromosomes s ell as dianosin this behaiour e corrected or it in our linkae analysis leadin to the creation o a tailored hihdensity enetic map or this important ornamental species his map alloed us to uncoer interestin enomic rearranements beteen rose and its closest seuenced relatie the oodland straberry Fragaria vesca and has since been used in eorts to create a reerence seuence assembly in rose n Chapter 6 e describe a ne sotare tool polymapR hich is reely aailable as an packae distributed throuh polymapR is currently able to enerate interated and phased linkae maps usin dosaescored marker data in triploid tetraploid and heaploid populations that ehibit random chromosomal pairin polysomic inheritance as ell as semental allotetraploids shoin preerential chromosome pairin and represents a timely addition to the suite o tools aailable to polyploid eneticists

n Chapter 7 e used polymapR to analyse a marker dataset rom a lare population o heaploid Chrysanthemum morifolium eneratin the irst hihdensity interated linkae map in this important ornamental species e used the phased map and marker dosae scores to enerate identitybydescent probabilities hich ormed the basis o a analysis or a number o important ornamental traits e return to the topic o double reduction in Chapter 8 uantiyin the eect o double reduction on the poer and precision o studies in autotetraploids s ell as ainin insiht into the mechanics o based analysis at the polyploid leel e eplored the eect o ariable marker densities on enetic mappin usin the enotypic inormation coeicient and ound that this measure has important implications or the results o polyploid studies e applied these methods in Chapter 9 to perorm a multienironment analysis or some important morpholoical traits in rose his alloed us not only to describe the enetic architecture o these traits but to test hether the enironment has a stron inluence on the phenotypic epression o reat releance to breeders and researchers perormin studies aay rom the taret enironment n Chapter 10 these deelopments orm the basis o polyqtlR an packae to perorm interal mappin in autopolyploid populations inally in Chapter 11 take a step back rom the technical innoations presented in this thesis considerin the impact that these tools can hae on our understandin o and our ability to breed polyploid crops

t is hoped that the methods and tools described in this thesis ill simpliy enetic mappin in polyploids and acilitate enomicsassisted breedin decisions in the uture or these ascinatin species

298 Acknowledgements Being one of the most-read sections of a thesis, I am aware of the responsibility to properly thank all the people who in some way or another contributed to the work described within these covers. Given that the list is long and space is short, I will do my best, but please accept my apologies if you came here looking for your name and failed to find it! I had planned a special dedication to you further down in white ink, but the printers were unable to render it legibly...

But back to the serious topic of thanking people, first of all to my daily supervisors Chris Maliepaard and Roeland Voorrips, I wish to thank you both for your time, enthusiasm and expertise, and for inviting me to do this PhD in the first place. Our fortnightly supervision meetings were both friendly and focused, and we always seemed to end up laughing at some stage. Both of you contributed much to the content of the thesis, but also to the quality of the written output. Roeland, I hadn’t fully appreciated what the term “scrupulous” meant until I received feedback from you. Your attention to detail and command of the English language are impressive, indeed I am often left with the feeling that your English is more English than mine. Thank you both for enduring the editorial demands of this somewhat frenetic writer. Chris, thanks also for giving me the room to follow my own research goals during this work. Together with Roeland you helped define the work to be done, but you allowed me the space to develop many of the details. I think this is one of the strengths of a good supervisor and I’m very aware of how lucky I’ve been with the supervision team I’ve had here.

To my promoter Richard Visser, you provided not only the opportunity to do the PhD within the department of plant breeding, but also very valuable feedback on my project, particularly the written output. After 2.5 years you told me I could finish up whenever I liked (you were probably getting worried that you would have no free weekends left if I kept writing) – this really spurred me on to “opschiet een beetje” and encouraged me to finish writing the thesis in 3.5 years instead of the usual 4 (or 5!). You have always been approachable and friendly, and your dedication to the department is second to none. Thanks to my external supervisor Duur Aanen for making time for our annual meetings which I enjoyed. Thankfully you weren’t required to mediate at any stage! Within the department there are also a large number of individuals that have helped in many ways. I would like to thank René Smulders for the many interesting conversations and collaborations we’ve had, particularly on rose genetics but also more generally on quantitative genetics in polyploids. To Herman van Eck, like René you are also one of the people that helped me think more deeply and consider alternative ways to look at things. You have also given me a fuller appreciation of the power of a good question,

299 ______Acknowledgements and I must admit I am slightly relieved that you are not eligible to sit on the opponents’ benh for my defene hans to Paul Arens for the help and feedba during my proet as well as being suh a friendly olleague with whom I ould have a hat about “gewoon dinges” as well. You have helped me feel more at home in the Netherlands. hans also to Eric van de Weg you possess both a deep nowledge on geneti mapping as well as a metiulousness that has helped separate the signal from the noise in a way no omputer ould. Your ideas and uestions helped broaden the boundaries of the wor in this thesis. I also appreiated your enthusiasm for my presenting style partiularly when it led to my presenting hapter of this thesis to the polyploids session of the lant and nimal enome onferene in . hans to Gerard van der Linden and Christian Bachem my thesis supervisors. hristian you managed to onvine me to stay and do a h in plant breeding despite my advaning years. I probably wouldn’t still be here otherwise here are also many other people that deserve a mention who have been involved in my proet be it inspirational or otherwise suh as Arwa Shahin Carole Koning-Boucoiran Chaozhi Zheng Danny Esselink Ehsan Motazedi Frans Krens Giorgio Tumino Herma Koehorst-van Putten who along with Michela Appiano helped fly the flag at my wedding Johan Willemsen Katherine Preedy Konrad Zych Linda Kodde, Marco Bink Peter Vos Richard Finkers Robert van Loo Ronald Hutten and Thijs van Dijk. hans also to everyone who ept the ournal lub going over the years hopefully it will ontinue into the future

speial thans to Christine Hackett for aepting my reuest to oin io for a month in . hris you have been one of the leading lights in polyploid geneti mapping for many years and it was a great eperiene to wor together with you. ur ollaboration led diretly to hapter of this thesis but also helped greatly in developing hapters and . I would lie to than Hans Jansen for helping to define the goals in polyploid linage mapping early on in the proet and for your infetious enthusiasm for mapping and genetis. I also want to than Johan van Ooijen for being so generous with your time and advie in helping us understand some of the mapping steps in oinap and trying to translate these ideas into polymapR. hanyou to Geert van Geest for all the wor in improving polymapR you too a set of funtions and helped turn them into a great paage. I learned a lot about programming from you and our ollaboration really helped push forward the progress in both of our proets.

I also want to anowledge the ontribution of the students I osupervised - Twan Kranenburg Patrick Wissink Yanlin Liao and Jitpanu Yamjabok. ll of you helped my proet by eploring new ideas and I learnt a lot from the eperiene of supervision. hans to my paranymphs Jordi Petit-Pedro and Michiel Klaassen for your meditative pose and perfet enuniation of any and all propositions during my

300 ______Acknowledgements defene as well as being allround great guys. hans to my present and former offie mates Anne Giesbers Charlie Chen Deniz Gol Ehsan Motazedi Mas Muniroh Michela Appiano Myluska Caro-Rios Narges Sedaghat Telgerd Yi Wu and Ying Liu and everyone from my short stay in . for putting up with a olleague that never goes away to the lab or the greenhouse – the offie was my greenhouse the computer was my lab! To the many other friends and acquaintances I’ve made along the way thans for maing my time at plant breeding so enjoyable. I won’t list you all by name as that would involve literally listing everyone at and uite a few people at the other side of adi as well so you now who you are I should lie to partiularly mention Anne Giesbers Jeroen Berg and Tim van der Weijde with whom I travelled to Naning hina for the international graduate onferene in . part from being a great bunch of adventurous spirits, you also put up with my “wat vervelend Nederlands” and I thin the wee and a half I spent in hina really paid dividends for my ability to communicate using your “ingewikkeld taal”. o the seretaries past and present in the department of plant breeding Nicole Trefflich Letty Dijker Daniëlle van der Wee and Janneke van Deursen than you all for being suh a friendly and effiient team eeping the wheels in motion within the department. You have always been very helpful with any problems or uestions that I had and always did it with a smile. speial thans to Niole for booing my thesis defene first thing on a riday morning on your day off I also want to than all the polyploid proet partners who were there throughout my proet during presentations and worshops as well as the oasional offsite visit whih I really enoyed. Your feedba was essential and I thin the involvement of so many partners with so many diverse polyploid rops has really helped broaden the horions of polyploid genetis here in ageningen. o my fellow members of the h ounil – Amalia Francesca Jarst Jesse Jordi Kiki Kim Malaika Manos Martina Sandeep Sara Shanice Tieme and Tom I thin we ahieved a lot together on the ounil and had some fun along the way hans also to Douwe Zuidema Ingrid Vleghels and Ria Fonteyn from for all the help. I also want to than the reading committee (my “opponents”) for agreeing to read this thesis – without forenowledge about its length. hey say brevity is the soul of wit – apologies for not writing a wittier thesis then

inally I would lie to anowledge my family and in partiular my parents who have always enouraged and supported me despite the many twists and turns I have taen along the way. han you for trusting that it will all wor out one day as indeed I hope someday it shall. final word of thans to my lovely wife lena who has been a onstant soure of support and enouragement throughout. y the time these words are in print we will already be needeep in nappies and I loo forward to ontinue sharing all the happiness in the world together with you.

301 About the author

Peter Michael Bourke was born on 28th July 1982 in Cork, Ireland. In 2001 he represented his country twice – at the World Schools’ Debating Championship in Johannesburg, South Africa and the International Mathematical Olympiad in Washington DC, USA. He continued speaking and doing maths during his bachelors studies, graduating with joint first class honours in mathematics and physics from University College Cork. Having completed his studies, he bought a bicycle and headed overland and sea for China, with an English-language teaching qualification in his bag. Unfortunately, Russian immigration regulations and its winter intervened, forcing him to about-turn and return to western Europe, eventually pedalling 2300m above sea level to look for work in a ski resort in the French Alps. Having survived the winter (and learnt how to ski), he enrolled the following spring in a course on practical sustainability in Kinsale F.E.C., Cork. This led to a job with the Irish Seed Savers Association, an organisation dedicated to conserving open-pollinated landrace varieties of vegetables, grains and fruit trees in Ireland. After four years working as the garden co-ordinator he again took to his bicycle, visiting and working with organic breeding companies and seedbanks across Europe on a 10,000 km cycling tour. A love of cycling and a desire to learn more about plant breeding brought him to the Netherlands in 2012 where he enrolled in the MSc plant sciences program at Wageningen University (specialisation plant breeding and genetic resources). During his masters he was fortunate to do a minor thesis with Chris Maliepaard and Roeland Voorrips in the department of plant breeding, who offered him the possibility to continue this research as part of a PhD. In 2016 he was awarded an EMBO short-term fellowship to spend one month working in the group of Dr. Christine Hackett at BioSS, Dundee, Scotland. The results of his PhD project, started in September 2014, are described in this thesis.

302 List of publications

Bourke, P.M. an eest oorrips ansen ranenbrg Shahin et al.

polmap – linage analsis and genetic map constrction rom poplations o otcrossing polploids Bioinformatics (in press) doi bioinormaticsbt

ibrand Saintant ttin Bourke, P.M. et al. highalit genome seence o Rosa chinensis to elcidate ornamental traits Nature Plants (accepted)

Bourke, P.M. oorrips isser and aliepaard C ools or genetic stdies in eperimental poplations o polploids Frontiers in Plant Science doi pls

an eest Bourke, P.M. et al. n ltradense integrated linage map or heaploid chrsanthemm enables mltiallelic analsis Theoretical & Applied Genetics

Bourke, P.M. rens oorrips sselin D oningocoiran C S et al. artial preerential chromosome pairing is genotpe dependent in tetraploid rose The Plant Journal

osale rens Bourke, P.M. et al. ighdensit Sbased genetic maps or the parents o an otcrossed and a seled tetraploid garden rose cross inerred rom admied progen sing the rose S arra Horticulture Research

Bourke, P.M. oorrips ranenbrg ansen isser aliepaard C ntegrating haplotpespeciic linage maps in a tetraploid species sing S marers Theoretical & Applied Genetics

Bourke, P.M. oorrips isser and aliepaard C he doble redction landscape in tetraploid potato as reealed b a highdensit linage map Genetics

303 Education Statement of the Graduate School Experimental Plant Sciences

Issued to: Peter M. Bourke Date: 15 June 2018 Group: Laboratory of Plant Breeding University: Wageningen University & Research, Wageningen, The Netherlands

1) Start-up phase date ► First presentation of your project Title: QTL analysis in polyploids - models and methods 23 Jan 2015 ► Writing or rewriting a project proposal Title: Models and Methods for QTL analysis in polyploid crops 09 Dec 2014 ► Writing a review or book chapter "Tools for genetic studies in experimental populations of polyploids", Front. Plant Sci (2018), 9:513. doi: 10.3389/fpls.2018.00513 28 Dec 2017 ► MSc courses Programming in Python (INF-22306) Oct-Dec 2014 ► Laboratory use of isotopes Subtotal Start-up Phase 13.5 credits*

2) Scientific Exposure date ► EPS PhD student days EPS PhD student day 'Get2Gether', Soest, NL 29-30 Jan 2015 EPS PhD student day 'Get2Gether', Soest, NL 28-29 Jan 2016 EPS PhD student day 'Get2Gether', Soest, NL 09-10 Feb 2017 EPS PhD student day 'Get2Gether', Soest, NL 15-16 Feb 2018 ► EPS theme symposia EPS Theme 4 symposium 'Genome Biology', Amsterdam, NL 15 Dec 2015 EPS Theme 4 symposium 'Genome Biology', Wageningen, NL 16 Dec 2016 ► National meetings (e.g. Lunteren days) and other National Platforms Annual meeting 'Experimental Plant Sciences', Lunteren, NL 14-15 Apr 2014 Annual meeting 'Experimental Plant Sciences', Lunteren, NL 13-14 Apr 2015 Annual meeting 'Experimental Plant Sciences', Lunteren, NL 10-11 Apr 2017 ► Seminars (series), workshops and symposia EPS symposium: Omics advances for academia and industry 11 Dec 2014 EPS flying seminar: Whole-genome duplications, Yves van de Peer 02 Feb 2015 EPS flying seminar: Comparative & functional genomics of polyploidy, Jeff J. Doyle 12 May 2015 Biometris Symposium: From big data to biological solutions 18 Jun 2015 EPS flying seminar: Perianth evolution in Ranunculaceae, Sophie Nadot 20 May 2016 EPS seminar: Alternative splicing of a heat stress transcription factor, Sotirios Fragkostefanakis 02 Nov 2016 Seminar: Automated tetraploid genotype calling and its application to pedigree reconstruction in potato, Jeffrey Endelman 16 Nov 2016 Seminar: From QTLs to routine DNA-informed breeding: prospects, advances and needs, Cameron Peace 16 Nov 2016 Seminar: Genome-wide association analysis and prediction in tetraploid potato, Jeffrey Endelman 18 Nov 2016 EPS symposium 'WURomics-Technology-driven innovations for plant breeding', Wageningen, NL 15 Dec 2016 Seminar: Impact of ploidy level & genome evolution on the control of the frequency & distribution of recombination events in Brassicas, Alexander Pele 04 Jul 2017 Seminar: Breeding differently - The digital revolution: high-throughput phenotyping and genotyping, Tony Slater 17 Jul 2017 WEES seminar: From sex chromosomes to sex determination in Ledidoptera, Frantisek Marec 25 Oct 2017 ► Seminar plus ► International symposia and congresses EUCARPIA section Biometrics in Plant Breeding, Wageningen NL 09-11 Sep 2015 2nd International graduate conference Nanjing, China 27-30 Oct 2015 Plant and Animal Genome XXIV, San Diego CA. 09-13 Jan 2016 ► Presentations

304 Talks: Polyploids consortium meeting, Wageningen, NL 22 Jan 2015 B-WISE bioinformatics seminar, Wageningen, NL 07 Apr 2015 Polyploids consortium meeting, Wageningen, NL 25 Jun 2015 EUCARPIA Biometrics section in Plant Breeding 2015, Wageningen, NL 11 Sep 2015 2nd International graduate conference Nanjing, China (invited speaker) 27 Oct 2015 Plant and Animal Genome XXIV, San Diego CA, USA 10 Jan 2016 Polyploids consortium meeting, Wageningen, NL 04 Feb 2016 BioSS General Meeting, James Hutton Institute, Dundee, Scotland 26 Apr 2016 Plant meets Animal, Wageningen, NL 22 Jun 2016 EPS Theme 4 symposium, Wageningen, NL 16 Dec 2016 AWL Lunteren 2017 11 Apr 2017 Ernst van der Ende visits Plant Breeding, Wageningen, NL (flash presentation) 08 May 2017 Dümmen Orange B.V., De Lier, Netherlands (invited speaker) 01 Jun 2017 Polyploids consortium meeting, Wageningen, NL 27 Jun 2017 Polyploids consortium meeting, Wageningen, NL 29 Nov 2017 Poster: AWL Lunteren 2015 13-14 Apr 2015 ► IAB interview ► Excursions Visit to Enza Zaden (EPS trip) 12 Jun 2015 Subtotal Scientific Exposure 24.4 credits*

3) In-Depth Studies date ► EPS courses or other PhD courses Introduction to theory and implementation of genomic selection 13-17 Oct 2014 Bayesian statistics 20-21 Oct 2014 ► Journal club Member of literature discussion group Quantitative Genetics 2015-2018 ► Individual research training BioSS, James Hutton Institute, Invergowrie, Dundee, Scotland. 04 Apr-04 May 2016 Subtotal In-Depth Studies 6.5 credits*

4) Personal development date ► Skill training courses Competence Assessment (CA) 04 Nov 2014 EPS Introduction course 20 Jan 2015 Scientific Publishing 15 Oct 2015 Brain Training 20 Sep 2016 Reviewing a scientific paper (RSP) 22 Sep 2016 Teaching Outside of Academia Sep-Dec 2017 ► Organisation of PhD students day, course or conference EPS Get2Gether 2017 (Sponsorship, Abstract book, Company stalls) 2016-2017 EPS Get2Gether 2018 (Speakers, Poster) 2017-2018 ► Membership of Board, Committee or PhD council Mar 2016-Mar EPS Council member 2018 Subtotal Personal Development 9.6 credits*

TOTAL NUMBER OF CREDIT POINTS* 54.0

Herewith the Graduate School declares that the PhD candidate has complied with the educational requirements set by the Educational Committee of EPS which comprises of a minimum total of 30 ECTS credits

* A credit represents a normative study load of 28 hours of study.

305

GeneticGenetic mapping mapping InvitationInvitation Genetic mapping in polyploids Genetic Genetic mapping in polyploids Genetic

in polyploidsin polyploids You are cordiallyYou are invitedcordially to invited to attend theattend public the defense public defense of my PhDof thesis my PhD entitled: thesis entitled:

GeneticGenetic mapping mapping in polyploidsin polyploids

on Fridayon 15 Fridayth June 152018th June 2018 at 11:00 in at the 11:00 Aula in ofthe Aula of WageningenWageningen University, University, Generaal GeneraalFoulkesweg Foulkesweg 1, 1, Wageningen.Wageningen. Peter M. Bourke Peter M. Bourke Peter BourkePeter Bourke [email protected]@wur.nl

ParanymphsParanymphs

Michiel KlaassenMichiel Klaassen Peter PeterM. Bourke M. Bourke [email protected]@wur.nl 2018 2018 Jordi PetitJordi Pedro Petit Pedro [email protected]@wur.nl