<<

Genetic Analysis on oleracea breeding traits Wide Association Study in a diverse B. oleracea collection

Wageningen University & Research

Plant Breeding Department

Growth & Development Group

Author: Harm Brouwer

Registration number: 951028134030

Thesis code: PBR-80436 MSc Thesis Breeding

Supervisors: dr. ir. A. B. Bonnema

dr. M. J. Caldas Paulo

Date: May 2018

Copyright ©

No part of this publication may be reproduced or published in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the head of the Laboratory of of Wageningen University, The Netherlands.

Niets uit dit verslag mag worden verveelvoudigd en/of openbaar gemaakt door middel van druk, fotokopie of welke andere wijze ook, zonder voorafgaande schriftelijke toestemming van de hoogleraar van de Laboratorium van Planten Veredeling van de Wageningen Universiteit.

Page | 1

Table of Contents List of Tables & Figures ...... 3 Abstract ...... 4 Introduction ...... 5 Brassicacae Domestication ...... 5 Morphology ...... 7 Candidate Genes ...... 10 Research Questions ...... 13 Research Methodology ...... 14 The B. oleracea Collection...... 14 Genotypic Data ...... 15 Field Trial ...... 15 Phenotyping ...... 16 Population Structure ...... 18 GWAS ...... 19 Results ...... 21 LD ...... 21 Phenotyping ...... 24 Population Structure ...... 25 GWAS: TASSEL Analysis ...... 28 Conclusions & Discussion ...... 35 Acknowledgements ...... 41 Bibliography ...... 42 Appendices ...... 45

Page | 2

List of Tables & Figures Here follow lists of the tables and figures included in this thesis (excl. the appendices).

Tables:

❖ Table 1: pg. 10 Candidate genes ❖ Table 2: pg. 15 Filed trial ❖ Table 3: pg. 16 Harvest dates ❖ Table 4: pg. 16 Traits ❖ Table 5: pg. 21 LD, Centromeres ❖ Table 6: pg. 25 PCO ❖ Table 7: pg. 33 Genes of Interest ❖ Table 8: pg. 34 New Candidate genes ❖ Table 9: pg. 40 GOI comparison

Figures:

❖ Figure 1: pg. 5 ❖ Figure 2: pg. 6 B. oleracea Morphotypes ❖ Figure 3: pg. 9 Leaf Formation ❖ Figure 4: pg. 12 LD decay plots ❖ Figure 5: pg. 14 Our B. oleracea collection ❖ Figure 6: pg. 14 subset collection ❖ Figure 7: pg. 19 STRUCTURE grouping ❖ Figure 8: pg. 19 Cabbage subset STRUCTURE ❖ Figure 9: pg. 21 LD triangle plots ❖ Figure 10: pg. 22 LD 1 ❖ Figure 11: pg. 23 LD between ❖ Figure 12: pg. 23 LD Hybrids vs Accessions ❖ Figure 13: pg. 24 LD without rare-allele accessions ❖ Figure 14: pg. 25 Boxplot traits ❖ Figure 15: pg. 26 PCO full dataset ❖ Figure 16: pg. 27 Cabbage subset PCO ❖ Figure 17: pg. 28 Histograms ❖ Figure 18: pg. 29 QQ plots ❖ Figure 19: pg. 30 PP plots ❖ Figure 20/21/22: pg. 31 Manhattan plots ❖ Figure 23: pg. 32 Effect plot

Page | 3

Abstract is the genetically and morphologically most diverse in the economically important Brassica genus. Many of its diverse range of morphotypes depend on altered leaf morphology to form the harvested product. The aim of this thesis was to elucidate the relations between the leaf phenotypes we observe and the genetic variation in B. oleracea. This was done by conducting a GWAS on a SNP dataset of 18,580 markers and phenotypic data from 842 diverse accessions. The phenotypic data consisted of measurements of from a 2017 WUR field trial and core length measurements from data of a 2016 trial. The collection includes many B. oleracea morphotypes and a mix of modern hybrids and wild and cultivated gene-bank accessions. To correct for the differing degree of relatedness between the accessions a population structure correction was used. An existing structure correction from the program STRUCTURE was compared to a PCO correction to determine which is most effective. The PCO proved better at removing false positives but the correction can be further improved. The GWAS was conducted with TASSEL software and using different criteria a number of maker-trait associations were analysed further. The criteria centred around markers needing to have significant associations, being unaffected by population structure and showing a clear phenotypic effect. Genomic regions of 100 Kb surrounding these markers, were analysed in the BolBase/BRAD genome browser to find genes of interest. A reverse approach was also done to see if there were significant marker-trait associations in our data near known genes of interest (GOI) from literature and previous studies. In total 26 GOI on were found in the regions surrounding 16 markers, of the 26 GOI 7 were identified from the reverse approach. These new genes feature several genes related to previously identified GOI such as various MYB and NAC transcription factors and UBPs but many are new genes that interact with better known leaf development genes. The other genes are: ULT2/RH10/HDA5/NPR6/BOP1/KAN2/COL4/JUB1/TOE2/APRR3/MAGPIE/TCP4/RAV2/ ROT3/CD48B/SPR1/ICR1/ROP/PSY1R/RALF1/CUL1/FAF2.

Page | 4

Introduction Brassicacae Domestication The myriad shapes, colours and products produced by the Brassica genus is unparalleled amongst agricultural crops. The Brassica genus contains oilseed, forage, condiment and products, for which all the different plant organs are used: seeds, , buds, leaves, roots and stems (Cartea, Lema, Francisco, & Velasco, 2011). This morphological variety has made the Brassica genus an agricultural kingpin (Arias, Beilstein, Tang, McKain, & Pires, 2014). The genus’ incredible diversity has led to multiple domestication events for Brassica species, using the different species and hybrids to form morphotypes for industrial products, food or even medicine.

The Brassica genus consists of 37 species, of which four are of large agricultural importance: Brassica oleracea, , and Brassica napus (Cartea et al., 2011). These four crops are part of the so-called triangle of U, a classification which visualises the heritage of six Brassica species (Figure 1) (U, 1935). Here B. rapa and B. oleracea form a triangle with B. nigra (S. e. a. Liu, 2014). These three diploid species can interbreed and form species with allotetraploid . These species are B. juncea, B. napus and B. carinata (Arias et al., 2014; Feng Cheng, Wu, & Wang, 2014; U, 1935).

Figure 1: The triangle of U with the main diploid Brassica species and the allotetraploid hybrids they can form by interbreeding (U, 1935). Latin names, an example of the main agricultural crops and the chromosome number is given. B. rapa with n=10 can cross with B. nigra, n=8 to form B. juncea, B. nigra and B. oleracea, n=9, can cross to form B. carinata n=17 and rapa and oleracea can form B. napus, n=19. B. rapa and B. oleracea are the main vegetable crops, with B. napus being a major oilseed crop while the remaining three are part of the mustard family and are applied as condiments (Cartea et al., 2011; Feng Cheng et al., 2014).

The three diploid that form the triangle of U arose from a Whole Genome Triplication (WGT) event in a common ancestor around 9-15 million years ago. This triplication plays an important role in Brassica and the large variation within the species we observe today (Schranz, Lysak, & Mitchell-Olds, 2007). The theory is that the polyploidisation happened in two steps. Two ancestral genomes were merged and developed into a new diploid. This most likely occurred through allotetraploidisation but it may have been autotetraploidisation event as well. The diploid that developed from this doubling then merged with a third ancestral genome forming a hexaploid, which

Page | 5 again stabilised into a diploid genome. This resulted in B. rapa maintaining a genome of around 485 Mb and B. oleracea about 630 Mb. The WGT promoted reshuffling of genes, this reshuffling combined with subsequent chromosome reduction and biased gene retention caused specific sub- functionalisation and neofunctionalization of genes. By maintaining specific genes and losing others the genome could diversify and specify into different directions (Feng Cheng et al., 2014; Town et al., 2006). The WGT allowed parallel evolution in the and is facilitated the development of the many different but also similar modern day morphotypes (Feng Cheng et al., 2016).

The domestication of B. rapa is believed to have occurred in eastern , with initial domestication focussed on the morphotypes, whose diversity centre is located there, at around 2500-2000 BC. These domesticated plants then spread eastward into Asia, where more leafy morphotypes were developed around 1000 BC (Cartea et al., 2011). The exact domestication of B. oleracea is still somewhat disagreed upon in literature. The scenario upon which most researchers can agree is that B. oleracea has always grown in the wild on the Atlantic coasts of Europe, as the diversity centre of the species lies in this region (Cartea et al., 2011; Maggioni, von Bothmer, Poulsen, & Branca, 2010). At the Atlantic coast it is likely to have been cultivated by Celts in its primitive form as -like morphotypes, as these wild Atlantic plants are most closely linked to the cultivated species on a genetical level (Bonnema, Del Carpio, & Zhao, 2011; Maggioni et al., 2010). When these kales were eventually brought to the Eastern Mediterranean region they became fully domesticated and started an explosive diversification, giving rise to an enormous range of cultivated forms (Bonnema et al., 2011). The existence of domesticated Brassicas in the Mediterranean is proven by early signs of Brassica crops in Greek and Roman remains and texts from the 6th century BC. The lack of more ancient evidence from for example the Egyptian or fertile crescent cultures is used as further for support the Atlantic domestication theory (Maggioni et al., 2010).

The greatest variation of all Brassica crops is found in the B. oleracea species, which contains many vegetable and ornamental morphotypes (Figure 2). This large variation within the species gives B. oleracea a complex . B. oleracea is the most economically important of the Brassica species. The crop is in popular demand and global production is rising (S. e. a. Liu, 2014). At the same time there is a clear shift in acreage, with increasing production in developing countries and a diminishing production in Europe (Cartea et al., 2011).

Kingdom: Plantae Clade: Angiosperms Clade: Clade: Order: Family: Brassicacae Genus: Brassica Species: B. oleracea

Figure 2: The commonly cultivated morphotypes of B. oleracea, adapted from Feng Cheng et al. (2014). From top left to the middle right are shown: heading cabbage, brussels sprouts, , , , purple cauliflower, collards, and a diverse set of (ornamental) morphotypes on the bottom row. The classification of B. oleracea is given on the right, including the various clades to which it is counted.

Page | 6

B. oleracea is known for its physiological adaptations (Arias et al., 2014), stemming from its original habitat of high salt and limestone environments (Bonnema et al., 2011). The species is adapted to these harsh environments and this is part of the reason why it has been developed into such a successful crop. B. oleracea has unparalleled levels of genome diversity that give it a wide range of natural pest, drought, heavy metal and salinity tolerances and resistances that make it a suitable crop for (Cartea et al., 2011).

Many different crop-types now exist: (red, white, savoy, conical), broccoli and Romanesco, , kales, Brussels sprouts, greens, ’s and ornamental subspecies are all widely cultivated (Bonnema et al., 2011; Cartea et al., 2011). The oldest morphotypes are the kales, of which many ancient versions exist, and they are closest to the wild progenitors in appearance. They feature the classic, fleshy and thick leaves where the biennial B. oleracea stores its energy in the first year, before flowering in the next year (Bonnema et al., 2011). The other morphotypes developed from the use of other parts of the plant: cabbage from the use of the enlarged apical bud, the kohlrabi’s from swollen stems, the arrested inflorescences and floral meristems of cauliflower and broccoli or the many axillary buds of Brussels sprouts (Cartea et al., 2011).

The popularity of Brassica crops is on the rise due to health benefits associated with their high nutritional value. The crops are low in and proteins, while being high in fibre, and content. Brassicas naturally contain various anti-cancer, anti-oxidant and the anti-inflammatory compounds, the most well-known of which are the (Cartea et al., 2011). Of all crop plants the Brassicas are the most closely related to the model plant Arabidopsis thaliana (Town et al., 2006). However, to study them even more effectively recently fast lifecycle lines of different Brassicas were developed to use as same-species model plants in experiments (Cartea et al., 2011). Leaf Morphology The diversity of crops in B. oleracea is largely linked to the diversity in leaf shape and development, from growing large leafy kales, to the closely bunched leaf meristem of cabbages or the folding leaves that keep a cauliflower white. How leaves develop is crucial to understanding how plants work, for these organs provide plants with the energy needed for growth and many plant organs are developed from modified leaves (Barkoulas, Galinha, Grigg, & Tsiantis, 2007; Gonzalez, Vanhaeren, & Inze, 2012; Hepworth & Lenhard, 2014; Tsukaya, 2013).

The development of leaves has had infinitely more effect on the plants of today than either the discovery of fire, the wheel or agriculture has had on the lives of humans. Leaves are the most important plant organs, they form the basis of lateral organs in plants. Since plants are the start of most food chains, leaves subsequently form the basis of nearly all living organisms today. Scales, bracts, needles, even the iconic ‘pitchers’ of Nephetes plants are all formed from modified leaves. Understanding leaf development is therefore of crucial importance. Leaves are literally the platform on which photosynthesis, respiration and photoreception depend. These basic processes guide many other plant cycles and define plants as we know them today. Leaf formation from a bud on a branch seems simple but it is actually a rather intricate process, as often is the case with such fundamental things (Tsukaya, 2013).

Many leaf mutants were already studied over 50 years ago in Arabidopsis, but not from a developmental perspective, save a few, and only in later decades did this research truly begin to develop. Genetic studies have helped shed more light on the processes involved, yet many aspects of leaf formation remain a mystery (Tsukaya, 2013). Leaves evolved from branches by acquiring a more determinate growth and a flat structure (Bar & Ori, 2014). The basic leaf structure is comparable to that of a biological solar panel, with a large flat surface to capture the light and mechanisms that

Page | 7 minimise shading of other leaves by elongating the petioles (Braybrook & Kuhlemeier, 2010; Tsukaya, 2013). There are different forms of leaves, simple leaves and compound leaves, which are actually a combination of lateral branches and simple leaves (Bar & Ori, 2014; Dkhar & Pareek, 2014).

Most multicellular organisms start as single cell zygotes, in order to come to their final form cells need to be in constant communication with one another. In plants, vascular lateral organs are formed on the flanks of the Shoot Apical Meristem (SAM). At the very tip of the meristem pluripotent stem cells slowly divide, pushing older daughter cells to the periphery of the meristem. It is at these edges that the basis for new organs are laid. The future function of a cell depends on its location in the meristem dome. The leaf primordium starts as a simple bulge at the edge of the SAM before growing outwards from the periphery of the SAM under directional activity of phytohormones. The leaf then acquires axes of dissymmetry: adaxial-abaxially, medial-laterally and proximal-distally (Braybrook & Kuhlemeier, 2010; Ha, Jun, & Fletcher, 2010; Kalve, De Vos, & Beemster, 2014; Moon & Hake, 2011; Tsukaya, 2013). The growth of the leaf primordium inhibits growth of a new primordium in the nearby SAM cells. Thus, new leaf primordia arise at locations relatively far from the previous one. Once the basic shape is achieved leaves further grow primarily via cell expansion (Barkoulas et al., 2007; Braybrook & Kuhlemeier, 2010; Dkhar & Pareek, 2014; Kalve et al., 2014; Tsukaya, 2013). Phyllotaxis determines the precise orientation of the leaf primordia, though various systems of phyllotaxis exist in nature they are all similar in terms of light capturing efficiency (Braybrook & Kuhlemeier, 2010; Ha et al., 2010).

Leaf initiation co-occurs with midvein initiation, the cells that will form the midvein grow from auxin maxima at the initiation site and gradually connect to the existing vasculature. The basic leaf is formed during two phases: primary and secondary morphogenesis. During primary morphogenesis the leaf form, lamina, midrib, petiole, leaf base and leaflets, lobes and serrations are all established (Bar & Ori, 2014). In this early leaf development there is, after the initiation of a primordium, establishment of dorsiventrality and development of a marginal meristem. Abaxial-adaxial polarity is established with the bulging of the primordia. After differentiation of apical and basal regions cell proliferation in the primordium accelerates, new cells are provided from the junction region to the tip for construction of the leaf blade or lamina and to the base for the petiole. The primordium grows into a real leaf through cytoplasmic growth, cell division, endoreduplication, cell expansion, transitioning between division and expansion and then final cell differentiation (Figure 3). The original pluripotent cells differentiate to form the vascular cells, stomata, epidermis, spongy- and palisade-parenchyma and trichomes, amongst others. Dorsiventrality is needed for flat outgrowth of the lamina. Lamina growth is sustained by active cell proliferation in the plate meristem along the ad-ab junction (Tsukaya, 2013). Secondary morphogenesis consists of the formation of the mature leaf shape, most of the expansion occurs in this phase. The differential growth may cause the leaf shape to change during this secondary phase (Bar & Ori, 2014). Exact stages of leaf initiation differ between plants with different types of leaves (e.g. mono- vs. dicotylous plants, simple or complex leaves) (Dkhar & Pareek, 2014; Tsukaya, 2013).

Page | 8

Figure 3: The various steps involved in leaf formation (in model plant A. thaliana), taken from Kalve et al. (2014). Leaf primordium cells lose stem cell identity, migrate to the side of the SAM, acquire polarity and then there is growth of the leaf through various steps. Lastly there is differentiation into the different tissues of the mature leaf. The development of the cell is highlighted by the red path the positive and negative regulation by phytohormones and is shown in blue.

Forming a leaf from a primordium requires a complex interplay of phytohormones and genes (Figure 3). Cytokinin (CK) promotes SAM growth and maintenance, Knotted1-like homeobox (KNOX1) is required to maintain undifferentiated cells in the SAM as it increases the CK biosynthesis. The initiation of a leaf occurs under auxin accumulation, as mentioned, and Auxin Repressed Protein (ARP) aids in this, acting as an antagonist of KNOX1 (Kalve et al., 2014; Moon & Hake, 2011; Tsukaya, 2013). There is a positive feedback loop between auxin and the Pinformed1 (PIN1) transporter protein (Braybrook & Kuhlemeier, 2010). The YABBY gene family is believed to be crucial in the ancestral development of leaves, as they are not found in branches, from which leaves evolved but are part of nearly all leaf formation stages in seed plants (Bar & Ori, 2014; Braybrook & Kuhlemeier, 2010). Obtaining the abaxial identity requires the KANADI (KAN) gene family and Auxin Response Factor (ARF). Whereas ARP and the HD-ZIPIII family are needed for adaxial identity (Braybrook & Kuhlemeier, 2010; Dkhar & Pareek, 2014; Moon & Hake, 2011). Auxin-independent leaf formation mechanisms are known to exist in other plants, sugars are therefore also theorised to be heavily involved in leaf formation in plants (Dkhar & Pareek, 2014). Gibberellic Acid (GA) degrades growth repressors and thus promotes leaf growth, acting as an antagonist to KNOX1, similar to ARP (Blein, Hasson, & Laufs, 2010).

Various genes have been proven to be involved in plant development and head formation in B. rapa. Amongst these are TCP (TEOSINTE BRANCHED1, cycloidea, and PCF transcription factors), NAC (NAM/ATAF1-2/CUC2 transcription factors), MYB (myeloblastosis domain transcription factors), zinc fingers, GRF (Growth regulating factors), ARF (Auxin Response Factors) and PIN genes (Wang et al., 2012). TCP4 has also in another study been shown to affect head shape in B. rapa; high expression gives a rounded shape and low expression a cylindrical shape (Mao et al., 2014). TCP is thought to by regulate the round shape of cabbage heads via differential cell division arrest in leaf regions. NACs are

Page | 9 a large transcription factor family and have been quite widely studied in Arabidopsis. NACs have various roles in floral and vegetative development and are involved in SAM formation (Olsen, Ernst, Leggio, & Skriver, 2005). CUC genes are part of the NAC transcription factor family and promote SAM formation through interaction with the SHOOT MERISTEMLESS (STM) gene as well as having an independent mechanism. This independent pathway is negatively affected by AS1 and 2 transcription factors (Hibara, Takada, & Tasaka, 2003).

The KNOX genes STM and various KNATs are, like the NAC transcription factors, both involved in meristem formation and maintenance (Scofield, Dewitte, & Murray, 2007). The WUSCHEL (WUS) gene is also important for maintaining stem cell identity in meristems. Mutations in WUS cause premature termination of shoot and floral meristems. WUS and STM are antagonists to CLAVATA (CLV) which is a gene for organ initiation (Schoof et al., 2000). These genes together balance the cell proliferation and organ initiation signals in plants (Mayer et al., 1998). Candidate Genes The current knowledge of leaf development has mainly been obtained in research on A. thaliana, but it can be applied to study leaf development in Brassicas as well. Though there have been some studies on the leaf traits in B. rapa there have been very few on B. oleracea itself (Xiao et al., 2014). Genes involved in the polarity of the leaf are of particular interest for oleracea breeding, as they are needed to alter the polarity of leaves in order to form a cabbage head, an important trait for breeders. Previous studies of subsets of the data that we will analyse here yielded possible genes involved in leaf shaping, based on literature from related crops and some with proven associations with certain traits in B. oleracea itself. Literature research was used in Fabian Topper’s and Floris Slob’s theses at the WUR PBR department to identify candidate genes and look for associations thereof in their GWAS. Twan Groot’s and Josephine van Eggelen’s theses expanded on this list of candidate genes via the results of their own GWA Studies. All these findings combined were used to form the list of the candidate genes below (Table 1) (Eggelen, 2017; Groot, 2016; Slob, 2016; Topper, 2016). The list from Topper and Slob included many entries from the list of potential genes involved in leaf morphology in B. rapa from Xiao et al. (2014). Further candidate genes identified by Feng Cheng et al. (2016) were added to the list as well.

Table 1: Candidate genes for leaf traits adapted from Xiao et al. (2014) with results from previous analyses on this dataset. These genes are mainly B. rapa homologs of genes found to be involved in leaf formation in Arabidopsis. They were defined based on colocalization with phenotype QTLS, expression QTLs and gene coregulatory networks. Gene Name Gene Name Abbreviation Abbreviation AGO10 Argonaute Protein LOB21 LOB domain-containing protein 21 AIL5 AINTEGUMENTA-LIKE5 MKK5 MITOGEN ACTIVATED protein kinase 5 ANT AP2-like ethylene-responsive transcription MYB81/104 Transcription factor MYB factor ANT AP1 Floral homeotic protein APETALA 1 NAC054/058/098 NAC transcription factor family. Protein CUP- SHAPED COTYLEDON APRR3 Two-component response regulator-like OFP16 Transcription repressor OFP16 APRR3 APUM5/19 PUMILIO homolog PAP1 Probable plastid-lipid-associated protein 1 ARF3/4/6 Auxin response factor PAR2 Transcription factor PAR2 AS1/2 Transcription factor AS PHB Prohibitin AVP1 Pyrophosphate-energized vacuolar PHV PHV membrane proton pump 1 AVT1 Vacuolar amino acid transporter 1 PI Floral homeotic protein PISTILLATA BOP1/2 Ribosome biogenesis protein BOP1 PIP5K3 Phosphatidylinositol 4-phosphate 5-kinase 3 BP Homeodomain Protein BP PME35 Probable pectinesterase/pectinesterase inhibitor 35

Page | 10

CDC48A Cell division control protein 48 homolog A PRL1 Protein pleiotropic regulatory locus 1 CHC1 CLATHRIN HEAVY CHAIN1 RDRP6 RNA dependent RNA polymerase 6 CLV3 CLAVATA3 REV Homeobox-leucine zipper protein REVOLUTA CLO 110 kDa U5 small nuclear ROT3 3-epi-6-deoxocathasterone 23-monooxygenase ribonucleoprotein component CLO COL5 Zinc finger CONSTANS-LIKE5 SG1 SLOW GREEN 1 CUC1/2/3 CUC SPL3/10 Squamosa promotor-binding-like protein CYCU2-1 CYCLIN-U2-1 SPR1/2 SPIRAL EXPB4/6 EXPANSIN-B6 B4 STM Homeobox protein SHOOT MERISTEMLESS GRF4/5 Growth-regulating factor TCP1/2/3/4/12 Transcription factor TCP GTE4 GLOBAL transcription factor group E4 TFL1 Terminal 1 HST1 HASTY 1 DREEB3 TINY2Dehydration-responsive element-binding protein 3 HUB2 E3 ubiquitin-protein ligase BRE1-like 2 TMK1/4 Transmembrane kinases. Indoleacetic acid induced protein 9 IAA9; interacts with ARFs IAA9/19 Auxin responsive IAA TOE2 AP2-like ethylene-responsive transcription factor TOE2 CLF INCURVATA2 TOUSLED2 Serine/threonine-protein kinase TOUSLED IRX9 Irregular xylem 9 UBP15 Ubiquitin carboxyl-terminal hydrolase 15 KAN1/2/3 Transcription regulator KANADI UCU1 ULTRACURVATA 1 KNAT2/3/4/5/6/7 Homeobox protein knotted-1-like WOX1/3 WUSCHEL-related homeobox LEP Ethylene-responsive transcription factor WSD1 Wax synthase diacylglycerolacyltransferase LEP LG1 Low green 1 YAB1/2/3/5 Axial regulator YABBY LNG1 LONGIFOLIA 1 ZFP3 Zinc finger protein

In order to discover if these candidate genes cause the phenotypic variation we observe in B. oleracea one needs markers that are located near the genes of interest. One important factor here is the degree of linkage in the B. oleracea genome. With a high degree of linkage one needs few genomic markers to find associations between regions of the genome and particular phenotypes. However, to get fine mapping a low LD is needed, to accurately find links between markers and specific genes. Previous studies in B. oleracea showed varying degrees of LD from an average LD of 36.8 Kb in a diverse B. oleracea collection to much higher LD values for the main morphotypes. Cabbages came in at an LD of 54.1 Kb, kohlrabi at 98.6 Kb, cauliflower 63.6 Kb and broccoli >100 Kb (Cheng et al., 2014) (Figure 4).

Page | 11

Figure 4: LD decay plots adapted from Cheng et al., 2014; Supplementary materials. The LD decay for B. rapa (A) and B. oleracea (B), oleracea had an LD of 36.8 Kb. The genetic components calculated by the tool STRUCTURE for the B. rapa (C) and the B. oleracea (D) accessions.

Page | 12

Research Questions The latest field trial with a diverse collection of B. oleracea, with various degrees of interrelatedness, had just been completed at the start of this thesis. This supplied a large set of phenotypic data to analyse, in addition to this new data there were the results of older GWAS to compare it with and there is a genomic dataset available. This thesis aimed to connect all these datasets, to find patterns between different harvests and thus support earlier discoveries as well as making new ones. The first goal was to find a suitable method of correcting for population structure in the B. oleracea collection that could be used for the current and subsequent GWAS. Using the new set of data on the leaf morphology in the B. oleracea accessions the current method of correcting for population structure was be tested by comparing and checking it with another structure correction. The second goal was get more insight into possible genes regulating leaf- and cabbage head morphology through a GWAS. There was genotypic data (Sequence Based Genotyping data from the 2015 TKI project) and phenotypic data available from 2017 harvest at the WUR that were applied in a GWAS. This phenotypic data consisted of a sizeable collection of leaf pictures, as well as data on cabbage head weights. A GWAS does not require previous knowledge about regions or genes, a large population with natural genotypic and phenotypic variation is sufficient input, which makes it a suitable method for the dataset we have. In case of different degrees of relatedness between individuals GWAS does require a correction factor to prevent false positives. In our analysis, certain traits are related to specific morphotypes and without a correction SNPs particular to those morphotypes may show association with a trait even though they have no causal relationship. Thus, the population structure correction from the first goal is required for a successful analysis.

To achieve these goals several research questions were formulated:

- How to best correct for population structure in B. oleracea? - What are regions of interest on the B. oleracea genome for leaf related and heading related traits? - Are leaf formation genes, previously highlighted as candidate genes, associated with particular observed phenotypes in leaves or cabbage heads? - Can we identify new candidate genes involved in leaf formation or development? - Can we establish genetic links between leaf morphology and heading traits in cabbage?

Page | 13

Research Methodology The B. oleracea Collection The Plant Breeding Department of Wageningen UR has been working with breeding companies on a project to ‘elucidate the genome sequence and evolutionary relationships between 1000 B. oleracea genotypes representing all morphotypes. To achieve this about 1000 different B. oleracea genotypes consisting of modern hybrid , gene bank and wild material (from here onwards referred to as ‘accessions’) were genotyped to reveal the genetic diversity and also phenotyped during several field trials (Figures 5 & 6).

58 50 26 Brusselssprouts 4 93 28 Broccoli

51 Cauliflower

Chinese Kale 36 Collard Greens

Heading

Kale 229 Kohlrabi

Ornamentals

Off Types

Tronchuda 335 7 22 Wild B. oleracea

Figure 5: The different morphotypes of B. oleracea used in the genotyping assay. The largest groups consist of , cauliflowers and the heading cabbages. Nearly all these accessions were also planted and phenotyped in the 2017 field experiment, for an exact overview of that set see Linkage Disequilibrium Linkage Disequilibrium (LD) tests was also performed on the SNP data to check how suitable the genotypic data is for the GWAS analysis. After trying various options the file format of the SBG data turned out to be unsuitable to use in every tested programme except TASSEL, so this was used to do the LD study (Bradbury et al., 2007). The LD was tested per chromosome for each of the population structure groups with a chi-square test. The dataset of all 85,532 SNPs was used, separated into separate chromosomes. A full LD matrix calculated and heterozygous calls set to missing. The output was analysed in LD matrix plots and the R2 values exported and plotted into LD decay graphs in Excel. Field Trial After two years of trials with about half of the full dataset of 913 accessions in 2017 a large field trial was set up in which 842 different B. oleracea accessions were planted out and phenotyped (Table 2). Previous field trials in 2015 and 2016 used smaller subset of the total collection, 465 and 471 accessions respectively. In each trial all accessions were planted out in two blocks in the field, with five replicates of each accession in each block. Of each of these sets of five always the three most

Page | 14 similar plants were phenotyped. Accessions were randomised by morphotype in each block. In 2017 the accessions were sown in April in the greenhouse and transplanted to the field after four weeks.

Table 2.

10 45 Red

Savoy 50

Unknown/wild

181 White 31

Pointed

Figure 6: Overview of the different heading cabbage morphotypes used in the TKI genotyping experiment.

Genotypic Data Sequence Based Genotyping (SBG) For the genotyping, between 50-100 seedlings were harvested of the modern hybrids. The accessions from gene-banks were highly heterogeneous, so one representative plant of each gene-bank accessions was harvested for genotyping. The genotypic information for this study was collected through the SBG method of the company Keygene (KeyGene, 2017).

In the SBG the fragments were cut with PstI and MseI restriction enzymes. Fragments were amplified with primers, where the PstI primer had two selective nucleotides added. PstI is sensitive to methylation. This methylation sensitivity in combination with a low starting amount of DNA means that it is likely that certain sections of the genome were now not sequenced, mainly heterochromatic regions and centromeres. The sequenced fragments were clustered based on homology with a maximum of three SNPs permitted per cluster, when there was a fourth SNP between fragments a new cluster was started. These clusters were aligned to the reference genome TO1000 (Parkin et al., 2014). The final output was an extremely large number of Single Nucleotide Polymorphisms (SNPs) but in many cases with low coverage, thus for the main analyses the data was filtered. Initial filtering took it from more than 200,000 SNPs to a dataset of 85,532 SNPs. This was then further limited by selecting only SNPs that occurred in at least 80% of all accessions (85,168 SNPs) and then to the ones which had a minor allele frequency >2.5% (18,580 SNPs). Due to limited computational power the population structure corrections were done on another subset of these 18,580 SNPs: a set of 1,376 SNPs that are at least 250 Kb apart and spread equally across the genome. The SNPs of the 18,580 SNPs dataset were used for the GWAS, they cover nearly the entire genome quite neatly but are not spread evenly, so some SNPs will be clustered together quite closely while others have few or no neighbouring SNPs (See digital Appendix 6 Slob (2016)).

Linkage Disequilibrium Linkage Disequilibrium (LD) tests was also performed on the SNP data to check how suitable the genotypic data is for the GWAS analysis. After trying various options the file format of the SBG data turned out to be unsuitable to use in every tested programme except TASSEL, so this was used to do the LD study (Bradbury et al., 2007). The LD was tested per chromosome for each of the population structure groups with a chi-square test. The dataset of all 85,532 SNPs was used, separated into

Page | 15 separate chromosomes. A full LD matrix calculated and heterozygous calls set to missing. The output was analysed in LD matrix plots and the R2 values exported and plotted into LD decay graphs in Excel. Field Trial After two years of trials with about half of the full dataset of 913 accessions in 2017 a large field trial was set up in which 842 different B. oleracea accessions were planted out and phenotyped (Table 2). Previous field trials in 2015 and 2016 used smaller subset of the total collection, 465 and 471 accessions respectively. In each trial all accessions were planted out in two blocks in the field, with five replicates of each accession in each block. Of each of these sets of five always the three most similar plants were phenotyped. Accessions were randomised by morphotype in each block. In 2017 the accessions were sown in April in the greenhouse and transplanted to the field after four weeks.

Table 2: The numbers of accessions belonging to each B. oleracea morphotype that were planted and analysed in the 2017 WUR field trials. In total 842 were planted and phenotyped.

Morphotype Number of accessions Brussels sprouts 45 Broccoli 86 Cauliflower 219 Chinese Kale 7 Collard Green 22 Heading 310 Kale 35 Kohlrabi 46 Ornamentals 14 Off-types 4 Tronchuda 22 Wild Species 32

Phenotyping Traits Various traits were scored for the different accessions. For all plants the leaf blistering was scored visually in the field. The other traits were scored when they had reached maturity (Table 3). Of all the accessions the largest leaf at the mature stage was harvested and photographed on the field in a light- proof box. For heading cabbages the head weight was also scored and the third leaf around the cabbage was photographed along with the largest mature leaf. The cabbages themselves were halved after weighing and then also photographed. For the leaves different measurements were taken by analysing the pictures: the length and width of the leaf lamina, the lamina area, the mean width of the leaf and the petiole length and area (Table 4). The lamina and petiole lengths and areas were also added up for each leaf to get a total area and total leaf length.

Page | 16

Table 3: Harvest dates per morphotype (from van Eggelen, 2017). Leaf blistering was scored visually, in the field. Other traits were measured by harvesting the plants at maturity and taking pictures for later analysis.

Morphotypes Block Date Days after sowing Sowing All A&B 11-04-2017 0 Transplanting All A&B 11&12-05-2017 30 Blistering All A 31-5-2017 50 All B 1-6-2017 51 Photos Wild species A&B 4-7-2017 84 Kohlrabi B 4-7-2017 84 Kohlrabi A 5-7-2017 85 Ornamentals A&B 5-7-2017 85 Brussel Sprouts A&B 5-7-2017 85 Broccoli A&B 6-7-2017 86 Collard green A&B 10-7-2017 90 Tronchuda A&B 10-7-2017 90 Heading A 11-7-2017 91 Heading B 13-7-2017 93 Heading B 14-7-2017 94 Heading B 17-7-2017 97 Kale/Chinese kale A&B 18-8-2017 98 Heading A&B 21-7-2017 101 Cauliflower A 24-7-2017 104 Cauliflower A&B 25-7-2017 105 Heading A&B 26-7-2017 106 Heading A&B 27-7-2017 107 Cauliflower A 28-7-2017 108

Table 4: Measured traits used for the GWAS in this analysis, from the 2017 leaf pictures, head weights and 2016 data core length measurements.

Dataset Trait Unit Mature Leaves Full dataset, 2017 Leaf lamina area mm2 data Leaf lamina length mm Leaf Lamina width mm Mean Leaf width mm Leaf petiole area mm2 Leaf petiole length mm Total leaf length mm Total leaf area mm2 Cabbage sub-set, 2017 data All mature leaves traits for the mm/mm2 cabbage accessions Cabbage leaf lamina area mm2 Cabbage leaf lamina length mm Cabbage leaf Lamina width mm Mean Cabbage Leaf width mm Cabbage leaf petiole area mm2 Cabbage leaf petiole length mm Total Cabbage leaf length mm Total Cabbage leaf area mm2 Head weight grams 3D data Cabbage sub-set, 2016 Core length mm

Data Analysis

Page | 17

Due to the scale of the current experiment, a Halcon script was written by Toon Tielen from the computer vision department of Wageningen UR to analyse the leaf images. The extreme morphological diversity of the dataset meant the script required several iterations before working satisfactorily. Especially the recognition of the border between petiole and lamina was an issue. The input for the Halcon analysis were the pictures of the three leaves per accession per block, so a total of six images per accession were analysed. The script read the barcode sticker that was placed next to each leaf to label the output with the correct accession number. The output file included measurements of the leaf traits (See ‘Traits’). The heading accessions included additional data, for in addition to the largest leaf the third covering leaf around the cabbage was also photographed. This was done by placing it alongside the largest leaf of the plant in one photo. These pictures with two leaves required a slight variation on the original script to ensure proper analysis. Lastly, the cabbage heads were halved and also photographed. Unfortunately, there is currently no script to analyse these pictures.

The three repeats per accession per block were checked to see if they had similar outputs and several leaves per harvest day were analysed with ImageJ. The ImageJ measurements were compared to the Halcon output and had less than 5% difference in 93% of cases. The latest script performs well but there were five or six rare cases where it still left out the veins of the leaf or took the shadow of the leaf on the background as part of the leaf, and in one or two cases took the QR code label as a leaf. For further analysis we added in TKI numbers instead of the original field codes, as well as the morphotype of each accession. Some leaves were too large for the box and had to be photographed with their petiole broken off and placed alongside, measurements for these were summed to get the true leaf dimensions. All morphotypes were corrected to their observed phenotype and we removed the incorrectly sowed genotypes from the data. All in all, except for some outliers due to the extreme diversity of the dataset, the Halcon analysis gives accurate results.

In order to get even more accurate measurements and validate the accuracy of the current 2D analysis of the leaves and cabbage heads 3D images were taken in 2016 of a subset of the total collection. This year a student from Farm Technology, Kiran Jayaraj1 worked on developing scripts to analyse several traits from the 2016 data. These 3D traits have not yet been analysed, however, Kiran also developed a script for measuring core length of the cabbages using 2D image with textures to get an accurate measurement. The core length measurements were finished in time to include in the analysis here.

To analyse possible block effects a linear model, univariate ANOVA was conducted using the traits as the dependent variable and block and the morphotypes as a fixed factor with IBM SPSS 24 software. Phenotypic means per accession were obtained by averaging the measurements over the blocks. A Pearson correlation test was done on the phenotypic means to find correlations between traits. To look at significant trait differences between the different morphotypes we conducted a (pairwise) Fisher’s protected LSD analysis. Boxplots of the different traits per morphotype were used to further study the variation between morphotypes. Population Structure In the TKI project 936 unique accessions were genotyped, these can be divided over different morphotypes and in the case of heading species even in sub-groups (Figure 5 & Figure 6). These morphotype groups have often been geographically and genetically isolated from each other by human-guided selection and breeding. This means that in order to do a successful GWAS on such a population one first needs to correct for the presence of these different degrees of relatedness to

1 Contact Dr. ir. Gerrit Polder, Computer Vision & Plant Phenotyping, Wageningen University & Research for more information.

Page | 18 reduce false positives. Previous projects on this B. oleracea population already made a correction by studying population structure, but we wished to look into the effectivity of the existing population structure corrections.

The first step was to use the existing population structure corrections, studied using the STRUCTURE and Structure-Harvester programs (Figure 7 & 8), and check the quality of the devised structure corrections by comparing them to Principal Coordinate analyses (PCOs, also known as PCoA) in the DARwin program (Carpio et al., 2011; Earl & vonHoldt, 2012; Evanno, Regnaut, & Goudet, 2005; Hamon, Seguin, Perrier, & Glaszmann, 2003; Pritchard, Stephens, & Donnelly, 2000). The STRUCTURE and PCO corrections were analysed based on the 1376 SNP dataset. We used PCOs calculated for all accessions and for the heading subset, with 10, 20 and 30 different axes. To calculate the PCO in DARwin we had to change the genotype file from a VCF into an allelic format. In the allelic format there is a separate entry for every marker’s two alleles per accession. For the PCO the 1376 SNPs (occurring about every 250kb and with fewer missing data than the full SNP set) were used. Missing alleles were marked with 0, reference alleles as 1 and alternative alleles as 2. DARwin does not recognise missing alleles. An identifier file was made with the accession numbers and other identifying markers such as K groups, morphotypes and sub morphotypes. A Euclidean dissimilarity matrix with metric d(i.j)≤d(i,k)+d(j.k) was calculated on the allelic composition of the accessions. Finally, a PCO factorial analysis was done, using the dissimilarities between accessions as calculated in the previous step. The PCO factorial analysis uses the distances from the dissimilarity matrix to form a coordinate matrix. Each axis of the coordinate matrix tries to approximate the distances between the accessions in the original matrix as best as possible via the coordinate of the accession on the axis. The axis coordinates could then be used as the population structure correction factor.

The output of the PCO was compared to that of STRUCTURE to see if the devised population structure corrections matched. Furthermore, a separate PCO structure correction for the heading cabbages was made to look into known the sub-population structure in this morphotype group (Figure 8). This could then be used in analysis of both the mature leaf traits and the extra data that was collected on the cabbage heads (weights and pictures of cabbage leaves) the outcome of which could be put alongside the analysis of the whole leaf dataset. In the further analysis we compared the output of the GWAS using the PCO axes and using the STRUCTURE groups to see which method was most effective in reducing the amount of positive results.

Figure 7: The STRUCTURE grouping, as calculated in previous theses (Eggelen, 2017). Group 1 is an admixed group, group 2 consists mainly of heading varieties, group 3 of cauliflowers, group 4 of brussels sprouts and C9 species and group 5 of various broccoli accessions.

Page | 19

Figure 8: Heading cabbages sub-population structure as calculated by STRUCTURE by Xuan Xu (unpublished). Group 1&2 are white cabbages, 3 is also white cabbages, mainly gene bank accessions, group 4 features red cabbages, 5 savoy cabbages, 6 white cabbage gene bank accessions and group 7 mainly hybrid white cabbages. Pointed cabbages are spread across the various groups.

GWAS Using the population structure corrections the leaf and heading cabbage datasets were analysed in a GWAS. The association analysis was performed with TASSEL software (Bradbury et al., 2007). TASSEL uses a fixed effects General Linear Model (GLM) to test for association between genetic markers and phenotypes. The input files for TASSEL were the genotypic SBG dataset with 18,580 SNPs and the phenotypic data from the Halcon analysis (the complete leaf dataset or the cabbage subset). The phenotypic data contains the traits shown in Table 4. The analysis was run three times on both the cabbage and full leaf datasets, once without a population structure correction and once with respectively the PCO or the STRUCTURE output corrections as a covariate. The settings of the GLM included a maximum p value of 1 and 999 permutations. The results were analysed in tables with the P-values and LOD scores as well as Manhattan plots of the LOD scores per marker, per trait.

To study the symmetry of the distribution of the traits histograms of the traits were made as GWAS requires a symmetrical or even normal distribution for trait values. To study the distribution of traits further a one-way ANOVA on the trait values per accession was conducted, using IBM SPSS 24 software and the residuals calculated. Using these residuals quantile-quantile (QQ) plots were made, transformation of the data was not needed for GWAS.

PP-plots were made in the method of Yu et al. (2006) of P-values from analyses to check if the PCO and STRUCTURE population structure corrections are effective. The PP-plots were made using the P- values from the GWAS against the expected distribution under the null hypothesis of these values for each trait. Under the null hypothesis the P-values are expected to follow a uniform distribution. A larger than expected number of low P-value associations can be due to insufficient correction for population structure. The PP plots show whether there is an excess of false positives in the GWAS, which may be caused by the failure to sufficiently correct for population structure.

To correct for multiple testing in the GWAS we used the Bonferroni method based on the number of independent markers (Li & Ji, 2005). To partially compensate the difference in effect from remaining population structure the thresholds were set at: Alpha=0.01 genome-wide significance threshold corresponding to a test-wise threshold of 5.0 for the whole data set.

Alpha=0.05 genome -wide significance threshold corresponding to a test-wise threshold of 3.8 for the cabbage subset. The cabbage structure corrections performed better resulting in low LOD scores, where a threshold of α=0.01 would be too strict.

To further select the most interesting markers for each trait we stated following criteria: Markers should have a LOD score above the calculated threshold. Neighbouring markers should have high LOD scores. Markers must stay the same or increase in LOD score when the GWAS was performed with a

Page | 20 population structure over the run without to ensure the association is unlikely to be affected by population structure. Markers also had to be both significant in the PCO and the STRUCTURE run, as if a marker is significant for that trait in both analyses it is very likely it has an actual effect on the trait. The markers should have a clear effect on phenotypes, with a change in allele corresponding to a different phenotype than the reference allele. This means that for the different alleles on that marker there should be a change in the phenotype, with a possible intermediate phenotype visible in the heterozygous accessions in case of co-dominance. This effect had to be clearly visible in at least one morphotype group, with additional, but perhaps less clear interactions visible in one or more of the other morphotype groups. In addition, we used a reverse approach to check whether there were associations with known Genes of Interest (GOI), through comparing locations of our previously defined candidate genes with our list of significant markers.

The region around the selected markers was taken, 50kb to either side, which is corresponds to the LD distance identified by various studies in Brassica. These regions of interest were investigated using a genome browser to find genes in that region that could explain the association with a certain phenotype (Yu et al., 2013). These genes were then entered in the Brassica genome browser to find their functions and annotations from TREMBL and SwissProt (Cheng, 2011).

Page | 21

Results LD Analysis of the linkage disequilibrium in our SNP dataset was done to check the quality of this genomic data for use in the GWAS. In the centromere locations of the B. oleracea chromosomes you would expect a higher distance of LD decay, as these regions are often heterochromatic (Table 5). What is seen however is that not all chromosomes have a block of high linkage, and if it is there it does not match with the known centromeric regions. In addition, in this data collection there are regions which are far apart and yet show linkage (Figure 9).

Table 5: Locations of the B. oleracea genome centromeres. Adapted from: Mason et al. (2016).

Chromosome Start End (Mbp) Size of centromere-containing Chromosome Length (Mbp) (Mbp) region (Mbp) C1 17.9 24.2 6.2 38.3 C2 31.8 32.2 0.3 46.1 C3 40.6 41.9 1.3 60.6 C4 17.1 19.4 2.3 48.9 C5 17.6 28.3 10.7 42.6 C6 8.0 8.4 0.5 37.2 C7 5.4 7.2 1.8 44.5 C8 5.8 6.4 0.6 38.3 C9 23.1 23.4 0.3 48.4

1 2 3

4 5 6

Markers

7 8 9

Markers Figure 9: LD triangle plots from TASSEL analysis. Each plot represents one chromosome, the axes are the different markers. The colour gradient shows the degree of linkage, the upper triangles show the R2, the lower triangles the P-value. P-values were determined by a two-sided Fisher's Exact test.

Page | 22

By calculating the LD between different markers, we made decay plots showing the distance between markers and the degree of LD, when this flattens off a state of linkage equilibrium is reached and there is no more linkage. In this collection the LD decay plots per chromosome revealed high LD values over long distances, which was not expected. Secondly, many of these unexpected values have the same R2 value, forming horizontal LD lines in the plot (Figure 10). In addition, we calculated the LD for regions of between 500 and 800 SNPs that overlapped between different chromosomes, which showed unexpected linkage between chromosomes (Figure 11).

We calculated the LD distance by taking quantiles of the data and seeing what the R2 value was for each of these. At the third quantile (75%) the LD value is still between 0.01-0.02 on average, with a single outlier of R2=0.044. We chose a minimum value of R2=0.1 for LD. Based on this we found an LD distance of approximately 150 kb, after which markers are considered to be in linkage equilibrium. This is longer than expected based on previous studies of LD in B. oleracea.

R2

Distance (bp) Figure 10: LD over all of chromosome 1, calculated with all accessions and the full dataset of 85K SNPs. Though LD decays and R2 drops below 0.1 within 200Kb, this distance is rather long and there are many points of LD over long distances, even over different arms of the chromosome.

Page | 23

R2

Distance (bp) Figure 11: LD decay plot of the end chromosome 1 and the start of chromosome 2. The plot shows lots of high R2 values and points of LD between markers from the same chromosome, but also over large distances, with LD occurring between markers from the different chromosomes, which was not expected. Most values are below 0.1 in fact, so this was used as a cut-off value for the other analyses, there being no true LD possible beyond this point.

When separating the gene bank accessions and modern hybrids and looking at their individual LD decay plots, we saw that the lines of LD over long distances, with the same R2, that we saw in the full chromosome plots were nearly all exclusive to the gene bank material (Figure 12). We theorised this may be due to certain accessions having alleles that do not occur in the rest of the dataset. If such rare alleles occur in only a few accessions they would relate to each other, which could be shown in TASSEL as points of LD. By studying the allelic variation for the markers that cause these lines of LD we could see that several accessions repeatedly had rare alleles for these markers, across all nine chromosomes. When removing these accessions and recalculating LD there are less clear lines present (Figure 13). The accessions belonged mainly to the wild type and ornamental morphotypes.

R2

Distance (bp) Figure 12: Shows the LD of chromosome 1 of the full SNP dataset of 85 thousand SNPs. With the modern hybrid material (left) separated from the gene bank accessions (right). Interesting to see is the lines of LD over large distances occurs only in the gene bank material.

Page | 24

R2

Distance (bp) Figure 13: LD over chromosome 1 with accessions with accessions rich in rare alleles removed. The lines of LD over long distances with the same R2 are much reduced but can still be discerned. These lines of LD are likely caused by rare alleles that are related to each other and occur mainly in a select number of accessions.

Phenotyping The phenotypic data for the GWAS consists of the leaf images, which were analysed using Halcon scripts. 33 entries were missing due to an unsuccessful barcode recognition. There were 94 instances where the leaf was not found in the picture by the algorithm. These missing values include mostly heading cabbages. 131 very small values were removed, which should not occur in the data. The data from 04-07-2017 is excluded from the small values removal as on this day the wild species were photographed, and these do have such small leaves. Most small values occurred in the heading cabbages, the algorithm often had trouble identifying the correct outline for the second, smaller leaf in these photos.

The Pearson correlation test showed correlations between many traits, as expected, most leaf traits will be linked to each other (Appendices). To analyse possible block effects a one-way ANOVA was conducted using the traits as the dependent variable and the blocks and the morphotypes as fixed factors. No block-effects were present so trait values could be averaged per accession for the GWAS. To look at significant trait differences between the different morphotypes Fisher’s protected LSD was conducted, visualised further through boxplots of the different traits per morphotype (Figure 14). There were few significant differences found between morphotypes and then only between morphotypes of which there were very few accessions such as Chinese kales, OFF-type plants and ornamentals. Chinese kales and OFF types showed significant differences for petiole length, leaf lamina length, leaf lamina area and leaf lamina width. Ornamentals were significantly different to OFF types as well for the traits leaf lamina area and leaf lamina width.

Page | 25

Figure 14: Boxplot showing the variation per morphotype for the trait Leaf Width for the whole dataset. There are some outlier- morphotypes but most morphotypes showing clear overlap for this trait.

Population Structure A total of 30 different axes were calculated for the full dataset PCO as well as for the cabbage subset PCO. The full dataset PCO shows a quick decline in the percentage of the variation each axis explains, with the cabbage PCO showing a steadier downward trend, and explaining relatively more of the allelic variation in total (Table 5).

Table 6: Full dataset PCO explains a total of 52.75% of the allelic variation, the cabbage subset PCO performs slightly better with 58.4% of the variation.

Full Dataset Cabbages Axis Eigenvalue Inertia% Eigenvalue Inertia% 1 0.01077 13.3 0.00602 12.12 2 0.00912 11.26 0.00359 7.23 3 0.00343 4.23 0.0023 4.64 4 0.00277 3.42 0.00203 4.09 5 0.00215 2.65 0.0016 3.22 6 0.00155 1.91 0.00142 2.87 7 0.00138 1.7 0.00113 2.28 8 0.00103 1.27 0.00089 1.79 9 0.00091 1.13 0.00081 1.64 10 0.00084 1.04 0.00065 1.31 11 0.00075 0.93 0.00058 1.16 12 0.0007 0.86 0.00057 1.14 13 0.00067 0.82 0.00055 1.11 14 0.00063 0.77 0.00052 1.05 15 0.00055 0.68 0.0005 1 16 0.00051 0.63 0.00048 0.98 17 0.0005 0.61 0.00047 0.94 18 0.00045 0.55 0.00044 0.89 19 0.00042 0.52 0.00042 0.85 20 0.0004 0.49 0.00042 0.84 21 0.00039 0.48 0.00041 0.82 22 0.00036 0.44 0.00039 0.79 23 0.00036 0.44 0.00038 0.77 24 0.00034 0.42 0.00037 0.75 25 0.00033 0.4 0.00037 0.74

Page | 26

26 0.00031 0.38 0.00035 0.71 27 0.0003 0.37 0.00035 0.7 28 0.00029 0.36 0.00033 0.67 29 0.00029 0.35 0.00032 0.65 30 0.00028 0.34 0.00032 0.65

By plotting the first two dimensions of the complete dataset PCO and colouring based on STRUCTURE K-groups you see that the two methods have very similar grouping (Figure 15). A similar analysis with colouring accessions based on whether they are gene bank accessions or modern hybrids shows that there is less variation within the hybrids compared to the gene bank material. When plotting the PCO axes of the cabbage accessions you can see cabbages from north-western Europe group together and varieties from the Mediterranean and middle east together as well. Red cabbages also form a separate group, while Savoy only forms a separate group in some dimensions (Figure 16). Many white cabbages lack information on their actual use, but the industry and storage varieties appear to be different from the rest and in cases closer to many red cabbages. The accessions of which we know that they are for fresh use also form a distinct group from the rest of the white cabbages. The rare ‘ye gan la’ and Jersey cabbages have clearly different genotypic characteristics.

Figure 15: PCO of full dataset. Shown are the first two axes (X=1, Y=2) which together explain 25% of the variation (13.3% and 11.26% respectively), about half the total correction that is made by the 30 axes. Accession are labelled with their morphotype and coloured by the group that they are assigned in the original STRUCTURE correction. These STRUCTURE groups can also be discerned in this PCO analysis, with clear cauliflower (K3, red) and broccoli (K5, pink) groups, separated from the cabbage (K2, yellow-brown) and sprout (K4, blue) groups by the more admixed STRUCTURE group (K1, green).

Page | 27

Figure 16: Two examples of the cabbage PCO. The graph on the top shows axes 2 and 4 (7.2% and 4.1%) with a colouring and labelling by sub-morphotype. One can see a clear group of red cabbages (red) while the pointed (yellow) and savoy (blue) cabbages are mixed with the white cabbage collection (green). On the bottom are the axes 2 and 6 plotted together (7.2% and 2.9%) with labels and colours per applications (if known) or sub-morphotypes. Here the fresh cabbages (light blue) group together, the savoy accessions (dark blue) filter out along with several rare morphotypes (brown colours), and red cabbages (red) group closely to the white cabbages developed for storage and industrial applications (dark green).

Page | 28

GWAS: TASSEL Analysis Using the phenotype data, genotype data and population structure as input in TASSEL the GWAS itself was conducted. Associations were calculated both without any population structure correction and using PCO or STRUCTURE respectively and for both the complete (leaf) dataset and the cabbage subset with the additional data on the cabbage leaves, core lengths and weights.

To check whether the phenotypic data was normally distributed we first calculated the residuals of the traits using a one-way ANOVA (univariate, linear model) with the trait data per accession. These residuals were plotted in QQ-plots. All traits were normally distributed apart from Petiole Length in Cabbage Leaves, which likely deviated due to errors in the measurements of this trait (Figure 18). This reflected what we saw in the basic histograms per traits, where the petiole data showed a combination of very low and high values, due to the differences in petiole size between morphotypes (Figure 17). The data was not transformed for GWAS.

Figure 17: Histograms showing the distribution of the traits Leaf length and Petiole (stem) area. These traits are for the complete data on the largest leaves. The histograms show the two main divisions seen in the analysis, either a clear bell curve, mostly seen in the leaf lamina traits, or two extreme sets of values, mainly seen in the petiole length and petiole area traits, where some morphotypes exhibit far larger values than the mean and errors in the Halcon measurements had more effect.

Page | 29

Figure 18: To test normality we made QQ-plots of the residuals of all traits, four shown here. The top two plots are from the whole dataset and the lower two from the cabbage subset. Nearly all traits showed a normal distribution (as shown for the whole leaf dataset traits Leaf lamina area, top left, Leaf lamina length, top right and Cabbage leaves width, bottom left). The only exception being the Petiole Area trait for the Cabbage leaves (bottom right), the data was not transformed for GWAS.

In the PP-plots of the P-values from GWAS we see how well the PCO structure correction corrects for the relatedness between accessions compared to the STRUCTURE correction and when not using any correction (Yu et al., 2006). In these the P-values from the GWAS were plotted against the expected distribution of P-values (Figure 19). The closer the observed values are to the diagonal the better the correction is. Using all 30 axes of the PCOs gave a significantly better correction than with only the first 10 or 20 axes.

Page | 30

A B No Correction PCO STRUCTURE P-expected

C D No Correction PCO P-expected STRUCTURE

Figure 19: PP plots from the whole dataset (A: leaf lamina length, B: leaf lamina area) and the cabbage subset (C: Largest leaf lamina area, D: cabbage leaf lamina area). The correction by PCO performed better than STRUCTURE though the actual effectiveness varies per trait. The cabbage subset performs better than the full data PCO but there is less population structure to begin with, as the No Structure line is much closer to the diagonal as well.

The 30 axes PCO corrects for the population structure equally well or better than STRUCTURE for both the leaf dataset and the cabbage subset, so the analysis of GWAS results focusses on the output generated using the PCO correction. For some traits, mainly petiole length and leaf lamina length, there is still quite a gap remaining between the observed P-values from the 30 axes PCO and the expected values, so there likely still is population structure that is not corrected for. The Cabbage PP-plots cabbages are closer to diagonal, the correction here works better. This is because the cabbage sub-set is more homogeneous than the full dataset of all the different morphotypes and accessions, as well as being less structured than the full dataset. The lines of the P-values without any correction are always furthest from the diagonal. Traits with the highest numbers of low P-value markers in the GWAS were the traits where the correction for population structure was less effective, as can be seen from the PP-plot of Leaf lamina length and the TASSEL-generated Manhattan plot for this trait below (Figure 20).

Figures 20, 21 and 22 show a selection of the Manhattan plots made from the GWAS results, the remaining can be found in the Appendices. There are very high LOD scores, with many markers are above the significance threshold (Figure 20). Traits for which the population structure correction performed (Figure 19) better showed lower LOD scores, though still many are significant (Figure 21). The LOD scores for the cabbage traits are much lower than those of the traits from the complete dataset (Figure 22).

Page | 31

Figure 20: Manhattan plot for the trait Leaf Lamina Length, analysed with PCO, STRUCTURE and without structure corrections. These plots are based on the complete leaf dataset. Thresholds are at LOD=5.0, with alpha at 0.01.

Figure 21: Manhattan plot for the trait Leaf Lamina Area, analysed with PCO, STRUCTURE and no structure corrections. These plots are based on the complete leaf dataset. Thresholds are at LOD=5.0, with alpha at 0.01.

Figure 22: Manhattan plot for the trait Leaf Lamina Area for the largest leaves, analysed with PCO. This plot is based on the cabbage dataset. Thresholds are at LOD=3.8, with alpha at 0.05.

The criteria we initially selected, being: high LOD scores for the marker in the PCO analysis, high LOD scores for neighbouring markers, significant LOD score in the STRUCTURE corrected GWAS and a

Page | 32 similar or higher score in the PCO corrected GWAS over the uncorrected analysis, gave 48 markers that were very promising, out of the original 4053 significant markers. Using the reverse approach to check known GOI locations with our list of significant markers, another 31 markers were selected to be analysed. However, when taking the regions around these markers and studying the nearby genes these 79 markers already gave more than 800 genes to analyse. This was too many to all manually research so a final selection was made based on the effect of the markers (Figure 23). The markers should have a clear effect on phenotypes, visible in at least one morphotype group.

Figure 23: Example of an effect plot for the trait Total Leaf Length and Marker 16946057 plotted per morphotype, with a broccoli, cauliflower, heading and rest-group. The broccoli is highlighted, here the effect is most clear, the broccolis with AA alleles have longer leaves than those with CC alleles, with allele M (both an A and a C allele) broccoli’s showing an intermediate phenotype. Similar interactions can be seen in the Heading and Rest (blue) groups. The effect plot selection limited the list of genes to 200. All these 200 genes were entered in the Brassica genome browser to find their annotations. Looking at the functions of the genes we filtered out all those genes whose function could potentially explain their association to the measured morphotype traits and thus be highlighted as new genes of interest (Table 6). 26 GOI on were found in the regions surrounding 16 markers, of the 26 GOI 7 were identified from the reverse approach.

These new genes feature several genes of similar families and functions as known GOI such as MYB transcription factors and UBPs (Ubiquitin carboxyl-terminal hydrolases) but many are new genes that interact with better known leaf development genes (Table 7). Fantastic Four 2 Regulates the size of the shoot meristem via WUS feedback, whereas ULT2 Negatively regulates the size of the WUS- expressing centre in meristems. TCP4 plays a pivotal role in the control of morphogenesis of shoot organs and RH10 is involved in leaf polarity establishment by repressing abaxial ARF/KAN/YAB/KNOX/KNAT genes.

Page | 33

morphology, the others are others new the morphology, related genes. discoveriesand genes Table

SC05.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C05_720711 SC05.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C05_720711 SC05.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C05_720711 SC06.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C06_21783228 SC06.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C06_21783228 SC05.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C05_6982361 SC05.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C05_6982361 SC05.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C05_6982361 SC05.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C05_6982361 SC09.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C09_10441845 SC09.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C09_10441845 SC01.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C01_1271311 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_12799964 SC04.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C04_11150606 SC06.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C06_3219896 SC03.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C03_16946057 SC09.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C09_5283910 SC09.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C09_27947404 SC09.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C09_27947404 SC01.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C01_35748674 SC01.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C01_35748674 SC01.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C01_19361976 SC04.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C04_9822547 SC00.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C00_18068605 SC00.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C00_18068605 SC00.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C00_72020898 Marker

7

ee nw gns f neet ta wr peiul ietfe a big eae t leaf to related being as identified previously were that interest, of genes known were

:

Genes of interest 2018. 26 GOI were found in the regions surrounding 16 markers, 7 of these of 7 markers, 16 surrounding regions the in found were GOI 26 2018. interest of Genes MeanWidthLeaf MeanWidthLeaf MeanWidthLeaf AreaStem AreaStem LengthStemLargest LengthStemLargest LengthStemLargest LengthStemLargest LengthStemLargest LengthStemLargest WidthLeaf AreaPetioleLargest AreaLeaf CoreLength TotalLength LengthLeaf LengthStemCabbage LengthStemCabbage CoreLength CoreLength LengthLeaf LengthLeaf LengthStem LengthStem TotalLength Trait

C05: 670711..770711 C05: 670711..770711 C05: 670711..770711 C06: 21733228..21833228 C06: 21733228..21833228 C05: 6932361..7032361 C05: 6932361..7032361 C05: 6932361..7032361 C05: 6932361..7032361 C09: 10391845..10491845 C09: 10391845..10491845 C01: 1221311..1321311 C02: 12749964..12849964 C04: 11100606..11200606 C06: 3169896..3269896 C03: 16896057..16996057 C09: 5233910..5333910 C09: 27897404..27997404 C09: 27897404..27997404 C01: 35698674..35798674 C01: 35698674..35798674 C01: 19311976..19411976 C04: 9772547..9872547 Scaffold000256: 211693..311693 Scaffold000256: 211693..311693 Scaffold000457: 97838..148387 Region of interest

- - - TotalArea TotalArea - - - - - TotalArea AreaLeaf MeanWidthLargestLeaf TotalLength LengthLeaf WidthLeaf - LengthLeaf AreaLeaf TotalLength MeanWidthLeaf LengthStem - - - - TotalLength MeanWidthLeaf AreaLeaf WidthLeaf TotalLength AreaLeaf LengthLeaf LengthLeaf LengthLeaf Significant in other Traits

- - - - - UBP15 UBP15 UBP15 UBP15 SPR1 SPR2 SPR1 SPR2 ROT3 CLAVATA3 MYB81 MYB104 - - - APRR3 TOE2 APUM19 APRR3 TOE2 APUM19 - - KAN2 LNG1 - - - Known GOI Bol040804 Bol040788 Bol040785 Bol039158 Bol039157 Bol038299 Bol038288 Bol038287 Bol038285 Bol032553 Bol032552 Bol029002 Bol028637 Bol027782 Bol026171 Bol025737 Bol019260 Bol017253 Bol017251 Bol015767 Bol015765 Bol014057 Bol011079 Bol007769 Bol007768 Bol001263 Bol codegene Q8GXU9 Q94AH6 Q9SRY3 Q9ZNU2 Q9C932 Q9C7S5 Q9FPT1 Q8LE98 Q9FPS9 Q9SJW3 Q9ZPR1 Q9M066 P82280 Q9SM27 Q8GWP0 Q8LPR5 Q9ZWA6 Q9LVG4 Q9LVG2 Q9SK55 Q940T9 Q700D9 P93002 Q8RX28 Q8GY84RH10 Q8S8I2 Uniprot code FAF2 CUL1 RALF1 NAC18 NAC19 PSY1R UBP12 ICR1/ROP UBP15 SPR1 CD48B ROT3 RAV2 MYB104 MYB39 TCP4 MAGPIE APRR3 TOE2 JUB1/NAC42 COL4 KAN2 NPR6/BOP1 HDA5 RH10 ULT2 New GOI PCO P-value 0.000108297 0.000108297 1.36651E-05 1.36651E-05 1.36651E-05 2.95654E-16 2.95654E-16 3.93316E-06 3.93316E-06 3.93316E-06 3.93316E-06 1.12602E-06 9.71102E-06 6.10605E-06 1.71732E-10 1.01942E-19 3.40699E-05 3.40699E-05 6.06035E-14 5.51074E-10 1.30999E-06 8.2346E-09 8.2346E-09 0.0000308 0.0000221 0.0000221 PCO LOD 15.53 15.53 18.99 13.22 4.86 4.86 4.86 5.41 5.41 5.41 5.41 3.97 3.97 5.95 5.01 5.21 4.51 9.77 4.47 4.47 4.66 4.66 9.26 8.08 8.08 5.88 NoStructure LOD 16.26 16.26 25.48 26.49 78.48 35.45 44.06 2.90 2.90 2.90 7.91 7.91 7.91 7.91 3.51 3.51 3.82 6.41 8.51 6.32 6.32 6.56 6.56 8.07 8.07 5.72 STRUCTURE LOD 16.12 16.12 17.40 10.48 14.00 12.21 12.21 11.85 5.74 5.74 5.74 5.28 5.28 5.28 5.28 1.80 1.80 0.89 4.74 9.02 9.24 0.94 0.94 4.89 4.89 2.80 Page | 34

Table 8: The list of candidate genes and their functions, source: Uniprot gene entries. Known genes of interest were found in previous studies to be related to leaf morphology, either from literature studies or in previous GWAS on subsets of the collection we used. Gene Function Description Gene code ULT2 Putative transcription factor that acts as a key negative regulator of cell accumulation in shoot and floral meristems. Negatively regulates the size of the WUSCHEL (WUS)-expressing organizing Q8S8I2 centre in inflorescence meristems. RH10 Involved in leaf polarity establishment by functioning cooperatively with AS2 to repress abaxial genes ARF3, ARF4, KAN1, KAN2, YAB1 and YAB5, and the KNOX homeobox genes KNAT1, KNAT2, Q8GY84RH1 KNAT6, and STM to promote adaxial development in leaf primordia at shoot apical meristems. 0 HDA5 Responsible for histone deacetylation, gives a tag for epigenetic repression and plays an important Q8RX28 role in transcriptional regulation, cell cycle progression and developmental events. NPR6/ May act as a substrate-specific adapter of an E3 ubiquitin-protein ligase complex (CUL3-RBX1-BTB) BOP1 Key positive regulator of the SA-dependent signalling pathway that negatively regulates JA- dependent signalling pathway. Q9M1I7 KAN2 Establishing adaxial identity. Known GOI (Feng Cheng et al., 2016). Q9C616 COL4 CONSTANS-LIKE 4, regulation of flower development and response to light stimulus. Transcription factor that acts in the long day flowering pathway and may mediate between the circadian clock Q940T9 and the control of flowering. JUB1/ Transcription factor that is a negative regulator of leaf senescence. Q9SK55 NAC42 TOE2 Regulates negatively the transition to flowering time and confers flowering time delay involved in Q9LVG2 the ethylene signalling pathway. Known GOI (Eggelen, 2017). APRR3 Controls photoperiodic flowering response. Component of the circadian clock. Controls the Q9LVG4 degradation of APRR1/TOC1. Known GOI (Eggelen, 2017). MAGPIE Zinc finger transcription factor that regulates tissue boundaries and asymmetric cell division. Q9ZWA6 TCP4 Transcription factor playing a pivotal role in the control of morphogenesis of shoot organs by negatively regulating the expression of boundary-specific genes such as CUC genes. Involved in cell Q8LPR5 differentiation, cotyledon morphogenesis embryo development, leaf development, leaf morphogenesis, senescence, etc. (Mao et al., 2014). MYB39 DNA binding transcription factor, involved in cell differentiation. Q8GWP0 MYB104 Transcription factor, involved in cell differentiation. Q9SM27 RAV2 Protein related to APETALA2. Probably acts as a transcriptional activator. Transcriptional repressor P82280 of flowering time on long day plants. ROT3 This protein is involved in the pathway brassinosteroid biosynthesis, which is part of plant hormone Q9M066 biosynthesis. Known GOI (Xiao et al., 2014). CD48B Functions in cell division and growth processes (cytoskeleton and growth polarity). Q9ZPR1 SPR1 Required for directional control of cell elongation. Stabilizes growing ends of cortical microtubules Q9SJW3 and influences their dynamic properties. Known GOI (Topper, 2016). UBP15 Recognizes and hydrolyzes the peptide bond at the C-terminal Gly of ubiquitin. Involved in the processing of poly-ubiquitin precursors as well as that of ubiquitinated proteins. Known GOI (Slob, Q9FPS9 2016). ICR1/ ROP Required for primary and adventitious root maintenance. Regulates directionality of polar auxin transport and is required for the formation of a stable auxin maximum and tip localized auxin Q8LE98 gradient during embryogenesis, organogenesis, and meristem activity. Involved in exocytosis and in the recycling of PIN proteins back to the plasma membrane UBP12 Recognizes and hydrolyses the peptide bond at the C terminal Gly of ubiquitin. Involved in JA Q9FPT1 mediated signalling pathway, and (de) ubiquitination amongst others. Related to known GOI. PSY1R Regulates, in response to tyrosine-sulphated glycopeptide binding, a signalling cascade involved in Q9C7S5 cellular proliferation and plant growth. NAC19 Encodes a transcription factor involved in the elaboration of shoot apical meristems (SAM). Q9C932 NAC18 Encodes a transcription factor involved in the elaboration of shoot apical meristems (SAM). Q9ZNU2 RALF1 Cell signalling peptide that regulates plant stress, growth, and development. Prevents plant growth Q9SRY3 (e.g. root and leaf length). CUL1 Together with SKP1, RBX1 and a F-box protein, it forms a SCF complex. The functional specificity of this complex depends of the type of F-box protein. SCF(UFO) is implicated in floral organ Q94AH6 development. SCF(TIR1) is involved in auxin signalling pathway. Other versions of the complex are involved in jasmonate response and senescence amongst others. FAF2 Regulates the size of the shoot meristem by modulating the CLV3-WUS feedback loop. Can repress Q8GXU9 WUS but is under negative control by CLV3.

Page | 35

Conclusions & Discussion To conclude this thesis, we will return to the original research questions. How to best correct for population structure in B. oleracea? Can we establish genetic links between leaf morphology and heading traits in cabbage? What are significant regions of interest on the B. oleracea genome for breeding leaf related traits? Are leaf formation genes, previously highlighted as candidate genes, associated with particular phenotypes in leaves or cabbage heads? Can we identify new candidate genes involved in leaf formation or development? It is fair to say all these questions have been addressed in our analyses and many answered, though at least an equal amount of questions has been raised in return.

To address the goals and research questions of this thesis we wished to elucidate the degree of LD in the B. oleracea genome and possible domestication blocks in the genome. The analysis in TASSEL yielded an LD distance of 150 Kb, with many markers having unexpectedly high LD over large distances, including different chromosome arms and between individual chromosomes. Previous studies showed significantly shorter LD decay distances than we found. Theo Borm of the Wageningen UR Plant Breeding Department also looked at LD in the WGS dataset he is analysing to replace the current SBG data. This showed about 25 Kb of linkage over the B. oleracea genome (Theo Borm, Personal communication). This short distance of LD decay is smaller than many of the distances between the clusters of SNPs we have in the SBG data. If this is indeed the real LD decay distance for B. oleracea that would make it difficult to identify tight linkage in the SBG dataset, as the distance between markers that could be linked is larger than the distance in which LD decays. However future work could still look at larger selective sweeps and conserved morphotype specific regions.

One interesting observation we made was a difference between the analysis of the full genomic dataset and one filtered for only the most common SNPs (80%, MAF <2.5%) as the full genomic dataset had many instances of LD with the same R2 over long distances, forming horizontal lines of LD in the decay plots. These lines of LD were present in the gene bank accessions but not in the modern hybrids. Therefore, it was theorised that a select number of accessions share multiple rare alleles that are not present in the larger dataset. That would mean the points of LD of those markers are not due to actual LD but are caused by population structure. This was studied further and most of these LD points were on locations for which a select number of lines had SNPs, and these accessions were often the same, even for different chromosomes. They were wild and ornamental morphotypes, so are likely to have different alleles than the other accessions as they are more specialised morphotypes. However, these accessions did not explain all the points of LD over long distances. It would nevertheless be interesting to investigate if these accessions with rare SNPs also showed very different phenotypes from the rest of the accessions.

The SBG data was hard to transform to formats that are suitable for programs other than TASSEL. Analysis in TASSEL posed some limits on the depth of the analysis. Many of the instances of LD between markers that are far apart on the genome we believe to not be true LD, but artefacts caused by population structure. TASSEL is unable to correct for population structure so in future research we suggest redoing the LD analysis using software that does allow for population structure correction. Especially the open software R has many great existing tools that can be relatively easily optimised for our data. GAPIT for example is a recent version of TASSEL adapted for use in R, being newer and more refined it may give us better options for applying population structure corrections and output display options, without needing to export data to excel to make graphs (which is problematic because of the large file sizes). If the data is set in the right format it will also be easier to run it in multiple programs and compare the output. Should the SBG data be used again for an LD analysis it is vital the program or package can handle missing data points well. Another option would be to switch to a higher quality

Page | 36 dataset, such as the Whole Genome Shotgun sequencing data being developed at the moment. This should allow for a more trustworthy estimation of LD in our collection. Based on the LD value found in such a future analysis and whether it is constant over the genome and between accessions the distance which is taken as the region of interest around a marker that is associated with a trait should be changed. Now the standard was 50 Kb to either side, but this number may have to be changed depending on in which morphotypes the effect is seen, which region of the genome the marker is and the general LD decay distance in our dataset.

Singh and Singh give a nice list of points to take into account in LD analyses: “(1) LD decay is much more rapid in outcrossing than in selfing species. (2) The extent of LD is much higher in cultivars and breeding lines than in wild accessions and land races of a crop species. … (4) Different marker systems are likely to provide different estimates of LD. (5) The extent of LD may vary markedly among the different regions of a genome. (6) Collections of germplasm accessions with narrow genetic base show longer LD blocks than those having broad genetic base. (7) The size of LD blocks and the abundance of LD determine the power and precision of AM. (8) Finally, the pattern of LD is greatly influenced by a variety of factors, including population structure and genetic drift.” (Singh & Singh, 2015) Point four supports our findings that the measured LD differs between studies, whereas point 5 would be interesting to study in a more in-depth analysis of LD in B. oleracea. This may affect the findings of our GWAS, now for all the markers we find to be associated to certain traits we take the same region around them to look for causal genes, while the size of this region of interest should in theory be different depending on the actual LD distance of that region of the genome. Point six would not be an issue in our collection as it contains material from a diverse range of companies and gene banks. Finally, the degree of LD also influences how finely we can map genes to the markers we use in GWAS (Flint-Garcia, Thornsberry, & Buckler, 2003; Myles et al., 2009).

The Halcon scripts developed for phenotyping traits work well and permit us to analyse large numbers of images. However, the analysis by the scripts can be aided by standardising the data collection by always using only one leaf per photograph, always placing the edge of the petiole on a pre-determined line and adding more lights to avoid shadows forming around the leaves. There is also not yet a script to analyse the pictures of the cabbage heads and soon there will be scripts to do analysis of 3D images from 2016, these will both add data for future studies. The two-dimensional analysis of the leaves is bound to be an underestimation of all traits, as Brassica leaves are never completely flat and you are not able to take the curvature of the leaf into account. To address this the 3D analysis could be used to develop a ratio between the 2D measurement-estimations and the actual, 3D measured dimensions of leaves and cabbage heads. Even with a 3D analysis though, there are some leaves, especially the curly kale accessions that are simply impossible to quantify accurately due to their extremely curved and folded leaves. The Halcon script for the 2D pictures was sound though and measured the pictures well, as many Brassica leaves are relatively similar in their curvature the 2D data still reflects the relative sizes and dimensions of the majority of accessions.

Though the initial script did not yet function optimally, this improved over several iterations. Analysing the two and the one-leaf images separately and optimising the barcode-identification had the greatest effect, resulting in far fewer false, duplicate or missing entries. There were some smaller issues as well such as that the pictures of one day having smaller sized images and a different background setting which caused issues with recognition by the script, but these were relatively easily resolved. The script

Page | 37 seems less capable of making the distinction between the leaf and the background if the leaf is red- coloured, which could be improved in the future.

The data collection can also be optimised by taking a smaller collection, which can still reflect the genotypic and phenotypic variation we have in the current full dataset, in which more traits and even trait development over time could then be observed. Doing an analysis over the lifetime of the plants will give more insight into what genes are involved in leaf development as well as providing a more accurate estimation of the traits than when plants are only measured at the time they are deemed mature, which contains a human error. Furthermore, the curly kale accessions can be removed from the field trial, the data for these is too unreflective of the actual and relative sizes to generate any usable results in the GWAS. The wild accessions with their extremely small leaves, are like the kales, another group with very different phenotypes from all the other accessions. They could be reviewed to see which contribute most to both the phenotypic and genotypic variation and accessions that contribute little to either can be excluded from future studies. Lastly it would be good to develop protocols for checking the data quality of Halcon, which entries are deemed true or false and which tests are carried out, including the various steps and programs needed here.

Trying a different method for correcting for population in GWAS was a success, with PCO corrections scoring better than STRUCTURE. The PCOs are also a more efficient method in terms of computational time and allow for more post-test analysis of the results by plotting the different axes against one another. Conducting a GWAS on cabbages, with a subpopulation structure correction for these accessions also improved the correction we were able to make.

Performing GWAS on subpopulations allows for a better correction as the overall structure effects are reduced, for example cabbages will be relatively more related to each other than they are to Chinese kales. There is less structure present that you need to correct for, for example in our case meaning that with a PCO that explains only 6% more of the genetic variation present than the PCO for the full population did, you still have much less population effect in the GWAS. Performing GWAS in subpopulations does not only reduce structure effects but also reduces the total genetic variation itself. However, for a successful GWAS you still need segregating traits, by limiting too much on specific morphotypes the amount of genetic and phenotypic variation is reduced. If this is reduced too much the quality of the GWAS can be compromised, as there are less segregating markers to map and less phenotypic variation to relate them to. Therefore, our recommendation for future studies would be to not simply focus on subpopulations. This adds too much risk of losing important variation, so we suggest to only make subpopulations if there is a large group of accessions with genetic and phenotypic variation and a clear subpopulation structure (with large subgroups), such as the heading cabbages. Instead it would be good to try and limit the total dataset slightly, by removing accessions which are genotypically very different from the rest but do not actually show clear phenotypic variation the PCO correction can be improved. Other options are to increase the number of SNPs that is used to calculate the PCOs.

Using marker-based kinship in conjunction with PCOs to correct for population structure is also an option that should be investigated, as the PCOs currently explain a relatively low percentage of the genotypic variation. In B. rapa Kinship by itself was found to be less effective because there is more variation within the morphotype groups than between them (Carpio et al., 2011). In addition, in B. rapa, many morphotypes show more genetic similarity to different phenotypic morphotypes from the same geographical region than to phenotypically similar morphotypes from other regions, making it hard to distinguish groups using this pairwise method (Zhao et al., 2005). The effectivity of the marker-

Page | 38 based kinship correction method in B. oleracea is not published though and in other crops Kinship and PCO correction methods have been used successfully.

The GWAS generated too much data to analyse all in this thesis. The selection that could be studied revealed several interesting new genes that affect leaf morphology. Further studies can validate these in molecular analyses. Many significant marker-trait associations remain unanalysed for possible genes of interest that affect leaf morphology, to be analysed in future projects. To facilitate this a datafile has been made with all the significant associations we found, using this, future students can work on developing further selection criteria and standardising the analysis. By keeping a file with all the associations found in the studies on this dataset an overview can be built where one can filter for various criteria, such as having certain LOD scores or filtering for markers that are also significant for other traits, are close to known genes of interest or have been found to be significant in previous association analyses. If this could be combined with a quantification of attributes like the distance to the nearest other markers, the LOD scores of those neighbouring markers, a quantified effect of the different alleles that would further ensure the quality of the output. One change that would reduce the labour-intensiveness of the analysis greatly would be a way to automatically retrieve, for each marker one wishes to analyse, the region of interest, all the genes therein and their annotation from the Brassica genome browser (F Cheng et al., 2011). The actual markers that are excluded from our analysis due to the population structure correction markers can also be studied further, as they may be involved in giving certain traits to certain morphotypes.

Determining a suitable threshold (to correct for multiple testing) was one issue we ran into. Previous theses used the False Discovery Rate (FDR) to correct for multiple testing, which we tried as well for our data (Benjamini & Hochberg, 1995). To calculate the FDR the classical one-stage method with P- values provided by TASSEL can be used. The first FDR-adjusted P-value which is significant, called the q-value, was inserted in the following formula to establish the threshold for significant LOD scores in a Manhattan plot. 푇ℎ푟푒푠ℎ표푙푑 = −log(10 ∗ 푞 − 푣푎푙푢푒). This was attempted for the dataset we had, as the FDR method allows for a scalable method, which would benefit us, as the population structure correction does not work equally well for all the traits, causing variation in LOD scores. However, using the FDR method would require us setting the acceptable FDR to such low values (smaller than 0.0001 for some traits) that we decided to use the Bonferroni method instead, and since you do the same test on all data one should in theory also use the same multiple testing correction on all data. In addition to using this stringent method for our threshold we applied as additional criteria to determine which markers we actually analysed. Future analyses may wish to apply a less strict threshold, this will prevent the occurrence of too many false negatives. Such false negatives disappear in the background signal, whereas false positives, in limited numbers (not if there are as many as in some of our traits) are easier to weed out.

All the genes we found in this GWAS need molecular confirmation by future studies, to ensure the markers and the effect shown by the different alleles of the markers are indeed linked to the proposed genes. Even so, we would like to briefly visit some of the proposed genes of interest, their functions and how they might be linked to the traits the nearby markers are associated with. Many of our genes of interest have functions that are clearly known to affect leaf morphology directly or by regulating other genes or hormones that affect leaf development. Two exceptions are the UBPs and MYBs, both these families have shown associations with traits in previous theses on this dataset but the information on how they actually affect leaf traits is not always clear. In this thesis the trait Leaf lamina area shows association to a marker near MYB104 and Core length shows relations to MYB39. According to their GO annotations these genes have cell differentiation effects, but clear literature

Page | 39 evidence is lacking. MYB domains are however also present in other important leaf development genes such as the KANADIs, according to their GO annotation, as well as AS1 and related genes (Byrne et al., 2000). The exact role of the MYB transcription factors warrants further study. The UBPs have a similar case, showing relations to Mature leaf petiole length for both UBP12 and UBP15 but having no clear function related to leaf morphology. Yet, Floris Slob found that UBP15 at least is known to have a role in cell proliferation (Y. Liu et al., 2008). Literature evidence on a relation between the function of UBP12 and leaf morphology is currently lacking, though one study shows a link between UBP12 and circadian mechanisms and flowering regulation (Cui et al., 2013).

As mentioned, other genes do have clear links between their functions and the observed phenotype effect. ULT2 and TCP4 are GOI for the Total length trait. ULT2 negatively regulates cell accumulation in the SAM and Floral Meristem. By negatively affecting WUS expression the stimulate organ formation, such leaf development from the meristem (Carles, Choffnes-Inada, Reville, Lertpiriyapong, & Fletcher, 2005). TCP4 has a even more direct link to leaf morphology, being involved in shoot organ morphogenesis (Palatnik, 2003). Similarly to the total length trait, Mean width leaf has GOI FAF2, which regulates the size and maintenance of the SAM by interacting with WUS (Wahl, Brand, Guo, & Schmid, 2010), and RALF1, a signal peptide that regulates plant growth and development (Guerreiro, Pearce, Silva-Filho, & Moura, 2013). Leaf petiole area showed association to markers close to NAC018 and NAC019, again transcription factors involved in elaboration of SAM (Hickman et al., 2013). Many of these genes, based on the information we currently have, are mainly involved in the meristem activity and the initiation of the leaf formation. These fundamental processes are rather conserved and mutations in these genes often lead to extreme phenotypes. It would be interesting to see if these genes are the real cause of the phenotypic variation we observe and what genetic variation there the is between the accessions for these genes.

Another angle on which to spend more time in future is the overlap between analyses. In our dataset the overlap between associations found in the analysis of the whole dataset and the one on the cabbage subset could be studied. The results for the mature leaves could be compared to highlight differences between the two correction methods. Additionally, one can look at the QTLs that affect the plant leaves as well as the cabbage head morphology, through overlap in significant markers for the largest leaves and the cabbage leaves in our data. In our analysis markers were selected based on their GWAS output for one trait, but 50% of our final selected markers are also significant in other traits. There are many relations between markers scoring high in length-traits, for example in both total length and leaf lamina length. This same pattern was seen for area traits, and the leaf area traits are also often related to the length and width traits. The traits in the Cabbage subset analysis showed less interrelations, of the 6 markers only 1 showed significance for another trait. This is likely due to the fact that far fewer markers were significant in this analysis. In our selected markers there are none that were selected in both cabbage and the full dataset, though these do show overlap. The extent of this overlap between our analyses would be interesting to study further in order to find markers and regions or even genes that affect both heading and leaf traits, one of our initial goals. There may be even more relations between traits than shown here as many markers are close together and neighbouring markers may be significant for other traits as well.

Lastly, we looked whether GOI match over the years and over the different studies and we tried to find a match between different traits over the years, i.e. whether there are QTLs that affect leaf morphology but are also shown to affect head traits. Quite a number of our selected GOI or same- family genes were found in previous theses, partly due to our use of a reverse analysis. However, these genes were often found to be associated to different traits, because the traits we measured changed

Page | 40 slightly and this is the first time we included nearly the complete collection, where most previous studies worked on data subsets focussed on either leaf or heading morphology (Table 9).

Table 9: GOI and trait comparison with previous theses. Traits in brackets are other traits for which the marker was found to be associated in our analysis (Eggelen, 2017; Groot, 2016; Slob, 2016; Topper, 2016).

GOI 2018 Trait Previous Trait Previous thesis & Comments GOI APUM19 Length petiole cabbage APUM19 Head weight Thesis by Van Eggelen, same leaf marker. CDC48B Length petiole mature CDC48A Blistering Groot, found in different leaf, cabbage analysis. markers. CLAVATA3 Area petiole mature leaf CLAVATA3 Head traits Topper COL4 Core length (and Mean COL5 Head weight Topper width mature leaf) IAA19 Petiole area (and Total Head weight Van Eggelen, nearby but area, Mean width leaf) different markers KAN1 Length leaf (and Total KAN1 Leaf width, Mature Topper length, Area petiole KAN3 leaf width and cabbage, Length petiole Head area cabbage). KAN2 Length leaf (and Width leaf, Total length, Mean width leaf and Area leaf). KNAT4 Length petiole mature KNAT7 Petiole width Topper leaf (and Mean width leaf cabbage) LNG1 Length leaf (and total LNG1 Leaf area, Lamina Slob, found in different length and area leaf) width, Leaf markers. complexity, Leaf margin shape MYB104 Leaf area (and Width leaf, MYB81 Blistering Groot Total length, Length leaf) MYB104 NAC019 Area petiole (and Total NAC054 Blistering Groot area) NAC098 NAC018 Area petiole (and Total area) NAC042 Core length SPR1 Petiole length mature leaf SPR1 Mature leaf area Topper TCP4 Total length (and Length TCP4 Blistering Van Eggelen, in neighbouring leaf lamina) marker. Other TCP proteins found by Topper for head traits and leaf traits. WOX3 Area petiole mature leaf WOX1 Mature leaf Topper margin shape

Page | 41

Acknowledgements I would like to thank Guusje and Joao for their continuing support and feedback throughout this thesis, without them it would not even have approached its current level of analysis and depth. Furthermore, I would like to thank Johan for granting me use of his souped-up computer to run the calculations. Theo Borm, for his help with the genotypic data and analysis, Toon for his ingenious Halcon script for the image analysis, Kiran for the 3D analysis of the images, the people at the development group and Unifarm as well as fellow students Jospehine, Lorenzo and Mazadul for doing most of harvesting and data collection with which I could work and lastly all the fellow students for the supportive ambiance in the ‘breeders hall’.

Page | 42

Bibliography Arias, T., Beilstein, M. A., Tang, M., McKain, M. R., & Pires, J. C. (2014). Diversification Times Among Brassica () Crops Suggest Hybrid Formation After 20 Million Years Of Divergence. American Journal of , 101(1), 6. Bar, M., & Ori, N. (2014). Leaf development and morphogenesis. Development, 141(22), 4219-4230. doi:10.1242/dev.106195 Barkoulas, M., Galinha, C., Grigg, S. P., & Tsiantis, M. (2007). From genes to shape: regulatory interactions in leaf development. Curr Opin Plant Biol, 10(6), 660-666. doi:10.1016/j.pbi.2007.07.012 Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society, 57(1), 11. Blein, T., Hasson, A., & Laufs, P. (2010). Leaf development: what it needs to be complex. Current Opinion in Plant Biology, 13(1), 75-82. doi:https://doi.org/10.1016/j.pbi.2009.09.017 Bonnema, G., Del Carpio, D. P., & Zhao, J. (2011). Diversity analysis and molecular taxonomy of Brassica vegetable crops. Genetics, Genomics and Breeding of Vegetable Brassicas, 81-124. Bradbury, P. J., Zhang, Z., Kroon, D. E., Casstevens, T. M., Ramdoss, Y., & Buckler, E. S. (2007). TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics, 23(19), 2633-2635. doi:10.1093/bioinformatics/btm308 Braybrook, S. A., & Kuhlemeier, C. (2010). How a Plant Builds Leaves. The Plant Cell, 22(4), 1006-1018. doi:10.1105/tpc.110.073924 Byrne, M. E., Barley, R., Curtis, M., Arroyo, J. M., Dunham, M., Hudson, A., & Martienssen, R. A. (2000). Asymmetric leaves1 mediates leaf patterning and stem cell function in Arabidopsis. Nature, 408, 4. Carles, C. C., Choffnes-Inada, D., Reville, K., Lertpiriyapong, K., & Fletcher, J. C. (2005). ULTRAPETALA1 encodes a SAND domain putative transcriptional regulator that controls shoot and floral meristem activity in Arabidopsis. Development, 132, 14. Carpio, D. P. D., Basnet, R. K., Vos, R. C. H. D., Maliepaard, C., Paulo, M. J. o., & Bonnema, G. (2011). Comparative Methods for Association Studies: A Case Study on Metabolite Variation in a B r a s s i c a r a p a Core Collection. PLoS ONE, 6(5), 10. Cartea, M. E., Lema, M., Francisco, M., & Velasco, P. (2011). Basic information on vegetable Brassica crops. Genetics, Genomics and Breeding of Vegetable Brassicas, 1-33. Cheng, F., Liu, S., Wu, J., Fang, L., Sun, S., Liu, B., . . . Wang, X. (2011). BRAD, the genetics and genomics database for Brassica plants. BMC Plant Biotechnology, 11(136). Cheng, F., Sun, R., Xilin Hou, Hongkun Zheng, Fenglan Zhang, Yangyong Zhang, . . . Wang, X. (2016). Subgenome parallel selection is associated with morphotype diversification and convergent crop domestication in Brassica rapa and Brassica oleracea. Nature Genetics, 48, 6. Cheng, F., Wu, J., & Wang, X. (2014). Genome triplication drove the diversification of Brassica plants. 1, 14024. doi:10.1038/hortres.2014.24 https://www.nature.com/articles/hortres201424#supplementary-information Cui, X., Lu, F., Li, Y., Xue, Y., Kang, Y., Zhang, S., . . . Cao, X. (2013). Ubiquitin-Specific Proteases UBP12 and UBP13 Act in Circadian Clock and Photoperiodic Flowering Regulation in Arabidopsis. Plant Physiology, 162(2), 9. Dkhar, J., & Pareek, A. (2014). What determines a leaf's shape? EvoDevo, 5(1), 47. doi:10.1186/2041- 9139-5-47 Earl, D. A., & vonHoldt, B. M. (2012). STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resources, 4(2), 359-361. doi:10.1007/s12686-011-9548-7 Eggelen, J. v. (2017). Explaining phenotypical variation by assessing genetic variation in Brassica oleracea. Wageningen UR Thesis, 64.

Page | 43

Evanno, G., Regnaut, S., & Goudet, J. (2005). Detecting the number of clusters of individuals using the software structure: a simulation study. Molecular Ecology, 14(8), 2611-2620. doi:10.1111/j.1365-294X.2005.02553.x Flint-Garcia, S. A., Thornsberry, J. M., & Buckler, E. S. (2003). Structure of Linkage Disequilibrium in Plants. Annual Review of Plant Biotechnology, 54, 17. Gonzalez, N., Vanhaeren, H., & Inze, D. (2012). Leaf size control: complex coordination of cell division and expansion. Trends Plant Sci, 17(6), 332-340. doi:10.1016/j.tplants.2012.02.003 Groot, T. (2016). Genetic analysis of heading cabbage traits. Wageningen UR Thesis, 86. Guerreiro, J. R., Pearce, G., Silva-Filho, M. C., & Moura, D. S. (2013). Chapter 9 – RALF Peptides. In A. J. Kastin (Ed.), Handbook of Biologically Active Peptides (Second Edition) (pp. 46-49): Academic Press/Elsevier. Ha, C. M., Jun, J. H., & Fletcher, J. C. (2010). Shoot apical meristem form and function. Curr Top Dev Biol, 91, 103-140. doi:10.1016/s0070-2153(10)91004-1 Hamon, P., Seguin, M., Perrier, X., & Glaszmann, J. C. E. (2003). Genetic diversity of cultivated tropical plants. Enfield, Science Publishers, 34. Hepworth, J., & Lenhard, M. (2014). Regulation of plant lateral-organ growth by modulating cell number and size. Current Opinion in Plant Biology, 17(Supplement C), 36-42. doi:https://doi.org/10.1016/j.pbi.2013.11.005 Hibara, K.-i., Takada, S., & Tasaka, M. (2003). CUC1 gene activates the expression of SAM-related genes toinduce adventitious shoot formation. The Plant Journal, 36, 9. doi:doi: 10.1046/j.1365- 313X.2003.01911.x Hickman, R., Hill, C., Penfold, C. A., Breeze, E., Bowden, L., Moore, J. D., . . . Buchanan‐Wollaston, V. (2013). A local regulatory network around three NAC transcription factors in stress responses and senescence in Arabidopsis leaves. Plant Journal, 75, 13. Kalve, S., De Vos, D., & Beemster, G. T. S. (2014). Leaf development: a cellular perspective. Frontiers in Plant Science, 5, 362. doi:10.3389/fpls.2014.00362 KeyGene. (2017). Sequence-Based Genotyping. Li, J., & Ji, L. (2005). Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Nature, 95, 6. Liu, S. e. a. (2014). The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes. Nature Communications, 5(3930), 11. doi:10.1038/ncomms4930 Liu, Y., Wang, F., Zhang, H., He, H., Ma, L., & Deng, X. W. (2008). Functional characterization of the Arabidopsis ubiquitin‐specific protease gene family reveals specific role and redundancy of individual members in development. The Plant Journal, 55(5), 12. Maggioni, L., von Bothmer, R., Poulsen, G., & Branca, F. (2010). Origin and Domestication of Cole Crops (Brassica oleracea L.): Linguistic and Literary Considerations. Economic Botany, 64(2), 109-123. Mao, Y., Wu, F., Yu, X., Bai, J., Zhong, W., & He, Y. (2014). microRNA319a-Targeted Brassica rapa ssp. pekinensis TCP Genes Modulate Head Shape in by Differential Cell Division Arrest in Leaf Regions. Plant Physiology, 164, 10. Mason, A. S., Rousseau-Gueutin, M., Morice, J., Bayer, P. E., Besharat, N., Cousin, A., . . . Nelson, M. N. (2016). Centromere Locations in Brassica A and C Genomes Revealed Through Half-Tetrad Analysis. Genetics, 202(2), 513-523. doi:10.1534/genetics.115.183210 Mayer, K. F. X., Schoof, H., Haecker, A., Lenhard, M., Jürgens, G., & Laux, T. (1998). Role of WUSCHEL in Regulating Stem Cell Fate in the Arabidopsis Shoot Meristem. Cell, 95(6), 10. Moon, J., & Hake, S. (2011). How a leaf gets its shape. Current Opinion in Plant Biology, 14(1), 24-30. doi:https://doi.org/10.1016/j.pbi.2010.08.012 Myles, S., Brown, J. P. P. J., Ersoz, E. S., Zhang, Z., Costich, D. E., & Buckler, E. S. (2009). Association Mapping: Critical Considerations Shift from Genotyping to Experimental Design. Plant Cell, 21(8), 8.

Page | 44

Olsen, A. N., Ernst, H. A., Leggio, L. L., & Skriver, K. (2005). NAC transcription factors: structurally distinct, functionally diverse. Trends in Plant Science, 10(2), 8. Palatnik, J. F., Allen, E., Wu, X., Schommer, C., Schwab, R., Carrington, J.C., and Weigel, D. (2003). Control of leaf morphogenesis by microRNAs. Nature, 425, 6. Parkin, I. A., Koh, C., Tang, H., Robinson, S. J., Kagale, S., Clarke, W. E., . . . Sharpe, A. G. (2014). Transcriptome and methylome profiling reveals relics of genome dominance in the mesopolyploid Brassica oleracea. Genome Biology, 15, 18. Pritchard, J. K., Stephens, M., & Donnelly, P. (2000). Inference of Population Structure Using Multilocus Genotype Data. Genetics, 155(2), 945-959. Schoof, H., Lenhard, M., Haecker, A., Mayer, K. F. X., Jürgens, G., & Laux, T. (2000). The Stem Cell Population of Arabidopsis Shoot Meristems Is Maintained by a Regulatory Loop between the CLAVATA and WUSCHEL Genes. Cell, 100(6), 9. Schranz, M., Lysak, M., & Mitchell-Olds, T. (2007). The ABC's of comparative genomics in the Brassicaceae: building blocks of crucifer genomes. Trends in Plant Science, 11(11), 7. Scofield, S., Dewitte, W., & Murray, J. A. H. (2007). The KNOX gene SHOOT MERISTEMLESS is required for the development of reproductive meristematic tissues in Arabidopsis. The Plant Journal, 50, 14. doi: doi: 10.1111/j.1365-313X.2007.03095.x Singh, B. D., & Singh, A. K. (2015). Association Mapping Marker-Assisted Plant Breeding: Principles and Practices (pp. 217-256). New Delhi: Springer India. Slob, F. (2016). Genetic analysis of leaf morphology in Brassica oleracea. Wageningen UR Thesis, 120. Topper, F. (2016). Association Mapping of Leaf Morphology Traits in Brassica oleracea. Wageningen UR Thesis, 88. Town, C. D., Cheung, F., Maiti, R., Crabtree, J., Haas, B. J., Wortman, J. R., . . . Bancroft, I. (2006). Comparative genomics of Brassica oleracea and Arabidopsis thaliana reveal gene loss, fragmentation, and dispersal after . Plant Cell, 18(6), 1348-1359. doi:10.1105/tpc.106.041665 Tsukaya, H. (2013). Leaf Development. The Arabidopsis Book / American Society of Plant Biologists, 11, e0163. doi:10.1199/tab.0163 U, N. (1935). Genome analysis in Brassica with special reference to the experimental formation of B. napus and peculiar mode of fertilization. Japanese Journal of Botany, 7, 63. Wahl, V., Brand, L. H., Guo, Y.-L., & Schmid, M. (2010). The FANTASTIC FOUR proteins influence shoot meristem size in Arabidopsis thaliana. BMC Plant Biology, 10, 12. Wang, F., Li, L., Li, H., Liu, L., Zhang, Y., Gao, J., & Wang, X. (2012). Transcriptome analysis of rosette and folding leaves in Chinese cabbage using high-throughput RNA sequencing. Genomics, 99(5), 8. doi:https://doi.org/10.1016/j.ygeno.2012.02.005 Xiao, D., Wang, H., Basnet, R. K., Zhao, J., Lin, K., Hou, X., & Bonnema, G. (2014). Genetic Dissection of Leaf Development in Brassica rapa Using a Genetical Genomics Approach. Plant Physiology, 164(3), 1309-1325. doi:10.1104/pp.113.227348 Yu, J., Pressoir, G., Briggs, W. H., Bi, I. V., Yamasaki, M., Doebly, J. F., . . . Buckler, E. S. (2006). A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics, 38(2), 6. Yu, J., Zhao, M., Wang, X., Tong, C., Huang, S., Tehrim, S., . . . Liu, S. (2013). Bolbase: a comprehensive genomics database for Brassica oleracea. BMC Genomics, 14, 664. doi:10.1186/1471-2164-14- 664 Zhao, J., Wang, X., Deng, B., Lou, P., Wu, J., Sun, R., . . . Bonnema, G. (2005). Genetic relationships within Brassica rapa as inferred from AFLP fingerprints. Theoretical and Applied Genetics, 110(7), 14.

Page | 45

Appendices

The following items are included as appendices to this thesis, large datafiles such as the phenotype and genotype files, LD calculation files, the population structure corrections and the file with all significant marker-trait associations are not included but can be supplied on demand. Figures: Figure 1: Linkage disequilibrium for different scales and applications Jakob c Mueller ...... 47 Figure 2: PP-plots complete dataset ...... 49 Figure 3: PP-plots of cabbage traits and mature leaf traits with cabbage subset corrections...... 50 Figure 4: Full dataset leaf traits box plots...... 57 Figure 5: Cabbage traits boxplots ...... 58 Figure 6: QQ-plots of residuals full dataset traits ...... 59 Figure 7: QQ-plots of residuals of cabbage traits ...... 60 Figure 8: Manhattan plots for the full dataset analysis of the largest mature leaves. Top PCO correction, middle STRUCTURE and bottom without correction...... 61 Figure 9: Manhattan plots for the full dataset analysis of the largest mature leaves. Top PCO correction, middle STRUCTURE and bottom without correction...... 62 Figure 10: Manhattan plots for the full dataset analysis of the largest mature leaves. Top PCO correction, middle STRUCTURE and bottom without correction...... 63 Figure 11: Manhattan plots for the full dataset analysis of the largest mature leaves. Top PCO correction, middle STRUCTURE and bottom without correction...... 64 Figure 12: Manhattan plots for the full dataset analysis of the largest mature leaves. Top PCO correction, middle STRUCTURE and bottom without correction...... 65 Figure 13: Manhattan plots for the full dataset analysis of the largest mature leaves. Top PCO correction, middle STRUCTURE and bottom without correction...... 66 Figure 14: Manhattan plots for the full dataset analysis of the largest mature leaves. Top PCO correction, middle STRUCTURE and bottom without correction...... 67 Figure 15: Manhattan plots for the full dataset analysis of the largest mature leaves. Top PCO correction, middle STRUCTURE and bottom without correction...... 68 Figure 16: Manhattan plots for the cabbage subset data from 2016 (3D images). Top PCO correction, middle STRUCTURE and bottom without correction...... 69 Figure 17: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction...... 70 Figure 18: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction...... 71 Figure 19: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction...... 72 Figure 20: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction...... 73 Figure 21: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction...... 74 Figure 22: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction...... 75 Figure 23: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction...... 76 Figure 24: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction...... 77

Page | 46

Figure 25: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction...... 78 Figure 26: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction...... 79 Figure 27: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction...... 80 Figure 28: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction...... 81 Figure 29: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction...... 82 Figure 30: Effects graphs of final selection of markers of interest...... 84 Tables: Table 1: Candidate genes from theses by Groot, Topper and Slob...... 46 Table 2: Line causing accessions, the top 30 accessions with the highest number of the rare, LD-line causing SNPs are given for each of the nine chromosomes...... 47 Table 3: one way ANOVA output on traits with blocks and morpohtypes as fixed factors ...... 51 Table 4: Pearson correlation matrix ...... 55 Table 5: Other known GOI for which nearby markers were found to be significant...... 85 Table 6: Table of significant markers near previously known genes of interest, reverse approach results...... 85 Table 7: List of genes from selected markers of interest...... 86

Table 10: Candidate genes from theses by Groot, Topper and Slob.

Gene Abbreviation Name Comments TCP1 CLE12 KNAT3 LONGIFOLIA1 REVOLUTA CUC1 LG1 Low green 1 Head length association NAC058 NAC transcription factor family Involved with CUC2, which is involved in leaf margin development. Head length association. CYCLIN-U2-1 Good match for blistering

EXPANSIN-B6 B4 Good match for blistering NAC054 Involved with CUC1 good match for blistering WSD1 Wax synthase diacylglycerolacyltransferase maybe blistering

AVT1 Vacuolar amino acid transporter 1 maybe blistering MYB81 MYB104 maybe blistering CDC48A Cell division control protein 48 homolog A maybe blistering ARF6 maybe blistering TMK1 TMK4 Transmembrane kinases. Indoleacetic acid induced Good match for head weight protein 9 IAA9; interacts with ARFs TFL1 Terminal flower 1 Head weight

APUM5 PUMILIO homolog 5 Good match for head weight MKK5 MITOGEN ACTIVATED protein kinase 5 Good match for head weight IRX9 Irregular xylem 9

Page | 47

GTE4 Best GLOBAL transcription factor group E4 CHC1 Best CLATHRIN HEAVY CHAIN1 LBD21 LOB domain containing protein 21 SPL10 Squamosa promotor-binding-like protein 10 PIP5K3 Phosphatidylinositol 4-phosphate 5-kinase 3 LNG1 TCP2 GRF5 UCU1 HUB2 AVP1 UBP15 PRL1 GRF4

Figure 24: Linkage disequilibrium for different scales and applications Jakob c Mueller

Table 11: Line causing accessions, the top 30 accessions with the highest number of the rare, LD-line causing SNPs are given for each of the nine chromosomes.

1 2 3 4 5 6 7 8 9 TKI1236 TKI883 TKI1226 TKI887 TKI1219 TKI883 TKI1226 TKI887 TKI883 TKI883 TKI1236 TKI887 TKI1220 TKI888 TKI894 TKI888 TKI1220 TKI903 TKI888 TKI067 TKI894 TKI1223 TKI1226 TKI1223 TKI1223 TKI886 TKI894 TKI894 TKI903 TKI909 TKI1219 TKI1223 TKI1236 TKI1219 TKI1219 TKI1216 TKI886 TKI886 TKI1236 TKI888 TKI883 TKI887 TKI1220 TKI1236 TKI1220 TKI1220 TKI468 TKI1240 TKI883 TKI903 TKI1240 TKI894 TKI1240 TKI1236 TKI903 TKI726 TKI885 TKI1216 TKI1220 TKI903 TKI886 TKI1223 TKI433 TKI1219 TKI888 TKI883 TKI1236 TKI887 TKI1216 TKI887 TKI1226 TKI444 TKI885 TKI900 TKI1216 TKI885 TKI909 TKI1220 TKI1216 TKI888 TKI723 TKI1223 TKI1184 TKI886 TKI894 TKI1236 TKI886 TKI1061 TKI883 TKI791

Page | 48

TKI1216 TKI1235 TKI1220 TKI886 TKI1216 TKI1219 TKI909 TKI885 TKI815 TKI887 TKI896 TKI1219 TKI903 TKI894 TKI885 TKI1062 TKI894 TKI869 TKI1149 TKI1220 TKI454 TKI1226 TKI886 TKI1234 TKI1236 TKI903 TKI879 TKI454 TKI1234 TKI1223 TKI870 TKI1033 TKI888 TKI877 TKI1216 TKI885 TKI1241 TKI125 TKI903 TKI1240 TKI1235 TKI598 TKI665 TKI1235 TKI909 TKI909 TKI435 TKI888 TKI1082 TKI787 TKI909 TKI1068 TKI870 TKI016 TKI889 TKI575 TKI1239 TKI1084 TKI885 TKI1073 TKI181 TKI880 TKI050 TKI891 TKI887 TKI896 TKI697 TKI900 TKI782 TKI439 TKI1042 TKI059 TKI897 TKI901 TKI1234 TKI889 TKI1076 TKI864 TKI794 TKI909 TKI064 TKI1182 TKI1031 TKI1182 TKI1031 TKI1240 TKI870 TKI804 TKI1108 TKI081 TKI1230 TKI1065 TKI611 TKI1145 TKI651 TKI882 TKI883 TKI449 TKI1002 TKI1148 TKI1194 TKI697 TKI1241 TKI343 WBol401 TKI916 TKI451 TKI1042 TKI1145 TKI1239 TKI870 TKI1086 TKI428 TKI358 TKI1042 TKI726 TKI1151 TKI1240 TKI201 TKI1086 TKI768 TKI441 TKI428 TKI1059 TKI1111 TKI1179 TKI1150 TKI687 TKI1194 TKI696 TKI721 TKI461 TKI1240 TKI1179 TKI1226 TKI1147 TKI804 TKI499 TKI1100 TKI870 TKI841 TKI184 TKI1224 TKI1240 TKI1226 TKI857 TKI723 TKI1235 TKI058 TKI867 TKI315 TKI1239 TKI349 TKI1232 TKI882 TKI726 TKI582 TKI1042 TKI900 TKI666 TKI738 TKI360 TKI611 TKI026 TKI901 TKI891 TKI1051 TKI963 TKI721 TKI882 TKI426 TKI1146 TKI1062 TKI447 TKI1232 TKI1054 TKI005 TKI885 TKI901 TKI438

Page | 49

Figure 25: PP-plots complete dataset

Page | 50

Figure 26: PP-plots of cabbage traits and mature leaf traits with cabbage subset corrections.

Page | 51

Table 12: one way ANOVA output on traits with blocks and morpohtypes as fixed factors

Tests of Between-Subjects Effects

Dependent Variable: AreaLeaf

Type III Sum of

Source Squares df Mean Square F Sig.

Corrected Model 8680659979000.000a 1586 5473303896.000 6.420 .000

Intercept 8594854902000.000 1 8594854902000.000 10080.790 .000

Block 29288051.150 1 29288051.150 .034 .853

Morphotype .000 0 . . .

TKI 4270451051000.000 818 5220600307.000 6.123 .000

Page | 52

Block * Morphotype .000 0 . . .

Block * TKI 694787831900.000 744 933854612.700 1.095 .054

Morphotype * TKI .000 0 . . .

Block * Morphotype * TKI .000 0 . . .

Error 2719785598000.000 3190 852597366.300

Total 54586730350000.00 4777 0

Corrected Total 11400445580000.00 4776 0

a. R Squared = .761 (Adjusted R Squared = .643)

Tests of Between-Subjects Effects

Dependent Variable: WidthLeaf

Type III Sum of

Source Squares df Mean Square F Sig.

Corrected Model 47028992.900a 1586 29652.581 7.863 .000

Intercept 110756822.800 1 110756822.800 29367.969 .000

Block 268.977 1 268.977 .071 .789

Morphotype .000 0 . . .

TKI 20618121.700 818 25205.528 6.683 .000

Block * Morphotype .000 0 . . .

Block * TKI 3883247.449 744 5219.419 1.384 .000

Morphotype * TKI .000 0 . . .

Block * Morphotype * TKI .000 0 . . .

Error 12030599.200 3190 3771.348

Total 564530554.100 4777

Corrected Total 59059592.110 4776

a. R Squared = .796 (Adjusted R Squared = .695)

Tests of Between-Subjects Effects

Dependent Variable: LengthLeaf

Type III Sum of

Source Squares df Mean Square F Sig.

Corrected Model 62407684.170a 1586 39349.107 8.273 .000

Intercept 238780949.900 1 238780949.900 50202.996 .000

Block 624.488 1 624.488 .131 .717

Morphotype .000 0 . . .

TKI 25424092.630 818 31080.798 6.535 .000

Block * Morphotype .000 0 . . .

Block * TKI 4393182.314 744 5904.815 1.241 .000

Morphotype * TKI .000 0 . . .

Page | 53

Block * Morphotype * TKI .000 0 . . .

Error 15172624.870 3190 4756.309

Total 1071375970.000 4777

Corrected Total 77580309.040 4776

a. R Squared = .804 (Adjusted R Squared = .707)

Tests of Between-Subjects Effects

Dependent Variable: AreaPetiole

Type III Sum of

Source Squares df Mean Square F Sig.

Corrected Model 30287177930000.00 1586 19096581300.000 1.487 .000

0a

Intercept 2042644806000.000 1 2042644806000.000 159.107 .000

Block 58033004.270 1 58033004.270 .005 .946

Morphotype .000 0 . . .

TKI 13881376810000.00 818 16969898300.000 1.322 .000

0

Block * Morphotype .000 0 . . .

Block * TKI 9412692757000.000 744 12651468760.000 .985 .596

Morphotype * TKI .000 0 . . .

Block * Morphotype * TKI .000 0 . . .

Error 40812618730000.00 3179 12838194000.000 0

Total 79033381500000.00 4766 0

Corrected Total 71099796660000.00 4765 0

a. R Squared = .426 (Adjusted R Squared = .140)

Tests of Between-Subjects Effects

Dependent Variable: MeanWidthLeaf

Type III Sum of

Source Squares df Mean Square F Sig.

Corrected Model 2053721.883a 1586 1294.907 1.712 .000

Intercept 1299687.042 1 1299687.042 1717.977 .000

Block 352.282 1 352.282 .466 .495

Morphotype .000 0 . . .

TKI 1029659.405 818 1258.752 1.664 .000

Block * Morphotype .000 0 . . .

Block * TKI 606964.137 744 815.812 1.078 .091

Morphotype * TKI .000 0 . . .

Page | 54

Block * Morphotype * TKI .000 0 . . .

Error 2413304.473 3190 756.522

Total 11261878.110 4777

Corrected Total 4467026.356 4776

a. R Squared = .460 (Adjusted R Squared = .191)

Tests of Between-Subjects Effects

Dependent Variable: LengthPetiole

Type III Sum of

Source Squares df Mean Square F Sig.

Corrected Model 11312923.950a 1586 7132.991 3.650 .000

Intercept 5720022.859 1 5720022.859 2927.031 .000

Block 6598.969 1 6598.969 3.377 .066

Morphotype .000 0 . . .

TKI 4492745.077 818 5492.353 2.811 .000

Block * Morphotype .000 0 . . .

Block * TKI 2089716.335 744 2808.759 1.437 .000

Morphotype * TKI .000 0 . . .

Block * Morphotype * TKI .000 0 . . .

Error 6212421.770 3179 1954.206

Total 28511754.850 4766

Corrected Total 17525345.720 4765

a. R Squared = .646 (Adjusted R Squared = .469)

Tests of Between-Subjects Effects

Dependent Variable: TotalArea

Type III Sum of

Source Squares df Mean Square F Sig.

Corrected Model 46019011640000,00 1588 28979226470,000 2,161 ,000

0a

Intercept 14144970780000,00 1 14144970780000,00 1054,863 ,000

0 0

Morphotypes 17722266320,000 1 17722266320,000 1,322 ,250

Block 192294844,700 1 192294844,700 ,014 ,905

TKI 21850381820000,00 819 26679342880,000 1,990 ,000

0

Morphotypes * Block 1032296367,000 1 1032296367,000 ,077 ,781

Morphotypes * TKI ,000 0 . . .

Block * TKI 10866390460000,00 744 14605363520,000 1,089 ,066

0

Morphotypes * Block * TKI ,000 0 . . .

Error 42748839910000,00 3188 13409297340,000

0

Page | 55

Total 176846910400000,0 4777

00

Corrected Total 88767851540000,00 4776

0

a. R Squared = ,518 (Adjusted R Squared = ,279)

Tests of Between-Subjects Effects

Dependent Variable: TotalLength

Type III Sum of

Source Squares df Mean Square F Sig.

Corrected Model 74999439,480a 1588 47228,866 9,702 ,000

Intercept 244055413,400 1 244055413,400 50136,378 ,000

Morphotypes 35214,344 1 35214,344 7,234 ,007

Block 31215,158 1 31215,158 6,413 ,011

TKI 33184441,410 819 40518,243 8,324 ,000

Morphotypes * Block 20313,220 1 20313,220 4,173 ,041

Morphotypes * TKI ,000 0 . . .

Block * TKI 5037986,774 744 6771,488 1,391 ,000

Morphotypes * Block * TKI ,000 0 . . .

Error 15518645,270 3188 4867,831

Total 1304014927,000 4777

Corrected Total 90518084,750 4776

a. R Squared = ,829 (Adjusted R Squared = ,743)

Table 13: Pearson correlation matrix

Correlations

AreaLeaf LengthLeaf WidthLeaf AreaPetiole MeanWidthLeaf LengthPetiole

AreaLeaf Pearson Correlation 1 ,719** ,913** ,113** ,403** -,130**

Sig. (2-tailed) ,000 ,000 ,000 ,000 ,000

N 4777 4777 4777 4766 4777 4766

LengthLeaf Pearson Correlation ,719** 1 ,550** ,088** ,239** -,060**

Sig. (2-tailed) ,000 ,000 ,000 ,000 ,000

N 4777 4777 4777 4766 4777 4766

WidthLeaf Pearson Correlation ,913** ,550** 1 ,097** ,375** -,179**

Sig. (2-tailed) ,000 ,000 ,000 ,000 ,000

N 4777 4777 4777 4766 4777 4766

AreaPetiole Pearson Correlation ,113** ,088** ,097** 1 ,515** ,449**

Sig. (2-tailed) ,000 ,000 ,000 ,000 ,000

N 4766 4766 4766 4766 4766 4766

MeanWidthLeaf Pearson Correlation ,403** ,239** ,375** ,515** 1 ,102**

Page | 56

Sig. (2-tailed) ,000 ,000 ,000 ,000 ,000

N 4777 4777 4777 4766 4777 4766

LengthPetiole Pearson Correlation -,130** -,060** -,179** ,449** ,102** 1

Sig. (2-tailed) ,000 ,000 ,000 ,000 ,000

N 4766 4766 4766 4766 4766 4766

**. Correlation is significant at the 0.01 level (2-tailed).

Page | 57

Figure 27: Full dataset leaf traits box plots

Page | 58

Figure 28: Cabbage traits boxplots

Page | 59

Figure 29: QQ-plots of residuals full dataset traits

Page | 60

Figure 30: QQ-plots of residuals of cabbage traits

Page | 61

Figure 31: Manhattan plots for the full dataset analysis of the largest mature leaves. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 62

Figure 32: Manhattan plots for the full dataset analysis of the largest mature leaves. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 63

Figure 33: Manhattan plots for the full dataset analysis of the largest mature leaves. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 64

Figure 34: Manhattan plots for the full dataset analysis of the largest mature leaves. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 65

Figure 35: Manhattan plots for the full dataset analysis of the largest mature leaves. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 66

Figure 36: Manhattan plots for the full dataset analysis of the largest mature leaves. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 67

Figure 37: Manhattan plots for the full dataset analysis of the largest mature leaves. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 68

Figure 38: Manhattan plots for the full dataset analysis of the largest mature leaves. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 69

Figure 39: Manhattan plots for the cabbage subset data from 2016 (3D images). Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 70

Figure 40: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 71

Figure 41: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 72

Figure 42: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 73

Figure 43: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 74

Figure 44: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 75

Figure 45: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 76

Figure 46: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 77

Figure 47: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 78

Figure 48: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 79

Figure 49: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 80

Figure 50: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 81

Figure 51: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 82

Figure 52: Manhattan plots for the cabbage subset data from the 2017 field trial. Top PCO correction, middle STRUCTURE and bottom without correction.

Page | 83

Page | 84

Figure 53: Effects graphs of final selection of markers of interest.

Page | 85

Table 14: Other known GOI for which nearby markers were found to be significant.

Trait Marker Significant GOI Region of PCO PCO NoStru NoStruc STRUCT STRUCT in other interest P- LOD cture P- ture URE P- URE Traits value value LOD value LOD AreaStem SC01.NODUPS.MAX80MISSING.MAF2.5.RE TotalArea IAA C01: 2.689 71.57 1.0946 64.9607 2.42573 69.6151 CODE.VCF:C01_33589597 MeanWidthL 19 33539597..3 6E- 0309 7E-65 1723 E-70 5682 eaf 3639597 72 LengthLeaf SC02.NODUPS.MAX80MISSING.MAF2.5.RE TotalLength TCP C02: 2.451 10.61 1.2221 43.9128 0.03331 1.47733 CODE.VCF:C02_12664963 12 12614963..1 2E- 0618 7E-44 6872 6912 5252 2714963 11 MeanWidthLea SC02.NODUPS.MAX80MISSING.MAF2.5.RE KNA C02: 9.329 6.030 8.5473 6.06817 0.00554 2.25603 fCabbage CODE.VCF:C02_2385281 T4 2335281..24 5E- 14 E-07 1053 58 5797 35281 07 LengthStemLar SC02.NODUPS.MAX80MISSING.MAF2.5.RE KNA C02: 3.762 5.424 4.005E- 6.39739 0.00017 3.76548 gest CODE.VCF:C02_2914415 T4 2864415..29 5E- 5267 07 748 16 2716 64415 06 LengthLeaf SC02.NODUPS.MAX80MISSING.MAF2.5.RE TotalLength SPR C02: 4.328 11.36 8.2275 41.0847 0.13986 0.85429 CODE.VCF:C02_36556334 LengthStem 1 36506334..3 2E- 3697 5E-42 2922 5231 0232 SPR 6606334 12 2 LengthLeaf SC03.NODUPS.MAX80MISSING.MAF2.5.RE TotalLength LEP C03: 3.245 8.488 7.0686 14.1506 0.00010 3.97248 CODE.VCF:C03_2333663 AreaLeaf 2283663..23 1E- 7668 9E-15 6083 654 7507 83663 09 LengthLeaf SC03.NODUPS.MAX80MISSING.MAF2.5.RE TotalLength KAN C03: 7.158 9.145 2.2222 36.6532 0.00028 3.55050 CODE.VCF:C03_4170079 1 4120079..42 9E- 1534 6E-37 0456 1514 0222 20079 10 AreaPetioleLar SC07.NODUPS.MAX80MISSING.MAF2.5.RE WO C07: 9.262 4.033 0.0001 3.79182 0.99132 0.00378 gest CODE.VCF:C07_5318026 X3 5268026..53 2E- 2835 615 7473 6132 68026 05 LengthLeaf SC08.NODUPS.MAX80MISSING.MAF2.5.RE TotalLength GTE C08: 5.170 9.286 2.6533 7.57620 0.04649 1.33261 CODE.VCF:C08_1983277 AreaLeaf 4 1933277..20 1E- 4975 7E-08 2955 2846 3864 WidthLeaf CLO 33277 10 LengthLeaf SC09.NODUPS.MAX80MISSING.MAF2.5.RE TotalLength PI C09: 1.981 9.703 3.3735 48.4719 3.59826 4.44390 CODE.VCF:C09_30276899 30226899..3 5E- 0159 2E-49 1637 E-05 7317 0326899 10 LengthStemCab SC09.NODUPS.MAX80MISSING.MAF2.5.RE AreaPetioleC KAN C09: 8.773 12.05 6.7759 14.1690 1.156E- 9.93704 bage CODE.VCF:C09_32808722 abbage 1 32758722..3 5E- 6828 E-15 3301 10 2166 2858722 13 Table 15: Table of significant markers near previously known genes of interest, reverse approach results.

Trait Close to gene: Marker AreaStem IAA19 SC01.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C01_33589597 LengthLeaf KAN2 SC01.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C01_19361976 LengthStemCabbage KAN1 SC09.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C09_32808722 LengthLeaf KAN2 SC01.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C01_19361949 LengthLeaf KAN1 SC09.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C09_32808800 LengthLeaf SPR1 SPR2 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_36556334 LengthLeaf TCP12 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_12664963 LengthLeaf PI SC09.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C09_30276899 LengthLeaf MYB81 MYB104 SC04.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C04_11142145 LengthLeaf GTE4 CLO SC08.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C08_1983277 LengthLeaf LNG1 SC04.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C04_9822547 LengthLeaf KAN1 SC03.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C03_4170079 LengthLeaf LEP SC03.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C03_2333663 LengthLeaf LEP SC03.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C03_2333664 LengthLeaf INCURVATA2 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_35792375 LengthLeaf SPR1 SPR2 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_36556367 LengthLeaf SPR1 SPR2 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_36556301 LengthLeaf MYB81 MYB104 SC04.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C04_11143832 AreaLeaf GTE4 CLO SC08.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C08_1983277 AreaLeaf KAN2 SC01.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C01_19361976 AreaLeaf KAN1 SC09.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C09_32808800 MeanWidthLeafCabbage KNAT4 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_2385281 AreaLeaf KAN2 SC01.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C01_19361949 WidthLeaf ROT3 SC01.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C01_1271311 WidthLeaf MYB81 MYB104 SC04.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C04_11142145 AreaLeaf MYB81 MYB104 SC04.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C04_11142145 MeanWidthLeaf KAN2 SC01.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C01_19361949 AreaStem SPR1 SPR2 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_36582363 LengthStemLargest KNAT4 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_2914415 MeanWidthLeaf IAA19 SC01.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C01_33589597

Page | 86

LengthStemLargest UBP15 SC05.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C05_6982361 LengthStem SPR1 SPR2 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_36556334 MeanWidthLeaf KAN2 SC01.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C01_19361976 AreaLeaf MYB81 MYB104 SC04.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C04_11150606 WidthLeaf KAN2 SC01.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C01_19361976 LengthStem UBP15 SC05.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C05_6982361 MeanWidthLeaf INCURVATA2 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_35752628 AreaLeaf LEP SC03.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C03_2333663 AreaLeaf LEP SC03.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C03_2333664 AreaPetioleLargest CLAVATA3 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_12799987 AreaPetioleLargest CLAVATA3 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_12799930 AreaPetioleLargest CLAVATA3 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_12799964 WidthLeaf MYB81 MYB104 SC04.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C04_11150606 LengthStemCabbage CLAVATA3 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_12800006 LengthStemCabbage CLAVATA3 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_12800010 WidthLeaf KAN2 SC01.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C01_19361949 LengthStemCabbage APRR3 TOE2 APUM19 SC09.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C09_27947404 AreaPetioleLargest UBP15 SC05.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C05_6933557 WidthLeaf MYB81 MYB104 SC04.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C04_11143832 WidthLeaf GTE4 CLO SC08.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C08_1983277 AreaPetioleCabbage KAN1 SC09.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C09_32808722 AreaPetioleLargest UBP15 SC05.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C05_6934039 AreaPetioleLargest UBP15 SC05.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C05_6934111 AreaPetioleLargest UBP15 SC05.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C05_6934119 AreaPetioleLargest UBP15 SC05.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C05_6934126 AreaPetioleLargest WOX3 SC07.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C07_5318106 AreaPetioleLargest WOX3 SC07.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C07_5318026 LengthStemLargest SPR1 SPR2 SC09.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C09_10441845 MeanWidthLargestLeaf CLAVATA3 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_12799930 MeanWidthLargestLeaf CLAVATA3 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_12799964 MeanWidthLargestLeaf CLAVATA3 SC02.NODUPS.MAX80MISSING.MAF2.5.RECODE.VCF:C02_12799987

Table 16: List of genes from selected markers of interest.

Trait+Marker Gen Trait+Marker Gen Trait+Marker Gen Trait+Marker Gen e e e e LengthStemLargest Bol0 CoreLength Bol0 LengthStem Bol0 MeanWidthLeaf Bol0 SC09.NODUPS.MAX80MISSING. 3255 SC06.NODUPS.MAX80MISSING. 2618 SC09.NODUPS.MAX80MISSING. 3030 SC05.NODUPS.MAX80MISSING. 4080 MAF2.5.RECODE.VCF:C09_1044 2 MAF2.5.RECODE.VCF:C06_3219 6 MAF2.5.RECODE.VCF:C09_2037 8 MAF2.5.RECODE.VCF:C05_7207 1 1845 896 4430 11 AreaLeaf Bol0 CoreLength Bol0 LengthStemCabbage Bol0 MeanWidthLeaf Bol0 SC04.NODUPS.MAX80MISSING. 2778 SC06.NODUPS.MAX80MISSING. 2617 SC09.NODUPS.MAX80MISSING. 1725 SC05.NODUPS.MAX80MISSING. 4078 MAF2.5.RECODE.VCF:C04_1115 2 MAF2.5.RECODE.VCF:C06_3219 6 MAF2.5.RECODE.VCF:C09_2794 1 MAF2.5.RECODE.VCF:C05_7207 4 0606 896 7404 11 AreaLeaf Bol0 CoreLength Bol0 LengthStemCabbage Bol0 MeanWidthLeaf Bol0 SC04.NODUPS.MAX80MISSING. 2777 SC06.NODUPS.MAX80MISSING. 2617 SC09.NODUPS.MAX80MISSING. 1725 SC05.NODUPS.MAX80MISSING. 4078 MAF2.5.RECODE.VCF:C04_1115 7 MAF2.5.RECODE.VCF:C06_3219 9 MAF2.5.RECODE.VCF:C09_2794 3 MAF2.5.RECODE.VCF:C05_7207 7 0606 896 7404 11 AreaLeaf Bol0 LengthLeaf Bol0 LengthStemCabbage Bol0 MeanWidthLeaf Bol0 SC04.NODUPS.MAX80MISSING. 2778 SC01.NODUPS.MAX80MISSING. 1405 SC09.NODUPS.MAX80MISSING. 1725 SC05.NODUPS.MAX80MISSING. 4079 MAF2.5.RECODE.VCF:C04_1115 1 MAF2.5.RECODE.VCF:C01_1936 7 MAF2.5.RECODE.VCF:C09_2794 6 MAF2.5.RECODE.VCF:C05_7207 1 0606 1976 7404 11 AreaLeaf Bol0 LengthLeaf Bol0 LengthStemCabbage Bol0 MeanWidthLeaf Bol0 SC04.NODUPS.MAX80MISSING. 2778 SC01.NODUPS.MAX80MISSING. 1405 SC09.NODUPS.MAX80MISSING. 1725 SC05.NODUPS.MAX80MISSING. 4079 MAF2.5.RECODE.VCF:C04_1115 0 MAF2.5.RECODE.VCF:C01_1936 9 MAF2.5.RECODE.VCF:C09_2794 2 MAF2.5.RECODE.VCF:C05_7207 7 0606 1976 7404 11 AreaLeaf Bol0 LengthLeaf Bol0 LengthStemCabbage Bol0 MeanWidthLeaf Bol0 SC04.NODUPS.MAX80MISSING. 2777 SC01.NODUPS.MAX80MISSING. 1405 SC09.NODUPS.MAX80MISSING. 1724 SC05.NODUPS.MAX80MISSING. 4080 MAF2.5.RECODE.VCF:C04_1115 9 MAF2.5.RECODE.VCF:C01_1936 8 MAF2.5.RECODE.VCF:C09_2794 5 MAF2.5.RECODE.VCF:C05_7207 3 0606 1976 7404 11 AreaLeaf Bol0 LengthLeaf Bol0 LengthStemCabbage Bol0 MeanWidthLeaf Bol0 SC04.NODUPS.MAX80MISSING. 2777 SC02.NODUPS.MAX80MISSING. 1519 SC09.NODUPS.MAX80MISSING. 1724 SC05.NODUPS.MAX80MISSING. 4080 MAF2.5.RECODE.VCF:C04_1115 8 MAF2.5.RECODE.VCF:C02_3579 5 MAF2.5.RECODE.VCF:C09_2794 8 MAF2.5.RECODE.VCF:C05_7207 5 0606 2375 7404 11 AreaPetioleLargest Bol0 LengthLeaf Bol0 LengthStemCabbage Bol0 MeanWidthLeaf Bol0 SC02.NODUPS.MAX80MISSING. 2863 SC02.NODUPS.MAX80MISSING. 1519 SC09.NODUPS.MAX80MISSING. 1724 SC05.NODUPS.MAX80MISSING. 4079 MAF2.5.RECODE.VCF:C02_1279 7 MAF2.5.RECODE.VCF:C02_3579 8 MAF2.5.RECODE.VCF:C09_2794 9 MAF2.5.RECODE.VCF:C05_7207 4 9964 2375 7404 11 AreaPetioleLargest Bol0 LengthLeaf Bol0 LengthStemCabbage Bol0 TotalArea Bol0 SC02.NODUPS.MAX80MISSING. 2863 SC02.NODUPS.MAX80MISSING. 1519 SC09.NODUPS.MAX80MISSING. 1725 SC06.NODUPS.MAX80MISSING. 3989 MAF2.5.RECODE.VCF:C02_1279 9 MAF2.5.RECODE.VCF:C02_3579 4 MAF2.5.RECODE.VCF:C09_2794 0 MAF2.5.RECODE.VCF:C06_3222 3 9964 2375 7404 7507

Page | 87

AreaPetioleLargest Bol0 LengthLeaf Bol0 LengthStemCabbage Bol0 TotalArea Bol0 SC02.NODUPS.MAX80MISSING. 2863 SC02.NODUPS.MAX80MISSING. 1519 SC09.NODUPS.MAX80MISSING. 1725 SC06.NODUPS.MAX80MISSING. 3988 MAF2.5.RECODE.VCF:C02_1279 4 MAF2.5.RECODE.VCF:C02_3579 6 MAF2.5.RECODE.VCF:C09_2794 4 MAF2.5.RECODE.VCF:C06_3222 9 9964 2375 7404 7507 AreaPetioleLargest Bol0 LengthLeaf Bol0 LengthStemCabbage Bol0 TotalArea Bol0 SC02.NODUPS.MAX80MISSING. 2863 SC02.NODUPS.MAX80MISSING. 1519 SC09.NODUPS.MAX80MISSING. 1724 SC06.NODUPS.MAX80MISSING. 3989 MAF2.5.RECODE.VCF:C02_1279 6 MAF2.5.RECODE.VCF:C02_3579 7 MAF2.5.RECODE.VCF:C09_2794 7 MAF2.5.RECODE.VCF:C06_3222 2 9964 2375 7404 7507 AreaPetioleLargest Bol0 LengthLeaf Bol0 LengthStemCabbage Bol0 TotalArea Bol0 SC02.NODUPS.MAX80MISSING. 2863 SC04.NODUPS.MAX80MISSING. 1107 SC09.NODUPS.MAX80MISSING. 1725 SC06.NODUPS.MAX80MISSING. 3989 MAF2.5.RECODE.VCF:C02_1279 5 MAF2.5.RECODE.VCF:C04_9822 4 MAF2.5.RECODE.VCF:C09_2794 5 MAF2.5.RECODE.VCF:C06_3222 0 9964 547 7404 7507 AreaPetioleLargest Bol0 LengthLeaf Bol0 LengthStemCabbage Bol0 TotalArea Bol0 SC02.NODUPS.MAX80MISSING. 2864 SC04.NODUPS.MAX80MISSING. 1107 SC09.NODUPS.MAX80MISSING. 1724 SC06.NODUPS.MAX80MISSING. 3989 MAF2.5.RECODE.VCF:C02_1279 0 MAF2.5.RECODE.VCF:C04_9822 5 MAF2.5.RECODE.VCF:C09_2794 6 MAF2.5.RECODE.VCF:C06_3222 1 9964 547 7404 7507 AreaPetioleLargest Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 TotalLength Bol0 SC02.NODUPS.MAX80MISSING. 2863 SC04.NODUPS.MAX80MISSING. 1106 SC05.NODUPS.MAX80MISSING. 3828 SC03.NODUPS.MAX80MISSING. 2573 MAF2.5.RECODE.VCF:C02_1279 8 MAF2.5.RECODE.VCF:C04_9822 9 MAF2.5.RECODE.VCF:C05_6982 5 MAF2.5.RECODE.VCF:C03_1694 7 9964 547 361 6057 AreaPetioleLargest Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 TotalLength Bol0 SC07.NODUPS.MAX80MISSING. 2812 SC04.NODUPS.MAX80MISSING. 1107 SC05.NODUPS.MAX80MISSING. 3828 SC03.NODUPS.MAX80MISSING. 2574 MAF2.5.RECODE.VCF:C07_5318 6 MAF2.5.RECODE.VCF:C04_9822 9 MAF2.5.RECODE.VCF:C05_6982 8 MAF2.5.RECODE.VCF:C03_1694 5 026 547 361 6057 AreaStem Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 TotalLength Bol0 SC03.NODUPS.MAX80MISSING. 4131 SC04.NODUPS.MAX80MISSING. 1107 SC05.NODUPS.MAX80MISSING. 3829 SC03.NODUPS.MAX80MISSING. 2574 MAF2.5.RECODE.VCF:C03_3430 5 MAF2.5.RECODE.VCF:C04_9822 0 MAF2.5.RECODE.VCF:C05_6982 8 MAF2.5.RECODE.VCF:C03_1694 1 2942 547 361 6057 AreaStem Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 TotalLength Bol0 SC03.NODUPS.MAX80MISSING. 4131 SC04.NODUPS.MAX80MISSING. 1107 SC05.NODUPS.MAX80MISSING. 3828 SC03.NODUPS.MAX80MISSING. 2573 MAF2.5.RECODE.VCF:C03_3430 6 MAF2.5.RECODE.VCF:C04_9822 8 MAF2.5.RECODE.VCF:C05_6982 4 MAF2.5.RECODE.VCF:C03_1694 6 2942 547 361 6057 AreaStem Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 TotalLength Bol0 SC06.NODUPS.MAX80MISSING. 3915 SC04.NODUPS.MAX80MISSING. 1107 SC05.NODUPS.MAX80MISSING. 3829 SC03.NODUPS.MAX80MISSING. 2573 MAF2.5.RECODE.VCF:C06_2178 7 MAF2.5.RECODE.VCF:C04_9822 7 MAF2.5.RECODE.VCF:C05_6982 0 MAF2.5.RECODE.VCF:C03_1694 1 3228 547 361 6057 AreaStem Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 TotalLength Bol0 SC06.NODUPS.MAX80MISSING. 3915 SC04.NODUPS.MAX80MISSING. 1107 SC05.NODUPS.MAX80MISSING. 3830 SC03.NODUPS.MAX80MISSING. 2573 MAF2.5.RECODE.VCF:C06_2178 8 MAF2.5.RECODE.VCF:C04_9822 3 MAF2.5.RECODE.VCF:C05_6982 2 MAF2.5.RECODE.VCF:C03_1694 8 3228 547 361 6057 AreaStem Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 TotalLength Bol0 SC06.NODUPS.MAX80MISSING. 3915 SC04.NODUPS.MAX80MISSING. 1107 SC05.NODUPS.MAX80MISSING. 3828 SC03.NODUPS.MAX80MISSING. 2573 MAF2.5.RECODE.VCF:C06_2178 4 MAF2.5.RECODE.VCF:C04_9822 6 MAF2.5.RECODE.VCF:C05_6982 7 MAF2.5.RECODE.VCF:C03_1694 4 3228 547 361 6057 AreaStem Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 TotalLength Bol0 SC06.NODUPS.MAX80MISSING. 3915 SC04.NODUPS.MAX80MISSING. 1107 SC05.NODUPS.MAX80MISSING. 3829 SC03.NODUPS.MAX80MISSING. 2573 MAF2.5.RECODE.VCF:C06_2178 0 MAF2.5.RECODE.VCF:C04_9822 2 MAF2.5.RECODE.VCF:C05_6982 5 MAF2.5.RECODE.VCF:C03_1694 2 3228 547 361 6057 AreaStem Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 TotalLength Bol0 SC06.NODUPS.MAX80MISSING. 3915 SC04.NODUPS.MAX80MISSING. 1107 SC05.NODUPS.MAX80MISSING. 3829 SC03.NODUPS.MAX80MISSING. 2573 MAF2.5.RECODE.VCF:C06_2178 2 MAF2.5.RECODE.VCF:C04_9822 1 MAF2.5.RECODE.VCF:C05_6982 9 MAF2.5.RECODE.VCF:C03_1694 5 3228 547 361 6057 AreaStem Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 TotalLength Bol0 SC06.NODUPS.MAX80MISSING. 3915 SC08.NODUPS.MAX80MISSING. 2703 SC05.NODUPS.MAX80MISSING. 3829 SC03.NODUPS.MAX80MISSING. 2574 MAF2.5.RECODE.VCF:C06_2178 5 MAF2.5.RECODE.VCF:C08_1075 5 MAF2.5.RECODE.VCF:C05_6982 7 MAF2.5.RECODE.VCF:C03_1694 6 3228 4335 361 6057 AreaStem Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 TotalLength Bol0 SC06.NODUPS.MAX80MISSING. 3915 SC08.NODUPS.MAX80MISSING. 2703 SC05.NODUPS.MAX80MISSING. 3830 SC03.NODUPS.MAX80MISSING. 2574 MAF2.5.RECODE.VCF:C06_2178 3 MAF2.5.RECODE.VCF:C08_1075 6 MAF2.5.RECODE.VCF:C05_6982 0 MAF2.5.RECODE.VCF:C03_1694 3 3228 4335 361 6057 AreaStem Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 TotalLength Bol0 SC06.NODUPS.MAX80MISSING. 3915 SC08.NODUPS.MAX80MISSING. 2703 SC05.NODUPS.MAX80MISSING. 3829 SC03.NODUPS.MAX80MISSING. 2574 MAF2.5.RECODE.VCF:C06_2178 1 MAF2.5.RECODE.VCF:C08_1075 7 MAF2.5.RECODE.VCF:C05_6982 1 MAF2.5.RECODE.VCF:C03_1694 0 3228 4335 361 6057 AreaStem Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 TotalLength Bol0 SC06.NODUPS.MAX80MISSING. 3915 SC08.NODUPS.MAX80MISSING. 2703 SC05.NODUPS.MAX80MISSING. 3829 SC03.NODUPS.MAX80MISSING. 2574 MAF2.5.RECODE.VCF:C06_2178 6 MAF2.5.RECODE.VCF:C08_1075 4 MAF2.5.RECODE.VCF:C05_6982 2 MAF2.5.RECODE.VCF:C03_1694 2 3228 4335 361 6057 CoreLength Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 TotalLength Bol0 SC01.NODUPS.MAX80MISSING. 1576 SC08.NODUPS.MAX80MISSING. 2703 SC05.NODUPS.MAX80MISSING. 3829 SC03.NODUPS.MAX80MISSING. 2573 MAF2.5.RECODE.VCF:C01_3574 5 MAF2.5.RECODE.VCF:C08_1075 8 MAF2.5.RECODE.VCF:C05_6982 3 MAF2.5.RECODE.VCF:C03_1694 9 8674 4335 361 6057 CoreLength Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 TotalLength Bol0 SC01.NODUPS.MAX80MISSING. 1576 SC08.NODUPS.MAX80MISSING. 2703 SC05.NODUPS.MAX80MISSING. 3830 SC03.NODUPS.MAX80MISSING. 2573 MAF2.5.RECODE.VCF:C01_3574 7 MAF2.5.RECODE.VCF:C08_1075 3 MAF2.5.RECODE.VCF:C05_6982 3 MAF2.5.RECODE.VCF:C03_1694 3 8674 4335 361 6057 CoreLength Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 TotalLength Bol0 SC01.NODUPS.MAX80MISSING. 0735 SC08.NODUPS.MAX80MISSING. 2703 SC05.NODUPS.MAX80MISSING. 3829 SC03.NODUPS.MAX80MISSING. 2574 MAF2.5.RECODE.VCF:C01_3574 1 MAF2.5.RECODE.VCF:C08_1075 2 MAF2.5.RECODE.VCF:C05_6982 4 MAF2.5.RECODE.VCF:C03_1694 4 8674 4335 361 6057 CoreLength Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 WidthLeaf Bol0 SC01.NODUPS.MAX80MISSING. 1576 SC09.NODUPS.MAX80MISSING. 1926 SC05.NODUPS.MAX80MISSING. 3829 SC01.NODUPS.MAX80MISSING. 2899 MAF2.5.RECODE.VCF:C01_3574 6 MAF2.5.RECODE.VCF:C09_5283 4 MAF2.5.RECODE.VCF:C05_6982 6 MAF2.5.RECODE.VCF:C01_1271 1 8674 910 361 311 CoreLength Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 WidthLeaf Bol0 SC01.NODUPS.MAX80MISSING. 0735 SC09.NODUPS.MAX80MISSING. 1926 SC05.NODUPS.MAX80MISSING. 3828 SC01.NODUPS.MAX80MISSING. 2900 MAF2.5.RECODE.VCF:C01_3574 0 MAF2.5.RECODE.VCF:C09_5283 8 MAF2.5.RECODE.VCF:C05_6982 9 MAF2.5.RECODE.VCF:C01_1271 0 8674 910 361 311 CoreLength Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 WidthLeaf Bol0 SC01.NODUPS.MAX80MISSING. 1576 SC09.NODUPS.MAX80MISSING. 1926 SC05.NODUPS.MAX80MISSING. 3828 SC01.NODUPS.MAX80MISSING. 2898 MAF2.5.RECODE.VCF:C01_3574 8 MAF2.5.RECODE.VCF:C09_5283 1 MAF2.5.RECODE.VCF:C05_6982 6 MAF2.5.RECODE.VCF:C01_1271 6 8674 910 361 311

Page | 88

CoreLength Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 WidthLeaf Bol0 SC01.NODUPS.MAX80MISSING. 1577 SC09.NODUPS.MAX80MISSING. 1925 SC05.NODUPS.MAX80MISSING. 3830 SC01.NODUPS.MAX80MISSING. 2899 MAF2.5.RECODE.VCF:C01_3574 0 MAF2.5.RECODE.VCF:C09_5283 9 MAF2.5.RECODE.VCF:C05_6982 1 MAF2.5.RECODE.VCF:C01_1271 4 8674 910 361 311 CoreLength Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 WidthLeaf Bol0 SC01.NODUPS.MAX80MISSING. 0734 SC09.NODUPS.MAX80MISSING. 1926 SC09.NODUPS.MAX80MISSING. 3254 SC01.NODUPS.MAX80MISSING. 2899 MAF2.5.RECODE.VCF:C01_3574 8 MAF2.5.RECODE.VCF:C09_5283 6 MAF2.5.RECODE.VCF:C09_1044 9 MAF2.5.RECODE.VCF:C01_1271 5 8674 910 1845 311 CoreLength Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 WidthLeaf Bol0 SC01.NODUPS.MAX80MISSING. 0735 SC09.NODUPS.MAX80MISSING. 1926 SC09.NODUPS.MAX80MISSING. 3255 SC01.NODUPS.MAX80MISSING. 2899 MAF2.5.RECODE.VCF:C01_3574 2 MAF2.5.RECODE.VCF:C09_5283 7 MAF2.5.RECODE.VCF:C09_1044 4 MAF2.5.RECODE.VCF:C01_1271 9 8674 910 1845 311 CoreLength Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 WidthLeaf Bol0 SC01.NODUPS.MAX80MISSING. 0734 SC09.NODUPS.MAX80MISSING. 1926 SC09.NODUPS.MAX80MISSING. 3254 SC01.NODUPS.MAX80MISSING. 2900 MAF2.5.RECODE.VCF:C01_3574 9 MAF2.5.RECODE.VCF:C09_5283 2 MAF2.5.RECODE.VCF:C09_1044 8 MAF2.5.RECODE.VCF:C01_1271 1 8674 910 1845 311 CoreLength Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 WidthLeaf Bol0 SC01.NODUPS.MAX80MISSING. 1576 SC09.NODUPS.MAX80MISSING. 1926 SC09.NODUPS.MAX80MISSING. 3255 SC01.NODUPS.MAX80MISSING. 2900 MAF2.5.RECODE.VCF:C01_3574 9 MAF2.5.RECODE.VCF:C09_5283 0 MAF2.5.RECODE.VCF:C09_1044 3 MAF2.5.RECODE.VCF:C01_1271 2 8674 910 1845 311 CoreLength Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 WidthLeaf Bol0 SC06.NODUPS.MAX80MISSING. 2617 SC09.NODUPS.MAX80MISSING. 1926 SC09.NODUPS.MAX80MISSING. 3255 SC01.NODUPS.MAX80MISSING. 2899 MAF2.5.RECODE.VCF:C06_3219 1 MAF2.5.RECODE.VCF:C09_5283 3 MAF2.5.RECODE.VCF:C09_1044 0 MAF2.5.RECODE.VCF:C01_1271 6 896 910 1845 311 CoreLength Bol0 LengthLeaf Bol0 LengthStemLargest Bol0 WidthLeaf Bol0 SC06.NODUPS.MAX80MISSING. 2617 SC09.NODUPS.MAX80MISSING. 1926 SC09.NODUPS.MAX80MISSING. 3255 SC01.NODUPS.MAX80MISSING. 2899 MAF2.5.RECODE.VCF:C06_3219 7 MAF2.5.RECODE.VCF:C09_5283 5 MAF2.5.RECODE.VCF:C09_1044 1 MAF2.5.RECODE.VCF:C01_1271 2 896 910 1845 311 CoreLength Bol0 LengthStem Bol0 MeanWidthLeaf Bol0 WidthLeaf Bol0 SC06.NODUPS.MAX80MISSING. 2619 SC03.NODUPS.MAX80MISSING. 3559 SC02.NODUPS.MAX80MISSING. 1519 SC01.NODUPS.MAX80MISSING. 2899 MAF2.5.RECODE.VCF:C06_3219 0 MAF2.5.RECODE.VCF:C03_3841 2 MAF2.5.RECODE.VCF:C02_3575 3 MAF2.5.RECODE.VCF:C01_1271 3 896 5870 2628 311 CoreLength Bol0 LengthStem Bol0 MeanWidthLeaf Bol0 WidthLeaf Bol0 SC06.NODUPS.MAX80MISSING. 2618 SC03.NODUPS.MAX80MISSING. 3559 SC05.NODUPS.MAX80MISSING. 4080 SC01.NODUPS.MAX80MISSING. 2899 MAF2.5.RECODE.VCF:C06_3219 2 MAF2.5.RECODE.VCF:C03_3841 3 MAF2.5.RECODE.VCF:C05_7207 4 MAF2.5.RECODE.VCF:C01_1271 8 896 5870 11 311 CoreLength Bol0 LengthStem Bol0 MeanWidthLeaf Bol0 WidthLeaf Bol0 SC06.NODUPS.MAX80MISSING. 2617 SC07.NODUPS.MAX80MISSING. 1770 SC05.NODUPS.MAX80MISSING. 4080 SC01.NODUPS.MAX80MISSING. 2899 MAF2.5.RECODE.VCF:C06_3219 0 MAF2.5.RECODE.VCF:C07_1291 0 MAF2.5.RECODE.VCF:C05_7207 6 MAF2.5.RECODE.VCF:C01_1271 0 896 8563 11 311 CoreLength Bol0 LengthStem Bol0 MeanWidthLeaf Bol0 WidthLeaf Bol0 SC06.NODUPS.MAX80MISSING. 2617 SC07.NODUPS.MAX80MISSING. 1770 SC05.NODUPS.MAX80MISSING. 4079 SC01.NODUPS.MAX80MISSING. 2898 MAF2.5.RECODE.VCF:C06_3219 2 MAF2.5.RECODE.VCF:C07_1291 2 MAF2.5.RECODE.VCF:C05_7207 5 MAF2.5.RECODE.VCF:C01_1271 7 896 8563 11 311 CoreLength Bol0 LengthStem Bol0 MeanWidthLeaf Bol0 WidthLeaf Bol0 SC06.NODUPS.MAX80MISSING. 2617 SC07.NODUPS.MAX80MISSING. 1770 SC05.NODUPS.MAX80MISSING. 4079 SC01.NODUPS.MAX80MISSING. 2898 MAF2.5.RECODE.VCF:C06_3219 3 MAF2.5.RECODE.VCF:C07_1291 4 MAF2.5.RECODE.VCF:C05_7207 8 MAF2.5.RECODE.VCF:C01_1271 4 896 8563 11 311 CoreLength Bol0 LengthStem Bol0 MeanWidthLeaf Bol0 WidthLeaf Bol0 SC06.NODUPS.MAX80MISSING. 2618 SC07.NODUPS.MAX80MISSING. 1770 SC05.NODUPS.MAX80MISSING. 4078 SC01.NODUPS.MAX80MISSING. 2899 MAF2.5.RECODE.VCF:C06_3219 3 MAF2.5.RECODE.VCF:C07_1291 1 MAF2.5.RECODE.VCF:C05_7207 8 MAF2.5.RECODE.VCF:C01_1271 7 896 8563 11 311 CoreLength Bol0 LengthStem Bol0 MeanWidthLeaf Bol0 WidthLeaf Bol0 SC06.NODUPS.MAX80MISSING. 2618 SC07.NODUPS.MAX80MISSING. 1770 SC05.NODUPS.MAX80MISSING. 4078 SC01.NODUPS.MAX80MISSING. 2898 MAF2.5.RECODE.VCF:C06_3219 8 MAF2.5.RECODE.VCF:C07_1291 3 MAF2.5.RECODE.VCF:C05_7207 9 MAF2.5.RECODE.VCF:C01_1271 9 896 8563 11 311 CoreLength Bol0 LengthStem Bol0 MeanWidthLeaf Bol0 WidthLeaf Bol0 SC06.NODUPS.MAX80MISSING. 2617 SC07.NODUPS.MAX80MISSING. 1770 SC05.NODUPS.MAX80MISSING. 4080 SC01.NODUPS.MAX80MISSING. 2898 MAF2.5.RECODE.VCF:C06_3219 4 MAF2.5.RECODE.VCF:C07_1291 7 MAF2.5.RECODE.VCF:C05_7207 2 MAF2.5.RECODE.VCF:C01_1271 8 896 8563 11 311 CoreLength Bol0 LengthStem Bol0 MeanWidthLeaf Bol0 WidthLeaf Bol0 SC06.NODUPS.MAX80MISSING. 2617 SC07.NODUPS.MAX80MISSING. 1770 SC05.NODUPS.MAX80MISSING. 4079 SC01.NODUPS.MAX80MISSING. 2898 MAF2.5.RECODE.VCF:C06_3219 5 MAF2.5.RECODE.VCF:C07_1291 6 MAF2.5.RECODE.VCF:C05_7207 2 MAF2.5.RECODE.VCF:C01_1271 5 896 8563 11 311 CoreLength Bol0 LengthStem Bol0 MeanWidthLeaf Bol0 WidthLeaf Bol0 SC06.NODUPS.MAX80MISSING. 2618 SC07.NODUPS.MAX80MISSING. 1770 SC05.NODUPS.MAX80MISSING. 4080 SC02.NODUPS.MAX80MISSING. 3595 MAF2.5.RECODE.VCF:C06_3219 4 MAF2.5.RECODE.VCF:C07_1291 5 MAF2.5.RECODE.VCF:C05_7207 0 MAF2.5.RECODE.VCF:C02_2634 9 896 8563 11 0327 CoreLength Bol0 LengthStem Bol0 MeanWidthLeaf Bol0 WidthLeaf Bol0 SC06.NODUPS.MAX80MISSING. 2618 SC09.NODUPS.MAX80MISSING. 3030 SC05.NODUPS.MAX80MISSING. 4079 SC05.NODUPS.MAX80MISSING. 3197 MAF2.5.RECODE.VCF:C06_3219 5 MAF2.5.RECODE.VCF:C09_2037 4 MAF2.5.RECODE.VCF:C05_7207 9 MAF2.5.RECODE.VCF:C05_8243 5 896 4430 11 361 CoreLength Bol0 LengthStem Bol0 MeanWidthLeaf Bol0 WidthLeaf Bol0 SC06.NODUPS.MAX80MISSING. 2618 SC09.NODUPS.MAX80MISSING. 3030 SC05.NODUPS.MAX80MISSING. 4079 SC05.NODUPS.MAX80MISSING. 3197 MAF2.5.RECODE.VCF:C06_3219 7 MAF2.5.RECODE.VCF:C09_2037 5 MAF2.5.RECODE.VCF:C05_7207 6 MAF2.5.RECODE.VCF:C05_8243 7 896 4430 11 361 CoreLength Bol0 LengthStem Bol0 MeanWidthLeaf Bol0 WidthLeaf Bol0 SC06.NODUPS.MAX80MISSING. 2618 SC09.NODUPS.MAX80MISSING. 3030 SC05.NODUPS.MAX80MISSING. 4079 SC05.NODUPS.MAX80MISSING. 3197 MAF2.5.RECODE.VCF:C06_3219 0 MAF2.5.RECODE.VCF:C09_2037 7 MAF2.5.RECODE.VCF:C05_7207 3 MAF2.5.RECODE.VCF:C05_8243 8 896 4430 11 361 CoreLength Bol0 LengthStem Bol0 MeanWidthLeaf Bol0 WidthLeaf Bol0 SC06.NODUPS.MAX80MISSING. 2618 SC09.NODUPS.MAX80MISSING. 3030 SC05.NODUPS.MAX80MISSING. 4078 SC05.NODUPS.MAX80MISSING. 3197 MAF2.5.RECODE.VCF:C06_3219 1 MAF2.5.RECODE.VCF:C09_2037 6 MAF2.5.RECODE.VCF:C05_7207 6 MAF2.5.RECODE.VCF:C05_8243 6 896 4430 11 361 CoreLength Bol0 LengthStem Bol0 MeanWidthLeaf Bol0 WidthLeaf Bol0 SC06.NODUPS.MAX80MISSING. 2618 SC09.NODUPS.MAX80MISSING. 3030 SC05.NODUPS.MAX80MISSING. 4078 SC05.NODUPS.MAX80MISSING. 3197 MAF2.5.RECODE.VCF:C06_3219 9 MAF2.5.RECODE.VCF:C09_2037 2 MAF2.5.RECODE.VCF:C05_7207 5 MAF2.5.RECODE.VCF:C05_8243 2 896 4430 11 361 CoreLength Bol0 LengthStem Bol0 MeanWidthLeaf Bol0 WidthLeaf Bol0 SC06.NODUPS.MAX80MISSING. 2617 SC09.NODUPS.MAX80MISSING. 3030 SC05.NODUPS.MAX80MISSING. 4079 SC05.NODUPS.MAX80MISSING. 3197 MAF2.5.RECODE.VCF:C06_3219 8 MAF2.5.RECODE.VCF:C09_2037 3 MAF2.5.RECODE.VCF:C05_7207 0 MAF2.5.RECODE.VCF:C05_8243 3 896 4430 11 361

Page | 89

Page | 90