AN ABSTRACT OF THE THESIS OF

Ryan J. Hill for the degree of Master of Science in Horticulture presented on March 2, 2020.

Title: Fine Mapping and Gene Expression Analysis of Self-Incompatibility in Hazelnut

Abstract approved: ______Shawn A. Mehlenbacher

The European hazelnut (Corylus avellana L.) is a diploid (2n = 2x = 22) tree crop important to the economy of ’s Willamette Valley, where 99% of hazelnut production in the United States is located. Corylus avellana exhibits sporophytic self- incompatibility (SSI), controlled by a single S-locus with at least 33 unique alleles. The alleles exhibit codominance in the stigma and dominance or codominance in the pollen. SSI is present in many families and is understood best at the molecular level in Brassica. However, Brassica gene sequences have not proven useful for investigations in Corylus. With new genomic tools available for the study of hazelnut, including a new Pacific Biosciences (PacBio) reference genome for ‘Jefferson’, more progress can be made toward the goal of identifying the genetic determinants of SSI.

BAC end sequences associated with the minimal tiling path of the S-locus were used to identify 36 PacBio contigs associated with the hazelnut S-locus (located on

linkage group 5[LG5]). Simple sequence repeat (SSR) markers were developed from a subset of dinucleotide repeats found in those contigs and were characterized and mapped. Markers mapping near the S-locus on LG5 were screened against a population of 192 seedlings selected for recombination between the flanking markers of the S-locus, G05-510 and AU02-1350, for which the S-alleles had been determined.

This yielded a 500 kb region containing the S-locus which was used to develop more

SSR markers and a set of high-resolution melting (HRM) markers. When the new markers were used in fine mapping, they fully exploited all recombination in the fine mapping population and reduced the size of the S-locus region to 193.5 kb, containing

18 genes. The sequence of the mapped region most likely contains the S1 allele and a locus adjacent to it was also identified, and is likely to contain the S3 allele. The reference genome represents these genomic regions side-by-side but evidence suggests they represent two alleles at a single locus.

To study expression of S-locus genes following self- and cross-pollination a set of 5 reference genes was evaluated for qPCR relative expression analysis. ‘Jefferson’ stigmas were self-pollinated (incompatible) and cross-pollinated with ‘Felix’ and OSU

1253.064 (compatible). The tissue was collected at 0, 1, 2, 3, 5, and 7 hours following pollination. RNA was extracted and amplified using primers designed for reference genes. All five showed promise as references for relative expression analysis in pollinated stigmas and three were used to normalize expression of the genes of interest. Primers were designed for the 18 S-locus genes and were used to amplify cDNA from the pollinated style time course samples. Five genes showed no

expression, four showed constitutive expression, five decreased in expression, and four increased in expression. Five genes showed an interaction between time and pollen type and one gene showed a difference in expression between types of pollen applied but without the interaction effect.

Long-range PCR was used to generate amplicons 8-12 kb in length for targeted resequencing of the S-locus region. For primer pairs designed using the ‘Jefferson’ reference genome, 29 out of 43 gave bands on agarose gels when amplified from

‘Jefferson’ DNA. The 43 primer pairs were tested in the parents of ‘Jefferson’ (OSU

252.146 and OSU 414.062) and two unrelated genotypes, OSU 1477.047 DNA and OSU

1026.073. Six fragments amplified from OSU 252.146 DNA, three from OSU 414.062

DNA, eight from OSU 1477.047 DNA, and two from OSU 1026.073 DNA. The results, obtained from the first attempt at primer development, demonstrate that long-range

PCR could be a viable option to produce amplicons for targeted resequencing in

‘Jefferson’ hazelnut.

©Copyright by Ryan J. Hill March 2, 2020 All Rights Reserved

Fine Mapping and Gene Expression Analysis of Self-Incompatibility in Hazelnut

by Ryan J. Hill

A THESIS

submitted to

Oregon State University

in partial fulfillment of the requirements for the degree of

Master of Science

Presented March 2, 2020 Commencement June 2020

Master of Science thesis of Ryan J. Hill presented on March 2, 2020

APPROVED:

Major Professor, representing Horticulture

Head of the Department of Horticulture

Dean of the Graduate School

I understand that my thesis will become part of the permanent collection of Oregon State University libraries. My signature below authorizes release of my thesis to any reader upon request.

Ryan J. Hill, Author

ACKNOWLEDGEMENTS

I would like to thank my advisor Dr. Shawn Mehlenbacher for the opportunity to work with and learn from him for these two and a half years. I am thankful to have had your patience, experience, and confidence supporting me from the start. I would also like to thank Jacob Snelling for his thoughtfulness, insight, and care for so many details that I often overlook; Dave Smith for his constant encouragement and intentionality; and

Claudia Baldassi for starting the project and allowing me to benefit from her hard work.

I want to thank Dr. Kelly Vining, Golnaz Komaei Koma, Becky McClusky, Qing-Hua

Ma, Iovanna Pandelova, and Sam Talbot for their help during the research and writing of this thesis.

I want to thank my parents for encouraging me; my church family for supporting and praying for me; and Sunshyne for being my wife, my home, and the mother to our son.

CONTRIBUTION OF AUTHORS

Dr. Shawn Mehlenbacher provided funding, research objectives, experimental design, and provided the facilities used. Dr. Kelly Vining provided the experimental design for chapter 3 along with the equipment for data collection and statistical consulting for data analysis associated with the chapter. Jacob Snelling developed protocols for nucleic acid extraction for chapters 3 and 4, helped troubleshoot at all stages of research, ordered equipment and reagents, and provided input on experimental design for chapter

4. Claudia Baldassi performed the first stages of marker development for chapter 2 by identifying marker sequences, ordering primers, screening markers on agarose gels, and starting the work of characterizing the useful markers. The OSU hazelnut breeding program provided the plant material used and help with tissue collection for chapter 3.

David Smith aided in performing cross pollinations and propagation. Qing-Hua Ma helped with tissue collection and nucleic acid extraction for chapter 3

TABLE OF CONTENTS

Page

Chapter 1: Introduction……………………………………………………………. 1

Botany, history, and use………………………………………………...… 1

Genetic markers……….…………………………………………………… 5

Linkage mapping in hazelnut …………………………………………….. 10

Genomic resources in hazelnut….……………………………………….. 11

Self-incompatibility……………………………………………...... ………. 14

Real-time quantitative PCR………………………………………………... 20

Long-range PCR and amplicon sequencing……………………………… 25

Research objectives……………………………………………………….. 27

References………………………………………………………………….. 29

Chapter 2: Fine mapping of the S-locus in European hazelnut……..………...... 46

Introduction…………….. ………………………………………….……… 47

Methods ...... …...... 52 SSR identification……………………………………………...... 52 SSR characterization and mapping ………………………………... 54 Fine mapping ……………………………………………………… 55 High-resolution melting (HRM) marker development and mapping 56

Results...... …...... 57 Predicted genes in the target region………………………………... 60 Alternate locus identified…………………………………………... 60

Discussion...... …...... 61

TABLE OF CONTENTS (Continued) Page

Conclusion...... …...... 63

References...... …...... 64

Chapter 3: Real-time quantitative PCR expression analysis of S-locus genes in European hazelnut ……..………...... 84

Introduction…………….. ………………………………………….……… 85

Methods ...... …...... 88 Tissue collection……………………………………………...... 88 RNA extraction and cDNA synthesis ……………………………... 89 Reference gene selection …………………………………..……… 89 Primer design and PCR conditions ……………………………..…. 90 Primer efficiency and qPCR conditions …………………………… 90 Expression analysis……………………………………………..…. 91 Sequencing of primer binding sites……………………………..…. 92

Results...... …….. 92 Tissue Collection, RNA extraction, and cDNA synthesis ………… 92 Initial amplification…………………………………………..……. 93 Primer efficiency…………………………………………..…..…… 93 Expression analysis……………………………………………..….. 94 Sequencing of primer binding sites……………………………..…..94

Discussion...... …...... 95

Conclusion...... …...... 99

References...... …...... 100

Chapter 4: Long-range amplification of sequences in the S-locus region of European hazelnut ……………………………………………………………………………. 112

Introduction…………….. ………………………………………….……… 113

TABLE OF CONTENTS (Continued) Page

Methods ...... …...……….... 115

Plant materials……………………………………………...... 115 Primer design ……………………………………………………… 115 DNA extraction ……………………………………………………. 116 Amplification of fragments………………………………………… 116 Gel electrophoresis………………………………………………….116

Results...... ………………... 117

Discussion...... …...... 118

Conclusions...... …...... 119

References...... …...... 121

LIST OF FIGURES

Figure Page

1.1 Linkage group 5……………………………………………………………….. 44

1.2 S-allele dominance hierarchy………………………………………………….. 45

2.1 Linkage group 5 of the female parent of the mapping population…………….. 80

2.2 Fine mapping of the S-locus……..…………………………….……………… 81

2.3 Comparison of S1 and S3 loci…………….…………………………………... 82

2.4 Alignment of S1 and S3 loci…...……………………………………………… 83

3.1 CaJ_12867…...……………………………………………………………..… 110

3.2 Pollen x time interactions…...………………………………………………… 111

4.1 S-locus region and associated contigs………………………………………… 128

LIST OF TABLES

Table Page

2.1 PacBio contig identification……………………………………………….. 70

2.2 Hazelnut accessions for marker characterization………………………….. 71

2.3 SSR marker characterization ………………………………………..…….. 73

2.4 Markers developed from the ‘Jefferson’ version 2 reference genome…...... 75

2.5 Previously developed markers used in fine mapping…………………….... 76

2.6 SSR and HRM markers developed from the ‘Jefferson’ version 3 genome.. 77

2.7 Gene annotations from the S1 and S3 alleles………………...…………….. 78

2.8 Allele sizes of S-locus markers at S1 and S3 loci………………………….. 79

3.1 Primers for qPCR…………………………………………………….…….. 105

3.2 Amplification efficiencies……………………………………..…………… 107

3.3 Reference gene evaluation…………………………………….…………… 108

3.4 Relative expression………………………………………………………… 109

4.1 Long-range PCR primers and results……………………………………… 123

4.2 High molecular weight DNA………………………………………………. 127

LIST OF APPENDICES

Appendix Page

Title page…………………………………………………………………... 129

A. Minimum tiling path BACs used to identify S-locus contigs……...... 130

B. SNPs targeted for development as HRM markers……………………... 133

C. Allele frequencies in 50 hazelnut accessions for 39 SSR markers…….. 135

D. Primers for S-locus qPCR…...…………...………..…………………… 140

E. Graphs for relative expression in 13 S-locus genes……………………. 142

F. S1 Locus Summary……………………………………………………... 149

G. S3 Locus Summary…………………………………………………….. 150

1

Chapter 1: Introduction

Botany, History, and Use:

Corylus avellana L., the European hazelnut, is a woody perennial of the family Betulaceae. The family Betulaceae, one of eight families in the order Fagales, is comprised of 6 genera containing approximately 140 of which most species are wind pollinated and monoecious. The genus Corylus contains 13 diploid species: C. avellana, C. americana, C. chinensis, C. colurna, C. cornuta, C. fargesii, C. heterophylla, C. jacquemontii, C. sieboldiana, C. ferox, C. californica, C. kweichowensis and C. yunnanensis (Molnar 2011). Within the genus the species’ growth habits range from multi-stemmed bushes of 1 m in height to trees up to 40 m tall and all species are self-incompatible, diploid, and deciduous woody perennials. European hazelnut (C. avellana) is the most economically important species in Corylus and is widely grown for its edible nuts. It is highly heterozygous and diploid with a chromosome number of

2n=2x=22 and a genome of approximately 378 Mbp. It grows naturally as a multi-stemmed shrub ranging from 3 to 10 meters tall with leaves from 5 to 10 cm in length. spread by suckers which are produced from the base of the plant and plants can have growth forms ranging from erect to drooping. The fruit of C. avellana is a nut that grows in clusters of one to

12 nuts, with each nut enclosed by two overlapping bracts which can vary in length, the tightness with which they hold the nuts, thickness, and serration (Thompson et al. 1996, Molnar

2011).

The European hazelnut is wind-pollinated and monoecious with separate male and female flowers. The male flower is borne on a pendulous catkin and the female flower cluster is a compound bud containing 4-16 flowers each of which has a pair of red stigmatic styles. Until

2

pollination, the flower appears as a small tuft of many red styles emerging from a small bud.

Corylus avellana flowers in the winter, typically mid-December through late February in

Oregon, and are often dichogamous with most cultivars being protanderous (male flowers mature before female flowers). Cultivars are self-incompatible and require pollen from a compatible genotype for pollination to result in a successful fertilization, resulting in the need for compatible pollinizers to be planted alongside main cultivars in commercial orchards. Pollen is shed by catkins over a period of several weeks and if pollination is prevented female flowers can remain receptive to pollen for up to three months. Once a compatible pollen grain lands on a receptive stigma it germinates within 4 hours and penetrates the surface of the style by 12 hours. Following pollination styles stop growing and become increasingly necrotic. Pollen tubes reach the base of the style by 4 to 7 days after pollination where the sperm nucleus remains in a resting state until mid-June (Thompson 1979). During this waiting period the basal ovarian meristem develops into a mature ovary, at which point fertilization occurs and the nut begins to develop. The nuts reach maturity between late September and October. Full sized nuts completely empty at maturity (called blanks) can also develop due to a pollination event that did not result in fertilization or resulted in embryo abortion (Thompson 1979).

Wild populations of the European hazelnut are found across Europe and Asia Minor; east as far as the Caucusus Mountains, north as far as Moscow and Norway, and west as far as Ireland and the Iberian Peninsula. Agricultural production of C. avellana is mainly limited to Mediterranean climates: locations near large bodies of water with mild, wet winters and dry summers

(Mehlenbacher 2014). Hazelnut, also called filbert in some regions, is one of the world’s most important nut crops. Corlyus avellana has been cultivated at least since Roman times but

3

nutshell fragments have been found at Mesolithic archeological sites, suggesting that European hazelnut has been used as a food source for between 6,000 to 10,000 years (Boccacci and Botta

2009). In most hazelnut producing regions traditional cultivars that have been selected over centuries from local wild populations are still the main components of orchards, propagated traditionally by suckers.

The majority of worldwide production is in Turkey, averaging 64% of global production, followed by Italy at 13%, Georgia at 4 %, and the USA at 4% (FAOSTAT 2019). In Turkey the majority of nuts are harvested from branches by hand, from cultivars with long, clasping husks that hold the nuts in the tree. In the Pacific Northwest of the United States, trees are selected with short, open husks that allow mature nuts to fall to the orchard floor. These mature nuts can then be swept up by machine for harvest. Orchard floors are highly manicured and trees are pruned into an upright, single-trunk form to facilitate efficient machine harvest. The climate of the Willamette Valley is particularly suitable for hazelnut with mild, wet winters and warm dry summers. The European hazelnut was brought to the United States in the late 1800s by

Felix Gillet, a French immigrant who set up a nursery in Nevada City, . From 1880 to

1905 he distributed hazelnut cultivars across California and the Pacific Northwest, including

‘Barcelona’ and ‘DuChilly’. The availability of these cultivars allowed the hazelnut industry to start in Oregon in 1900 with the first commercial orchard planted by George A. Dorris in

Springfield, Oregon (Mehlenbacher and Miller 1989, Hummer 2001). In the USA, 99% of hazelnuts are grown in Oregon’s Willamette Valley and demand remains high for hazelnuts, maintaining potential for growth of the industry.

4

For the first few decades the hazelnut industry thrived in Oregon. Cultivars ‘DuChilly’,

‘Barcelona’, and pollinizers ‘Daviana’ and ‘Hall’s Giant’ were extensively planted. These cultivars still represent a large portion of the producing trees in older orchards in the Willamette Valley.

However, in the 1960s, due to the unsuitability of ‘Barcelona’ for the kernel market, a breeding program was started at Oregon State University under the direction of Dr. Maxine M.

Thompson. Breeding for local adaptability, yield, and kernel quality which progressed until her retirement in 1986 when management of the program was transferred to Dr. Shawn

Mehlenbacher.

Eastern filbert blight (EFB) made its appearance in the Willamette Valley of Oregon in 1986, threatening old susceptible orchards and the Oregon industry, producing the need for new disease resistant cultivars (Thompson et al. 1996). Disease resistant cultivars have been produced by the program to preserve the Oregon industry and have recently spurred rapid growth in plantings of OSU hazelnut cultivars in Oregon and worldwide. Production in Oregon has increased from around 6,000 ha harvested yearly in the 1960s to almost 15,000 ha harvested in 2017 (FAOSTAT, 2019). It is estimated that there are 32,000 ha (80,000 acres) in

Oregon though half of that number are not yet in production due to juvenility (Weigand 2019).

This increase in acreage is in part due to a favorable international market and high-quality EFB resistant cultivars supplied by OSU. The OSU breeding program has produced cultivars suited to the in-shell market as well as cultivars suited for the kernel market. Historically the Oregon hazelnut industry has targeted the in-shell market, growing cultivars that produce large round nuts with attractive shells, but this market represents only 5-10% of the total market. The other

5

90-95% of the hazelnuts produced worldwide are cracked and sold as kernels, requiring cultivars that produce small round nuts with thin shells.

The United States Department of Agriculture (USDA) National Clonal Germplasm Repository

(NCGR) collection of C. avellana is also situated in Corvallis, Oregon. Over 485 C. avellana and wild relative selections are maintained at the repository, collected from the species’ centers of origin by the USDA and OSU researchers (Molnar 2011, Hummer 2018). The diversity represented at OSU and the USDA repository is an invaluable resource for the improvement of hazelnut cultivars including durable genetic EFB resistance. Many species of Corylus are cross- compatible and interspecific hybrids have been created for use in breeding, primarily for EFB resistance. Most notably C. americana, the American hazelnut, is broadly cross-compatible with

C. avellana and has been used extensively by breeders at OSU and Rutgers in New Jersey for the purpose of building EFB resistant varieties for the eastern USA. Single gene resistance was originally found in Washington by OSU researchers in an EFB affected orchard where the obsolete pollinizer ‘Gasaway’ was found free of disease. ‘Gasaway’ resistance is the source of single-gene resistance in all recent OSU releases through 2019; there are 171 sources of resistance in use by the breeding program at different levels of development. 18 sources of resistance map to linkage group 6 with ‘Gasaway’, 7 map to linkage group 7, and 3 map to linkage group 2 (Chen et al 2007, Sathuvalli et al. 2011 and 2012, Sathuvalli and Mehlenbacher

2011, Bhattarai 2015, Sekerli 2019, Komaei Koma unpublished).

Genetic Markers:

Genetic markers have been used by the OSU hazelnut breeding program in the mapping of disease resistance and other traits, DNA fingerprinting, cultivar identification, and marker

6

assisted selection (MAS) in the breeding populations. Genetic markers provide information about the genetic characteristics of an organism and can be tracked through generations. The hazelnut breeding program has used polymerase chain reaction (PCR)-based markers including amplified fragment length polymorphism (AFLP), randomly amplified polymorphic DNA (RAPD), simple sequence repeat (SSR), and high-resolution melting (HRM) markers.

AFLP marker technology was developed by Vos et al. (1995) as a DNA fingerprinting technology that was used in many applications including genetic linkage mapping. The technology works using a restriction enzyme digest of genomic DNA followed by adaptor ligation and PCR amplification. Selective primers could be used to reduce the complexity of the final product and analysis of the markers occurs on a polyacrylamide gel. Chen et al. (2005) identified AFLP markers linked to EFB resistance in the numbered selection OSU 408.040.

RAPD markers, first described by Williams et al. (1990), are based on the random amplification of genomic DNA by arbitrary single primers and products are separated using gel electrophoresis. Markers can be scored by presence or absence of bands on a gel after staining with ethidium bromide. Nearly all RAPD markers are dominant and presence or absence are often the result of polymorphism in primer binding sites. Marker development and scoring is cheap but can be inconsistent due to the sensitivity of the assay to conditions of the PCR. RAPD markers have been extensively used in hazelnut and served as the majority of markers for the original linkage map (Mehlenbacher et al. 2006). Pomper et al. (1998) identified RAPD markers linked to hazelnut self-incompatibility alleles S1 and S2 and Mehlenbacher et al. (2004) identified three RAPD markers in tight linkage with eastern filbert blight resistance in hazelnut.

7

Dominant RAPD markers can either segregate 1:1 or 3:1 and the latter segregation pattern can be used to anchor linkage maps, though with reduced accuracy for positioning.

SSRs, or microsatellites, are codominant markers that exploit length polymorphism in tandemly repeated motifs of 2-6 (or more) base pairs (bp). SSRs are abundant throughout genomes in coding and non-coding regions and repeat lengths are often highly polymorphic. SSR markers are scored by PCR amplification of the repeat-containing region followed by capillary electrophoresis fragment sizing.

SSR markers are more reliable and consistent than RAPD markers and have been used more extensively in recent years in hazelnut. Initially fifty-one SSR markers were developed from libraries enriched for SSR sequences (Bassil et al. 2005a, Bassil et al. 2005b, Boccacci et al. 2005) and twenty-eight were included in the original linkage map as anchor loci (Mehlenbacher et al.

2006). Gürcan et al. (2010a) developed 86 new microsatellites from hazelnut GC and CA- enriched libraries and Gürcan and Mehlenbacher (2010a) developed 72 SSRs from ISSR fragments. Sathuvalli and Mehlenbacher (2013) developed 23 SSRs from BAC end sequences.

Gürcan and Mehlenbacher (2010b) and Boccacci et al. (2015) developed 12 SSRs and 20 SSRs, respectively, from public databases (Alnus, Betula, and Corylus). Colburn et al. (2017) used

‘Jefferson’ hazelnut transcriptome sequences (Rowley et al. 2012) to develop 111 microsatellites, and Bhattarai and Mehlenbacher (2017 and 2018) developed 343 new microsatellites from genomic Illumina sequences from ‘Jefferson’ hazelnut (Rowley 2016).

Lastly, Sekerli (2019) developed 42 SSRs for linkage groups 2 and 7 from PacBio contigs from

‘Jefferson’ hazelnut. In total, 760 SSRs are available, developed from a variety of hazelnut sequences.

8

SSR markers have also been used to assess diversity in hazelnut accessions (Boccacci et al. 2006,

Gokirmak et al. 2009, Gürcan et al. 2010b) and can be characterized using a diverse set of individuals. Characterization estimates values for polymorphism information content (PIC), observed heterozygosity (Ho), expected heterozygosity (He), and frequency of null alleles (r).

PIC measures how useful a marker will be in linkage studies or diversity assessment and is calculated using this formula:

th th where pi is the frequency of the i allele, pj is the frequency of the j allele, and n is the number of alleles. The more polymorphic the SSR locus the closer to 1 the PIC value will be (Botstein et al. 1980). Null alleles can confound SSR data due to lack of amplification of one allele at a heterozygous locus, making the individual incorrectly seem homozygous by the absence of a second fragment. Null allele frequency (r) can be calculated using the maximum likelihood model (Kalinowski et al. 2007) or the formula of Brookfield (1996) where r = (He – Ho) / 1 + He.

Observed heterozygosity is the frequency of heterozygous individuals at an SSR locus, calculated by dividing the number of heterozygous genotypes by total genotypes in the population used to characterize the marker. Expected heterozygosity is the probability that a random individual would be found to be heterozygous at a locus, assuming Hardy-Weinberg equilibrium, and is calculated through use of the equation:

th where frequency pi is the frequency of i allele and n is the number of alleles (Nei, 1973).

9

Although microsatellites have been a great resource for hazelnut, single nucleotide polymorphisms (SNPs) are by far the most numerous polymorphisms that have been exploited for marker development and are an attractive option for development in hazelnut. SNPs are single nucleotide changes at a particular locus and can be functionally linked to an agronomic trait. SNPs are identified most often using next-generation sequencing information and can be used in higher-throughput than PCR based markers for genome-wide association studies

(GWAS), genomic and marker-assisted selection, and linkage mapping. Despite the many advantages of SNP markers, developing a useful set of SNPs can be costly and time-consuming whether reduced representation sequencing or whole-genome sequencing is employed.

SNPs can also be targeted individually using PCR-based high-resolution melting (HRM) or kompetitive allele specific PCR (KASP) markers. For the short-term these applications may be more useful for breeding and mapping purposes when other PCR-based markers are unavailable. SNPs can be identified for use as HRM or KASP markers through a low-throughput visual scan of genome alignments in regions or genes of interest. In the case of HRM development, primers must be designed so that the SNP is contained in an amplicon, a melt curve is generated using PCR and DNA-binding dyes, and differences in genotype identified through comparison of melt curves (Liew et al. 2004). Amplicon size of 23-300 bp with 18-24 bp primers is optimal for HRM analysis (Chagné et al. 2008, Studer et al. 2009) and during melt curve generation it is recommended that each step increase the temperature only 0.2°C to detect even the smallest change in melt curve shape (Fisher et al. 2010).

KASP marker technology, developed by KBiosciences (Hoddesdon, Hertfordshire, UK), is another PCR-based SNP genotyping method that uses two allele-specific fluorescent forward

10

primers with a common reverse primer. The fluorescent reporting system is used to identify which of the competing forward primers annealed and produces the genotype assignment

(Kumpatla et al. 2012).

Linkage Mapping in Hazelnut:

As markers are developed for hazelnut, they are characterized and added to linkage maps to show their relative position to other markers and relation to interesting traits such as self- incompatibility, non-dormancy, or disease resistance. The first complete linkage map published on European hazelnut was created by Mehlenbacher et al. (2006). RAPD and SSR markers were used in the construction of the map with mostly RAPD markers used but SSR markers were also included with the purpose of anchoring the map so it could be expanded to other populations.

As a basis for the map an eastern filbert blight resistant parent (OSU 414.062) was crossed with a susceptible parent (OSU 252.146) and linkage of markers was measured in their 144 offspring.

Mapping markers to linkage groups (LG) using the double pseudo-testcross resulted in four groups for each chromosome due to markers being in repulsion in both male and female maps.

Dummy markers were created to merge the two maps derived from each parent. This resulted in a 661 centimorgan (cM) map for the susceptible parent (female) and 812 cM for the resistant parent (male). The maternal map used 249 RAPD and 20 SSR markers and the paternal map used 271 RAPD and 28 SSR markers. SSR markers used in this original map made it possible for information to be easily used in other populations, due to the higher reproducibility of SSRs compared to RAPD markers. On this map the locus for self-incompatibility was placed on linkage group LG5 and the locus for ‘Gasaway’-derived eastern filbert blight resistance was placed on LG6. Mehlenbacher and Bhattarai (2017 and 2018) added 232 markers from genomic

11

sequences and Colburn et al. (2017) added 54 markers from the transcriptome to the linkage map produced in 2006, greatly increasing the utility of the linkage map for fingerprinting, trait analysis, and marker assisted selection (MAS). This map showed the incompatibility locus on linkage group 5F to be bounded by SSR markers BR427 (Colburn et al. 2017) and GB309

(Bhattarai and Mehlenbacher 2017) (Figure 1.1).

Genetic mapping has also been used for qualitative trait loci (QTL) analysis both with SSR markers and single nucleotide polymorphism (SNP) markers. Beltramo et al. (2016) produced a mapping population using the cross ‘Tonda Gentile delle Langhe’ (TGDL) x ‘Merveille de

Bollwiller’ (n=163) and created a linkage map using 152 SSR markers. Marker-trait associations were used to identify QTL associated with vigor, sucker habit, and time of bud burst. Further investigation into the budburst QTL using 2,406 SNP markers, derived from genotyping-by- sequencing (GBS), and 39 SSR markers identified a stably expressed region on LG2 of the TGDL map (Torello Marinoni et al. 2018).

Double digestion restriction site associated DNA sequencing (ddRADseq) was used to produce a map recently for the mapping population H3R07P25 x OSU 1155.009 (n=119) to map EFB resistance from H3R07P25, grown from open pollinated seed collected in Holmskij, Russia.

Resistance was mapped to LG2 (Honig et al. 2019).

Genomic Resources in Hazelnut:

Recent improvements to the genomic resources in hazelnut have increased the ease at which

SNPs and other markers can be developed. Sathuvalli and Mehlenbacher (2011) reported on the development of a bacterial artificial chromosome (BAC) library for the ‘Jefferson’ hazelnut consisting of 39,936 clones with a mean insert size of 117 kb for use in map-based cloning of

12

the self-incompatibility and resistance loci in hazelnut. In 2015, Clemson University used high information content fingerprinting (HICF) to construct a physical map of the ‘Jefferson’ genome consisting of 26,752 BAC clones arranged in 1,673 contigs. They also identified a minimal tiling path (MTP) consisting of 5232 clones and sequenced the ends of each clone in the MTP using universal primers located in the BAC vector. Reactions were carried out with BigDye Terminator v3.1 (Life Technologies) and collected on an ABI3730xl DNA Analyzer (Life Technologies). MTPs for the S- and R-loci were identified as well, consisting of 127 and 94 BACs respectively.

Sathuvalli et al. (2017) reported on the use of the BAC library for marker development and high- resolution mapping of the EFB resistance locus, developing markers from BAC end sequences associated with the R-locus MTP.

Rowley et al. (2012) used RNAseq to sequence transcripts from leaves, catkins, bark, and seedlings, resulting in 6.8 Gb of reads that were assembled into the first hazelnut transcriptome. Other transcriptomes have been produced for C. heterophylla and C. mandshurica as well (Ma et al. 2013, Chen et al. 2014) and a transcriptome for a C. heterophylla x C. avellana Ping’ou hybrid was produced for SSR development (Zhao et al. 2019). Other sequence analysis has included low coverage genome re-sequencing of Corylus heterophylla to identify sequence variation correlated with blank-nut mutants (Cheng et al. 2019) and chloroplast genome sequencing for phylogenetic analysis of Corylus species (Yang et al. 2018).

Rowley et al. (2016) produced the first hazelnut reference genome using Illumina short read sequencing of the ‘Jefferson’ hazelnut at 93X coverage. Reads were assembled in 36,641 contigs and 34,910 putative genes were identified in the genome. In addition, lower coverage sequencing of 7 diverse cultivars (‘Barcelona’, ‘Ratoli’, ‘Tonda Gentile delle Langhe’, ‘Tonda di

13

Giffoni’, ‘Daviana’, ‘Hall’s Giant’, ‘Tombul Extra Ghiaglhi’) was added to aid in marker development for mapping and fingerprinting studies. The reference genome has been improved using long reads from Pacific Biosciences at 49.39X coverage. Illumina and PacBio reads were assembled into chromosome-level scaffolds using proximity ligation by Dovetail Genomics, resulting in 11 scaffolds corresponding to the 2n=2x=22 chromosome number expected. 34,982 protein-coding genes were identified in the 11 scaffolds and 24,911 were annotated (Vining et al. in press). Two other individuals were also sequenced using Pacific Biosciences long-read sequencing technology, OSU 1026.073 (‘Ratoli’ EFB resistance) and OSU 1477.047 (Georgian

EFB resistance) to provide supplementary information on distantly related genotypes with different self-incompatibility alleles and unique forms of EFB resistance. In order to understand better the heterozygosity of these diploid genomes, the parents of the three sequenced individuals (‘Jefferson’, OSU 1026.073, and OSU 1477.047) were also sequenced. Sequencing was finished December 2019 and new parental genomes should be useful to study the inheritance of characteristics between parents and offspring (Mehlenbacher, personal communication).

A recent preprint also describes a new chromosome-level genome assembly of the Turkish cultivar ‘Tombul’. The assembly spans 370 Mb and covers 97.8% of the estimated genome size with 28,409 predicted genes using a combination of Illumina paired-end sequences and low coverage, long single molecule reads from Oxford NanoPore to produce a fragmented assembly. Dovetail Genomics carried out proximity ligation sequencing to produce chromosome level scaffolds (Lucas et al. 2019).

14

Self-Incompatibility:

Self-incompatibility (SI) is defined as the inability of a fertile seed plant to produce zygotes following self-pollination (de Nettancourt 1977). SI in hazelnut has been a mechanism of interest because it maintains outcrossing behavior and limits which crosses can be made in a breeding program. Hazelnut is not unique in having an SI system; in flowering plants 19 orders,

71 families and 250 genera, representing between 40 and 60 percent of angiosperms, exhibit some form of SI to encourage outcrossing and maintain genetic diversity (Allen and Hiscock

2008, Igic et al. 2008). There are two main types of incompatibility found in angiosperms: heteromorphic SI and homomorphic SI. Heteromorphic SI was the first SI system to be studied and was described by Charles Darwin (1877) and is an SI that corresponds with style and stamen length differences. Darwin commented on the reduced fertility in primrose when self- pollinated and greatly improved fertility upon cross pollination, suggesting more than just flower morphology encouraging outcrossing. Buckwheat also exhibits heteromorphic SI and genetically it has been shown to possess a diallelic system, with a homozygous recessive state to lead to the long-styled flower form and a heterozygous state to lead to the short-styled flower form (Yasui et al. 2012). HSI is present in at least 28 families and it is likely that the systems evolved in parallel, rather than sharing a common ancestor (Barrett et al. 2008).

Homomorphic SI is more common than heteromorphic SI and is separated into two main categories: gametophytic SI (GSI) and sporophytic SI (SSI). Both systems typically involve a single polymorphic locus with a minimum of two genes, one controlling male identity and one controlling female identity. These systems inhibit self-fertilization through either self- recognition or non-self-recognition mechanisms, as reviewed by Fujii et al. (2016).

15

GSI is found in the Solanaceae, Plantaginaceae, Papaveraceae, Rutaceae, Poaceae, and

Rosaceae and is a barrier to self-fertilization that is determined by an interaction between the sporophytic female flower and the gametophtyic pollen. GSI is more common than SSI and generally has a conserved ‘non-self-recognition’ mechanism. Most GSI families exhibit an S-

RNase controlled mechanism where S-RNase proteins located in the style determine pollen tube rejection (Murfett et al. 1994). Sijacic et al. (2004) showed that an S-locus F-box (SLF) protein controlled pollen compatibility and the introduction of its S2 allele into S1S1, S1S2 and

S2S3 genotypes conferred self-compatibility. The current model of the interaction explains that

S-RNases are recognized and detoxified by a pollen tube expressed protein complex made up of many SLF proteins in compatible pollinations (Kubo et al. 2010). SLF detoxification of S-RNases allows for translation of pollen tube RNAs when S-RNases located in the style would otherwise degrade ribosomal RNA, inhibiting translation of proteins. SLF proteins in the complex allow for recognition and detoxification of S-RNases associated with non-self S-alleles and each individual lacks the SLF gene that recognizes the S-RNase associated with its own S-allele. An incompatible pollination would involve a pollen tube not having the proper SLF gene product which would be able to detoxify the stigmatic S-RNase, leading to slowed or stopped pollen tube growth in the style due to the inability to produce protein products (Kubo et al. 2015). The implications of this are shown in the induction of self-compatibility in autotetraploid apple trees due to the production of heterozygous diploid pollen grains (Adachi et al., 2009). This shows that since each S-allele has the ability to detoxify all S-RNases except self-S-RNase, a heterozygous diploid pollen tube expressing SLF genes in a codominant fashion would be able to detoxify all types of

S-RNase.

16

GSI also contains a self-recognition system in the Papaveraceae. In this family a ligand-receptor pair signals elements related to the cell-death cycle in pollen if one of the stigmatic SI alleles is also expressed by the pollen gametophyte. Lin et al. (2015) showed that naturally self- compatible Arabidopsis thaliana plants expressing both the male (PrpS) and female (PrsS) identity genes from Papaver rhoeas produced strong SI phenotypes when self-pollinated. This is significant because although SI has been synthetically transferred between closely related species, the Papaver and Arabidopsis lineages diverged over 140 million years ago.

The other major class of homomorphic SI, sporophytic SI (SSI), is present in the Asteraceae,

Betulaceae, Brassicaceae, Caryophyllaceae, Convolvulaceae, and Polemoniaceae and is characterized by an incompatibility reaction that involves S-alleles in the female sporophyte recognizing S-alleles of the male sporophyte (located in the pollen coat), rather than the S-allele expressed by the pollen tube itself as is the case in GSI. The sporophytic expression of male S- alleles allows for a dominance interaction in both the stigma and the pollen to determine compatibility through self-recognition.

SSI is well-characterized at the molecular level in the Brassicaceae where it has been shown that the interaction is governed by a receptor-ligand interaction with the receptor being SRK (S- locus Receptor Kinase), a plasma-membrane anchored receptor that determines pollen acceptance on the stigma (Nasrallah et al. 1994). SRK is exclusively expressed in the stigma and is composed of a conserved kinase domain, a trans-membrane domain, and a variable extra- cellular domain. The ability of SRK to initiate a SI response is improved by the presence of an S-

Locus Glycoprotein (SLG) which is a secreted protein with a similar S-domain to SRK but SRK alone determines S-haplotype specificity (Takasaki et al., 2000). Eventually S-locus cysteine-rich

17

protein (SCR), also known as S locus protein 11 (SP11), was identified by two separate groups independently as the determinant of pollen phenotype (Schopfer et al. 1999, Takayama et al.

2000). SCR/SP11 is expressed sporophytically by the anther and transferred to the pollen coat, so dominance rather than pollen genotype determines acceptance or rejection by the stigma

(Shiba et al 2001).

Though the molecular determinants of SSI in Brassicaceae have been identified, there seems to be divergence between the SSI mechanisms of Brassicaceae and other SSI families, contrasting with the similarities seen between S-RNase GSI systems. SSI species from Convolvulaceae,

Betulaceae, and Asteraceae contain sequences similar to Brassicaceae SSI proteins, though unlinked to the S-locus and unlikely to be important in SSI interactions in those families.

Candidate genes for the genetic determinants of SSI have also been identified in Tolpis coronopifolia of the Asteraceae and Ipomoea trifida of the Convolvulaceae and all candidate genes belong to different gene families than the Brassica genes (Rahman et al. 2007, Kowyama et al. 2008, Koseva et al. 2017). It is not surprising to see divergent systems governing SSI between families as SSI typically occurs in families at the ends of taxonomic branches in the more derived eudicot lineages, indicating that it may be of more recent origin than GSI (Allen and Hiscock, 2008).

European hazelnut, a member of Betulaceae, exhibits sporophytic self-incompatibility (SSI) and requires pollen produced from an individual expressing a different S-allele in order to set nuts, making the genetics of self-incompatibility a topic of interest for hazelnut breeding programs. S- alleles restrict what crosses can be made between individuals and for each cultivar the program

18

releases there needs to be an adequate pollinizer to include in each orchard to ensure good seed set on the main cultivar.

Self-incompatibility in hazelnut is determined at a single S-locus on LG5 with different alleles

(haplotypes) controlling compatibility through a self-recognition mechanism. Currently 33 alleles have been identified and their dominance hierarchy determined (Figure 1.2), with one allele exhibiting self-compatibility (Mehlenbacher and Smith, 2006). Alleles are expressed as codominant in the stigma and either codominant or dominant in the pollen depending on the dominance interaction. This means that reciprocal crosses can have different compatibilities.

Compatibility or incompatibility can be determined 16-24 hours after pollination by use of tester pollen, fluorescence microscopy, and unpollinated female flowers. Low pollen germination or bunched, short, or bulbous pollen tubes are typical of an incompatible pollen- stigma interaction whereas good pollen germination with a mass of long parallel pollen tubes growing down the style is indicative of a compatible combination (Mehlenbacher, 1997).

Fluorescence microscopy was also used to assess incompatibility in wild relatives of the

European hazelnut and has suggested that self-incompatibility exists in many of those species including several self-compatible individuals (Erdoğan and Mehlenbacher 2001). Upon pollination both compatible and incompatible hazelnut pollen hydrates on the stigma within 2 hours, with compatible pollen tubes emerging after 4 hours and penetrating the stigmatic surface by 12 hours. Incompatible pollen tubes emerge 8 hours later than compatible and never penetrate the stigma resulting in bulbous, short, and bunched pollen tubes (Hampson et al.

1993).

19

Partial self-compatibility has been observed in cultivars ‘Montebello’ and ‘Tombul’ but self- pollination in these varieties resulted in twice as many blanks as in instances of cross pollination

(Mehlenbacher and Smith, 1991). Full self-compatibility has been identified in seedlings of the cutleaf hazelnut and is caused by S28 when in combination with an S-allele lower in the dominance hierarchy (Mehlenbacher and Smith, 2006). Self-compatibility in seedlings of the cutleaf hazelnut is derived from the failure of female flowers to reject self-pollen, while S28 pollen behaves the same as S30 pollen and is rejected on S30 containing stigmas (Mehlenbacher, personal communication).

Hampson et al. (1996) investigated whether there were similar sequences in the hazelnut genome to the Brassica genes SLG and SRK through use of probe hybridization assays at high and low stringencies (3-4% mismatch tolerated vs. 20-30% mismatch tolerated). SRK showed no regular hybridization and SLG showed regular hybridization at low stringency, though not with its conserved functional domains. The conclusion of the study was that the S-genes in hazelnut are likely of independent origin or diverged greatly from the Brassica genetic mechanism.

Mehlenbacher et al. (2006) initially mapped the self-incompatibility locus (S-locus) to LG5 flanked by RAPD markers G05-510d and 345-1050d, separated by 15 cM. The V3 ‘Jefferson’ genome predicts the region between these two RAPD markers to be 2.6 Mb in size and containing 293 predicted genes. Markers G05-510d and AU02-1350d (maps near 345-1050d) were used to identify recombinant seedlings in a fine mapping population produced by OSU

252.146 x OSU 414.062 and its reciprocal cross (n=1488), of which 192 were identified. Markers were screened against recombinants to identify 204-950d and KG819/KG847 to be within one cM of the S-locus (Sathuvalli and Mehlenbacher, 2011).

20

Ives et al. (2014) showed the S-locus to be closely linked to the red-leaf/red-style locus and mapped markers BR259 and B774 within 5 cM of the S-locus. The map produced most recently

(Mehlenbacher and Bhattarai, 2018) shows the self-incompatibility locus to be closely flanked on one side by SSR markers GB309 and GB646 at distances of 4.5 and 5.2 cM respectively. The other side of the locus is flanked by BR427, 204-950d, KG847, and BR259 at distances of 0.1,

0.8, 2.3, and 3.7 cM respectively.

Real-time quantitative PCR:

Real-time quantitative PCR (qPCR) is a fast, accurate method for measuring the accumulation of

PCR products using a fluorescent dye that binds to double stranded DNA, SYBR Green being the most common. As fluorescence increases in a qPCR reaction mixture the fluorescence detected will at some point pass the quantification cycle (Cq) threshold during the exponential phase of amplification. Cq values relate directly to starting template abundance and have been extensively used to identify RNA expression differences between tissues that have undergone various treatments. Analysis of Cq values allows quantification of transcripts at a wide range of abundances. Included in the variables that can influence the outcome of a qPCR experiment are data normalization, primer annealing efficiency, DNA contamination, biological variation, pipetting errors, secondary structure of RNA, reverse-transcription, tissue collection, and RNA extraction (Taylor et al. 2019). Minimum Information for Publication of Quantitative Real-Time

PCR Experiments (MIQE) guidelines lay out the requirements for methodology in qPCR experiments in the attempt to ensure reliability and repeatability in qPCR studies (Bustin et al.

2009).

21

Relative or absolute quantification of transcript abundance can be used to find differences in expression using qPCR. Absolute quantification can be achieved through use of digital PCR

(dPCR) or through use of a standard curve to identify transcript counts in samples. The standard curve methodology involves the use of a plot constructed through use of standards with known concentrations and known transcript copy numbers based on molecular weight. This allows the determination of transcript numbers for genes of interest based on the standards plot (Whelan et al. 2003). Absolute quantification through dPCR is a technique that involves a PCR solution being diluted until single molecules are isolated. Target DNA abundance is measured by assessing how many partitions show product amplification assuming random and independent sample distribution for accurate measurement (Vogelstein and Kinzler 1999). Relative uncertainty in template abundance under 6% was demonstrated by Bhat et al. (2009) when using dPCR on samples with 212 - 3,365 template molecules each, analyzing 5 dPCR panels with

765 partitions each. Despite the power of dPCR it is limited by low-throughput and high associated costs. A more recent improvement on the dPCR method came with the development of high-throughput droplet digital PCR (ddPCR), reported by Hindson et al. (2011), allowing dPCR to be performed in nanoliter-sized droplets. Droplets are generated in a high-throughput fashion, mixing samples with oil containing stabilizing surfactants. PCR reactions can occur in each droplet and results are recorded by specialized machinery that separates droplets and measures fluorescence. Taylor et al. (2017) compared ddPCR with qPCR showing ddPCR to be more reliable than qPCR in cases of variable contamination with inhibitors and low transcript abundance. In cases of consistent inhibitor contamination and higher transcript abundance qPCR and ddPCR were equivalent. The researchers claim that use of qPCR is reliable in cases

22

where 2-fold or greater differences in expression are seen and higher than 100 transcript copies per reaction are possible but ddPCR could be a possible alternative if these standards are not met.

Despite the robust results that rigorous absolute quantification methods can produce, relative quantification of transcript abundance is still a very commonly used due to relative ease and low cost of setting up assays, especially when knowing the numbers of transcripts produced is not relevant to the experimental question. Relative abundance is commonly expressed in terms of the ΔΔCq equation (Livak and Schmittgen 2001) which takes two references, a Cq value from a reference gene and the Cq value of a calibrator sample, to account for both the amount of total cDNA in each sample and inter-run variation in order to give a more reliable number for the relative abundance of a target gene. One assumption of this method is that two copies of each template are produced each PCR cycle, but this may not always be an accurate assumption in some cases as inhibitors present in samples or imperfect primers may lead to a higher or lower PCR efficiency. To better account for violations of this assumption Pfaffl et al.

(2001) developed a revised ΔΔCq method that takes into account efficiency, allowing for different efficiencies between reference samples and gene of interest samples. However even this modification is not able to account for situations when efficiency is different among samples.

The selection of reference genes to compare relative expression when using the ΔΔCq method is an important consideration when using this method. For a reference gene to be useful it must maintain stable expression across experimental conditions and be expressed at similar levels to the genes of interest. There are several genes that have traditionally been used as reference

23

genes in qPCR including 훽-Actin (ACTB), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), and 18S rRNA. 18S rRNA may be a problematic reference gene due to the typically much higher expression of rRNA than mRNA, its transcription being linked to RNA polymerase I while mRNA uses RNA polymerase II, and that rRNA does not have a polyA tail, excluding it from reverse transcription if oligo(dT) primers are used for cDNA synthesis (Kozera and Rapacz 2013). GAPDH and ACTB may be valid reference genes but must be validated before use as they have been shown to be unstable in several studies. Zhu et al. (2012) tested 21 genes in Papaya and ranked

GAPDH as the worst in terms of stability, Xiao et al. (2016) tested 10 genes for and also found GAPDH to be inconsistent in its expression but ACTB to be highly stable. In a review by Ruan and Lai (2007) ACTB was shown to be consistently changing its expression in response to widely varied biomedical stimuli. Difficulties in choosing an appropriate reference gene can be resolved through a rigorous validation of candidate reference genes.

One difficulty with developing appropriate reference genes is that before a reference is created there is no baseline from which to accurately judge stable expression. One commonly used method to overcome this issue, developed by Vandesompele et al. (2002), is to compare a reference gene with the other candidates being evaluated to produce the value M, which is defined as the standard deviation of the logarithmically transformed expression ratios. The reference gene with the lowest M value should have the lowest variation among the set tested and stepwise removal of genes from the dataset with the highest values should lower the overall group M values. The authors also warn that it should not be adequate to use a single reference gene for qPCR experiments but a set of three reference genes might be necessary.

They released GeNorm software to provide researchers with the ability to easily do this for

24

themselves. Andersen et al. (2004) developed NormFinder as another tool to develop reference genes. It uses a mathematical model to show intra- and inter-group expression variation and separately analyzes subgroups as it calculates stability values for candidate genes. The authors argue that their model-based approach is more robust than the pairwise approach utilized by

GeNorm and show that in cases of co-regulation of candidate reference genes the model-based approach seems to be more robust. The last algorithm commonly used is BestKeeper, developed by Pfaffl et al. (2004), and it uses cycle threshold (Cq) values and their variation while taking into account PCR efficiency to create an index of the tested genes and evaluate them in a pairwise fashion. Correlation coefficients are produced by the program to evaluate the set of reference genes, with higher coefficients suggesting better use as internal references.

Stepwise removal of genes with lower correlation coefficients can allow a set of reference genes to be developed that can better control for variation as a group than a single reference gene could. BestKeeper also allows for up to 10 target genes to be analyzed as a part of the software, increasing ease of use.

Even though all validation programs are helpful in identifying valid reference genes the orderings of stability rankings produced by each program for the same set of genes can be different, adding extra uncertainty to how to appropriately proceed. Kozera and Rapacz (2013) suggest that a common rule of “Best 3” can be applied: use at least three reference genes, three different validation programs, and take three samples with three repeats for each genotype.

In hazelnut prior use of qPCR has been focused on allergen detection in pollen and food products and there has been little work done in terms of developing reference genes. Platteau

25

et al. (2011) developed an assay for identification of allergen detection in soy and hazelnut food products using qPCR but did not report on reference genes, Ražná et al. (2014) reported use of an 18S-rRNA internal reference for use in pollen allergen detection but did not validate its stable expression experimentally, and Žiarovská et al. (2015) reported on the development of 3- hydroxy-3-methylglutaryl coenzyme A reductase (HMGCR) as a reference gene for pollen allergen detection by qPCR.

The identification of pathogen presence and quantification of transcripts can be useful in early diagnosis of disease and a qPCR assay exists for detection of Anisograma anomala, the causal agent of EFB in hazelnut, for this purpose. The assay’s ability to detect pathogen presence was

79% accurate as early as 6 weeks after inoculation, allowing for very early screening of plant material for susceptibility to EFB (Molnar et al. 2013). Though qPCR has been rarely used in hazelnut it shows promise for future studies involving self-incompatibility and disease resistance.

Long-Range PCR and Amplicon Sequencing:

The use of qPCR, parental genome sequences, and genetic mapping can help dissect the origin and presence of genes and gene products but there is a need to confirm findings especially when working with sequences from a highly heterozygous individual’s genome. PCR has been a useful tool in amplifying small regions of interest for physical evidence of polymorphism, confirmation of genomic sequences, or examining diversity at a locus in multiple genotypes, however the size of amplified fragments can be limiting in the use of PCR for these purposes.

This is especially true if the regions of interest are tens to hundreds of kilobases (kb) long like the hazelnut S- or R-loci.

26

Long-range PCR is a tool that can make up for the shortcomings of traditional PCR through use of alternate polymerases and specific cycling conditions to amplify fragments from 5 to 15 kb long. In some cases, fragments up to 40 kb can be amplified depending on the polymerase used. Jia et al. (2015) evaluated the ability of 6 long-range DNA polymerases (Invitrogen

SequalPrep, Invitrogen AccuPrime, TaKaRa PrimeSTAR GXL, TaKaRa LA Taq Hot Start, KAPA Long

Range HotStart and QIAGEN Long Range PCR Polymerase) to amplify fragments sizes 12.9 kb,

9.7 kb, and 5.8 kb and showed that TaKaRa PrimeSTAR was the only polymerase able to amplify all fragments at the recommended cycling conditions. PrimeSTAR was then used to amplify fragments containing two genes of interest from 9 individuals to identify polymorphism, sequencing the fragments using Illumina MiSeq. The PrimerSTAR GXL product manual shows that traditional PCR conditions do not work appropriately for long-range PCR and recommends during the elongation/annealing step to increase the length of the step 1 min/kb up to 10 minutes for products 10 to 30 kb long.

Long-range PCR can also be used to study sequences from rare or degraded samples. Denier et al. (2017) used long-range PCR to amplify mitochondrial DNA from environmental samples, samples which were impossible to sequence as they were. They also used MiSeq to sequence their amplicons and though the amplicons were clean their assemblies were fragmented. They suggested the use of PacBio long-read sequencing in place of Illumina short-reads for more contiguous assembly. Bai et al. (2019) used long-range PCR to amplify viral genomes that existed in too low of abundance to sequence de novo using second or third generation sequencing techniques. They used both Illumina short-reads (Hi-Seq) and PacBio long-reads

(RSII) to sequence their amplicons and reported that PacBio reads easily spanned all repetitive

27

regions and facilitated easier assembly of the viral genome but had more variable coverage when compared to Illumina genomes. According to them the benefit of PacBio reads allowing for easier assembly must be weighed against the more uniform coverage and lower cost of

Illumina technologies.

Sometimes long-range PCR amplicons cannot span the entirety of a region of interest, and this would certainly be the case for the hazelnut S-locus. Development of a minimum amplicon tiling path is necessary to chain long amplicons together to cover a region. Francis et al. (2017) developed ThermoAlign to design primers for this purpose, demonstrating its use by resequencing a 24 kb target region in the highly repetitive maize genome using nine amplicons of 4-5 kb each. The program takes into account off-target binding sites in the target genome and designs highly specific primers in a minimal tiling path to cross a region of interest.

Research Objectives:

Our current understanding of SSI in hazelnut gives the OSU breeding program sufficient information to work around incompatibility issues in the species, but the identification of the male and female determinants of SI could facilitate development of allele-specific markers for easier identification of S-alleles in an individual. Additionally, a further understanding of the molecular mechanism controlling SSI in hazelnut would represent a significant contribution to our understanding of the SSI systems found in higher plants.

With new genomic tools available for the study of hazelnut, including a reference genome for

‘Jefferson’ with chromosome-level scaffolds, more progress can be made toward the goal of the identification of the genetic determinants of SSI. The first step of this process will be fine mapping of the S-locus to aid in the identification of candidate genes and following this

28

expression of S-locus genes following self- and cross-pollination will be evaluated. Lastly, long- range PCR and amplicon sequencing will be used to study the S-locus region in several genotypes independent of the ‘Jefferson’ reference.

We expect SSI in hazelnut to have some similarities with the Brassica system, though significantly diverged. It is likely that both a plasma membrane anchored receptor protein and pollen coat protein produced by the sporophyte participate in the recognition system. We expect highly tissue-specific expression of the male and female determinants of SSI. A single locus controls SSI in hazelnut, and we expect the coding sequences for the male and female determinants to be tightly linked. This work will set the stage for investigation of self- compatibility derived from S28, found in seedlings of the cutleaf hazel.

29

References:

1. Adachi, Y., S. Komori, Y. Hoshikawa, N. Tanaka, K. Abe, H. Bessho, M. Watanabe, and A.

Suzuki. 2009. Characteristics of Fruiting and Pollen Tube Growth of Apple Autotetraploid

Cultivars Showing Self-compatibility. J. Japan. Soc. Hort. Sci. 78 (4): 402–409.

2. Allen, A.M., and S.J. Hiscock. 2008. Evolution and phylogeny of self-incompatibility

systems in Angiosperms. P. 73-101. In V.E. Franklin-Tong. (ed.). Self-Incompatibility in

Flowering Plants. Springer Berlin/Heidelberg, Germany.

3. Andersen, C.L., J.L. Jensen, T.F. Ørntoft 2004. Normalization of Real-Time Quantitative

Reverse Transcription-PCR Data: A Model-Based Variance Estimation Approach to

Identify Genes Suited for Normalization, Applied to Bladder and Colon Cancer Data Sets,

Cancer Res 2004;64 5245-5250.

4. Bai, C.M., B. Morga, U. Rosani, Shi, J. Li, C. Xin, L.S. and Wang, C.M. 2019. Long-range

PCR and high-throughput sequencing of Ostreid herpesvirus 1 indicate high genetic

diversity and complex evolution process. Virology 526: 81–90.

5. Barrett, S. C. H. and J. S. Shore 2008. New Insights on Heterostyly: Comparative Biology,

Ecology and Genetics. In V. E. Franklin-Tong (Ed.), Self-Incompatibility Flowering Plants

(pp. 3–32). Springer Berlin/Heidelberg, Germany.

6. Bassil, N.V., R. Botta, and S.A. Mehlenbacher. 2005a. Additional microsatellite markers

of the European hazelnut doi: 10.17660/ActaHortic.2005.686.13.

7. Bassil, N.V., R. Botta, and S.A. Mehlenbacher. 2005b. Microsatellite markers in hazelnut:

isolation, characterization and cross-species amplification. J. Am. Soc. Hort. Sci. 130:543-

549. doi: 10.21273/JASHS.130.4.543.

30

8. Beltramo, C., Valentini, N., Portis, E. D.T. Marinoni, P. Boccacci, M.A.S. Prando, and R.

Botta. 2016. Genetic mapping and QTL analysis in European hazelnut (Corylus avellana

L.) Mol Breeding. 36: 27.

9. Bhat, S., J. Herrmann, P. Armishaw, P. Corbisier, and K.R. Emslie. 2009. Single molecule

detection in nanofluidic digital array enables accurate measurement of DNA copy

number. Anal Bioanal Chem 394(2):457. https://doi.org/10.1007/s00216-009-2729-5

10. Bhattarai, G. 2015. Microsatellite marker development, characterization and mapping in

European hazelnut (Corylus avellana L.) and investigation of novel sources of eastern

filbert blight resistance in Corylus. M.S. thesis (Corvallis, OR, USA: Oregon State

University).

11. Bhattarai, G., and S.A. Mehlenbacher. 2017. In silico development and characterization

of tri-nucleotide simple sequence repeat markers in hazelnut (Corylus avellana L.). PLoS

One 12 (5), e0178061 https://doi.org/10.1371/journal.pone.0178061. PubMed

12. Bhattarai, G., and S.A. Mehlenbacher. 2018. Discovery, Characterization, and Linkage

Mapping of Simple Sequence Repeat Markers In Hazelnut, Journal of the American

Society for Horticultural Science J. Amer. Soc. Hort. Sci., 143(5), 347-362.

13. Boccacci, P., A. Akkak, N.V. Bassil, S.A. Mehlenbacher, and R. Botta. 2005.

Characterization and evaluation of microsatellite loci in European hazelnut (Corylus

avellana L.) and their transferability to other Corylus species. Molecular Ecology Notes

5:934-937. doi: 10.1111/j.1471-8286.2005.01121.x.

31

14. Boccacci, P., A. Akkak, and R. Botta. 2006. DNA typing and genetic relations among

European hazelnut (Corylus avellana L.) cultivars using microsatellite markers. Genome

49:598-611.

15. Boccacci, P., C. Beltramo, M.A. Sandoval Prando, A. Lembo, C. Sartor, S.A.

Mehlenbacher, R. Botta, and D. Torello Marinoni. 2015. In silico mining, characterization

and cross-species transferability of EST-SSR markers forEuropean hazelnut (Corylus

avellana L.). Molecular Breeding 35:21. doi: 10.1007/s11032-015-0195-7.

16. Boccacci, P. and R. Botta 2009. Investigating the origin of hazelnut (Corylus avellana L.)

cultivars using chloroplast microsatellites. Genet. Resour. Crop Ev. 56. 851-859.

17. Botstein, D., R.L. White, M. Skolnick, and R.W. Davis. 1980. Construction of a genetic

linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet

32:314-331.

18. Brookfield, J.F.Y. 1996. A simple new method for estimating null allele frequency from

heterozygote deficiency. Mol. Ecol. 5:453-455.

19. Bustin SA, V. Benes, J.A. Garson, J. Hellemans , J. Huggett, M. Kubista, R. Mueller, T.

Nolan, M.W. Pfaffl, G.L. Shipley, J. Vandesompele, and C.T. Wittwer. 2009. The MIQE

guidelines: Minimum Information for publication of Quantitative real-time PCR

Experiments. Clin Chem. 55 (4):611–22.

20. Bustin, S. A., and T. Nolan. 2004.. Pitfalls of quantitative real-time reverse-transcription

polymerase chain reaction. J Biomol Tech, 15(3), 155–166.

32

21. Chagné D., K. Gasic, R.N. Crowhurst, Y. Han, H.C. Bassett, D.R. Bowatte, T.J Lawrence,

E.H.A. Rikkerink, S.E. Gardiner, and S.S. Korban. 2008. Development of a set of SNP

markers present in expressed genes of the apple. Genomics 92(5):353-358.

22. Chapman, J.R., J. Waldenström. 2015. With Reference to Reference Genes: A Systematic

Review of Endogenous Controls in Gene Expression Studies. PLoS ONE 10(11)

23. Chen, H., S.A. Mehlenbacher, and D.C. Smith. 2005. AFLP markers linked to eastern

filbert blight resistance from OSU 408.040 hazelnut. J. Amer. Soc. Hort. Sci. 130:412-417.

24. Chen, X., J. Zhang, Q. Liu, W. Guo, T. Zhao, Q. Ma, and Wang. 2014. Transcriptome

sequencing and identification of cold tolerance genes in hardy Corylus species (C.

heterophylla Fisch) floral buds. PloS one, 9(9), e108604.

25. Cheng, Y., S. Jiang, X. Zhang, H. He, and J. Liu. 2019. Whole-Genome Re-Sequencing of

Corylus heterophylla Blank-Nut Mutants Reveals Sequence Variations in Genes

Associated With Embryo Abortion. Frontiers in plant science. 10: 1465.

26. Colburn, B.C., S.A. Mehlenbacher, and V.R. Sathuvalli. 2017. Development and mapping

of microsatellite markers from transcriptome sequences of European hazelnut (Corylus

avellana L.) and use for germplasm characterization. Mol. Breed. 37(2): 16

27. Darwin, C. 1877. The different forms of flowers on plants of the same species. Murray,

London, UK.

28. de Nettancourt D. 1977. The Basic Features of Self-Incompatibility. In: Incompatibility in

Angiosperms. Monographs on Theoretical and Applied Genetics, vol 3. Springer,

Berlin/Heidelberg, Germany.

33

29. Deiner, K., M.A. Renshaw, Y. Li, B.P. Olds, D.M. Lodge, and M.E. Pfrender. 2017. Long-

range PCR allows sequencing of mitochondrial genomes from environmental DNA.

Methods Ecol. Evol. 8: 1888– 1898.

30. Erdoğan, V. and Mehlenbacher, S.A. (2001). INCOMPATIBILITY IN WILD CORYLUS

SPECIES. Acta Hortic. 556: 163-170.

31. FAO. FAOSTAT. (Latest update: January, 18, 2019). Accessed 8/12/2019. URL:

http://www.fao.org/faostat/en/#data/QC

32. Fisher, C., R. Meng, F. Bizouarn, and R. Scott. (2010). High Resolution Melt Parameter

Considerations for Optimal Data Resolution. Bio-Rad Bulletin 6009.

33. Francis, F., M.D. Dumas, and R.J. Wisser. 2017. ThermoAlign: a genome-aware primer

design tool for tiled amplicon resequencing. Sci. Rep. 7: 44437.

34. Fujii S, K., Kubo, S. Takayama. 2016. Non-self- and self-recognition models in plant self-

incompatibility. Nat Plants; 2:16130.

35. Gokirmak, T., S. Mehlenbacher, and N. Bassil. 2009. Characterization of European

hazelnut (Corylus avellana) cultivars using SSR markers. Genet. Resour. Crop Evol.

56:147-172.

36. Gürcan, K. and Mehlenbacher, S.A. 2010a. Development of microsatellite marker loci for

European hazelnut (Corylus avellana L.) from ISSR fragments. Mol Breeding. 26: 551.

37. Gürcan, K. and S.A. Mehlenbacher. 2010b. Transferability of microsatellite markers in

the Betulaceae. J. Am. Soc. Hort. Sci.135:159-173. doi: 10.21273/JASHS.135.2.159.

38. Gürcan, K., S.A. Mehlenbacher, R. Botta, and P. Boccacci. 2010a. Development,

characterization, segregation, and mapping of microsatellite markers for European

34

hazelnut (Corylus avellana L.) from enriched genomic libraries and usefulness in genetic

diversity studies. Tree Genetics & Genomes 6:513-531.doi: 10.1007/s11295-010-0269-y.

39. Gürcan, K., S.A. Mehlenbacher, R. Botta, and P. Boccacci. 2010b. Development,

characterization, segregation, and mapping of microsatellite markers for European

hazelnut (Corylus avellana L.) from enriched genomic libraries and usefulness in genetic

diversity studies. Tree Genet. Genomes 6:513- 531.

40. Hampson, C.R., G.D. Coleman, and A.N. Azarenko. 1996. Does the genome of Corylus

avellana L. contain sequences homologous to the self-incompatibility gene of Brassica?.

Theoret. Appl. Genetics 93: 759–764.

41. Hampson, CR., A.N. Azarenko, A. Soeldner. 1993. Pollen-Stigma Interactions following

Compatible and Incompatible Pollinations in Hazelnut. J. Amer. Soc. Hort. Sci.

118(6):814-819.

42. Heid C.A., J. Stevens, K.J. Livak, andP.M. Williams. 1996. Real time quantitative

PCR. Genome Res. 6:986–994.

43. Hindson, B. J., K.D. Ness, D.A. Masquelier, P. Belgrader, N.J. Heredia, A.J. Makarewicz, et

al. 2011. High-throughput droplet digital PCR system for absolute quantitation of DNA

copy number. Anal. Chem. 83: 8604−8610.

44. Hiscock, S. J., and D.A. Tabah. 2003. The different mechanisms of sporophytic self-

incompatibility. Philos Trans R Soc Lond B Biol Sci. 358(1434): 1037–1045.

45. Honig, J.A., M.F. Muehlbauer, J.M. Capik, C. Kubik, J.N. Vaiciunas, S. A. Mehlenbacher,

and T.J. Molnar. 2019. Identification and mapping of eastern filbert blight resistance

35

quantitative trait loci in European hazelnut using double digestion restriction site

associated DNA sequencing. J. Amer. Soc. Hort. Science. 144(5):295-304.

46. Hummer, K.E. (June 26, 2018) Corylus Genetic Resources. Retrieved from:

https://www.ars.usda.gov/pacific-west-area/corvallis-or/national-clonal-germplasm-

repository/docs/ncgr-corvallis-corylus-germplasm/

47. Hummer, K.E. 2001. HISTORICAL NOTES ON HAZELNUTS IN OREGON. Acta Hortic. 556:

25-28

48. Igic, B., R. Lande, and J.R. Kohn. 2008. Loss of self-incompatibility and its evolutionary

consequences. Int. J. Plant Sci. 169: 93–104.

49. Ives, C., V.R. Sathuvalli, B.C. Colburn, and S.A. Mehlenbacher. 2014. Mapping the

Incompatibility and Style Color Loci in Two Hazelnut Progenies. HortScience. 49: 250-

253.

50. Whelan, J.A., N.B. Russell, and M.A. Whelan. 2003. A method for the absolute

quantification of cDNA using real-time PCR. J Immunol Methods, 278(1–2):261-269.

51. Jia, H., Y. Guo, W. Zhao, and K. Wang. 2015. Long-range PCR in next-generation

sequencing: comparison of six enzymes and evaluation on the MiSeq sequencer. Sci

Rep. 4(5737).

52. Kalinowski, S. T., M.L. Taper and T.C. Marshall. 2007. Revising how the computer

program CERVUS accommodates genotyping error increases success in paternity

assignment. Mol. ecol. 16(5): 1099-1106.

53. Komaei Koma, G. unpublished. Construction of a high density linkage maps for European

hazelnut (Corylus avellana L.) using Genotyping By Sequencing, microsatelite marker

36

development for Linkage Group 6 and identifying new sources of resistance to Eastern

Filbert Blight. : Oregon State University.

54. Koseva, B., D. J. Crawford, K.E. Brown, M.E. Mort, and J.K. Kelly. 2017. The genetic

breakdown of sporophytic self-incompatibility in Tolpis coronopifolia (Asteraceae). New

Phytol, 216: 1256-1267.

55. Kowyama, Y., T. Tsuchiya, and K. Kakeda. 2008. Molecular genetics of sporophytic self-

incompatibility in Ipomoea, a member of the Convolvulaceae. In V. E. Franklin-Tong

(Ed.), Self-Incompatibility Flowering Plants (pp. 259-271). Springer Berlin/Heidelberg,

Germany.

56. Kozera, B., and M. Rapacz. 2013. Reference genes in real-time PCR. J. Appl. Genet. 54(4):

391–406.

57. Kubo, K., T. Entani, A. Takara, N. Wang, A.M. Fields, Z. Hua, M. Toyoda, S. Kawashima, T.

Ando, A. Isogai, T. Kao, and S. Takayama. 2010. Collaborative Non-Self Recognition

System in S-RNase–Based Self-Incompatibility. Science. 330 (6005):796-799.

58. Kubo, K., T. Paape, M. Hatakeyama, T. Entani, A. Takara, K. Kajihara, M. Tsukahara, R.

Shimizu-Inatsugi, K.K. Shimizu, and S. Takayama. 2015. Gene duplication and genetic

exchange drive the evolution of S-RNase-based self-incompatibility in Petunia. Nat.

Plants 1: 14005.

59. Kumpatla S.P., R. Buyyarapu, I.Y. Abdurakhmonov, and J.A. Mammadov. 2012.

Genomics-assisted plant breeding in the 21st century: technological advances and

progress. In: Abdurakhmonov I (ed) Plant breeding. InTech publishers.

37

60. Liew, M., R. Pryor, R. Palais, C. Meadows, M. Erali, E. Lyon, C. Wittwer. 2004. Genotyping

of Single-Nucleotide Polymorphisms by High-Resolution Melting of Small Amplicons Clin

Chem, 50(7): 1156-1164.

61. Lin, Z., D.J. Eaves, E. Sanchez-Moran, F. Christopher, H. Franklin, and V.E. Franklin-Tong.

2015. The Papaver rhoeas S determinants confer self-incompatibility to Arabidopsis

thaliana in planta. Science. 350(6261):684-687.

62. Livak K.J. and T.D. Schmittgen. 2001. Analysis of relative gene expression data using real-

time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods. 25:402–408. doi:

10.1006/meth.2001.1262.

63. Lucas, S.J., K. Kahraman, B. Avşar, R.J.A. Buggs, I. Bilge. 2019. A chromosome-scale

genome assembly of European Hazel (Corylus avellana L.) reveals targets for crop

improvement. bioRxiv 817577.

64. Ma H., Z. Lu, B. Liu, Q. Qiu, and J. Liu. 2013. Transcriptome analyses of a Chinese

hazelnut species Corylus mandshurica. BMC Plant Biol. 13(152).

65. Mehlenbacher, S. A. 2014. Geographic distribution of incompatibility alleles in cultivars

and selections of European hazelnut. J. Amer. Soc. Hort. Sci. 139:191-212.

66. Mehlenbacher, S.A. 1997. Testing compatibility of hazelnut crosses using fluorescence

microscopy. Acta Hort. 445: 167-171.

67. Mehlenbacher, S.A. 2009. Genetic resources for hazelnut: state of the art and future

perspectives. Acta Hort 845: 33-38

68. Mehlenbacher, S.A. and A.N. Miller. 1989. 'Barcelona' hazelnut. Fruit Var. J. 43:90-95.

38

69. Mehlenbacher, S.A. and D.C. Smith. 1991. Partial self-compatibility in ‘Tombul’ and

‘Montebello hazelnut. Euphytica 56: 231-236.

70. Mehlenbacher, S.A. and D.C. Smith. 2006. Self-compatible seedlings of the cutleaf

hazelnut. HortScience 41: 482–483.

71. Mehlenbacher, S.A. and G. Bhattarai. 2018. An updated linkage map for hazelnut with

new simple sequence repeat markers. Acta Hortic. 1226: 31-38

72. Mehlenbacher, S.A., R.N. Brown, E.R. Nouhra, T. Gökirmak, N.V. Bassil, and T.L. Kubisiak.

2006. A genetic linkage map for hazelnut (Corylus avellana L.) based on RAPD and SSR

markers. Genome 49: 122-133.

73. Mehlenbacher, S.A., R.N. Brown, J.W. Davis, H. Chen, N.V. Bassil, and D.C. Smith. 2004.

RAPD markers linked to eastern filbert blight resistance in Corylus avellana. Theor. Appl.

Genet. 108: 651-656.

74. Molnar, T.J., Walsh, E., Capik, J.M., V. Sathuvalli, S.A. Mehlenbacher, A.Y. Rossman, and

N. Zhang. 2013. A Real-Time PCR Assay for Early Detection of Eastern Filbert Blight. Plant

dis. 97(6): 813-818.

75. Molnar, TJ. 2011. Corylus p. 15–48. In: Kole C. (ed.). Wild crop relatives: Genomic and

breeding resources forest trees. Springer Berlin/Heidelberg Germany.

76. Nasrallah, J. B., S.J. Rundle, and M.E. Nasrallah. 1994. Genetic evidence for the

requirement of the Brassica S-locus receptor kinase gene in the self-incompatibility

response. The Plant Journal, 5: 373-384.

77. Nei, M. 1973. Analysis of gene diversity in subdivided populations. PNAS. USA 70:3321-

3323.

39

78. Pfaffl M.W., A. Tichopad, C. Prgomet, and T.P. Neuvians. 2004. Determination of stable

housekeeping genes, differentially regulated target genes, and sample integrity:

BestKeeper–Excel-based tool using pair-wise correlations. Biotechnol Let 26(6): 509–515

79. Pfaffl, M.W. 2001. A new mathematical model for relative quantification in real-time RT-

PCR. Nucleic Acids Res. 29(9):e45. doi: 10.1093/nar/29.9.e45.

80. Platteau, C., M. De Loose, B. Meulenaer, and I. Taverniers. 2011. Detection of Allergenic

Ingredients Using Real-Time PCR: A Case Study on Hazelnut (Corylus avellena) and Soy

(Glycine max) J Agr Food Chem. 59 (20), 10803-10814

81. Pomper, K., A. Azarenko, N. Bassil, J.W. Davis, and S.A. Mehlenbacher. 1998.

Identification of random amplified polymorphic DNA (RAPD) markers for self-

incompatibility alleles in Corylus avellana. L. Theor Appl Genet 97: 479.

82. Rahman M.H., M. Uchiyama, M. Kuno, N. Hirashima, K. Suwabe, T. Tsuchiya, Y. Kagaya, I.

Kobayashi, K. Kakeda, and Y. Kowyama. 2007. Expression of stigma- and anther-specific

genes located in the S locus region of Ipomoea trifida. Sex Plant Reprod 20: 73–85.

83. Ražná K., M. Bežo, N. Nikolaieva, K. Garkava, J. Brindza, and J. Žiarovská. 2014.

Variability of Corylus avellana, L. CorA and profilin pollen allergens expression. J Envir Sci

Health B. 49: 639-645.

84. Rowley, E. 2016. Genetic resource development for European hazelnut (Corylus avellana

L.). PhD. dissertation (Corvallis, OR, USA: Oregon State University).

85. Rowley, E.R., S.E. Fox, D.W. Bryant, C.M. Sullivan, H.D. Priest, S.A. Givan, S.A.

Mehlenbacher, and T.C. Mockler. 2012. Assembly and characterization of the European

hazelnut ‘Jefferson’ transcriptome. Crop Science 52:2679-2686.

40

86. Ruan W.J., M.D. Lai. 2007. Actin, a reliable marker of internal control? Clin Chim Acta

385(1–2): 1–5.

87. Sathuvalli, V.R. and S.A. Mehlenbacher. 2011. A bacterial artificial chromosome library

for ‘Jefferson’ hazelnut and identification of clones associated with eastern filbert blight

resistance and pollen–stigma incompatibility. Genome 54: 862-867.

88. Sathuvalli, V.R., H. Chen, S.A. Mehlenbacher, and D.C. Smith. 2011. DNA markers linked

to eastern filbert blight resistance in 'Ratoli' hazelnut (Corylus avellana L.). Tree Genet

Genomes 7: 337-345.

89. Sathuvalli, V.R., S.A. Mehlenbacher, and D.C. Smith. 2012. Identification and mapping of

DNA markers linked to eastern filbert blight resistance from OSU 408.040 hazelnut.

HortScience 47: 570-573.

90. Sathuvalli, V.R. and S.A. Mehlenbacher. 2013. De novo sequencing of hazelnut bacterial

artificial chromosomes (BACs) using multiplex Illumina sequencing and targeted marker

development for eastern filbert blight resistance. Tree Genetics & Genomes 9:1109-

1118. doi: 10.1007/s11295-013-0626-8.

91. Schopfer, CR., M.E. Nasrallah, and J.B. Nasrallah. 1999. The Male Determinant of Self-

Incompatibility in Brassica. Science 286(5445): 1697-1700.

92. Sekerli, M. 2019. Microsatellite Marker Development, Characterization, and Mapping

and Investigation of New Sources of Resistance to Eastern Filbert Blight (EFB) in

European Hazelnut (Corylus avellana). : Oregon State University.

93. Shiba H., S. Takayama, M. Iwano, H. Shimosato, M. Funato, T. Nakagawa, F.S. Che, G.

Suzuki, M. Watanabe, K. Hinata, and A. Isogai. 2001. A pollen coat protein, SP11/SCR,

41

determines the pollen S-specificity in the self-incompatibility of Brassica species. Plant

Physiol. 125: 2095–103

94. Sijacic, P., X. Wang, A. Skirpan, Y. Wang, P.E. Dowd, A.G. McCubbin, S. Huang, and T.

Kao. 2004. Identification of the pollen determinant of S-RNase-mediated self-

incompatibility. Nature. 429: 302–305.

95. Studer B, J.B. Jensen, A. Fiil, and T. Torben Asp. 2009. “Blind” mapping of genic DNA

sequence polymorphisms in Lolium perenne L. by high resolution melting curve analysis.

Mol. Breeding 24(2):191-199.

96. Takasaki, T., K. Hatakeyama, G. Suzuki, M. Watanabe, A. Isogai, and K. Hinata. 2000. The

S receptor kinase determines self-incompatibility in Brassica stigma. Nature 403: 913–

916.

97. Takayama, S., H. Shiba, M. Iwano, H. Shimosato, F. Che, N. Kai, M. Watanabe, G. Suzuki,

K. Hinata, and A. Isogai. 2000. The pollen determinant of self-incompatibility in Brassica

campestris. PNAS, 97(4): 1920-1925.

98. Taylor, S.C., G. Laperriere, and H. Germain. 2017. Droplet Digital PCR versus qPCR for

gene expression analysis with low abundant targets: from variable nonsense to

publication quality data. Sci Rep 7: 2409. https://doi.org/10.1038/s41598-017-02217-x

99. Taylor, S.C., K. Nadeau, M. Abbasi, C. Lachance, M. Nguyen, and J. Fenrich 2019. The

ultimate qPCR experiment: producing publication quality, reproducible data the first

time. Trends Biotechnol. 37: 761-774.

100. Thompson M.M., H.B. Lagerstedt, and S.A. Mehlenbacher. 1996 Hazelnuts pp 125–184.

In: Janick J, Moore JN (eds) Fruit breeding, vol 3, Nuts. Wiley, New York,

42

101. Thompson, M.M. 1979a. Growth and development of the pistillate flower and nut in

'Barcelona' filbert. J. Amer. Soc. Hort. Sci. 104: 427-432.

102. Torello Marinoni D, N. Valentini, E. Portis, A. Acquadro, C. Beltramo, S.A. Mehlenbacher,

T.C. Mockler, E.R. Rowley, and R. Botta. 2018. High density SNP mapping and QTL

analysis for time of leaf budburst in Corylus avellana L. PLoS One. 13(4): e0195408.

103. Vandesompele J., K. De Preter, F. Pattyn, B. Poppe, N. Van Roy, A. De Paepe, F.

Speleman. 2002. Accurate normalization of real-time quantitative RT-PCR data by

geometric averaging of multiple internal reference genes. Genome Biol 3(7).

104. Vogelstein B., K.W. Kinzler. 1999. Digital PCR. PNAS 96(16):9236.

DOI: 10.1073/pnas.96.16.9236

105. Vos, P., R. Hogers, M. Bleeker, M. Reijans, T. Van de Lee, M. Hornes, and M. Zabeau.

1995. AFLP: a new technique for DNA fingerprinting. Nucleic Acids Res. 23(21): 4407-

4414.

106. Weigand, B. (April 11, 2019) Oregon Hazelnut Industry Growth Continues. Capital Press.

Retrieved from: https://www.capitalpress.com/specialsections/orchard/oregon-

hazelnut-industry-growth-continues/article_6a9d1efc-41d8-11e9-9126-

6b9f7046ee5c.html

107. Williams, J.G.K., A.R. Kubelic, K.J. Livak, J.A. Rafalsky, and S.V. Tingey. 1990. DNA

polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucleic

Acids Res. 18: 6531-6535.

43

108. Xiao, Z., X. Sun, X. Liu, C. Li, L. He, S. Chen, and J. Su. 2016. Selection of Reliable

Reference Genes for Gene Expression Studies on Rhododendron molle G. Don. Frontiers

in plant science. 7: 1547.

109. Yang, Z., T. Zhao, Q. Ma, L. Liang, and G. Wang. 2018. Comparative Genomics and

Phylogenetic Analysis Revealed the Chloroplast Genome Variation and Interspecific

Relationships of Corylus (Betulaceae) Species. Frontiers in plant science, 9(927).

110. Yasui Y., M. Mori, J. Aii, T. Abe, D. Matsumoto, S. Sato, Y. Hayashi, O. Ohnishi, and T.

Ota. 2012. S-LOCUS EARLY FLOWERING 3 Is Exclusively Present in the Genomes of Short-

Styled Buckwheat Plants that Exhibit Heteromorphic Self-Incompatibility. PLOS ONE

7(2).

111. Zhao, T., J. Li, D. Miao, L. Liang, G. Wang and Q. Ma (2019). Analysis of SSR information

in the pistil transcriptome and the development and characterization of novel tri-

nucleotide SSR markers in Corylus. Eur. J. Hortic. Sci. 84(5), 310-319.

112. Zhu X., X. Li, W. Chen, J. Chen, W. Lu, L. Chen, and D. Fu. 2012. Evaluation of new

reference genes in papaya for accurate transcript normalization under different

experimental conditions. PLoS One 7(8):e44405

113. Žiarovská J., N. Nikolaieva, K. Garkava, and J. Brindza. 2015. Validation of HMG CoA

Reductase as Internal Control for Hazelnut Pollen Allergens Expression Analysis. Austin J

Genet Genomic Res. 2(1): 1008.

44

Figure 1.1: Map of linkage group 5 (Mehlenbacher and Bhattarai 2018). The S3 allele is seen at

114.7 cM on LG5S (female), marked with an orange arrow.

45

Figure 1.2: Dominance hierarchy of S-alleles in hazelnut pollen (Mehlenbacher, 2014 and unpublished). A given allele is co-dominant to others at the same level, recessive to alleles at a higher level, and dominant to alleles at a lower level.

46

Chapter 2: Fine Mapping of the S-Locus in European Hazelnut

Abstract:

Marker development at the hazelnut S-locus allows for fine mapping and the identification of candidate genes for self-incompatibility in European hazelnut. In this study, bacterial artificial chromosome (BAC) end sequences associated with the BAC minimal tiling path of the S-locus were used to identify 36 PacBio contigs from the ‘Jefferson’ genome associated with the hazelnut S-locus. Simple sequence repeat (SSR) markers were developed from dinucleotide repeats found in those contigs and were characterized and mapped. Markers mapping near the

S-locus on linkage group 5 were screened against a population of 192 seedlings showing recombination between the flanking markers of the S-locus: G05-510 and AU02-1350. This yielded a 500 kb region containing the S-locus and 50 predicted genes, which was used to develop more SSR markers and a set of high-resolution melting (HRM) markers. When the new markers were used in fine mapping, they fully exploited all recombination in the fine mapping population and reduced the size of the S-locus to a 193.5 kb region containing 18 genes. Among the predicted genes at the locus are five probable leucine-rich repeat receptor-like protein kinases and three probable receptor-like serine/threonine protein kinases. The sequence of the mapped region most likely contains the S1 allele. A genomic region 2 Mbp away is predicted to represent the S3 allele based on SSR marker mapping and allele sizes. The two regions represent the homologous S-alleles, assembled to be side-by-side when they are actually two alleles at the same locus.

47

Introduction:

Self-incompatibility (SI) in plant species is a widespread and highly diverse set of systems that plants use to promote outcrossing and prevent self-fertilization of the ovary. The most common categories of self-incompatibility are gametophytic self-incompatibility (GSI) and sporophytic self-incompatibility (SSI). In GSI, pollen identity is determined by the genotype of the male gametophyte (pollen) and in SSI pollen identity is determined by the genotype of the sporophyte (plant) that produces the pollen. Incompatibility is commonly controlled by a single highly polymorphic locus with a minimum of two genes, one controlling male identity and the other controlling female identity. SSI is best understood in the Brassica species where the female determinant of SI is a serine/threonine receptor kinase named S-receptor kinase (SRK)

(Nasrallah et al. 1994) and the pollen determinant is a small cysteine-rich protein named

SCR/SP-11 that is expressed in the tapetum of the anther and deposited on the pollen coat

(Stephenson et al. 1997, Schopfer et al. 1999, Takayama et al. 2000). Recognition of self-pollen in Brassica is enhanced by S-locus glycoprotein (SLG) (Takasaki et al. 2000). Brassica

(Brassicaceae) is commonly presented as a model for SSI, but species of Ipomoea

(Convolvulaceae) and Tolpis (Asteraceae) that show SSI have also been studied, and a set of candidate genes for the incompatibility response identified in each species. The SSI candidate genes in Ipomoea and Tolpis are not similar to those in Brassica (Koseva et al. 2017, Rahman et al. 2007). The apparent dissimilarity between the SSI systems in each family and the distant evolutionary relationships of SSI families suggest independent origins of SSI.

48

Corylus avellana L., the European hazelnut, is a species that exhibits SSI. Hazelnut is a monoecious, wind-pollinated, and dichogamous woody perennial of the family Betulaceae. It is a highly heterozygous diploid with a chromosome number of 2n=2x=22. The male inflorescence is a pendulous catkin and the female inflorescence is a compound bud containing 4-16 flowers, each of which has a pair of red stigmatic styles. Prior to pollination, the female inflorescence appears as a small tuft of many red styles emerging from a small bud. Anthesis is in the winter, from mid-December to late February in Oregon. Cultivars show dichogamy and most are protandrous, with male flowers maturing before female flowers.

SSI in hazelnut is controlled by a single highly polymorphic locus, the S-locus, at which

33 alleles have been identified to date. Alleles are identified using fluorescence microscopy as described by Mehlenbacher (1997). Branches are emasculated and bagged in December prior to emergence of the stigmas. Pollen is collected from each of 33 tester trees, each expressing one of the different S-alleles, and stored in the freezer (-18 °C). Females are pollinated in the lab, placed on moist filter paper in a petri dish, and 15-24 h later the styles are removed, placed in a drop of aniline blue stain, and squashed. A reaction is called compatible or incompatible based on the appearance of the stained pollen tubes. Low pollen germination or bunched, short, or bulbous pollen tubes are typical of an incompatible pollen-stigma interaction while good pollen germination with a mass of long parallel pollen tubes growing down the style indicate a compatible combination. The alleles are co-dominant in the stigma and can be either co-dominant or dominant in the pollen depending on the pair of alleles present and their position in the dominance hierarchy (Mehlenbacher 1997) (Figure 1.2). Self-compatible

49

seedlings have been identified in seedlings of the cutleaf hazelnut (Mehlenbacher and Smith,

2006).

Hampson et al. (1996) investigated whether there are sequences in the hazelnut genome that are similar to those of the Brassica genes SLG and SRK using probe hybridization assays at high and low stringencies (3-4% mismatch tolerated vs. 20-30% mismatch tolerated, respectively). SRK showed no regular hybridization and SLG showed regular hybridization only at low stringency, though not with its conserved functional domains. Hampson et al. (1996) concluded that the S-genes in hazelnut are likely of independent origin or greatly diverged from those in Brassica.

Mehlenbacher et al. (2006) placed the S-locus on linkage group 5 on the map constructed using 144 seedlings from a cross of female parent OSU 252.146 (S3 S8) and male parent OSU 414.062 (S1 S1). The map was constructed using random amplified polymorphic DNA

(RAPD) markers. The addition of simple sequence repeat (SSR) markers allowed alignment of the maps of the female and male parents. The S-locus was placed between RAPD markers G05-

510d and AU02-1350d on the map for the female parent (OSU 252.146) which segregates (S3 vs.

S8) in a 1:1 ratio (Sathuvalli and Mehlenbacher 2011). The distance between two markers was

14.6 centiMorgans (cM). One of the seedlings in the mapping population is the cultivar

‘Jefferson’ (S1, S3) which was sequenced as a reference genome for European hazelnut (Vining et al. in press). In the version 3 (V3) genome of 'Jefferson', the S-locus region is 2.6 megabase pairs (Mbp) long and contains 293 predicted genes (Coordinates 3.03-5.7 Mbp on Scaffold 4).

To reduce the size of the S-locus, Sathuvalli and Mehlenbacher (2011) repeated the original cross and it’s reciprocal in 2007, producing 1080 and 408 seedlings, respectively. DNA was

50

extracted from these seedlings, amplified using the polymerase chain reaction (PCR) and RAPD primers AU02 and G05. After separation by electrophoresis on agarose gels, the presence of bands of 1350 bp for AU02 and 510 bp for G05 was scored. G05-510 and AU02-1350 are robust and easy to score RAPD markers that flank the S-locus. Of the 1488 seedlings, 192 seedlings showed recombination between the two loci and were retained for planting in the field. In other words, a seedling was scored as recombinant if it had one marker but not the other.

When the 192 seedlings flowered, their S-alleles were identified. Additional markers in the region were scored in the recombinant seedlings. The map, produced by the double pseudo- testcross approach, shows markers G05-510d and AU02-1350d as "dummies". The map shows two markers, RAPD 204-950 and SSR KG819-200, on the map at 0.14 and 0.24 cM from the S- locus, respectively. This further narrowed the region that contains the S-locus but additional markers are needed to finely map the region.

Sathuvalli and Mehlenbacher (2011) described a bacterial artificial chromosome (BAC) library of the ‘Jefferson’ hazelnut genome for use in map-based cloning of the self- incompatibility and eastern filbert blight (EFB) resistance loci in hazelnut. The BAC library consists of 39,936 clones with a mean insert size of 117 kilobases (kb). S-locus associated SSR marker KG819 and RAPD marker 204-950 were screened against the library to identify 7 and 9 S- locus associated BACs, respectively.

In 2015 Chris Saski of Clemson University used high information content fingerprinting

(HICF) to construct a physical map of the ‘Jefferson’ genome using methods described by Ding et al. (2001). The map consisted of 26,752 BAC clones arranged in 1,673 contigs. They also

51

identified a minimal tiling path (MTP) consisting of 5232 clones and sequenced the ends of each clone in the MTP. S-locus associated markers were used to identify 61 associated BACs in 20 BAC contigs and two singletons resulting in a final MTP of the S-locus containing 129 BACs and an estimated length of 13.4 Mbp.

Rowley et al. (2016) used Illumina short-read sequencing to generate the first genome sequence (93X) for ‘Jefferson’ hazelnut. In addition, the genomes of 7 additional cultivars

('Barcelona', 'Ratoli', 'Tonda Gentile delle Langhe', 'Tonda di Giffoni', 'Daviana', 'Hall’s Giant' and

‘Tombul (Extra Ghiaglhi)’ were sequenced at lower coverage (~20x) (Rowley et al. 2016). These seven genome sequences were used to efficiently develop new SSR markers (Colburn 2017;

Bhattarai and Mehlenbacher 2017 and 2018). More recently, Snelling et al. (2018) sequenced the ‘Jefferson’ genome using long reads from the Pacific Biosciences (PacBio) RSII sequencer providing ~49x coverage. The resulting contigs were error corrected using the Illumina reads produced by Rowley et al. (2016) and now constitute the version 2 (V2) reference genome for

'Jefferson'. Snelling et al. (2018) also sequenced the parents of the mapping population, OSU

252.146 and OSU 414.062, using Illumina Hi-seq to allow determination of which sequences in the 'Jefferson' genome are from the female parent and which are from the male parent, and to facilitate further marker development. The error-corrected PacBio reads were assembled into chromosome-level scaffolds using Hi-C proximity ligation procedures (Dovetail Genomics, Scotts

Valley, CA). The resulting 11 scaffolds correspond to the haploid chromosome number of hazelnut (2n=2x=22) and represent the V3 genome sequence of ‘Jefferson’, which was used for gene annotation (Vining et al., in press) and further marker development.

52

The goal of this study was to develop additional markers in the S-locus region using the

PacBio genome sequence of 'Jefferson' hazelnut and sequences of RAPD markers, SSR markers and BAC ends to finely map the S-locus region. A further goal was to identify candidate genes for the male and female determinants of self-incompatibility.

Methods:

SSR identification. BAC end sequences from minimal tiling path BACs in the S-locus region were used to identify the corresponding ‘Jefferson’ (V2) PacBio contigs. These contigs were then searched for presence of SSRs. GMATO software (Wang et al. 2013) was used to identify SSRs with di-nucleotide motifs with a minimum of 8 repeats. SAMtools (Li et al. 2009) was used to trim fragments, retaining 250 bp on either side of the repeat. ClustalW was used to identify duplicates among the sequences identified along with others developed by Amira M.

Bidani and Tian-tian Zhao. Repeats that contained only As and Ts were not pursued as previous experience found them to be difficult to score (Bhattarai and Mehlenbacher 2017). Illumina short reads from seven C. avellana cultivars ['Barcelona', 'Ratoli', 'Tonda Gentile delle Langhe',

'Tonda di Giffoni', 'Daviana', 'Hall's Giant' and 'Tombul (Extra Ghiaghli)'] (Rowley et al. 2016) were aligned to the ‘Jefferson’ reference using BBmap

(https://sourceforge.net/projects/bbmap/) and the aligned sequences visualized using Tablet software (Milne et al. 2012). The aligned reads were inspected and sequences that were clearly polymorphic for number of repeats but had conserved flanking regions that lacked polymorphism were identified. BLASTn was used to identify sequences that were identical to

53

those already deposited in NCBI (https://www.ncbi.nlm.nih.gov/). A subset of SSRs that showed at least 3 alleles each were chosen from each contig. BatchPrimer3 (You et al., 2008) was used to design primers with parameters set at an optimum annealing temperature of 60 ℃, minimum

GC content of 50%, and an amplicon size of 90-350 bp to facilitate post-PCR multiplexing of primer products for genotyping. Oligo primers were ordered from Integrated DNA Technologies

(IDT), Inc. (Coralville, IA).

During May 2017, 2-4 leaves were collected from each of 24 cultivars (Table 2a) growing in the field at the USDA-ARS National Clonal Germplasm Repository (NCGR) or Oregon State

University’s (OSU) Smith Horticultural Research Farm in Corvallis. DNA was extracted according to Lunde et al. (2000) without RNAase treatment. DNA was quantified using ultraviolet spectrophotometry with a BioTek Synergy 2, a Take3 microplate reader, and data analysis with

Gen5 software (Biotek Instruments, Winooski, VT). DNA was diluted with water to a concentration of 20 ng/μl. DNA of the 24 cultivars was amplified using the SSR marker primer pairs. PCR amplification was performed in GeneAmp PCR system 9700 thermal cyclers (Applied

Biosystems, Foster City, CA) in 96-well plates as follows: denaturation at 94 °C for 5 min followed by 40 cycles of 94 °C for 40 s, 60 °C for 40 s, 72 °C for 40 s, extension at 72 °C for 7 min, and a final infinite hold at 4 °C. The PCR products were separated by electrophoresis for 3.2 h at 90V on 3% w/v agarose gels with TBE buffer as described by Bhattarai and Mehlenbacher (2017).

Gels were stained with ethidium bromide for 20 min, destained for 15 min and photographed under UV light using a BioDoc-It® Imaging System (UVP, Upland, CA). The photographs were inspected, and notes recorded for each primer pair. Primers that generated one or two PCR products per cultivar that were similar in size and clearly polymorphic among the 24 cultivars

54

were pursued further. In a second round of marker development, Version 3 of the ‘Jefferson’ reference genome (PacBio + Dovetail) was used, and AT repeats and shorter repeat lengths were included in the search for SSRs. Polymorphism was predicted by aligning the lllumina reads of the parents of 'Jefferson', OSU 252.146 ♀ and OSU 414.062 ♂, with the V3 genome sequence of the S-locus region of ‘Jefferson’ using BWA Mem (Li 2013) and visualized using Tablet software.

BLAST of primer sequences for the new SSR markers was used to search for off-target binding sites in the genome that could produce unwanted amplicons. The second set of SSR markers were not screened for polymorphism on 3% agarose gels.

SSR characterization and mapping. For a subset of the SSR markers scored as "clearly polymorphic" in the first round of development, primers fluorescently labeled with FAM or HEX were ordered from Integrated DNA Technologies (IDT), Inc. (Coralville, IA). The subset of SSRs included a few from each of the contigs identified by alignment with sequences of RAPD markers, SSR markers and BAC ends. The fluorescent forward and non-fluorescent reverse primers were used to amplify DNA from 48 diverse individuals, including the 24 cultivars originally screened on agarose gels, 24 additional accessions, and the parents of 'Jefferson'

(Table 2). PCR products of different sizes and fluorescent tags were pooled post-PCR, allowing for multiplexing of 3-7 products in the same well. For pooling, 2 μl of the product from each primer pair was mixed and diluted with water to a final volume of 200 μl volume. A 1.5 μl aliquot of the pooled samples was sent to the Core Labs of OSU’s Center for Genome Research and Biocomputing (CGRB) for capillary electrophoresis on an ABI 3730 (Life Technologies,

Carlsbad, CA) system with ROX-500 as the size standard. Peak sizes were estimated using

Genemapper software (Life Technologies, Carlsbad, CA) and verified manually. Samples with

55

PCR amplification failures or ambiguous results were repeated. Markers that were monomorphic or judged difficult-to-score were not pursued further. SSR markers were characterized using PowerMarker software (Liu and Muse 2005) to retrieve values for observed heterozygosity (Ho), expected heterozygosity (He), polymorphism information content (PIC), and Cervus software was used to estimate the frequency of null alleles (Fnull) (Kalinowski et al.

2007).

Fluorescent forward primers and non-fluorescent reverse primers were then used to amplify and size SSR bands in 138 individuals comprising the mapping population (OSU 252.146 x OSU 414.062) using the same methods as above. Each SSR allele size was scored as present or absent in each seedling. Allele scores for the new SSR markers were combined with those for the previously mapped SSR and RAPD markers on linkage group 5 (Mehlenbacher and Bhattarai

2018), and imported into JoinMap 5 software (Van Ooigen 2018). Linkage map construction was as described by Bhattarai and Mehlenbacher (2017) with a LOD score of 10 for grouping and an arbitrary nearest neighbor fit cutoff of 8.7. Nearest neighbor stress values were inspected to identify markers that fit poorly with neighboring markers.

Fine mapping. Of the 1,488 seedlings used for fine mapping of the S-locus region (Sathuvalli and

Mehlenbacher, 2011), 192 seedlings were scored as recombinant between the markers G05-510 and AU02-1350 that flank the S-locus. The newly developed markers near the S-locus were scored in the recombinant seedlings, and the new data combined with previous marker scores

(Sathuvalli and Mehlenbacher 2011). The combined data were imported into JoinMap and a map generated to show marker order in the region. Following this, the marker sequences were

56

aligned with the ‘Jefferson’ V3 genome and their coordinates were noted. The positions of predicted genes in the region were also noted, with a focus on those between the new markers that flanked the S-locus.

High-Resolution Melt (HRM) marker development and mapping. Following the mapping of the region with SSR markers, high resolution melting (HRM) markers were developed in or near predicted S-locus genes. Single nucleotide polymorphisms (SNPs) that were heterozygous in the female parent (OSU 252.146) and homozygous in the male parent (OSU 414.062) and had conserved regions bordering them that allowed primer design were selected for HRM marker development. Primers for 34 SNPs were designed using Batchprimer3 (You et al. 2008) to produce PCR amplicon sizes ranging from 70 to 250 bp and ordered from IDT. BLAST was used to search for off-target binding sites in the genome that could produce extra, unwanted amplicons.

HRM primers were used to amplify the parents of the mapping population to confirm presence of a single band after running 2% agarose gels, at 90 V for 1.5 hours. HRM reactions were performed using a CFX96 real-time PCR system, CFX Manager software was used to set up the plate arrangement, and Precision Melt Analysis software was used to analyze the melt curves

(Bio-Rad Laboratories, Hercules, CA). Reaction mixtures had a 10 µl volume that included 5 µl 1X

Precision Melt Supermix (BioRad), 0.5 µl of each primer at 200 nM, and 4 µl of genomic DNA at

5 ng/µl. Cycling parameters were 95 °C for 2 min followed by 40 cycles of denaturation at 95 °C for 10 s and annealing/extension at 60 °C for 30 s. The heteroduplex formation step of 95 °C for

30 s and 60°C for 1 min was followed by melting over a temperature range of 65 to 95 °C with

0.2 °C increments and a rate of 10 s/step. Initially all markers were scored on 48 seedlings from the mapping population to determine whether melt curve patterns were easy to score.

57

Alternative markers were developed from the same sequence if the clustering of melt curves was ambiguous. When useful HRM markers were identified they were scored in the full mapping population (n=138). HRM markers scores were combined with those of other markers on linkage group 5 and JoinMap used to construct a new linkage map. Lastly, HRM markers were screened against the 192 seedlings in the fine mapping population showing recombination between G05-510 and AU02-1350. When multiple HRM markers were available from a single gene only the one with clearest separation in melt curves was selected for scoring and mapping.

Results:

BAC end sequences from the minimal tiling path of the S-locus region identified 36

PacBio contigs from the V2 ‘Jefferson’ genome (Table 2.1, Appendix A). A search of these contigs identified 2,722 di-nucleotide repeats. When sequences of seven cultivars were aligned with the

V2 ‘Jefferson’ reference, 708 repeats showed clear polymorphism in number of repeats and had conserved flanking regions. Primer pairs were designed for a subset of 161 di-nucleotide repeat regions. PCR amplification of the DNA of 24 cultivars (Table 2.2a), followed by agarose gel electrophoresis, showed that 52 were clearly polymorphic. For these 52, fluorescent forward primers were used in PCR amplification and fingerprinting of 50 individuals (Table 2.2), including the parents of the mapping population. Of these markers, 10 were discarded because 5 produced multiple bands on gels and 5 were monomorphic. Seven more markers were not characterized because they showed peak patterns that were difficult to score. The remaining 35

58

polymorphic markers were fully characterized and had an average of 10.5 alleles, 0.71 Ho, 0.75

He, 0.73 PIC, and 0.03 F(null) (Table 2.3, Appendix C).

Of the 35 characterized markers, 7 (CB011.02, CB016.01, CB016.10, CB016.11, CB018.01,

CB087.01, and CB332.02) were not polymorphic in the mapping population. Three markers

(CB007.06, CB011.04, and CB123.04) were added that could not be fully characterized but showed potential in mapping, resulting in a final set of 31 markers that could be added to the linkage maps. These markers were scored in 138 seedlings of the original mapping population

(OSU 252.146 x OSU 414.062) and added to the linkage map of Mehlenbacher and Bhattarai

(2018) (Figure 2.1). The resulting maps showed 8 new SSR markers on linkage group 5 where the S-locus resides, and the ordering of markers was approximately the same as predicted by the reference genome. The 7 out of the 8 markers were developed from contigs 18 and 16 of the ‘Jefferson’ V2 reference genome. Other markers were placed on different linkage groups

(Table 2.4). These 8 markers, and 10 SSR markers previously developed and mapped to the S- locus region (Table 2.5), were scored in the 192 recombinant individuals in the fine-mapping population. The new map showed two markers, CB018.12 and KG847, flanking S3 on linkage group 5 with 14 recorded recombination events between CB018.12 and the S3 allele and 3 recorded recombination events between KG847 and the S3 allele (Figure 2.2a). The coordinates of these markers in the 'Jefferson' V3 reference genome showed the distance between

CB018.12 and KG847 to be 500 kb and the region contained 49 predicted genes. This 500 kb region was searched for additional SSR markers. 10 new SSR markers were identified using the methods described earlier, supplemented by alignment of Illumina reads of the parents with the

V3 reference to identify SSR markers useful for mapping. BLAST of primer sequences identified

59

predicted off-target binding sites for several primers in a region 2 Mbp downstream from the target region in the reference genome.

All 10 SSR markers were predicted to be polymorphic in-silico, RH_SLOC01 was discarded after no PCR product was visualized on an electrophoresis gel, RH_SLOC08 was discarded after screening against the mapping population because many peaks could not be scored, and

RH_SLOC04 was monomorphic. RH_SLOC09 was polymorphic and useful in mapping but not characterized due to difficulty in accurate fragment sizing. Characterization of the 6 remaining

SSR markers using 50 diverse cultivars gave means of 12.2 alleles per marker, 0.74 Ho, 0.83 He,

0.81 PIC, and .077 F(null) (Table 2.3, Appendix C).

An additional 34 HRM markers were identified using the same alignment by visually scanning for single nucleotide polymorphisms (SNP) that would segregate 1:1 from the female parent of the mapping population. BLAST of primer sequences again identified predicted off- target binding sites for several primers in the same region 2 Mbp downstream from the target region in the reference genome. Primer sequences aligned to the same region that was identified when screening SSR markers for off-target binding. Screening the parents of the mapping population and separating products on an electrophoresis gel led to six markers being discarded due to the presence of more than one band and 1 marker was discarded for absence of any bands. A further 13 markers were discarded as uninformative in melt curve analysis resulting in 14 markers that were screened against the full mapping population. For each gene containing multiple useful HRM markers the one with clearest separation of melt curves was selected, resulting in a final set of 7 markers (Table 2.6, Appendix B).

60

The newly developed markers were placed on linkage group 5F using the original population (n=138) from OSU 252.146 x OSU 414.062 (Figure 2.1) and then the 192 recombinant seedlings of the 1488 seedlings. Though RH_SLOC02 and RH_SLOC05 were characterized they were not used in mapping due to redundancy with previously developed SSR marker CB018.12 and monomorphism, respectively, so with the inclusion of RH_SLOC09 the final number of mapped SSR markers decreased to 5 (Table 2.6). The new set of markers fully exploited recombination in the fine-mapping population. The S-locus is now flanked by SSR marker RH_SLOC07 on the left and KG847 on the right (Figure 2.2b) and the size of the S-locus region, according to the V3 'Jefferson' reference, has been reduced to a 193.5 kb region that contains 18 predicted genes (Table 2.7a).

Predicted genes in the target region.

Interesting gene annotations in the 193.5 kb region include a set of five probable leucine-rich repeat receptor-like protein kinases (LRR-RLK) with homology to At1g35710

(Corav.Jeff_12866, Corav.Jeff_12869, Corav.Jeff_12870, Corav.Jeff_12872, Corav.Jeff_12874) and three probable transmembrane receptor-like serine/threonine protein kinases (RLK) with homology to At5g15080 (Corav.Jeff_12871, Corav.Jeff_12873, Corav.Jeff_12875).

Alternate locus identified.

Development of SSR and HRM markers targeted the S-locus region from 4.53-4.71 Mbp on scaffold 4 of the V3 reference. The search for off-target primer binding sites consistently identified a second locus in the reference genome with nearly exact matches. The second locus was from 6.76-6.92 Mbp on scaffold 4, just two Mbp away from the mapped region. The

61

markers at the second locus were in the same order and nearly the same distances apart (Figure

2.3c, 2.3d, Appendix F, Appendix G). Further, the region that was used for fine mapping of the region originated from PacBio contig 18 in the V2 genome sequence, while the alternate locus originated from PacBio contig 16. Both contigs are over 3 Mbp long and are near each other in the V3 genome sequence. We believe that they represent the two alleles of 'Jefferson' (S1 and

S3, respectively) at the S-locus, rather than two separate genetic loci.

Gene annotations at the alternate locus (S3) are highly similar to the mapped region

(S1) and include 3 LRR-RLK genes with homology to At1g35710 and 3 RLK genes with homology to At5g15080 (PIX7) (Table 2.7b, Figure 2.3a, 2.3f, Appendix F, Appendix G). The fragment sizes at the SSR loci match predictions for markers for S1 from the male parent, and fragment sizes for S3 locus match predictions for markers for S3 from the female parent (Table 2.8). Finally, alignments of Illumina sequences to the reference genome showed a greater abundance of reads from the male parent (S1, S1) mapping to the S1 locus and a greater abundance of reads from the female parent (S3, S8) mapping to the S3 locus.

Discussion:

Incompatibility in hazelnut is sporophytic and under the control of a single locus with multiple alleles. This study finely mapped the S-locus region, narrowing it to a genomic region of

193.5 bp that contains 18 predicted and annotated genes (Table 2.7). One recombinant individual (OSU 1425.021), initially scored as S1 S3, showed an additional recombination event between the S3 allele and SSR markers RH_SLOC09, RH_SLOC10, and 877-HRM2. Inclusion of

62

this seedling would have reduced the S-locus further to 161.7 kb containing 14 genes. However, a check of the phenotyping records indicated that its alleles at the S-locus had not been identified with certainty. Unfortunately, the seedling no longer exists, so its S-alleles were considered unknown.

None of the 18 genes in the S-locus region show homology to S-receptor kinase (SRK), S- locus glycoprotein (SLG), or S-locus protein 11/S-locus cysteine-rich (SP11/SCR) proteins known to be involved in SSI in Brassica. It is likely the SI mechanisms in Corylus and Brassica are examples of convergent evolution, which is also likely the case with two other families with sporophytic incompatibility, Convolvulaceae and Asteraceae (Koseva et al. 2017, Rahman et al.

2007).

There are 5 LRR-RLK genes in the S-locus region that show homology to At1g35710, a gene in Arabidopsis thaliana known to be a homolog to MIK2 (MDIS1-interacting receptor kinase). MIK2 is an important protein involved in pollen tube perception of female attractants

(Higashiyama et al. 2017), immune response (Coleman et al. 2019), salt stress response

(Julkowska et al. 2016) and root growth direction (Van der Does et al. 2017). Five MIK2 homologs exist in the S-locus region and three others are positioned 67 kb upstream from the S- locus region. Three RLK genes in the S-locus region show homology to At5g15080, which encodes a cytoplasmic receptor-like kinase in Arabidopsis thaliana also known as Putative

Interactor of XopAC 7 (PIX7). XopAC is a Xanthomonas campestris-specific type III effector that was shown to associate with PIX7 in a yeast two-hybrid screen though not with the uridylylation

63

domain hypothesized to be functionally relevant (Guy et al. 2013). The role played by these genes in Corylus is not yet known, but they are attractive candidates for further studies of SSI.

Two separate regions were identified in the ‘Jefferson’ V3 reference that seem likely to correspond to sequences from the S1 and S3 alleles, derived from PacBio contigs 18 and 16, respectively, and placed side-by-side due to a misassembly. Gene annotations in the two regions show many similarities but the S3 region has two fewer LRR-RLK genes than the S1 region.

Marker ordering for the two regions is the same but the distances between markers RH_SLOC09 and 870-HRM1 is reduced in the S3 locus (15.9 kb) as compared to the S1 locus (54.1 kb). An alignment of the regions using Mauve software (Darling et al. 2004) shows that both have conserved regions bordering a polymorphic central region, likely containing the genes responsible for self-recognition and preventing synapsis and crossing-over in the region (Figure

2.4).

Conclusion:

The genomic resources now available in hazelnut allow much quicker advances in the genetic studies of the species. In this study, stepwise development of SSR and HRM markers and fine genetic mapping narrowed the S-locus region to a 193.5 kb region containing 18 predicted genes. Further work must be pursued to study the expression of these S-locus genes, allelic diversity through targeted resequencing of cultivars, and the genetic basis of self-compatibility in seedlings of the cutleaf hazelnut. These future studies should further narrow the list of candidates for the determination of SI in hazelnut.

64

References:

1. Bhattarai, G., and S.A. Mehlenbacher. 2017. In-silico development and characterization

of tri-nucleotide simple sequence repeat markers in hazelnut (Corylus avellana L.). PLoS

One 12 (5): e0178061 https://doi.org/10.1371/journal.pone.0178061.

2. Bhattarai, G., and S.A. Mehlenbacher. 2018. Discovery, characterization, and linkage

mapping of simple sequence repeat markers in hazelnut. J. Amer. Soc. Hort. Sci. 143(5):

347-362.

3. Colburn, B.C., S.A. Mehlenbacher, and V.R. Sathuvalli. 2017. Development and mapping

of microsatellite markers from transcriptome sequences of European hazelnut (Corylus

avellana L.) and use for germplasm characterization. Mol. Breed. 37(2): 16

4. Coleman, A.D., L. Raasch, J. Maroschek, S. Ranf, and R. Hückelhoven. 2019. The

Arabidopsis leucine-rich repeat receptor kinase MIK2 is a crucial component of pattern-

triggered immunity responses to Fusarium fungi. BioRxiv 720037.

5. Darling A.C., B. Mau, F.R. Blattner, and N.T. Perna. 2004. Mauve: multiple alignment of

conserved genomic sequence with rearrangements. Genome Res. 14(7):1394–1403.

doi:10.1101/gr.2289704

6. Ding Y., M.D. Johnson, W.Q. Chen, D. Wong, Y.J. Chen, S.C. Benson, J.Y. Lam, Y.M. Kim,

and H. Shizuya. 2001. Five-color-based high-information-content fingerprinting of

bacterial artificial chromosome clones using type IIS restriction endonucleases.

Genomics 74(2): 142-154.

65

7. Fang GC, Blackmon BP, Staton ME, Nelson CD, Kubisiak TL, Olukolu BA, Henry D,

Zhebentyayeva T, Saski CA, Cheng CH et al. 2013. A physical map of the Chinese

(Castanea mollissima) genome and its integration with the genetic map. Tree Genet

Genomes 9(2): 525-537.

8. Li S, Chou HH. 2004. LUCY2: an interactive DNA sequence quality trimming and vector

removal tool. Bioinformatics 20(16): 2865-2866.

9. Sambrook J, Fitsch EF, Maniatis T. 1989. Molecular Cloning: A Laboratory Manual. Cold

Spring Harbor Press, Cold Spring Harbor, NY.

10. Saski CA, Feltus FA, Staton ME, Blackmon BP, Ficklin SP, Kuhn DN, Schnell RJ, Shapiro H,

Motamayor JC. 2011. A genetically anchored physical framework for Theobroma cacao

cv. Matina 1-6. BMC Genomics 12(1): 413.

11. Sathuvalli V, Mehlenbacher SA. 2009. A Hazelnut BAC Library for Map-Based Cloning of a

Disease Resistance Gene. Vii International Congress on Hazelnut 845: 191-194.

12. Soderlund C, Humphray S, Dunham A, French L. 2000. Contigs built with fingerprints,

markers, and FPC V4.7. Genome Res 10(11): 1772-1787.

13. Gürcan, K. and S.A. Mehlenbacher. 2010. Development of microsatellite marker loci for

European hazelnut (Corylus avellana L.) from ISSR fragments. Mol. Breeding 26: 551-

559. DOI 10.1007/s11032-010-9464-7

66

14. Guy, E., M. Lautier, M. Chabannes, B. Roux, E. Lauber, M. Arlat, & L.D. Noël. 2013. xopAC-

triggered immunity against Xanthomonas depends on Arabidopsis receptor-like

cytoplasmic kinase genes PBL2 and RIPK. PLOS ONE 8(8).

15. Hampson, C.R., G.D. Coleman, and A.N. Azarenko. 1996. Does the genome of Corylus

avellana L. contain sequences homologous to the self-incompatibility gene of Brassica?

Theoret. Appl. Genetics 93: 759–764.

16. Higashiyama, T., and W. Yang. 2017. Gametophytic Pollen Tube Guidance: Attractant

Peptides, Gametic Controls, and Receptors. Plant Physiology. 173 (1) 112-121.

17. Julkowska, M.M., K. Klei, L. Fokkens, M.A. Haring, M.E. Schranz, and C. Testerink. 2016.

Natural variation in rosette size under salt stress conditions corresponds to

developmental differences between Arabidopsis accessions and allelic variation in the

LRR-KISS gene. J. Exp. Botany 67(8): 2127–2138.

18. Kalinowski, S.T., M.L Taper, and T.C. Marshall. 2007. Revising how the computer

program CERVUS accommodates genotyping error increases success in paternity

assignment. Molecular Ecology 16: 1099-1106.

19. Koseva, B., D. J. Crawford, K.E. Brown, M.E. Mort, and J.K. Kelly. 2017. The genetic

breakdown of sporophytic self-incompatibility in Tolpis coronopifolia (Asteraceae). New

Phytol 216:1256-1267.

20. Li H., B. Handsaker, A. Wysoker, T. Fennel, J. Ruan, N. Homer, G. Marth, G. Abecasis, R.

Durbin, and 1000 Genome Project Data Processing Subgroup. 2009. The sequence

alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079.

67

21. Li, H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-

MEM. arXiv : 1303.3997 v1 [q-bio.GN].

22. Liu, K. and S.V. Muse. 2005. PowerMarker: Integrated analysis environment for genetic

marker data, Bioinformatics 21(9):2128-2129.

23. Lunde C.F., S.A. Mehlenbacher, and D.C. Smith. 2000. Survey of hazelnut cultivars for

response to eastern filbert blight inoculation. HortScience 35:729–731.

24. Mehlenbacher, S.A. 1997. Testing compatibility of hazelnut crosses using fluorescence

microscopy. Acta Hort. 445:167-171.

25. Mehlenbacher, S.A. and D.C. Smith. 2006. Self-compatible seedlings of the cutleaf

hazelnut. HortScience 41:482–483.

26. Mehlenbacher, S.A. and G. Bhattarai. 2018. An updated linkage map for hazelnut with

new simple sequence repeat markers. Acta Hort. 1226:31-38.

27. Mehlenbacher, S.A., R.N. Brown, E.R. Nouhra, T. Gökirmak, N.V. Bassil, and T.L. Kubisiak.

2006. A genetic linkage map for hazelnut (Corylus avellana L.) based on RAPD and SSR

markers. Genome 49:122-133.

28. Milne I., G. Stephen, M. Bayer, P.J.A. Cock, L. Pritchard, L. Cardle, P.D. Shaw, and D.

Marshall. 2012. Using Tablet for visual exploration of second-generation sequencing

data. Briefings in Bioinformatics 14(2):193–202.

29. Nasrallah, J. B., S.J. Rundle, and M.E. Nasrallah. 1994. Genetic evidence for the

requirement of the Brassica S-locus receptor kinase gene in the self-incompatibility

response. The Plant Journal 5:373-384.

68

30. Rahman M.H., M. Uchiyama, M. Kuno, N. Hirashima, K. Suwabe, T. Tsuchiya, Y. Kagaya, I.

Kobayashi, K. Kakeda, and Y. Kowyama. 2007. Expression of stigma- and anther-specific

genes located in the S locus region of Ipomoea trifida. Sex. Plant Reprod. 20:73–85.

31. Sanabria, N., D. Goring, T. Nürnberger, and I. Dubery. 2008. Self/nonself perception and

recognition mechanisms in plants: a comparison of self-incompatibility and innate

immunity. New Phytologist 178:503-514.

32. Sathuvalli, V.R. and S.A. Mehlenbacher. 2011. A bacterial artificial chromosome library

for ‘Jefferson’ hazelnut and identification of clones associated with eastern filbert blight

resistance and pollen–stigma incompatibility. Genome 54:862-867.

33. Schopfer, CR., M.E. Nasrallah, and J.B. Nasrallah. 1999. The male determinant of self-

incompatibility in Brassica. Science 286(5445):1697-1700.

34. Snelling, J.W., V.R. Sathuvalli, B.C. Colburn, G. Bhattarai, E.R. Rowley, T.C. Mockler, C.A.

Saski, D. Copetti, and S.A. Mehlenbacher. 2018. Genomic resource development in

hazelnut breeding. Acta Hort. 1226:39-46.

35. Stephenson A. J., J. Doughty, S. Dixon, C.J. Elleman, S.J. Hiscock, and H.G. Dickinson.

1997. The male determinant of self-incompatibility in Brassica oleracea is located in the

pollen-coating. The Plant Journal 12(6):1351–1359.

36. Takasaki, T., K. Hatakeyama, G. Suzuki, M. Watanabe, A. Isogai, and K. Hinata. 2000. The

S receptor kinase determines self-incompatibility in Brassica stigma. Nature 403:913–

916.

69

37. Takayama, S., H. Shiba, M. Iwano, H. Shimosato, F. Che, N. Kai, M. Watanabe, G. Suzuki,

K. Hinata, and A. Isogai. 2000. The pollen determinant of self-incompatibility in Brassica

campestris. PNAS. 97(4):1920-1925.

38. Van der Does, D., F. Boutrot, T. Engelsdorf, J. Rhodes, J.F. McKenna, S. Vernhettes, I.

Koevoets, N. Tintor, M. Veerabagu, E. Miedes, C. Segonzac, M. Roux, A.S. Breda, C.S.

Kardtke, A. Molina, M. Rep, C. Testerink, G. Mouille, H. Höfte, T. Hamann, and C. Zipfel.

2017. The Arabidopsis leucine-rich repeat receptor kinase MIK2/LRR-KISS connects cell

wall integrity sensing, root growth and response to abiotic and biotic stresses. PLoS

Genetics 13(6).

39. Van Ooijen, J.W. 2018. JoinMap® 5, Software for the calculation of genetic linkage maps

in experimental populations of diploid species. Kyazama B.V., Wageningen, The

Netherlands.

40. Vining K.J., J. Snelling, G. Komaei Koma, and S. Mehlenbacher. in press. A chromosome-

level assembly of the Corylus avellana ‘Jefferson’ genome.

41. Wang X., P. Lu, and Z. Luo. 2013. GMATo: A novel tool for the identification and analysis

of microsatellites in large genomes. Bioinformation 9(10):541–544.

42. You F. M., N. Huo, Y. Qiang Gu, M. Luo, Y. Ma, D. Hane, G.R. Lazo, J. Dvorak, and O.D.

Anderson. 2008. BatchPrimer3: A high throughput web application for PCR and

sequencing primer design. BMC Bioinformatics 9:253.

70

Table 2.1: Markers were used to identify high information content fingerprinting (HICF) contigs in the minimal tiling path. BAC end sequences from those HICF contigs (Appendix A) were used to identify PacBio contigs in the right column. Marker HICF Contigs PacBio Contig PacBio contig length (bp) 007-720 Contig 477 NA* 204-950 Contig 1422 Contig 11 3994632 260-525 Contig 300 Contig 97 1364462 260-525 Contig 300 Contig 39 2476421 345-1050 Contig 300 Contig 39 2476421 345-1050 Contig 300 Contig 97 1364462 815-510 Contig 1207 Contig 63 1726131 AU02-1350 Contig 272 Contig 18 3667985 AU02-1350 Contig 1082 Contig 30 2828186 AU02-1350 Contig 402 NA* AV12-1700S Contig 1768 Contig 88 1441097 AV12-1700S Contig 1768 Contig 120 1101236 AV12-1700S Contig 672 Contig 123 1056229 AV12-1700S Contig 672 Contig 479 136547 AV12-1700S Contig 672 Contig 483 133666 B720 Contig 43 Contig 18 3667985 B741 Contig 1197 Contig 87 1447621 B741 Contig 436 NA* BR259 Contig 1807 Contig 16 3585080 BR259 Contig 1807 Contig 18 3667985 BR415 Contig 1865 Contig 7 4943503 BR415 Contig 1865 Contig 16 3585080 G05-510S Contig 753 NA* G05-510S Contig 753 NA* I06-750 Contig 1741 Contig 6 5623595 I06-750 Contig 1259 Contig 41 2316184 I06-750 Contig 1207 Contig 332 264091 I06-750 Contig 1741 Contig 337 262800 KG847 Contig 1807 Contig 16 3585080 KG847 Contig 1807 Contig 18 3667985

*Due to typos in the dataset CTG1753, CTG1436, CTG1477, and CTG1402 were included in the MTP and erroniously used to identify S-locus associated contigs instead of the CTG753, CTG436, CTG477, and CTG402. Markers G05 and 007 were not useful in contig identification due to this error.

71

Table 2.2: 48 Hazelnut accessions used to screen for SSR polymorphism on agarose gels and to characterize di-nucleotide simple sequence repeat markers. Accession Origin Source a) 24 accessions used for polymorphism screening on agarose gels Ala Kieri COR187 Finland Lappa, Finland (seeds) Albania 55 Albania Cajup, Albania (seeds) Barcelona Spain Oregon nursery (scions) Casina Spain-Asturias Q.B. Zielinski from Spain (scions) Cosford England-Reading NYAES, Geneva, NY (scions) DuChilly England OSU Entomology Farm, OR, USA (scions) Fusco Rubra Germany Morton Arboretum, Lisle, IL, USA (scions) Gasaway USA-Washington Orchard in Washington, USA (scions) Hall's Giant Germany/France Oregon State Univ. Entomology Farm, OR, USA Imperiale de Trebizonde Turkey Q.B. Zielinski from France (scions) Mortarella Italy-Campania Q.B. Zielinski from Italy (scions) Negret Spain-Tarragona Q.B. Zielinski from Spain (scions) OSU 495.049 Russia-Southern VIR, southern Russia (seeds) Palaz Turkey-Ordu Q.B. Zielinski from Greece (scions) Pellicule Rouge France Q.B. Zielinski from France (scions) Pendula France Arnold Arboretum, , MA, USA (scions) Ratoli Spain-Tarragona IRTA Mas Bove, Reus, Spain (scions) Rote Zellernuss Netherlands Q.B. Zielinski from Netherlands (scions) Römische Nuss unknown Hermansverk, Norway (scions) Tombul Ghiaghli Turkey Q.B. Zielinski from Greece (scions) Tonda Bianca Italy-Campania Q.B. Zielinski from Italy (scions) Tonda di Giffoni Italy-Campania Q.B. Zielinski from Italy (scions) Tonda Gentile delle Langhe Italy-Piemonte Univ. di Torino, Italy (scions) Tonda Romana Italy-Lazio ISF Rome, Italy (scions)

b) 24 additional accessions used to characterize polymorphic markers Alli Estonia Polli, Estonia (scions) Artellet Spain-Tarragona IRTA Mas Bove, Reus, Spain (scions) Aurea France Morton Arboretum, Lisle, IL, USA (scions) B-3 Macedonia Skopje, Macedonia (scions) Barcelloner Zeller Spain Faversham, England, UK (scions) Bergeri Belgium-Luttich ISF Rome, Italy (scions) Contorta England Arnold Arboretum, Boston, MA, USA (scions) Cutleaf England Arnold Arboretum, Boston, MA (scions) Des Anglais unknown INRA, Bordeaux, France (scions) Early Long Zeller Germany Faversham, England, UK (scions) Gem USA-Washington Orchard in Oregon, USA (scions)

72

Table 2.2 (cont.): 48 Hazelnut accesions used to screen for SSR polymorphism on agarose gels and to characterize di-nucleotide simple sequence repeat markers. Accession Origin Source b) 24 additional accessions used to characterize polymorphic markers (cont.) Gunslebert Germany-Gunsleben INRA Bordeaux, France (scions) Gustav's Zeller Germany-Landsberg Faversham, England, UK (scions) Iannusa Racinante Italy-Sicily Univ. di Torino, Italy (scions) OSU 26.072 Russia-N. Caucasus North Caucasus, Russia (seeds) OSU 54.039 Turkey-Giresun/Ordu Giresun, Turkey (seeds) OSU 408.040 Univ. Minnesota Univ. Minnesota, MN, USA (seeds) OSU 556.027 Turkey-Istanbul Istanbul market, Turkey (seeds) OSU 681.078 Russia-Moscow Moscow, Russia (seeds) OSU 759.010 Georgia Ozurgeti, Georgia (scions) Sant Jaume Spain-Tarragona IRTA Mas Bove, Reus, Spain (scions) Simon Spain-Tarragona IRTA Mas Bove, Reus, Spain (scions) OSU 252.146 USA-Oregon State Univ. Parent of mapping population 93001 OSU 414.062 USA-Oregon State Univ. Parent of mapping population 93001

73

Table 2.3: Characterization information for all new SSR markers developed for fine mapping the hazelnut S-locus. Characterization was based on the genotypes of 48 diverse cultivars and the parents of the mapping population. Locus K Hexp Hobs PIC F(Null) SSR markers developed in the first round CB006.06 11 0.794 0.78 0.771 0.0084 CB006.13 6 0.488 0.46 0.465 0.0267 CB007.01 9 0.724 0.70 0.691 0.0113 CB007.03 14 0.840 0.80 0.824 0.0234 CB007.04 14 0.851 0.84 0.838 0.0022 CB007.07 5 0.682 0.74 0.637 -0.0435 CB007.10 11 0.718 0.80 0.675 -0.0589 CB011.02 11 0.625 0.62 0.576 0.0076 CB016.01 9 0.742 0.66 0.711 0.0676 CB016.04 9 0.738 0.76 0.708 -0.0121 CB016.05 10 0.862 0.90 0.847 -0.0221 CB016.09 12 0.802 0.58 0.778 0.1627 CB016.10 13 0.731 0.68 0.712 0.0248 CB016.11 13 0.701 0.70 0.675 -0.0252 CB018.04 11 0.847 0.82 0.831 0.0141 CB018.05 7 0.689 0.70 0.639 -0.0077 CB018.12 14 0.801 0.76 0.779 0.0284 CB030.03 17 0.867 0.92 0.854 -0.0336 CB039.03 13 0.771 0.79 0.753 -0.031 CB041.02 12 0.794 0.80 0.768 -0.0136 CB041.08 12 0.826 0.84 0.806 -0.0117 CB063.07 10 0.839 0.68 0.820 0.1103 CB087.01 9 0.810 0.82 0.786 -0.0068 CB097.05 13 0.579 0.36 0.559 0.2572 CB097.07 11 0.833 0.72 0.813 0.0662 CB123.01 9 0.772 0.84 0.745 -0.0512 CB123.02 12 0.817 0.68 0.797 0.0953 CB145.03 7 0.771 0.68 0.739 0.0587 CB332.01 5 0.547 0.54 0.502 0.0205 CB332.02 9 0.816 0.32 0.792 0.4392 CB479.02 11 0.869 0.80 0.855 0.0429 CB483.02 11 0.703 0.72 0.678 -0.0052 CB736.01 8 0.768 0.64 0.731 0.0923 CB007.06 11 NA NA NA NA CB018.01 9 NA NA NA NA CB123.04 11 NA NA NA NA Mean 10.5 0.7588 0.71 0.731 0.0374

74

Table 2.3 (Cont.): Characterization information for all new SSR markers developed for fine mapping the hazelnut S-locus. Locus K Hexp Hobs PIC F(Null) SSR markers developed in the second round RH_SLOC02 14 0.881 0.92 0.869 -0.0243 RH_SLOC03 9 0.837 0.78 0.816 0.038 RH_SLOC05 7 0.691 0.30 0.644 0.3872 RH_SLOC06 11 0.777 0.76 0.756 0.0124 RH_SLOC07 14 0.880 0.80 0.869 0.0498 RH_SLOC10 18 0.885 0.88 0.874 0.001 Mean 12.167 0.825 0.740 0.805 0.077

Note: Number of alleles (K), expected heterozygosity (Hexp), observed heterozygosity (Hobs), polymorphism information content (PIC), and frequency of null alleles (F[null]) were calculated for every SSR locus with only two alleles.

75

Table 2.4: Primer sequences, linkage grouop assignment, and Pacific Biosciences contigs in 'Jefferson' hazelnut genome (V2) for new simple sequence repeat markers.

Name F primer R Primer Linkage Group Seq ID CB006.06 GCAGAGATGGGAAGAGCTACA TGCCACCATTCATGTCAAAT LG10 000006F_pilon:2830348-2830875 CB006.13 CCGATGCAGAGACACATCAA CCGTCCAGTTTGACAGTGAG LG10 000006F_pilon:5455241-5455768 CB007.03 TCAGGGTGAACTCAGGCTTT ACCACCACCTCCTCTTCCTT LG1 000007F_pilon:2054816-2055335 CB007.04 GGAGGGTAATTGGGAGCGTA TTTGGTTGTTGTCTGCCTGA LG1 000007F_pilon:2363346-2363873 CB007.06 CGAAGAAGATTTCCTCCACTTT TGGATATTTAGAAACCAACTCCCTA LG1 000007F_pilon:3199753-3200284 CB007.07 TGGGATGTTGTACCCAATGA CATTACCAAAAGCGGCAAAC LG1 000007F_pilon:3712941-3713464 CB007.10 GGTAGGGAGGGATAAGGAGGT AGACAGCAGCCGAGAGAGAG LG1 000007F_pilon:4456868-4457399 CB011.04 ACGAACACAAACACGCAGAG AGGAACACCAGCGTTGTCAT LG3 000011F_pilon:2087517-2088038 CB016.04 TTGCAGCACAGAACAGAGAGA CACGTGAGGGGAAAAGGATA LG5 000016F_pilon:2187219-2187734 CB016.05 GTTTCACACCCCTTTGCTTG ACGCACGGAGAGAGAGAGAG LG5 000016F_pilon:2635747-2636280 CB016.09 CGCCCATTTTACCTTGACTG GCTTGCGATAACTTTTGCTCA LG5 000016F_pilon:383588-384117 CB018.01 CTCCCGTGGCTACCTGTAAG CGAAGAGATCAGAAGTCAAGAGC LG5 000018F_pilon:1184152-1184671 CB018.04 AGCTCCTAAGCTGGGGTTTC TTGAACCAAACGCATAAACG LG5 000018F_pilon:1903946-1904475 CB018.05 CAGCAAGCCACCTCTCTCTC GATGGGCATTGACCCTATTG LG5 000018F_pilon:2250535-2251062 CB018.12 CGGTGCTGAGATTGAACAAA CACTCCTCGTCGGTCAAAA LG5 000018F_pilon:537586-538107 CB030.03 CAAGCGTCGAGGTCTCTCTC TGTAGTCAGCATCGGAGTCG LG2 000030F_pilon:2323971-2324506 CB039.03 GCCGAAGTCAGTTGTCAGGT TTTGTGGCCTTGTGGGTTAT LG8 000039F_pilon:1920841-1921366 CB039.08 TTTCCTAATGAGATCGGGACTT TGCTCGGGAGTACTTCTGCT LG8 000039F_pilon:965230-965757 CB041.02 ATAATTCATTGGCGCATTGG GGGGGTTGGTCTACTTCGAT LG7 000041F_pilon:1071679-1072210 CB041.08 GCTATGCACCGACTCTTTCC TGGTGAGGTGGGTCTCTCTC LG7 000041F_pilon:752599-753124 CB063.07 GCCCCTATCTTTCTCCCTCT GGTTGTGATTTGGCATTGAA LG4 000063F_pilon:885973-886492 CB097.05 AGGGTGTTTGGCCTCTTTCT CATCCCCTCCCTTTCTCTCT LG8 000097F_pilon:1075927-1076460 CB097.07 CTCATTAGGGCTTGGTGAGG AGGCTGGGTTAGAGGCAATC LG8 000097F_pilon:1338210-1338733 CB123.01 AGCCGATGCTTGAGACCTT TTATGCCTTCCTTTCGTGCT LG1 000123F_pilon:270601-271134 CB123.02 ATGCTGAACGCATTTTCCTC CATCGTACTCATGCCCAAAA LG1 000123F_pilon:28307-28836 CB123.04 GGTTAGGGATGGGGAGGTAA AGAGGTGAAAAGGTTGGTGAAA LG1 000123F_pilon:909253-909784

76

Table 2.4 (cont.): Primer sequences, linkage grouop assignment, and Pacific Biosciences contigs in 'Jefferson' hazelnut genome (V2) for new simple sequence repeat markers.

CB145.03 TGAAAACTTGAGGGCCAAAT CTGTGGTGGTCGGATATTGA LG2 000145F_pilon:631763-632280 CB332.01 TGCAGTGACGGTATGGTTTG GAACCCTTGACTCCCCAGTA LG4 000332F_pilon:127306-127825 CB479.02 ACAGGGAGCGAGATGAGAGA AAAGCATAGCTGGCAAGAGC LG1 000479F_pilon:92239-92764 CB483.02 GGATGAGGTGGGTGTTGTCT TTGTGGCTAGCATGTGTGTG LG1 000483F_pilon:25027-25556 CB736.01 ACGGCAACACGAAGAAGAAG ACGACCTACCTCCCTCCCTA LG5 000736F_pilon:18671-19196

Table 2.5: Simple sequence repeat markers previously mapped to the S-locus region in hazelnut.

Marker Reference GB309 Bhattarai and Mehlenbacher (2017) GB401 Bhattarai and Mehlenbacher (2018) GB544 Bhattarai and Mehlenbacher (2018) GB646 Bhattarai and Mehlenbacher (2018) GB673 Bhattarai and Mehlenbacher (2018) GB678 Bhattarai and Mehlenbacher (2018) GB892 Bhattarai and Mehlenbacher (2017) GB921 Bhattarai and Mehlenbacher (2017) GB932 Bhattarai and Mehlenbacher (2017) BR294 Colburn et al. (2017) KG847 Gürcan et al. (2010)

77

Table 2.6: Primer sequences for simple sequence repeat and high resolution melting markers developed from the V3 genome of 'Jefferson' hazelnut and mapped to the S-locus region on linkage group 5.

Marker name Marker type Forward primer Reverse primer Location in the V3 genome sequencez RH_SLOC02 SSR AAAACCAATGGCAGAGTTGG CCAAGGGGGACCTCACTATT scaffold_4_length=51261591:4209780-4210303 RH_SLOC03 SSR TGACAGCAAACCGCTGTAAC GCATGCCCAGTATTCCCATA scaffold_4_length=51261591:4274077-4274600 RH_SLOC05 SSR TCCAGGCAGTCAAAAGAGAAA CTCAAAAGCTTTCGCCATTC scaffold_4_length=51261591:4384533-4385048 RH_SLOC06 SSR AAGCACTTCCAAGCTTCGTC ACATCATGGGGGTAACATGG scaffold_4_length=51261591:4436320-4436849 RH_SLOC07 SSR AGCAACGTCGACTAGGAAGAG GAGGAAGAAGGGGTCAACAA scaffold_4_length=51261591:4525541-4526060 RH_SLOC09 SSR AGATTTTTGGAGCCACCTGA TGTGGTGTGCTCCATCCTAA scaffold_4_length=51261591:4687297-4687820 RH_SLOC10 SSR GGAGGAAAGCAAGGATAGGC TGTTTGGTCAAGTGGAAAAGG scaffold_4_length=51261591:4692170-4692695 846-HRM1 HRM CTGTGTTGGTGAAGGGGTTC TGAAGCCATTCTTGTTGTGC scaffold_4_length=51261591:4416818-4416938 854-HRM2 HRM GCCTCATTGCGTCTTTTATG ATATCCAATTTCGCCGCACT scaffold_4_length=51261591:4451853-4452032 862-HRM1 HRM GCAAGGCATTTCCAGGATAG TGCTCCAAGTATGTGCATGT scaffold_4_length=51261591:4533617-4533867 864-HRM1 HRM TGAACCACTCTTCCCAGGAG GATCCCGATCGTGTTAAAGC scaffold_4_length=51261591:4541861-4542161 868-HRM1 HRM TGGCTTTGAATTTGTTAGAGATGA GGATTGATATGCCAAACCAT scaffold_4_length=51261591:4629038-4629141 870-HRM1 HRM TGGACCAATCCCTTCATCTC AAATCTTTGGGGATGGAACC scaffold_4_length=51261591:4633162-4633282 877-HRM2 HRM CATCTTGAGGCACTTGGTCA TTTCAAGAGCACCTGCAATG scaffold_4_length=51261591:4690580-4690750 z Scaffold 4 of the V3 genome sequence corresponds to linkage group 5 on the genetic map.

78

Table 2.7: Predicted genes at two alleles (S1 and S3) in the S-locus region with annotations and positions on scaffold 4 of the V3 'Jefferson' hazelnut genome. Genes on S1 Genes on S3 (Contig 18) Position Annotation (Contig 16) Position Annotation Corav.Jeff_12862 4534851 WRKY transcription factor 72 Corav.Jeff_13135 6772585 WRKY transcription factor 72 translation factor GUF1 homolog, translation factor GUF1 homolog, Corav.Jeff_12863 4537555 mitochondrial isoform X2 Corav.Jeff_13136 6776413 mitochondrial isoform X2 translation factor GUF1 homolog, translation factor GUF1 homolog, Corav.Jeff_12864 4542208 mitochondrial Corav.Jeff_13137 6780941 mitochondrial isoform X1 Corav.Jeff_12865 4549574 protein IQ-DOMAIN 31-like Corav.Jeff_13138 6788575 protein IQ-DOMAIN 31-like leucine-rich repeat receptor-like protein Corav.Jeff_12866 4600003 kinase At1g35710 Corav.Jeff_13139 6799233 Protein BYPASS-related 1-aminocyclopropane-1-carboxylate Corav.Jeff_12867 4622024 aldose 1-epimerase-like Corav.Jeff_13140 6824259 oxidase homolog 1 3-dehydroquinate synthase homolog Corav.Jeff_12868 4628726 isoform X2 leucine-rich repeat receptor-like protein Corav.Jeff_12869 4631759 kinase At1g35710 leucine-rich repeat receptor-like protein leucine-rich repeat receptor-like protein Corav.Jeff_12870 4637203 kinase At1g35710 Corav.Jeff_13141 6855688 kinase At1g35710 receptor-like serine threonine protein Corav.Jeff_12871 4639993 kinase At5g15080 Corav.Jeff_13142 6866059 serine/threonine-protein kinase PIX7 leucine-rich repeat receptor-like protein leucine-rich repeat receptor-like protein Corav.Jeff_12872 4646822 kinase At1g35710 Corav.Jeff_13143 6871259 kinase At1g35710 receptor-like serine threonine protein serine/threonine-protein kinase PIX7 Corav.Jeff_12873 4659957 kinase At5g15080 Corav.Jeff_13144 6873294 isoform X1 leucine-rich repeat receptor-like protein leucine-rich repeat receptor-like protein Corav.Jeff_12874 4659933 kinase At1g35710 Corav.Jeff_13145 6877567 kinase At1g35710 receptor-like serine threonine protein Corav.Jeff_12875 4679727 kinase At5g15080 Corav.Jeff_13146 6882889 receptor-like protein kinase At5g15080 homeobox-leucine zipper protein HAT7- Corav.Jeff_12876 4689293 like homeobox-leucine zipper protein HAT7- homeobox-leucine zipper protein HAT7- Corav.Jeff_12877 4691232 like Corav.Jeff_13147 6891333 like Corav.Jeff_12878 4705027 Glutaredoxin domain-containing protein Corav.Jeff_13148 6905020 Glutaredoxin domain-containing protein Corav.Jeff_12879 4708823 binding partner of ACD11 1 Corav.Jeff_13149 6910369 binding partner of ACD11 1

79

Table 2.8: Allele sizes (in base pairs) from genomic regions and capillary electrophoresis sizing of amplicons of the parents of the hazelnut mapping population. The allele sizes indicate that the S1 region is from the male parent (S1 S1) and the S3 region is from the female parent (S3 S8).

Sequence length Allele sizes S1 S3 Male parent (OSU 414.062) Female parent (OSU 252.146) RH_SLOC03 162 ND* 162/162 163/167 RH_SLOC06 181 ND* 180/180 174/179 RH_SLOC07 308 321 308/308 316/null RH_SLOC09 312 302 312/312 300/310 RH_SLOC10 253 240 255/255 241/245

* ND sequence length not determined as primer sequences for these markers did not align with the S3 sequence, as contig 16 terminates between markers 6 and 7.

80

LG5F

0.0 P15-300d 9.1 GB875_343dF 10.6 070-1400 11.4 073-1400 14.4 L20-550OKP 17.4 W12-650 20.0 AI07-650d 22.6 AV16-1200 25.6 C114-273 27.8 Y14-750dExc 28.5 A601-212 29.3 AD02-650 29.6 607-420d 30.0 823-600 414-610Good 31.5 GB659_219dF 35.5 G17-250 43.1 BR190_289_F 50.9 GB660_210dF 59.0 KG834-281d-5S B741-186 *CB016.09_171F GB734hkd A014a-219 70.7 GB401hkd GB361hk GB932hk B028-254 BR253_336_FM GB544hkd 80.6 I06-750d 80.9 BR415_254_MF 85.4 G05-510d 89.1 GB892hkd *CB016.04_d177F 91.7 GB921_352dF 95.1 *CB016.05_d200F 100.5 BR427_316_F 101.9 *854-HRM2d KG847-190 204-950d [S3] *RH_SLOC10_241 ** BR259_233_F *RH_SLOC06_179 *RH_SLOC07_316 *RH_SLOC09_300 102.7 *846-HRM1d *859-HRM2 *862-HRM1d *864-HRM1d *868-HRM1d *870-HRM1d *877-HRM2d 105.6 *CB018.12_249F 107.9 GB309_287F 108.6 GB646_273F 109.3 *RH_SLOC03_163 112.3 345-1050d AU02-1350d *CB018.01_257F 113.0 *CB018.04_d156F 114.5 B774-206 115.5 *CB018.05_272F 119.1 GB678hk 123.0 169-1250 126.7 B720-159 129.0 GB673_371F

Figure 2.1. New markers on linkage group 5F (female parent of the mapping population).

Markers developed in this study are indicated with a *. The S-locus is shown as the allele S3 at position 102.7 cM (marked with an orange arrow), included with other markers that show no recombination in the 138 seedling mapping population (marked with a **).

81

Figure 2.2. Fine map of the S-locus region in the hazelnut reference mapping population. a)

Linkage group 5 (length 129 cM) showing the positions of the S-locus and flanking RAPD markers AU02-1350d and G05-510d. b) Six SSRs, one RAPD marker and the S-locus are between flanking RAPD markers AU02-1350d and G05-510d. Also shown is the number of recombinant seedlings (out of 192) between the adjacent markers. c) Five SSRs and seven HRM markers between flanking SSR markers CB018.12 and KG847 in the refined S-locus region. Also shown is the number of recombinant seedlings between the adjacent markers. No recombination in the

192 seedlings was observed between S3 and four HRM markers.

*The S-alleles of one recombinant seedling (OSU1425.021) were not clearly determined, and hence this seedling was not used for further refinement of the S-locus region.

82

Figure 2.3: A comparison of the two loci corresponding to the S1 and S3 alleles in the ‘Jefferson’ reference. A) The finely mapped region from scaffold 4 of the reference genome with gene annotations, likely the containing the S1 allele. B) PacBio Contig 18 maps to the original locus. C) Marker locations based on the reference sequences from the original locus (part A). D) Marker locations based on the reference sequences from the alternate locus (part F). E) PacBio Contig 16 maps to the alternate locus. F) The alternate region from scaffold 4 of the reference genome with gene annotations, likely containing the S3 allele.

83

Figure 2.4: Mauve alignment (Darling et al. 2004) of S-locus genomic sequences containing S1 (top) with S3 (bottom). Blocks show conserved regions and gaps show dissimilarity.

84

Chapter 3: Real-Time Quantitative PCR Expression Analysis of S-Locus Genes in

European Hazelnut

Abstract:

Real-Time Quantitative PCR (qPCR) is a useful tool to observe changes in expression of a few target genes but has seen only very limited use in European hazelnut. In this study expression of

18 S-locus genes and 5 reference genes following self- and cross-pollination was evaluated for qPCR relative expression analysis. ‘Jefferson’ stigmas were self-pollinated or cross-pollinated with ‘Felix’ or OSU 1253.064. Tissue was collected at 0, 1, 2, 3, 5, and 7 hours following pollination. RNA was extracted and amplified using primers from reference genes. All five showed promise as reference genes for relative expression analysis in pollinated stigmas and three were chosen to normalize expression for the genes of interest. Primers were developed for

18 S-locus candidate genes and were used to amplify cDNA for qPCR analysis. Five genes showed no expression, four showed constitutive expression, five decreased in expression, and four increased in expression. Five genes showed an interaction between time and pollen and one gene showed a difference in expression between the pollens applied but without the interaction effect. These results give a first glimpse into expression dynamics following self- and cross- pollination in European hazelnut and provide the first set of reference genes for qPCR in hazelnut flowers.

85

Introduction:

In order to maintain gene flow within a plant population, a variety of mechanisms are employed to encourage outcrossing, avoid inbreeding depression, and maintain heterozygosity. One common mechanism is self-incompatibility (SI), described by de Nettancourt (1977) as the

“inability of a fertile seed plant to produce zygotes after self-pollination”, and is accomplished through a broad variety of methods. One of the main classes of SI found in plants is sporophytic

SI (SSI). SSI usually involves a single highly polymorphic locus. The interaction is controlled by the male and female sporophytic genotypes at that locus and self-recognition leads to the inability of pollen tubes to penetrate the stigmatic surface. In the case of SI species in the

Brassicaceae, the male determinant of incompatibility is an S-locus derived protein that is transferred from sporophytic anther tissues and deposited on the pollen coat. S-locus cysteine- rich protein (SCR), also known as S-locus protein 11 (SP11), was independently identified by two groups as the determinant of pollen phenotype in the Brassicaceae (Schopfer et al. 1999,

Takayama et al. 2000). The female determinant of SSI in the Brassicaceae is SRK (S-locus

Receptor Kinase), a plasma-membrane anchored receptor that determines pollen acceptance on the stigma (Nasrallah et al. 1994). SRK is exclusively expressed in the stigma and is composed of a conserved kinase domain, a trans-membrane domain, and a variable extra-cellular domain

(Hiscock and Tabah 2003).

European hazelnut (Corylus avellana L.) is a diploid (2n = 2x = 22) that exhibits a strong

SSI system controlled by a single S-locus on linkage group 5 (LG5) with different alleles

(haplotypes) maintaining incompatibility through the self-recognition mechanism. Currently 33,

86

alleles have been identified. Most cultivars and selections are heterozygous. Both alleles are expressed by the stigma, and one or both may be produced by the male sporophyte depending on their position in the dominance hierarchy (Figure 1.2). One S-allele (S28) found in seedlings of the ‘Cutleaf’ hazel (Mehlenbacher and Smith, 2006) exhibits self-compatibility when paired with a second allele lower in the hierarchy. Alleles are identified using fluorescence microscopy as described by Mehlenbacher (1997). Branches are emasculated and bagged in December prior to emergence of the stigmas to exclude environmental pollen. Pollen is collected from each of

33 tester trees, each expressing one of the different S-alleles, and stored in the freezer (-18 °C).

Females are pollinated in the lab, placed on a double layer of moist filter paper in a petri dish, and 15-24 h later the styles are removed, placed in a drop of aniline blue stain, and squashed.

When viewed under the fluorescence microscope, a reaction is called compatible or incompatible based on appearance of the stained pollen tubes. Low pollen germination or bunched, short, or bulbous pollen tubes are typical of an incompatible pollen-stigma interaction while good pollen germination with a mass of long parallel pollen tubes growing down the style indicate a compatible combination. Alleles are expressed as codominant in the stigma and either codominant or dominant in the pollen, depending on the dominance interaction.

Compatible pollination results in pollen hydration after two hours, pollen tube emergence after

4 hours, and penetration of the style at 12 hours. Incompatible pollination also results in pollen hydration after two hours but delayed pollen tube emergence 8 hours after pollination

(Hampson et al. 1993).

The hazelnut S-locus has been finely mapped to a highly polymorphic 194 kilobase (kb) region on scaffold 4 of the ‘Jefferson’ reference genome containing 18 predicted genes (Chapter

87

2 of this thesis). ‘Jefferson’ hazelnut is a self-incompatible cultivar whose female flowers express

S-alleles S1 and S3 and produces S3 pollen (Mehlenbacher et al. 2011).

Real-time quantitative PCR (qPCR) is a highly sensitive, widely-used technique for the quantification of gene expression in tissues. A qPCR assay involves the use of fluorescent dye binding to double stranded DNA to track the accumulation of PCR product in real time, giving a basis for quantification of starting material. A commonly used method to accurately estimate differences in transcript abundance involves the use of one or several internal references with verified stable expression in the tissues being tested. Kozera and Rapacz (2013) suggest the

‘Best 3’ rule when evaluating reference genes. This involves the use of at least three reference genes, three different validation programs, and taking three samples with three biological replicates for each genotype.

While qPCR has been used in hazelnut for nut and pollen allergen detection (Platteau et al. 2011, Ražná et al. 2014, Žiarovská et al. 2015) no work has been done to explore gene expression in floral tissues or in self-incompatibility interactions. Žiarovská et al. (2015) reported the use of 3-Hydroxy-3-Methylglutaryl Coenzyme A Reductase (HMG CoA reductase) as an internal reference for pollen allergen detection, though the stability of this internal reference was not verified in pollinated stigmas for studies of gene expression in this self-incompatible plant.

In order to quantify expression of S-locus genes in European hazelnut, reference genes must be identified that can provide a stable baseline with which to compare target gene expression. In this research, we collected tissue at time-course intervals following compatible

88

and incompatible pollination of ‘Jefferson’ stigmas to identify reference genes and use them to quantify relative expression of S-locus genes.

Methods:

Tissue Collection. Five ‘Jefferson’ trees (planted in 2011) designated as trees A, B, C, D, and E were selected and in December 2017 all catkins on 3-4 branches of each tree were emasculated by clipping their catkins and then bagged to exclude environmental pollen. In February, branches were removed from the trees in the field with loppers and female flowers were removed from the branches indoors with forceps. Ten flowers were allocated for each sample and placed on petri dishes with wet filter paper. Flower maturity varied and the 10 flower samples were composed to minimize variability between samples due to differences in maturity. Styles were pollinated in the lab using forceps and vials of pollen from ‘Jefferson’ (S1,

S3), ‘Felix’ (S15, S21), and OSU 1253.064 (S2, S8) and plucked from each set of flowers at 0, 1, 2,

3, 5, and 7 hours after pollination. Unpollinated controls were also collected from each biological replicate. All tissue was immediately frozen and stored at -80 °C until RNA was extracted. As tissue was collected, several pollinated samples were set aside and stored at 4 °C so compatibility or incompatibility could be determined 24 hours after pollination by use of fluorescence microscopy (Mehlenbacher 1997). Pollen was germinated on agarose plates and after 24 hours 100 pollen grains were counted to calculate percent germination and assess the viability of the pollen used.

89

RNA extraction and cDNA synthesis. Total RNA was isolated using a CTAB/Phenol: Chloroform:

Isoamyl Alcohol method with a phase lock gel for separation of aqueous and organic phase

(VWR, Radnor, PA) and transferred to a Zymo RNA cleanup-column (Zymo Research, Irvine, CA).

Samples were separated on a 1% TAE agarose gel containing 1% commercial bleach (6% sodium hypochlorite) to preserve RNA integrity (Aranda et al. 2012). Gels were stained with ethidium bromide for 20 min, destained for 15 min and photographed under UV light using a BioDoc-It®

Imaging System (UVP, Upland, CA). Content and quality of RNA was estimated using ultraviolet spectrophotometry with a BioTek Synergy 2 and a Take3 microplate reader and data were analyzed with Gen5 software (Biotek Instruments, Winooski, VT). To obtain a more accurate quantification and assessment of quality, a subset of samples was analyzed on an Agilent

Bioanalyzer 2100 at the Oregon State University Center for Genome Research and

Biocomputing (CGRB). All cDNA was synthesized using iScriptTM Reverse Transcription Supermix for RT-qPCR (Bio-Rad) with 1 µg of RNA, 4 µl iScript Reverse Transcription (RT) Supermix, and nuclease-free water for a total of 20 µl per reaction. Cycling conditions were set as recommended by BioRad with a priming step at 25 °C for 5 mins, a RT step at 46 °C for 20 mins, and a RT inactivation step at 95 °C for 1 min.

Reference Gene Selection. 3-hydroxy-3-methylglutaryl-coenzyme A reductase 1 (HMGCR), alpha-tubulin 1 (TUB1), and Histone H3 (HisH3) were selected for evaluation as candidate reference genes and identified in the ‘Jefferson’ reference genome using a BLAST of primer sequences for reference genes retrieved from the literature (De keyser et al. 2013, Žiarovská et al. 2015, Qi et al. 2016). Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and serine/threonine-protein phosphatase 2A (PP2A) were also selected for evaluation as candidate

90

reference genes and identified in the ‘Jefferson’ reference genome based on homology to

Arabidopsis genes.

Primer Design and PCR Conditions. NCBI Primer-BLAST was used to develop primers from

‘Jefferson’ transcript sequences with amplicons crossing predicted intron boundaries whenever possible in order to validate intron models. Primers from 18 S-locus predicted genes and 5 candidate reference genes (Table 3.1, Appendix D) were used to amplify genomic DNA from

‘Jefferson’ and its parents to verify that a product of the expected size was amplified using methods described by Bhattarai and Mehlenbacher (2017). Then cDNA from ‘Jefferson’ stigmas pollinated with three different pollens (‘Jefferson’, ‘Felix’, and OSU 1253.064) was amplified using the same conditions to check which genes were expressed and whether a single band was produced. Primers for simple sequence repeat (SSR) marker RH_SLOC05 and PCR were used to amplify a subset of cDNA to determine if gDNA contamination was present. PCR products were separated by electrophoresis on 2% agarose gels in Tris-Borate-EDTA buffer alongside

QuickLoad® 100 bp DNA ladder (New England BioLabs, Ipswich, MA), stained with ethidium bromide, and imaged.

Primer efficiency and qPCR Conditions. A pooled cDNA set including all tissue types was used to generate a 1:10 dilution series. All primers were used to amplify the dilution set on a Bio-Rad

CFX96 system (Bio-Rad Laboratories, Hercules, CA). Initial denaturation at 95 ˚C for 3 min followed by 38 cycles of 10 seconds of 95 ˚C and 30 seconds of 60 ˚C for annealing/extension followed by a plate read step. Following the 38th cycle, a melt curve analysis was performed with temperatures increasing from 65 ˚C to 95 ˚C in 0.5 ˚C increments, holding at each step for

3 seconds. Data were visualized using CFX software and Cq values were plotted using Microsoft

91

Excel. The slope of the dilution curve was determined using the equation: PCR efficiency =

(10[−1/slope] − 1) × 100. Efficiency was considered acceptable if it fell between 85% and 115%.

Expression Analysis. Primers designed from reference gene sequences were used to amplify a subset of cDNA samples to predict stability before utilizing the entire experimental set.

Software packages geNORM (Vandesompele et al. 2002), NormFinder (Anderson et al. 2004), and BestKeeper (Pfaffl et al. 2004) were used to obtain estimates of gene stability. Following this, primers for all expressed genes of interest (GOI) and reference genes were run on a Bio-

Rad CFX96 system (Bio-Rad) for all cDNA samples using methods described above, with one primer pair for each plate and 90 samples on each plate to avoid inter-plate variability (Taylor et al. 2019). Reference genes were analyzed for stability in the entire experimental set and the best three selected for use as an internal reference. The average of the Cq values of the three reference genes served as the reference value. On each plate, the average of unpollinated control samples was used to normalize for plate-to-plate variation. Methods described by Pfaffl et al. (2001) were used for ΔΔCq relative quantification analysis.

Data were inspected for outlying points using the ratio of the standard deviation to the mean at each time-point (stdev/mean) between replicates. If the stdev/mean > 0.5, the replicates were inspected for outlying values. If a single value was deviant and its removal reduced the stdev/mean value to below 0.5, it was removed from the dataset. Otherwise the dataset was left as it was.

A log transformation was applied to relative expression (RE) values prior to analysis of variance. Two-way ANOVA was performed with independent variables pollen and time and dependent variable log(RE). Significance was indicated by a p-value under 0.05 and significant

92

effects were explored through pairwise t-tests with a Bonferroni multiple-test correction. When time was shown to have a significant effect on expression, the data were assessed as to whether RE increased or decreased compared to the starting time (t = 0).

Sequencing of primer binding sites. ΔΔCq analysis suggested potential for polymorphism in primer binding sites for qPCR amplicons used to quantify CaJ_12869 and CaJ_12872. To study this, primers were designed using NCBI Primer-BLAST to create amplicons that include qPCR primer binding sites. Amplicons were generated from ‘Jefferson’, ‘Felix’, and OSU 1253.064 gDNA and Sanger sequenced. Sequences were aligned with MEGA X (Kumar et al. 2018) and primer binding sites examined visually for variation.

Results:

Tissue Collection, RNA extraction and cDNA synthesis. Small catkins were found inside one of the bagged branches on both trees C and D and were excluded from the analysis except at the two-hour time point, as available flowers were limited. Trees C and D also had fewer flowers than other biological replicates and were supplemented with flowers from trees A and E at the

0-hour time point to ensure that quality stigma tissue was available at every time point. This was done under the assumption of relative equivalence between biological replicates due to each 'Jefferson' tree having the same genotype.

Fluorescence microscopy showed that ‘Felix’ and OSU1253.064 pollen on ‘Jefferson’ stigmas produced a typical compatible response while ‘Jefferson’ pollen on ‘Jefferson’ styles produced a typical incompatible response as expected. Inspection of the squashed styles indicated good germination and masses of pollen tubes growing parallel down the styles. Pollen

93

germination of ‘Felix’ and OSU1253.064 was high, between 92 and 96%. ‘Jefferson’ pollen germination was slightly lower, averaging 70%.

Upon RNA extraction, an error was committed during extraction of the samples for biological replicate Tree A and the replicate was lost, leaving 4 complete replicates.

Spectrophotometry showed 260/280 values were over 2 for all samples, and the 3-hour sample from replicate D pollinated with OSU 1253.064 pollen had a 260/280 value of 1.88. An average of 667 ng/µL and median of 609 ng/µL was collected from the samples. Bioanalyzer values confirmed spectrophotometry data. RNA integrity number (RIN) values collected by the bioanalyzer showed an average of 8.2 for unpollinated stigmas and an average of 9.2 for pollinated stigmas.

Initial amplification. Fragment sizes amplified from gDNA and separated on agarose gels largely matched expected sizes, though multiple bands were sometimes present. SSR marker

RH_Sloc05 showed no amplification in any cDNA sample, suggesting no gDNA contamination in cDNA samples. Fragment sizes from cDNA amplification showed a single clear band in most cases and in cases of multiple banding alternate primer sets were used. CaJ_12862, CaJ_12863,

CaJ_12864, and CaJ_12865 had no products following PCR amplification with the primers listed in Appendix D. Primers for CaJ_12869, CaJ_12872, and CaJ_12876 produced bands in a portion of tested samples and were included in later analyses.

Primer efficiency. Primers for all reference genes and four GOI had efficiencies between 85% and 106%. The 9 other primer pairs used in the study returned inconsistent efficiency values, at times in the acceptable range of 85-115% and at other times much higher (120-223%, Table

3.2).

94

Expression Analysis. When reference genes were amplified with a subset of samples, all showed constant expression so all genes were amplified using all cDNA samples. The entire dataset showed HisH3, HMGCR, and PP2A to be most suitable for use as internal references

(Table 3.3) and the average of the three was used to calculate ΔΔCq values.

CaJ_12869 and CaJ_12872, though they gave inconsistent banding in the initial PCR, amplified in all 90 cDNA samples. CaJ_12876 only showed amplification in a few unpollinated control samples at very high Cq values (35-38) and melt curves generated from qPCR suggested that the detected signal was from primer dimers. Across the time course following pollination, four genes showed stable expression, five genes decreased in expression, and four genes increased in expression (Table 3.4, Appendix E). One gene showed a difference in expression between pollens applied (‘Felix’, ‘Jefferson’, or OSU 1253.064) with no interaction between pollen genotype and time (Figure 3.1) and 5 genes showed an interaction between pollen genotype and time (Figure 3.2). Of the genes showing an interaction effect, CaJ_12879 shows a downward trend in styles of 'Jefferson' pollinated with 'Jefferson' and 'Felix' and stable expression when pollinated with OSU 1253.064 (Fig 3e). This suggests that any differences in gene expression between the pollens applied are not primarily derived from the self- incompatibility reaction.

Sequencing of primer binding sites.

Amplicons produced from ‘Jefferson’, ‘Felix’, and OSU 1253.064 gDNA showed no polymorphism in primer binding sites for CaJ_12869 or CaJ_12872. The only sequence variation was seen in the amplicon of CaJ_12869 produced from ‘Felix’, which had three single nucleotide polymorphisms.

95

Discussion:

One limitation of this study was the difficulty in acquiring enough female flowers so that there were enough high-quality flowers at a similar developmental stage for each sample (maturity assessment was based on the lengths of styles protruding from the female flower, from red dot stage to fully exerted). It is likely that biological variation would have been reduced if more homogeneous tissues were used. Efforts were made to reduce variation due to differences in flower quality in the current study but for future attempts it would be prudent to decrease the total number of samples in the study and the results shown here can aid in deciding which time points are most important for the self-incompatibility interaction. For future experiments it would be reasonable to only use a single compatible pollen. In addition to this, excluding the 1- hour time point and replacing the 5- and 7-hour time points with a 6-hour time point may be reasonable. Five trees with 4 bagged limbs per tree, as used in this experiment, may still be necessary, but with fewer samples in the experiment it may be possible to ensure greater flower quality and include more flowers per sample. No matter what changes are made, repeating this experiment will be destructive to the trees used.

Inability to acquire consistent primer efficiency values was a clear problem for many S- locus genes. There are several possible explanations. Firstly, all genes associated with problematic primer pairs were unique in that they had average Cq values above 29 while average Cq values for the four other GOI ranged from 25 to 28 and reference gene average Cq was 26. According to Taylor et al. (2019) stochastic amplification could occur when fewer than

100 molecules are present in a sample, which would result in Cq values of 29 and above.

Additionally, removing pollen from pooled cDNA was required to acquire accurate efficiency

96

values in higher abundance samples. This could be due to inhibitors present in the pollen-only samples, suggesting that there may also be inhibitors present in all samples where pollen is present, and this could also affect efficiency values. Lastly, the S-locus contains repetitive sequences and while primers were designed to be specific to each gene there is potential for off-target binding sites. Efficiency values tended to range higher than 100% and it is likely that this is due to off-target sites in transcripts produced from any of the genotypes involved in the study. The only solution to this problem would be to redesign primers and test again. This option was not taken in this study due to time limitations and difficulty in primer design for the repetitive regions contained in the S-locus. It is likely primer redesign would be a less productive option than RNA-sequencing. This is because RNA-seq provides absolute quantification of transcript abundance and even if primers improved PCR efficiency there would still be the problem of low gene expression to overcome. RNA-seq overcomes both of those potential sources of error.

While each primer pair was tested in pollen-only and style-only this does not prove or show with satisfactory clarity the source of each signal in mixed samples. This also confounds the interpretation of results and can hopefully be addressed in further studies. Leydon et al.

(2017) were able to overcome this difficulty through SNP informed RNA-sequencing, separating male and female signals through tracking polymorphism in transcripts.

The heterozygosity of the ‘Jefferson’ and the two cross-compatible genotypes used (‘Felix and

OSU 1253.064) represents the final difficulty in interpreting these data. Primers were designed from the finely-mapped S-locus region in the ‘Jefferson’ reference genome but this represents the S1 allele, only one of the two alleles expressed by ‘Jefferson’ (S1 S3) stigmas. The sequences

97

from the S3 allele, as reported in chapter 2 of this thesis, were not used in primer development and since S-alleles are expressed in a codominant manner in the stigma it is possible that preferential binding of primers designed from one allele confounds amplification from transcripts produced by the other. The two alleles found in the reference genome show most primers to bind perfectly to both S1 and S3 alleles but single mismatches were identified in the

S3 allele in the binding sites of the reverse primers used to amplify CaJ_12870 and CaJ_12872.

This problem also extends to the cross-compatible pollen used. We have no genomic data from

‘Felix’ (S15, S21) or OSU 1253.064 (S2, S8) and amplifying signals from these different genotypes, especially in the highly polymorphic and repetitive S-locus, has unknown consequences.

Despite the potential issues in this data there can still be some valuable takeaways. Five genes were shown to not be expressed in any tissues used in the study. Four strong candidates for internal reference genes were identified and a fifth reference gene from literature was confirmed (HMGCR). Expression levels of genes of interest showed differences both across time and among the three pollen sources and provide targets for further research.

One set of interesting genes in the S-locus region includes the five genes with homology to

At1g35710 annotated as encoding probable leucine-rich repeat receptor-like protein kinases

(LRR-RLK) (CaJ_12866, CaJ_12869, CaJ_12870, CaJ_12872, and CaJ_12874). At1g35710 is a gene in Arabidopsis thaliana known to be a homolog to MIK2 (MDIS1-interacting receptor kinase). MIK2 is an important protein involved in pollen tube perception of female attractants

(Higashiyama et al. 2017), immune response (Coleman et al. 2019), salt stress response

(Julkowska et al. 2016) and root growth direction (Van der Does et al. 2017). Three of these

98

LRR-RLK genes in the Corylus S-locus (CaJ_12869, CaJ_12870, and CaJ_12872) share a similar expression profile with a very low initial expression increasing by 1-3 hours after incompatible pollinations and by 5-7 hours after compatible pollinations. Considering the function of

Arabidopsis homologs and this expression profile it is possible that these genes are involved in pollen tube guidance. Hampson et al. (1993) showed that in Corylus avellana compatible pollen tubes emerge 4 hours after pollination and penetrate the stigmatic surface by 12 hours. This 4-

12 hour window fits with the compatible signal showed by the three LRR-RLK genes after 5 hours. Incompatible pollen tubes emerge from pollen grains at 12 hours following pollination and never penetrate the stigma resulting in bulbous, short, and bunched pollen tubes

(Hampson et al. 1993). If the three LRR-RLK genes are involved in pollen tube guidance it could be that their earlier expression (1-3 hrs post-pollination) is an effect of the incompatibility reaction resulting in an inability of pollen tubes to direct themselves.

Also noteworthy are three genes encoding probable transmembrane receptor-like serine/threonine protein kinases (RLK) (CaJ_12871, CaJ_12873, and CaJ_12875) with homology to At5g15080. At5g15080 encodes a cytoplasmic receptor-like kinase in Arabidopsis thaliana also known as Putative Interactor of XopAC 7 (PIX7). XopAC is a Xanthomonas campestris- specific type III effector that was shown to associate with PIX7 in a yeast two-hybrid screen, though not with the uridylylation domain hypothesized to be functionally relevant (Guy et al.

2013). All three Corylus genes show a similar slight downward trend in their expression over the hours sampled, but no significant differences between compatible and incompatible pollination. It is interesting that these genes show homology to a kinase in Arabidopsis which has been shown to associate with foreign biotic signaling components. While these three RLKs

99

show no homology to Brassica’s SRK, the ability to recognize diverse foreign biotic signal types fits well with the understanding of how SSI in Brassica works. This makes CaJ_12871,

CaJ_12873, and CaJ_12875 notable candidates for the female determinant of SSI in hazelnut.

Other genes showing difference between cross- and self-pollination were CaJ_12867, annotated as aldose 1-epimerase-like, and CaJ_12878, annotated as glutaredoxin domain- containing protein. Aldose 1-epimerase (Galactose mutarotase) is an enzyme in the galactose metabolism pathway (Thoden et al. 2003) and may have much higher expression in compatible pollen simply due to higher metabolic activity. An S-locus glutaredoxin could be involved in DNA synthesis, signal transduction, and the defense against oxidative stress but the most- documented functions of glutaredoxins in plants are their involvement in oxidative stress responses (Rouhier et al. 2008).

Conclusion:

Our first look into post-pollination expression of S-locus genes highlights several candidate genes for involvement in self-incompatibility, including three receptor-like serine/threonine protein kinases and three leucine-rich repeat receptor-like protein kinases. Future experiments should take into consideration the shortcomings of this experiment by separating the male and female signal, accounting for genetic diversity between S-alleles, reducing the number of time points sampled to ensure availability of enough flowers at similar developmental stages, and compensating for genes with lower expression.

100

References:

1. Andersen, C.L., J.L. Jensen, T.F. Ørntoft. 2004. Normalization of real-time quantitative

reverse transcription-PCR data: A model-based variance estimation approach to identify

genes suited for normalization, applied to bladder and colon cancer data sets, Cancer

Res. 2004(64): 5245-5250.

2. Aranda P.S., D.M. LaJoie, and C.L. Jorcyk. 2012. Bleach gel: a simple agarose gel for

analyzing RNA quality. Electrophoresis. 33(2):366–369. doi:10.1002/elps.201100335

3. Coleman, A.D., L. Raasch, J. Maroschek, S. Ranf, and R. Hückelhoven. 2019. The

Arabidopsis leucine-rich repeat receptor kinase MIK2 is a crucial component of pattern-

triggered immunity responses to Fusarium fungi. BioRxiv 720037.

4. De Keyser, E., L. Desmet, E. Van Bockstaele, and J. De Riek. 2013. How to perform RT-

qPCR accurately in plant species? A case study on flower colour gene expression in an

azalea (Rhododendron simsii hybrids) mapping population. BMC Molecular Biol 14: 13.

https://doi.org/10.1186/1471-2199-14-13

5. de Nettancourt D. 1977. The Basic Features of Self-Incompatibility. In: Incompatibility in

Angiosperms. Monographs on Theoretical and Applied Genetics, vol 3. Springer, Berlin,

Heidelberg, Germany.

6. Guy, E., M. Lautier, M. Chabannes, B. Roux, E. Lauber, M. Arlat, and L.D. Noël. 2013.

xopAC-triggered immunity against Xanthomonas depends on Arabidopsis receptor-like

cytoplasmic kinase genes PBL2 and RIPK. PLOS ONE 8(8).

101

7. Hampson, CR., A.N. Azarenko, and A. Soeldner. 1993. Pollen-stigma interactions

following compatible and incompatible pollinations in hazelnut. J. Amer. Soc. Hort. Sci.

118(6):814-819.

8. Higashiyama, T., and W. Yang. 2017. Gametophytic Pollen Tube Guidance: Attractant

Peptides, Gametic Controls, and Receptors. Plant Physiology. 173 (1) 112-121.

9. Hiscock, S. J., and D.A. Tabah. 2003. The different mechanisms of sporophytic self-

incompatibility. Philos. T. R. Soc. B. 358(1434): 1037–1045.

10. Julkowska, M.M., K. Klei, L. Fokkens, M.A. Haring, M.E. Schranz, and C. Testerink. 2016.

Natural variation in rosette size under salt stress conditions corresponds to

developmental differences between Arabidopsis accessions and allelic variation in the

LRR-KISS gene. J. Exp. Botany 67(8): 2127–2138.

11. Kozera, B., and M. Rapacz. 2013. Reference genes in real-time PCR. J. Appl. Genet. 54(4):

391–406.

12. Kumar, S., G. Stecher, M. Li, C. Knyaz, and K. Tamura. 2018. MEGA X: Molecular

Evolutionary Genetics Analysis across computing platforms. Mol. Biol. Evol. 35:1547-

1549.

13. Leydon, A. R., C. Weinreb, E. Venable, A. Reinders, J.M. Ward, and M.A. Johnson. 2017.

The molecular dialog between reproductive partners defined by SNP-

informed RNA-sequencing. The Plant Cell, 29(5): 984–1006. doi:10.1105/tpc.16.00816

14. Mehlenbacher, S. A., D.C. Smith, and R.L. McCluskey. 2011. ‘Jefferson’ Hazelnut.

HortScience, 46(4): 662-664.

102

15. Mehlenbacher, S.A. 1997. Testing compatibility of hazelnut crosses using fluorescence

microscopy. Acta Hort. 445: 167-171.

16. Mehlenbacher, S.A. and D.C. Smith. 2006. Self-compatible seedlings of the cutleaf

hazelnut. HortScience 41: 482–483.

17. Nasrallah, J. B., S.J. Rundle, and M.E. Nasrallah. 1994. Genetic evidence for the

requirement of the Brassica S-locus receptor kinase gene in the self-incompatibility

response. The Plant Journal 5: 373-384.

18. Pfaffl M.W., A. Tichopad, C. Prgomet, and T.P. Neuvians. 2004. Determination of stable

housekeeping genes, differentially regulated target genes, and sample integrity:

BestKeeper–Excel-based tool using pair-wise correlations. Biotechnol Let 26(6): 509–

515.

19. Pfaffl, M.W. 2001. A new mathematical model for relative quantification in real-time RT-

PCR. Nucleic Acids Res. 29(9):e45. doi: 10.1093/nar/29.9.e45.

20. Platteau, C., M. De Loose, B. Meulenaer, and I. Taverniers. 2011. Detection of allergenic

ingredients using real-time PCR: A case study on hazelnut (Corylus avellana) and Soy

(Glycine max). J Agr Food Chem. 59 (20), 10803-10814.

21. Qi S., L. Yang, X. Wen, Y. Hong, X. Song, M. Zhang, and S. Dai. 2016. Reference gene

selection for RT-qPCR analysis of flower development in Chrysanthemum morifolium

and Chrysanthemum lavandulifolium. Frontiers Plant Sci 7:287.

22. Radonić, A., S. Thulke, I.M. Mackay, O. Landt, W. Siegert, and A. Nitsche. 2004. Guideline

to reference gene selection for quantitative real-time PCR. Biochem. Biophys. Res.

Commun. 313: 856–862.

103

23. Ražná K., M. Bežo, N. Nikolaieva, K. Garkava, J. Brindza, and J. Žiarovská. 2014.

Variability of Corylus avellana, L. CorA and profilin pollen allergens expression. J Envir Sci

Health B. 49: 639-645.

24. Rouhier, N., S.D. Lemaire, and J.P. Jacquot. 2008. The role of glutathione in

photosynthetic organisms: emerging functions for glutaredoxins and glutathionylation.

Annu. Rev. Plant Biol. 59:1, 143-166.

25. Schopfer, CR., M.E. Nasrallah, and J.B. Nasrallah. 1999. The male determinant of self-

incompatibility in Brassica. Science 286(5445): 1697-1700.

26. Takayama, S., H. Shiba, M. Iwano, H. Shimosato, F. Che, N. Kai, M. Watanabe, G. Suzuki,

K. Hinata, and A. Isogai. 2000. The pollen determinant of self-incompatibility in Brassica

campestris. Proc. National Acad. Sci. (USA) 97(4): 1920-1925.

27. Taylor, S.C., K. Nadeau, M. Abbasi, C. Lachance, M. Nguyen, and J. Fenrich. 2019. The

ultimate qPCR experiment: producing publication quality, reproducible data the first

time. Trends Biotechnol. 37: 761-774.

28. Thoden, J. B., J. Kim, F. M. Raushel, and H.M. Holden. 2003. The catalytic mechanism of

galactose mutarotase. Protein Sci, 12(5): 1051–1059. doi:10.1110/ps.0243203

29. Van der Does, D., F. Boutrot, T. Engelsdorf, J. Rhodes, J.F. McKenna, S. Vernhettes, et al.

2017. The Arabidopsis leucine-rich repeat receptor kinase MIK2/LRR-KISS connects cell

wall integrity sensing, root growth and response to abiotic and biotic stresses. PLoS

Genetics 13(6).

30. Vandesompele J., K. De Preter, F. Pattyn, B. Poppe, N. Van Roy, A. De Paepe, and F.

Speleman. 2002. Accurate normalization of real-time quantitative RT-PCR data by

104

geometric averaging of multiple internal reference genes. Genome Biol 3(7).

https://doi.org/10.1186/gb-2002-3-7-research0034

31. Žiarovská J., N. Nikolaieva, K. Garkava, and J. Brindza. 2015. Validation of HMG CoA

Reductase as internal control for hazelnut pollen allergens expression analysis. Austin J

Genet Genomic Res. 2(1): 1008.

105

Table 3.1: Primer sequences for hazelnut S-locus and reference genes for qPCR gene expression analysis. Gene names and annotations are listed.

Gene Primer name Forward primer Reverse primer Annotation

reference genes glyceraldehyde-3-phosphate Corav.Jeff_09561 Corav_GAPDH1 GGTTCGGAAGGATTGGTCGT CTCGCCGAAGAGAAGGGTTT dehydrogenase, cytosolic-like Corav_Histone_H3- Corav.Jeff_23240 1.3 TCTTCGAGAAATCCGCAAGT TGTCTTCGAACAACCAACCA histone H3.3 Corav_HMG-CoA- 3-hydroxy-3-methylglutaryl- Corav.Jeff_06880 Reductase2 CTCCAAAGGCGTCCAGAACA CCACATCGCCGCTTATGAGA coenzyme A reductase 1-like Corav.Jeff_00644 Corav_TUB2 GGGTTCACCGTGTATCCCTC GCGCCTGCAGATGTCATAGA alpha tubulin 1 serine/threonine-protein Corav.Jeff_09797 PP2A-3 CTCGAAAGGTCGTACCCAAA TTGCACTCCATCAGATGCTC phosphatase PP2A catalytic subunit

S-locus genes probable leucine-rich repeat receptor-like protein kinase Corav.Jeff_12866 S66.1.1 TTGCCGGCACTTACGGTTAT TGGTAGGCGCCGGTCTAATA At1g35710 Corav.Jeff_12867 S67.1 TGGAGGCGCACAGTTTACTT ATGCGCTCTCACCTTCCAAA aldose 1-epimerase-like 3-dehydroquinate synthase Corav.Jeff_12868 S68.3.1 AAAAGGCCAACAACGAACTG GTGAGGTCACAGGGATTGCT homolog isoform X2 probable leucine-rich repeat receptor-like protein kinase Corav.Jeff_12869 S69.0.1 CATGGCATCCTCCGTTTTCG TAACCACTCCACCATCCCGT At1g35710 probable leucine-rich repeat receptor-like protein kinase Corav.Jeff_12870 S70.0.2 TCGCCATTTTTCTTGGATTC ATGTCAAAGTCCTCGGTTGC At1g35710 probable receptor-like protein Corav.Jeff_12871 S71.0.2 ACAAGCGCTCAGAAAGCACT CCTTGATAACCCCGTCTGAA kinase At5g15080 probable leucine-rich repeat receptor-like protein kinase Corav.Jeff_12872 S72.0.1 ACCCATCAGCATTTTCCTTG ATGTCAAAATCCTCGGTTGC At1g35710 probable receptor-like protein Corav.Jeff_12873 S73.7.2 TATGGTTATGCAGCCCCGAG TGCTCCCCATTTGGTTGGTT kinase At5g15080

106

Table 3.1 (cont.): Primer sequences for hazelnut S-locus and reference genes for qPCR gene expression analysis. Gene names and annotations are listed.

Gene Primer name Forward primer Reverse primer Annotation probable leucine-rich repeat receptor-like protein kinase Corav.Jeff_12874 S74.2 GTGTCGCAATGTGTGCTTCT GACCACTCATTGGCGAGTTT At1g35710 probable receptor-like protein Corav.Jeff_12875 S75.1 TGTCTTCAAGGTCCAGCGTG TGGTTGGTCTCGGCTTGAAT kinase At5g15080 homeobox-leucine zipper protein Corav.Jeff_12876 S76.3 TGAAGTCAGCTCGCTTCCTC GGGCTACAAGCATGTCCCAA HAT7-like homeobox-leucine zipper protein Corav.Jeff_12877 S77.1.2 AAGGGCTAGGTGGAAGACCA TTGCTCCATGAAAGCTCGGT HAT7-like Glutaredoxin domain-containing Corav.Jeff_12878 S78.0.2 CTACCATGTCGTCGCTCTCA TTTTCGGGATTTTCTGTTCG protein Corav.Jeff_12879 S79.2.2 TCATGGATGCCTGAAGATGA AATGCATCTTTGCCCAGAAC binding partner of ACD11 1

Extra note: primer names include "S" for S-locus, two digits corresponding to the last two digits of the gene name, the next digit indicates the exon- exon boundary that the amplicon spans (1 means the amplicon spans the first and second exon and 0 means the amplicon is contained in a single exon), the final digit distinguishes between primer pairs spanning the same region. See appendix for additional primer pairs.

107

Table 3.2: Amplification efficiency values for qPCR primers designed from the hazelnut S- locus region. Gene Efficiency reference genes Corav_GAPDH1 1.1 Corav_Histone_H3-1.3 0.85 Corav_HMG-CoA-Reductase2 1.01 Corav_TUB1 0.85 Corav_TUB2 0.91 PP2A-3 0.99 S-locus genes S66.1.1 0.96 S67.1 0.96 S68.3.1 0.93 S69.0.2* 0.94-1.32 S70.0.2* 0.95-1.41 S71.0.2* 0.70-2.23 S72.0.1* 0.85-2.1 S73.7.2* 1-1.27 S74.2* 0.86-1.48 S75.1* 0.78-0.91 S77.1.2* 1.21-1.9 S78.0.2* 0.95-1.44 S79.2 1.06 Primer efficiency values show how well a primer pair amplifies its target, with an ideal value being 1.0. Lower than 1.0 efficiency suggests inhibitors or poor primer design. Higher than 1.5 efficiency suggests potential off-target binding.

*Genes expressed at low levels (>29 Cq average) had inconsistent efficiencies.

Genes with average Cq values < 29 have amplification efficiency in the desired range (0.85 - 1.1).

108

Table 3.3: Reference gene expression stability in pollinated hazelnut flowers as predicted by three programs. The three reference genes, are marked with a *, and were chosen based on low Normfinder and GeNorm values. Program and suitability measure Normfinder GeNorm BestKeeper Coefficient of Gene name Stability value M value correlation HMGCR2* 0.015 0.053 0.93 PP2A-3* 0.029 0.062 0.87 HisH3 1.3* 0.035 0.065 0.78 GAPDH1 0.041 0.072 0.82 TUB2 0.050 0.080 0.92

NormFinder: lower values indicate greater stablity

Genorm: M < 1.5 is considered acceptable for use as a reference gene

BestKeeper: The average of Pearson product moment correlation coefficients with the other four reference genes. Closest to 1 indicates suitability for use as a reference.

109

Table 3.4: Expression differences at hazelnut S-locus genes over time course intervals (0, 1, 2, 3, 5, and 7 hours) following two compatible and one incompatible pollinations. Gene expression Difference over time Unpollinated Gene Annotation in pollinated flowers stigma Dry pollen Corav.Jeff_12862 probable WRKY transcription factor not expressed Corav.Jeff_12863 translation factor GUF1 homolog not expressed Corav.Jeff_12864 translation factor GUF1 homolog not expressed Corav.Jeff_12865 protein IQ-DOMAIN 31-like not expressed Corav.Jeff_12866 probable LRR-RLK At1g35710 decreased Corav.Jeff_12867 aldose 1-epimerase-like stable*   Corav.Jeff_12868 3-dehydroquinate synthase homolog stable   Corav.Jeff_12869 probable LRR-RLK At1g35710 increased*   Corav.Jeff_12870 probable LRR-RLK At1g35710 increased*    Corav.Jeff_12871 probable RLK At5g15080 decreased  Corav.Jeff_12872 probable LRR-RLK At1g35710 increased*  Corav.Jeff_12873 probable RLK At5g15080 decreased Corav.Jeff_12874 probable LRR-RLK At1g35710 stable    Corav.Jeff_12875 probable RLK At5g15080 decreased Corav.Jeff_12876 homeobox-leucine zipper HAT7-like not expressed  Corav.Jeff_12877 homeobox-leucine zipper HAT7-like stable  Corav.Jeff_12878 Glutaredoxin domain-containing increased* Corav.Jeff_12879 binding partner of ACD11 1 decreased 

Genes were either not expressed or showed stable, increased, or decreased expression. Differences in expression between types of pollen applied ('Jefferson' or 'Felix', and OSU1253.064) are signified with a *.

110

Figure 3.1: CaJ_12867 showed large differences in relative expression among the three pollens

(OSU 1253.064, Felix and 'Jefferson') but no significant interaction effect. Error bars signify standard error of the mean.

111

Figure 3.2: Relative expression of five genes showing interactions between the three pollens

('Jefferson', 'Felix' and OSU 1253.064) over time and all significant differences determined by pairwise t-tests (p<0.05). Different letters indicate a significant difference.

112

Chapter 4: Long-range amplification of sequences in the S-locus region of European Hazelnut

Abstract:

Long amplicon sequencing is an option for targeted resequencing of the S-locus in hazelnut. To test feasibility of the technique, long-range PCR was used to generate amplicons 8-12 kb in length for targeted resequencing of the S-locus region. ThermoAlign software was used for primer design and high molecular weight DNA extracted for use in amplicon generation. For primer pairs designed using the ‘Jefferson’ reference genome 29 out of 43 gave bands on a gel when amplified from ‘Jefferson’ DNA. The 43 primer pairs were tested in the DNA of the parents of ‘Jefferson’ (OSU 252.146 and OSU 414.062) and two unrelated genotypes, OSU

1477.047 and OSU 1026.073. Six fragments amplified from OSU252.146 DNA, three from

OSU414.062 DNA, eight from OSU1477.047 DNA, and two from OSU1026.073 DNA. These results, obtained from the first attempt at primer development, demonstrate that long-range

PCR could be a viable option to produce amplicons for targeted resequencing in ‘Jefferson’ hazelnut.

113

Introduction:

Self-incompatibility in European hazelnut is determined at a single S-locus on linkage group 5 (LG5) with different alleles (haplotypes) controlling compatibility (Mehlenbacher 2006).

So far, 33 alleles and their dominance hierarchy have been identified (Figure 1.2), with one allele (S28) conferring self-compatibility in some combinations (Mehlenbacher and Smith 2006,

Mehlenbacher 2014). A tester tree whose pollen expresses only one allele has been identified for each allele other than S28. Alleles are codominant in the stigma and can be either codominant or dominant in the pollen depending on a dominance hierarchy (Figure 1.2).

Compatibility or incompatibility can be determined about 16 hours after pollination, if female flowers are collected from emasculated branches that have been bagged to exclude environmental pollen. Fluorescence microscopy is used to observe pollen tubes. Low pollen germination or bunched, short, or bulbous pollen tubes are typical of an incompatible pollen- stigma interaction. Good pollen germination with a mass of long parallel pollen tubes growing down the style are indicators of a compatible combination (Mehlenbacher, 1997).

Fine mapping of the S-locus has refined the region on LG5 containing the genes responsible for self-incompatibility to a 197.5 kilobase (kb) span in ‘Jefferson’ reference genome (version 3). The sequence was determined to be the S1 allele, and an adjacent sequence was determined to be the S3 allele (Chapter 2 of this thesis). The diploid mosaic assembly of the ‘Jefferson’ reference (V3) placed the two alleles at the S-locus side by side, allowing comparison. The structure of the S-locus and the sequences of the two alleles of

114

‘Jefferson’ should be confirmed by a method complementary to the whole-genome sequencing that produced the reference.

Traditionally, PCR has been used to amplify fragments up to 1.2 kb in length for a variety of applications. However, for targeted resequencing of genomic regions spanning over 100 kb, traditional PCR is of limited usefulness. Long-range PCR (LRPCR) allows for amplification of fragments 5-15 kb in length using alternate polymerases and cycling conditions. Polymerases for use in LRPCR include Invitrogen SequalPrep, Invitrogen AccuPrime, TaKaRa PrimeSTAR GXL,

TaKaRa LA Taq Hot Start, KAPA Long Range HotStart and QIAGEN Long Range PCR Polymerase.

Jioa et al. (2015) compared these six polymerases for their ability to amplify fragments sizes of

12.9 kb, 9.7 kb, and 5.8 kb. TaKaRa PrimeSTAR was the only polymerase able to amplify all fragments at the cycling conditions recommended by the manufacturer (Jioa et al., 2015).

Although LRPCR can produce much longer amplicons than traditional PCR, a single amplicon of 197.5 kb to span the hazelnut S-locus is not possible. Instead, a tiling path of amplicons could be built, amplified and sequenced in stepwise manner to cover the large region. Francis et al. (2017) created ThermoAlign software and demonstrated its use in designing primers and developing a minimum amplicon tiling path of nine 4-5 kb amplicons across a 24 kb target in the highly repetitive maize genome. In this research, the S-locus region in ‘Jefferson’ hazelnut was targeted for resequencing using genomic DNA of 'Jefferson',

ThermoAlign software for primer design, and TaKaRa PrimerSTAR GXL polymerase for fragment amplification. Primers were designed for both the S1 and S3 alleles and amplification was attempted using genomic DNA from ‘Jefferson’, its parents OSU 252.146 and OSU 414.062, and two unrelated selections, OSU 1026.073 and OSU 1477.047.

115

Methods:

Plant materials. ‘Jefferson’ (S1, S3), its parents OSU 252.146 (S3, S8) and OSU 414.062 (S1, S1), and two unrelated selections, OSU 1026.073 (S2, S10) and OSU 1477.047 (S2, S20) were used in this study.

Primer design. ThermoAlign software was used for primer design in the S1 and S3 alleles at the

S-locus in the ‘Jefferson’ reference genome (V3), identified in chapter 2 of this thesis. 50 kb segments of the target regions were used to cover the region using iterative runs of

ThermoAlign, with five runs needed to develop primers for the S1 allele and four runs needed to develop primers for the S3 allele. Harsh filter settings were applied to reduce the number of oligonucleotides detected by the program under 25000, keeping to a computational load that allows the program to run in under a few hours. The parameters Filter_A/T_3prime,

Filter_di/si_repeats, and Filter_GC_clamp were used. The amplification range was set at 8 and

13 kb, oligo primer length was set at 27-28 base pairs, and target Tm set at 69 - 70 °C.

Contigs from the genome sequences of OSU 1477.047 and OSU 1026.073 that contain the S-locus were identified by alignment with 12 SSR marker sequences in the S-locus region:

KG847, CB018.12, CB018.01, CB018.04, RH_SLOC02, RH_SLOC03, RH_SLOC04, RH_SLOC05,

RH_SLOC06, RH_SLOC07, RH_SLOC09, and RH_SLOC010). Contigs 4 and 230 were identified from OSU 1477.047 and contigs 236, 115, and 269 were identified from OSU 1026.073. Contig

115 contained sequences from RH_SLOC09 and RH_SLOC010 but not SLOC07 because the contig ends part-way through the S-locus region. All regions were aligned using Mauve (Darling et al. 2004) to visualize similarities in sequences. Primers were developed from newly identified contigs to confirm assembly of the new S-alleles.

116

DNA extraction. Two DNA extraction methods were used on young hazelnut leaves. The first was that of Lunde et al. (2000). The second was a high molecular weight DNA extraction protocol that was adapted from Xin and Chen (2012), and used CTAB extraction and MagAttract magnetic beads for cleanup. In spring 2018, young leaves were collected from ‘Jefferson’, OSU

252.146, OSU 414.062, OSU 1026.073 and OSU 1477.047, and DNA extracted as described by

Lunde et al. (2000), and stored at -20 °C. For the Xin and Chen (2012) method, leaves were collected in spring 2017 (‘Jefferson’) and 2018 (all other samples) and stored at -80 °C until

January 2020 when DNA was extracted in triplicate from all samples. DNA concentration and quality were estimated using ultraviolet spectrophotometry with a BioTek Synergy 2, a Take3 microplate reader, and data analysis with Gen5 software (Biotek Instruments, Winooski, VT).

Samples were diluted to 20 ng/µl for use in LRPCR.

Amplification of fragments. Reaction components were used at concentrations recommended by TaKaRa to amplify fragments in 25 µl reactions: 5 µl 5X PrimerSTAR GXL buffer, 2 µl dNTP mixture (2.5 mM), 0.5 µl for each primer (0.25 µM), 20 ng template DNA, 0.5 µl PrimeSTAR GXL

DNA polymerase, and sterile distilled water up to a final volume of 25 µl. All primer pairs were used to amplify DNA of all five hazelnut selections. The thermocycler settings were 98 °C for 10 seconds and 68 °C for 10 min repeated for 30 cycles. When PCRs to amplify ‘Jefferson’ DNA did not produce a product, the DNA concentration was lowered to 10 ng/µl and repeated.

Gel electrophoresis. PCR products were separated by electrophoresis on 1% agarose gels in TBE buffer at 90 V for 3 hours, with a Quick-Load 1 kb extend DNA ladder (New England BioLabs,

Ipswich, MA) in some lanes. The gels were stained with ethidium bromide for 20 min, destained for 10 min and photographed under UV light using a BioDoc-It® Imaging System (UVP, Upland,

117

CA). Photographs were inspected and PCR product sizes estimated by comparing to size bands in the ladder.

Results:

Mauve alignment of S-locus associated contigs showed greater similarity between OSU

1477.047 contig 230 and OSU 1026.073 contig 236 and greater dissimilarity with the other sequences (Figure 4.1). Since OSU 1477.047 (S2, S20) and OSU 1026.073 (S2, S10) have the common allele S2, we considered these contigs to most likely be the S2 allele. Similarly, contigs

269 and 4 were considered to represent alleles S10 and S20, respectively.

A total of 43 primer pairs were designed. Of these, 17 were for the ‘Jefferson’ S1 locus,

16 for the ‘Jefferson’ S3 locus, two primer pairs for each of two PacBio contigs from OSU

1477.047, and two primer pairs for each of three PacBio contigs from OSU 1026.073.

ThermoAlign designed primers from 50 kb portions of the target regions, running between 15 minutes and 3.5 hours for each run. The predicted fragment sizes ranged from 8,056 to 12,978 bp

(Table 1).

Use of template DNA extracted using the method of Lunde et al. (2000) resulted in no amplicons. However, amplicons were obtained when high molecular weight template DNA extracted using the method of Xin and Chen (2012) was used. The 260/230 ratios in the high molecular weight DNA samples were much lower than expected and for each selection the sample with the highest ratio was chosen for use (Table 4.2). When DNA was separated by electrophoresis on a 1% agarose gel, it showed strong RNA presence in the samples, suggesting that the RNase was not effective.

118

Initially, 27 of the 43 fragments were amplified from ‘Jefferson’ DNA. After reducing template concentration to 10 ng/reaction, two more fragments amplified. Fragments were amplified from DNA of the four other selections: six from OSU252.146, three from OSU

414.062, eight from OSU 1477.047, and two from OSU 1026.073 (Table 4.1).

Two primer pairs from each of the newly identified S-locus-associated PacBio contigs were tested with ‘Jefferson’, OSU 252.146, OSU 414.062, OSU 1026.073 and OSU 1477.047

DNA. Both primers from the S10 and S20 contigs amplified ‘Jefferson’ DNA, one of the two S10 fragments amplified from OSU 252.146, OSU 414.062, and OSU 1477.047 DNA. One primer pair from each of the S2 contigs (230 and 236) and S20 amplified in OSU 1477.047. Nothing amplified from OSU 1026.073 DNA (Table 4.1).

Discussion:

Using LRPCR for targeted resequencing of the hazelnut S-locus requires an efficient method for the sequencing of amplicons. Illumina Mi-Seq has been used to sequence long amplicons (Jia et al. 2015, Denier et al. 2017) but can result in fragmented assemblies. Bai et al.

(2019) used both PacBio (RSII) and Illumina (Hi-Seq) sequencing to sequence LRPCR amplicons from viral DNA and reported that while PacBio reads easily spanned all repetitive regions sequenced and allowed for easy sequence assembly, the reads had more variable coverage than Illumina Hi-Seq reads. They recommended that the benefits of easy assembly gained from

PacBio should be weighed against the uniform coverage and lower cost of Illumina Hi-Seq. The assembly of the highly repetitive and polymorphic S-locus will likely benefit from long-read sequencing of LRPCR amplicons. Heterozygosity at the S-locus will also complicate assembly of

119

two alleles from short fragment sequencing, further supporting the use of long-read technology.

LRPCR and amplicon sequencing should be able to produce an accurate representation of the S-locus region independent of the reference genome in order to verify the assembly of the region. Potential reasons for failure of LRPCR to amplify some regions include secondary structure formation, inhibitors present in the template samples, and polymorphism or mis- assembly of the region from which the primers were designed. The developers of ThermoAlign addressed the issue of secondary structures by adding betaine to their reactions, leading to amplicon generation in cases where they were not initially obtained. Use of betaine or DMSO may improve success of LRPCR in hazelnut if secondary structures and mis-priming are inhibiting amplification (Jenson et al. 2010). It is very likely that inhibitors are present in the template samples and extracting DNA from fresh tissues with a higher proteinase K concentration and fresh RNase may purify the samples sufficiently for amplification to proceed.

Once potential issues stemming from inhibitors and secondary structure are addressed, primers could be redesigned for regions where amplicons are still not obtained from ‘Jefferson’ DNA.

Conclusion:

These initial results from long-range amplification show promise for utility in targeted resequencing. Further refinement of the technique will be pursued to achieve more consistent and reliable amplicon generation. While the current set of primers designed using ThermoAlign software shows promise for tiling the ‘Jefferson’ S-locus with long-range amplicons, there are issues with transferability of the primers to other genetic backgrounds. This is not surprising as

120

the program is built to develop highly specific primers based on a reference genome. More work could be done to identify conserved regions in the S-locus where more universal primers could be designed, allowing for long-range amplification in diverse backgrounds.

121

References:

1. Bai, C.M., B. Morga, U. Rosani, Shi, J. Li, C. Xin, L.S. and Wang, C.M. 2019. Long-range

PCR and high-throughput sequencing of Ostreid herpesvirus 1 indicate high genetic

diversity and complex evolution process. Virology 526:81–90.

2. Darling A.C., B. Mau, F.R. Blattner, and N.T. Perna. 2004. Mauve: multiple alignment of

conserved genomic sequence with rearrangements. Genome Res. 14(7):1394–1403.

doi:10.1101/gr.2289704

3. Deiner, K., M.A. Renshaw, Y. Li, B.P. Olds, D.M. Lodge, and M.E. Pfrender. 2017. Long-

range PCR allows sequencing of mitochondrial genomes from environmental DNA.

Methods Ecol. Evol. 8:1888– 1898.

4. Jensen, M.A., M. Fukushima, and R.W. Davis. 2010. DMSO and betaine greatly improve

amplification of GC-rich constructs in de novo synthesis. PLOS ONE 5(6):e11024.

https://doi.org/10.1371/journal.pone.0011024

5. Jia, H., Y. Guo, W. Zhao, and K. Wang. 2015. Long-range PCR in next-generation

sequencing: comparison of six enzymes and evaluation on the MiSeq sequencer. Sci

Rep. 4(5737).

6. Lunde C.F., S.A. Mehlenbacher, and D.C. Smith. 2000. Survey of hazelnut cultivars for

response to eastern filbert blight inoculation. HortScience 35:729–731.

7. Mehlenbacher, S. A. 2014. Geographic distribution of incompatibility alleles in cultivars

and selections of European hazelnut. J. Amer. Soc. Hort. Sci. 139:191-212.

8. Mehlenbacher, S.A. 1997. Testing compatibility of hazelnut crosses using fluorescence

microscopy. Acta Hort. 445:167-171.

122

9. Mehlenbacher, S.A. and D.C. Smith. 2006. Self-compatible seedlings of the cutleaf

hazelnut. HortScience 41:482–483.

10. Xin, Z. and J. Chen. 2012. A high throughput DNA extraction method with high yield and

quality. Plant Methods. 8(1):26. doi:10.1186/1746-4811-8-26

123

Table 4.1: Primers developed using ThermoAlign software tile the hazelnut S-locus region with long range PCR amplicons. Successful PCR is indicated by the estimated fragment size obtained in the column correponding to which DNA was used. DNA used (with estimated band sizes) Jefferson' Expected reference F primer R primer amplicon size 'Jefferson' OSU252.146 OSU414.062 OSU1477.047 OSU1026.073 TA_4_4534442 TA_4_4547067 S1 _28_F _28_R 12625 14000 NA NA NA NA TA_4_4547038 TA_4_4559334 S1 _28_F _28_R 12296 14000 NA NA NA NA TA_4_4559306 TA_4_4569489 S1 _26_F _27_R 10183 13000 NA NA NA NA TA_4_4563053 TA_4_4571215 S1 28_F _28_R 8162 12000 NA NA NA NA TA_4_4571212 TA_4_4584065 S1 _27_F _27_R 12853 NA NA NA NA NA TA_4_4584039 TA_4_4595111 S1 _27_F _28_R 11072 12000 NA NA NA NA TA_4_4595085 TA_4_4607607 S1 _27_F _27_R 12522 NA NA NA NA NA TA_4_4600772 TA_4_4609250 S1 _28_F _27_R 8478 NA NA NA NA NA TA_4_4608166 TA_4_4619778 S1 _27_F _27_R 11612 12000 NA NA NA NA TA_4_4616015 TA_4_4624189 S1 _27_F _28_R 8174 8000 NA NA NA NA TA_4_4624169 TA_4_4634848 S1 _27_F _28_R 10679 NA NA NA NA NA TA_4_4634801 TA_4_4646608 S1 _27_F _27_R 11807 NA NA NA NA NA TA_4_4641551 TA_4_4650517 S1 _27_F _28_R 8966 10000 NA NA NA NA TA_4_4650491 TA_4_4662099 S1 _28_F _27_R 11608 12000 NA 10000 NA NA

124

Table 4.1 (Cont.): Primers developed using ThermoAlign software tile the hazelnut S-locus region with long range PCR amplicons. Successful PCR is indicated by the estimated fragment size obtained in the column correponding to which DNA was used. DNA used (with estimated band sizes) Jefferson' Expected reference F primer R primer amplicon size 'Jefferson' OSU252.146 OSU414.062 OSU1477.047 OSU1026.073 TA_4_4662046 TA_4_4673916 S1 _28_F _28_R 11870 12000 NA NA NA 6000 TA_4_4673890 TA_4_4684218 S1 _28_F _27_R 10328 11000 NA NA NA NA TA_4_4697014 TA_4_4707166 S1 _27_F _28_R 10152 12000 NA 9000 NA NA TA_4_6752334 TA_4_6762959 S3 _28_F _28_R 10625 12000 NA NA 15000 NA TA_4_6772993 TA_4_6782112 S3 _28_F _27_R 9119 10000 NA NA NA NA TA_4_6782091 TA_4_6794881 S3 _28_F _27_R 12790 14000 NA NA NA NA TA_4_6793567 TA_4_6802350 S3 _28_F _27_R 8783 10000 NA NA NA NA TA_4_6800846 TA_4_6813817 S3 _27_F _28_R 12971 14000 NA NA NA NA TA_4_6813786 TA_4_6826764 S3 _28_F _27_R 12978 NA NA NA NA NA TA_4_6826738 TA_4_6839706 S3 _27_F _28_R 12968 NA NA NA NA NA TA_4_6839613 TA_4_6848763 S3 _27_F _28_R 9150 10000 NA NA NA NA TA_4_6848733 TA_4_6861118 S3 _27_F _27_R 12385 14000 14000 NA NA NA TA_4_6853096 TA_4_6864618 13000 and S3 _28_F _28_R 11522 8000 8000 NA NA 13000 TA_4_6875364 TA_4_6884329 S3 _28_F _28_R 8965 10000 10000 NA 10000 NA

125

Table 4.1 (Cont.): Primers developed using ThermoAlign software tile the hazelnut S-locus region with long range PCR amplicons. Successful PCR is indicated by the estimated fragment size obtained in the column correponding to which DNA was used. DNA used (with estimated band sizes) Jefferson' Expected reference F primer R primer amplicon size 'Jefferson' OSU252.146 OSU414.062 OSU1477.047 OSU1026.073 TA_4_6875364 TA_4_6884329 S3 _28_F _28_R 8965 10000 10000 NA 10000 NA TA_4_6875364 TA_4_6884329 S3 _28_F _28_R 8965 10000 10000 NA 10000 NA TA_4_6888056 TA_4_6899139 S3 _27_F _28_R 11083 12000 NA NA NA NA TA_4_6902837 TA_4_6913907 S3 _27_F _27_R 11070 12000 NA NA NA NA TA_4_6908695 TA_4_6916805 S3 _28_F _27_R 8110 NA NA NA NA NA OSU1477 Expected .047 F primer R primer amplicon size 'Jefferson' OSU252.146 OSU414.062 OSU1477.047 OSU1026.073 S20 (Contig TA_12_259426 TA_12_260448 4) 2_29_F 7_29_R 10225 12000 NA NA NA NA S20 (Contig TA_12_259605 TA_12_260725 4) 7_28_F 1_27_R 11194 12000 NA NA 15000 NA S2 (Contig TA_14_110548 TA_14_121971 230) _28_F _28_R 11423 NA NA NA 15000 NA S2 (Contig TA_14_121952 TA_14_133688 230) _28_F _28_R 11736 NA NA NA NA NA

126

Table 4.1 (Cont.): Primers developed using ThermoAlign software tile the hazelnut S-locus region with long range PCR amplicons. Successful PCR is indicated by the estimated fragment size obtained in the column correponding to which DNA was used. DNA used (with estimated band sizes) OSU1026 Expected .073 F primer R primer amplicon size 'Jefferson' OSU252.146 OSU414.062 OSU1477.047 OSU1026.073 Unknown (Contig TA_17_208_28 TA_17_8264_2 115) _F 9_R 8056 NA NA NA NA NA Unknown (Contig TA_17_8238_2 TA_17_17362_ 115) 9_F 29_R 9124 NA NA NA NA NA S2 (Contig TA_18_217617 TA_18_225997 236) _26_F _27_R 8380 NA NA NA NA NA S2 (Contig TA_18_225985 TA_18_236037 236) _26_F _29_R 10052 NA NA NA 10000 NA S10 (Contig TA_19_181403 TA_19_189654 269) _29_F _28_R 8251 9000 10000 10000 10000 NA S10 (Contig TA_19_189638 TA_19_201330 269) _27_F _29_R 11692 12000 NA NA NA NA

127

Table 4.2: Ultraviolet spectrophotometry was used to estimate concentration and quality of high molecular weight DNA extracted from frozen hazelnut leaves Sample 260/280 260/230 ng/µL Jefferson-1 2.159 1.927 903.1 Jefferson-2 2.151 1.523 546.9 Jefferson-3 2.125 1.942 807.4 252.146-1 2.138 2.007 758 252.146-2 2.125 1.95 1129.6 252.146-3 2.147 1.971 981 414.062-1 2.144 1.631 365.1 414.062-2 2.106 1.79 422.2 414.062-3 2.128 1.928 789.8 1477.047-1 2.138 1.828 1166.7 1477.047-2 2.138 1.946 474.2 1477.047-3 2.197 1.743 337 1026.073-1 2.107 1.995 1780 1026.073-2 2.223 2.248 2588 1026.073-3 2.169 2.155 2705.1 Bolded samples were used in long-range PCR reactions.

128

Figure 4.1: Mauve alignment of S-locus regions from the ‘Jefferson’ hazelnut reference genome with S-locus associated contigs from OSU1477.047 and OSU1026.073.

129

Appendices

130

Appendix A: PacBio contigs in the V2 genome sequence of 'Jefferson' hazelnut used in simple sequence repeat marker development, associated contigs from High Information Content Fingerprinting (HICF) of bacterial artificial chromosomes (BACs) and the BACs contained in the Contigs. BAC end sequences from these HICF contigs were used to identify S-locus associated PacBio contigs.

‘Jefferson' V2 contig HICF Contigs Minimal Tiling Path BACs Contig 6 Contig 1741 CA703104G17 CA703029K22 CA703047L07 CA703051N21 Contig 7 Contig 1865 CA703008M02 CA703069E22 CA703045C16 CA703029H02 CA703058J08 CA703058M14 Contig 11 Contig 1422 CA703023D14 CA703079E09 CA703083I13 CA703086E22 CA703018B02 CA703091F05 CA703041K15 CA703036C10 CA703032P07 CA703037N24 CA703050L13 CA703045F14 CA703051P09 Contig 16 Contig 1807 CA703082K12 CA703103N14 CA703094J01 CA703035F03 Contig 16 Contig 359 CA703066D04 CA703054E19 CA703071N03 CA703007E03 CA703023H24 CA703083C02 CA703100B11 Appendix A (Cont.): BAC end sequences from these HICF contigs were used to identify S-locus associated PacBio contigs.

131

Jefferson' V2 contig HICF Contigs Minimal Tiling Path BACs Contig 16 Contig 359 CA703101I11 CA703002G02 CA703031D09 CA703020D11 CA703039F07 CA703045N14 CA703016L09 CA703095J08 CA703085I08 CA703095O18 CA703071C14 Contig 16 Contig 1865 CA703044L03 CA703092F03 CA703065G07 CA703048M24 CA703098J17 CA703026O04 Contig 18 Contig 272 CA703058G01 CA703099C01 CA703016C14 CA703023L09 CA703001B07 Contig 18 Contig 1807 CA703082K12 CA703103N14 CA703094J01 CA703035F03 Contig 18 Contig 43 CA703101B03 Contig 30 Contig 1082 CA703049P07 CA703030A05 Contig 39 Contig 300 CA703026P16 CA703022H01 CA703083P21 CA703082M15 CA703086A17 CA703086L12 CA703030A12 CA703011C06 Appendix A (Cont.): BAC end sequences from these HICF contigs were used to identify S-locus associated PacBio contigs. Jefferson' V2 contig HICF Contigs Minimum Tiling Path BACs Contig 41 Contig 1259 CA703039A14 CA703029G18 Contig 63 Contig 1207 CA703100G18

132

Contig 87 Contig 1197 CA703026G12 CA703078F08 CA703045H09 CA703066H18 Contig 88 Contig 1768 CA703026P07 CA703054E11 CA703004J24 CA703038C14 CA703013H24 CA703062J09 CA703091M20 CA703081H04 CA703083O03 CA703013O20 CA703003H20 CA703079F22 CA703004I16 Contig 97 Contig 300 CA703026P16 CA703022H01 CA703061L10 Contig 120 Contig 1768 CA703028A21 CA703077G04 CA703013B01 CA703026H19 Contig 123 Contig 672 CA703083C17 CA703046E21 CA703027H06 Contig 332 Contig 1207 CA703053C06 Contig 337 Contig 1741 CA703095F01 CA703063D01 Contig 479 Contig 672 CA703047P22 CA703097G14 Contig 483 Contig 672 CA703097G14

133

Appendix B: SNPs targeted for development as HRM markers in the hazelnut S-locus region. Marker names give the last three numbers for the gene name, with 846-HRM1 being a HRM marker for a SNP in or near the gene model for Corav.Jeff_12846. The number at the end of each name is present to distinguish between multiple markers associated with the same gene model. Scaffold 4 SNP Name (abv) Position type Segregation Tm Result Forward Primer Reverse Primer 846-HRM1 4416848 AG lmxll 60 Mapped CTGTGTTGGTGAAGGGGTTC TGAAGCCATTCTTGTTGTGC 854-HRM1 4451774 AG lmxll 60 No Segregation GGATGGCCAGTTTGAATTTG CACCACTCGTTAGTCACATGC 854-HRM2 4451882 CA lmxll 60 Mapped GCCTCATTGCGTCTTTTATG ATATCCAATTTCGCCGCACT 854-HRM3 4446825 TC lmxll 60 Good, redundant TGGGGAATATTGTCGAGGTC GGCGTTTCAATTGCGATTTC 859-HRM1 4521384 GT lmxll 60 Poor segregation GAACGACACGCAGACTGGTA GGCTGCAACCAAAAAGAGAA 859-HRM2 4521384 GT lmxll 60 Mapped AGTCACCTATTGGGGTTGGA GCTGAGAAATATTACCAAGTGCAG 859-HRM3 4521762 GA lmxll 60 Good, redundant AAAACTCAAACTTTTTGTGACCA ACAGCAATGGCTTTCAGCTT 862-HRM1 4533767 GA lmxll 60 Mapped GCAAGGCATTTCCAGGATAG TGCTCCAAGTATGTGCATGT 862-HRM2 4533767 GA lmxll 60 No Segregation GCAAGGCATTTCCAGGATAG CAACCTCAATTTGCCTTTGG 864-HRM1 4542011 CT lmxll 60 Mapped TGAACCACTCTTCCCAGGAG GATCCCGATCGTGTTAAAGC 864-HRM2 4542693 GA lmxll 60 Good, redundant TTTTGGCTGGGTTAGACAGG AGTGGTTTCTTTGCATGGTG 868-HRM1 4629076 GA lmxll 60 Mapped TGGCTTTGAATTTGTTAGAGATGA GGATTGATATGCCAAACCAT 868-HRM3 4630979 GA nnxnp 60 Poor segregation TGTGGGACAATTCCAGTGAA GACTTGGGACATTTCCATGC 869-HRM1 4631318 GA nnxnp 60 Poor segregation TTGCCTCATATGGGACCTGT TGCAAATTTTCCACCCTTATG 870-HRM1 4633192 CT lmxll 60 Mapped CTTCACTCGAGATGCAAGGTT TTCATATGCAATATGTCCATCG 870-HRM2 4633684 CA nnxnp 60 No Segregation CATGCCTTATCTTACATGCATCA GGAGGAATCAGGATCAAGGA 870-HRM3 4635862 GA lmxll 60 Good, redundant CAGCACCTAAACCAAATCCA TTCTCGTTCCTTTGCATCAG 870-HRM4 4641887 TC lmxll 60 No Segregation TGGACCAATCCCTTCATCTC AAATCTTTGGGGATGGAACC 872-HRM3 4649391 4 snps nnxnp 58 No Segregation TCATTAAAGTATCAGAGTCATGAACA TTTTGAATCCAAAGTGTATGATTGA 872-HRM4 4651126 2 snps nnxnp 58 No Segregation TTCCTTCCCTAATTTAGTTTATCTTGA AAGTGGAATACTTCCCTTAAGTCG 873-HRM4 4655860 CT lmxll 58 No Segregation GTGTCGCAATGTGTGCTTCT CAAATCCACACCGTCTTCAA 873-HRM5 4658188 2 snps lmxll 58 No Segregation TGGCTTTGAATTTGTTAGAGATGA GGATTGATATGCCAAACCAT

134

Appendix B (Cont.): SNPs targeted for development as HRM markers. Scaffold 4 SNP Name (abv) Position type Segregation Tm Result Forward Primer Reverse Primer 873-HRM7 4666149 GA lmxll 58 No Segregation TTCCTAGGTGACAGTTTAATGTCCT GCAAGTCTTTCCTCCTGGAAT 873-HRM8 4667673 TC lmxll 58 No Segregation TAAGCCACATGGACCAATGA TCCAGCTGTAGGCAATTTCA 877-HRM1 4690577 TC lmxll 60 Poor segregation GATTTCCAACCCCAAGTGAA TGACCAAGTGCCTCAAGATG 877-HRM2 4690650 GA lmxll 60 Mapped CATCTTGAGGCACTTGGTCA TTTCAAGAGCACCTGCAATG Note: Segregation is shown by nnxnp for male SNPs inherited from the male parent or lmxll for female SNPs inherited from the female parent of the mapping population.

135

CB006.06 CB006.13 37 40 80 70 30 60 17 18 20 40 7 7 10 4 5 20 10 7 7 1 2 2 5 1 0 0 294 296 300 301 302 304 306 308 310 316 153 155 157 159 161 167

CB007.01 CB007.03 60 40 45 30 30 40 16 16 21 20 20 14 7 6 6 6 8 10 2 3 3 2 4 1 1 4 2 1 1 1

0 0

284 300 268 278 280 286 288 290 292 294 296 302 308 216 220 222 226 228 230 232 234 236 256

CB007.04 CB007.07

40 31 60 48 30 40 20 11 11 20 18 8 7 9 12 10 6 3 3 6 20 2 1 1 1 2

0 0

239 237 241 243 245 247 249 251 253 255 259 285 289 227 137 141 143 145 147

CB007.10 CB011.02 60 60 54 40 35 40 40 28 20 20 9 5 1 1 3 3 3 2 5 1 1 3 2 1 2 1 0 0 268 270 274 276 278 279 281 283 285 190 192 194 196 200 202 204 206 210 212 216

Appendix C: Allele frequencies in 50 hazelnut accessions at 39 SSR loci

136

CB016.01 CB016.04 60 60 43 44 40 40 18 16 18 15 20 20 10 10 6 1 2 3 1 3 1 2 4 3 0 0 206 208 214 216 218 220 222 224 228 159 169 171 173 175 177 183 185 189

CB016.05 CB016.09 30 40 30 29 20 30 20 15 16 16 11 20 8 10 6 9 7 3 3 10 4 6 6 2 1 1 1 3 3 0 0 190 192 194 198 200 202 204 206 208 210 156158160162164166168170172174176178

CB016.10 CB016.11 52 60 48 60 40 40 17 11 12 20 6 20 9 9 2 1 1 2 1 5 1 1 3 6 5 5 1 1 1 0

0

222 226 228 232 234 236 238 240

173 175 177 179 183 187 189 191 193 197 203 220

230* 231* 229* * Note: It is difficult to distinguish between 229, 230, and 231. They were marked 230 when scoring was uncertain but it is clear that all three sizes are present.

CB018.04 CB018.05 40 60 30 45 30 40 26 20 14 14 12 12 19 7 20 10 5 6 7 1 1 1 0 0 156 160 162 164 166 168 170 172 262 264 266 268 270 272 274

Appendix C: Allele frequencies in 50 hazelnut accessions at 39 SSR loci

137

CB018.12 CB030.03 40 36 25 23 30 20 17 17 18 16 15 20 8 8 7 10 6 10 4 5 3 4 4 2 2 1 3 2 1 1 2 5 1 2 1 1 1 1 1 2

0 0

164 165 166 168 169 170 171 172 173 174 175 176 177 178 179 180 188

271 249 253 257 259 261 263 267 269 273 277 281 283 247

CB041.02

CB039.03 40 34 60 30 41 19 20 40 20

20 12 8 7 6 5 9 9 10 6 6 1 1 1 1 2 1 1 1 1 1 1 2 0 0

268 274 276 278 282 283 284 286 291 293 294 295

293 295 297 299 301 303 305 307 309 311 313 317 291

CB041.08 CB063.07 40 30 26 29 30 17 18 20 20 15 15 20 11 7 7 7 10 6 5 10 4 1 1 2 2 2 2 2 1

0 0

164 166 170 172 174 176 178 180 182 184 186 162 119 127 131 133 135 137 139 141 147 151

CB087.01 CB097.07

40 30 25 29 21 30 26 20 20 20 11 10 9 8 8 10 5 7 6 10 5 2 1 1 2 2 2 0 0 113 115 123 125 127 129 131 135 137 194 196 200 202 204 206 208 210 212 214 216

Appendix C: Allele frequencies in 50 hazelnut accessions at 39 SSR loci

138

CB123.01 CB123.02

60 40 33 39 30 40 19 20 14 17 16 8 20 11 8 10 6 6 6 2 4 2 1 2 1 1 2 2 0 0

227 235 237 239 241 243 245 247 253

275 277 279 281 283 285 287 289 293 297 299 273

CB145.03 CB332.01 36 40 80 63 30 22 60 18 20 40 7 10 20 10 6 12 1 20 3 2 0 0 122 125 127 129 131 133 137 282 284 285 286 288

CB332.02 CB479.02 28 30 30 20 20 20 17 20 15 15 13 13 12 12 8 10 10 4 10 4 2 1 2 3 1 0 0 187 189 191 193 195 197 199 203 205 71 75 77 79 81 83 85 87 93 95

CB483.02 CB736.01

60 50 40 32 30 24 23 40 20 17 13 20 9 7 10 1 2 2 1 4 4 3 3 1 3 1 0 0 274 276 278 280 282 284 286 288 290 292 294 201 203 205 207 209 213 215 221

Appendix C: Allele frequencies in 50 hazelnut accessions at 39 SSR loci

139

RH_SLOC02 RH_SLOC03 19 20 16 30 14 23 21 15 13 9 20 16 10 7 7 12 11 11 4 4 5 2 2 2 10 4 1 1 1 0 0

153 159 161 162 163 164 167 169 171

217 195 197 199 209 213 215 219 221 223 227 229 193

RH_SLOC05 RH_SLOC06 60 60 45 44 40 34 40 14 20 10 20 11 8 12 1 5 5 2 1 5 3 0 0 113 115 119 121 123 125 164 168 172 174 176 178 180 182 184

RH_SLOC07 RH_SLOC10

30 20 17 1617 21 14 20 15 15 11 13 8 8 10 10 6 7 6 5 4 2 2 3 4 3 3 1 1 1 5 1 2 2 2 2 1 1 1 0

0

290 296 297 298 300 302 304 306 308 309 314 316 318 320

237 241 243 245 247 249 251 253 255 259 261 263 271 273 275 277 281 235

Appendix C: Allele frequencies in 50 hazelnut accessions at 39 SSR loci

140

Appendix D: Primers designed for qPCR. Primer names in bold were used for qPCR.

Gene Primer name Forward primer Reverse primer reference genes Corav.Jeff_09561 Corav_GAPDH1 GGTTCGGAAGGATTGGTCGT CTCGCCGAAGAGAAGGGTTT Corav.Jeff_09561 Corav_GAPDH4 ACTGGAAGCACACTGACGTG CGTTCACACCCACAACAAAC Corav.Jeff_23240 Corav_Histone_H3- 1.2 GCAAATCAGCCCCAACTACG GAACAACCAACCAGGTACGC Corav.Jeff_23240 Corav_Histone_H3- 1.3 TCTTCGAGAAATCCGCAAGT TGTCTTCGAACAACCAACCA Corav.Jeff_06880 Corav_HMG-CoA- Reductase1 CCATGGTTTTCAACCGCTCC ATCGAGGATGTTCTGGACGC Corav.Jeff_06880 Corav_HMG-CoA- Reductase2 CTCCAAAGGCGTCCAGAACA CCACATCGCCGCTTATGAGA Corav.Jeff_06880 Corav_HMG-CoA- ReductaseX* GTCCTCAAAACCAACGTGGC ACATTCTGAGCGGGGTCTTG Corav.Jeff_00644 Corav_TUB1 TACCGGCAACTCTTTCACCC CGTTCCAGTAGAAGGGAGCC Corav.Jeff_00644 Corav_TUB2 GGGTTCACCGTGTATCCCTC GCGCCTGCAGATGTCATAGA Corav.Jeff_09797 PP2A GCAAATCAGCCCCAACTACG GAACAACCAACCAGGTACGC Corav.Jeff_09797 PP2A-3 CTCGAAAGGTCGTACCCAAA TTGCACTCCATCAGATGCTC S-locus genes Corav.Jeff_12862 S62.1 TTGCACAGTTGCACCAGAATG GAAGTGGGTGGTTGTGGGTT Corav.Jeff_12862 S62.2 CCATGCCCACGAGCATACTA GCGGAAGTGGTAGAAGCCAT Corav.Jeff_12863 S63.3.1 AGGTCATTGCGAGAGAGACG CTCTTCCGACTAACGTCCCC Corav.Jeff_12863 S63.3.5 GTGGGGACGTTAGTCGGAAG ACCAATGCGTTTCATTCGCTT Corav.Jeff_12864 S64.0.1 CACCGAGATAAGCGATGGGC GCAACAGTTTGCGCTTGAAC Corav.Jeff_12864 S64.0.2 TCAGGGTGTTCAAGCGCAAA ACGATCGGGATCAGCAGTAG Corav.Jeff_12865 S65.3 CCGTTAGATGGAGGTCGTGG TTACAGGCAATGGTCGCCTC Corav.Jeff_12865 S65.6.1 GCAAGCACTTATTCGTGGGC CGACCCTCGTAACATCCAGG Corav.Jeff_12866 S66.1.1 TTGCCGGCACTTACGGTTAT TGGTAGGCGCCGGTCTAATA Corav.Jeff_12866 S66.1.2 GCCGGCACTTACGGTTATGT AGATGATTTGGAGGTGGTAGGC Corav.Jeff_12867 S67.1 TGGAGGCGCACAGTTTACTT ATGCGCTCTCACCTTCCAAA Corav.Jeff_12867 S67.2 TTTGGAAGGTGAGAGCGCAT AACTGGCTGTTCCCAAGGAG Corav.Jeff_12868 S68.2.1 GTGCACTCAGAATGCTTGGA CAGTTCGTTGTTGGCCTTTT Corav.Jeff_12868 S68.3.1 AAAAGGCCAACAACGAACTG GTGAGGTCACAGGGATTGCT Corav.Jeff_12869 S69.0.1 CATGGCATCCTCCGTTTTCG TAACCACTCCACCATCCCGT Corav.Jeff_12869 S69.0.2 GGCATCCTCCGTTTTCGTTTC TCGTTGCTGTAACCACTCCA Corav.Jeff_12870 S70.0.2 TCGCCATTTTTCTTGGATTC ATGTCAAAGTCCTCGGTTGC Corav.Jeff_12870 S70.3.1 AAGGGGTAACATGCTGGTTG TCGATTGCTATTGAGGGACA Corav.Jeff_12871 S71.0.1 AAGCGCTCAGAAAGCACTTC TTCCTTGATAACCCCGTCTG Corav.Jeff_12871 S71.0.2 ACAAGCGCTCAGAAAGCACT CCTTGATAACCCCGTCTGAA Corav.Jeff_12871 S71.1.1 TACAAGCGCTCAGAAAGCAC TTGAGGTTGAGCAGAGGCATT Corav.Jeff_12872 S72.0.1 ACCCATCAGCATTTTCCTTG ATGTCAAAATCCTCGGTTGC Corav.Jeff_12872 S72.1.1 TTGCCCCAGAGTTGGCTTAT GGAGGTGGTAGACGTTGGTC Corav.Jeff_12873 S73.4 AGAGGGTGGGTTTGGTTGTG CTGCGACTGTAAGCCCTGTT Corav.Jeff_12873 S73.7.2 TATGGTTATGCAGCCCCGAG TGCTCCCCATTTGGTTGGTT Corav.Jeff_12874 S74.2 GTGTCGCAATGTGTGCTTCT GACCACTCATTGGCGAGTTT

141

Appendix D: All primers developed for qPCR, bolded names indicate sets used for qPCR.

Gene Primer name Forward primer Reverse primer Corav.Jeff_12874 S74.5.2 GCATCCGACACCTTAATTGG AACGCTTCCATAACCACCTG Corav.Jeff_12875 S75.1 TGTCTTCAAGGTCCAGCGTG TGGTTGGTCTCGGCTTGAAT Corav.Jeff_12875 S75.2 TTCAAGCCGAGACCAACCAG ACTGCAACTGTAAGCCCTGT Corav.Jeff_12876 S76.1 TGCAAGTTAAGGCCCACAAAA CTCCAAAGACGAGGAAGCGA Corav.Jeff_12876 S76.3 TGAAGTCAGCTCGCTTCCTC GGGCTACAAGCATGTCCCAA Corav.Jeff_12877 S77.1.1 ACAATGATGCTCTCCACGCC GGGGAAGAGCTGCTTACTGC Corav.Jeff_12877 S77.1.2 AAGGGCTAGGTGGAAGACCA TTGCTCCATGAAAGCTCGGT Corav.Jeff_12878 S78.0.1 CACGTCGATGAGAACGAGAA TCAAGGCCTTCCATCAATTC Corav.Jeff_12878 S78.0.2 CTACCATGTCGTCGCTCTCA TTTTCGGGATTTTCTGTTCG Corav.Jeff_12879 S79.2 CTCTGGGAACTGCTGTTTTGC GTGGAGGTGCATTGTCCCATA Corav.Jeff_12879 S79.2.2 TCATGGATGCCTGAAGATGA AATGCATCTTTGCCCAGAAC * Corav_HMG-CoA-ReductaseX: Published by Žiarovská J., N. Nikolaieva, K. Garkava, and J. Brindza. 2015. Validation of HMG CoA Reductase as internal control for hazelnut pollen allergens expression analysis. Austin J Genet Genomic Res. 2(1): 1008.

142

Appendix E: Graphs for relative expression in 13 S-locus genes to the average of three reference genes after pollination with compatible (C) or incompatible (I) pollen. Error bars represent standard error of the mean.

143

Appendix E: Graphs for relative expression in 13 S-locus genes to the average of three reference genes after pollination with compatible (C) or incompatible (I) pollen. Error bars represent standard error of the mean.

144

Appendix E: Graphs for relative expression in 13 S-locus genes to the average of three reference genes after pollination with compatible (C) or incompatible (I) pollen. Error bars represent standard error of the mean.

145

Appendix E: Graphs for relative expression in 13 S-locus genes to the average of three reference genes after pollination with compatible (C) or incompatible (I) pollen. Error bars represent standard error of the mean.

146

Appendix E: Graphs for relative expression in 13 S-locus genes to the average of three reference genes after pollination with compatible (C) or incompatible (I) pollen. Error bars represent standard error of the mean.

147

Appendix E: Graphs for relative expression in 13 S-locus genes to the average of three reference genes after pollination with compatible (C) or incompatible (I) pollen. Error bars represent standard error of the mean.

148

Appendix E: Graphs for relative expression in 13 S-locus genes to the average of three reference genes after pollination with compatible (C) or incompatible (I) pollen. Error bars represent standard error of the mean.

149

Appendix F: S1 locus summary. BACs mapping to the region (in yellow) and the PacBio contig mapping to the region (in blue) are shown above a track showing molecular markers and their genomic positions in the S-locus. Below that are gene annotations (in orange) and long-range PCR amplicons positioned in reference to Scaffold 4 of the ‘Jefferson’ V3 reference, from 4.52 Mbp to 4.72 Mbp. Long-range amplicons are colored black if they successfully amplified and grey if they did not.

150

Appendix F: S3 locus summary. BACs mapping to the region (in yellow) and the PacBio contig mapping to the region (in blue) are shown above a track showing molecular markers and their genomic positions in the S-locus. Below that are gene annotations (in orange) and long-range PCR amplicons positioned in reference to Scaffold 4 of the ‘Jefferson’ V3 reference, from 6.76 Mbp to 6.92 Mbp. Note: CaJ_13141 may contain multiple genes, there is evidence of features of LRR-RLK genes and the glutaredoxin-domain containing gene predicted in the S1 locus. However no stop codons exist in the open reading frame.