DNA Fingerprinting update - 2019 campaign and beyond

There has been huge progress across the UK. Its been a wonderful initiative by Peter Laws with East Malling Research and National Fruit Collection. Thank you to them all.

Now we’re all beginning to see further benefits as other regional groups submit more and more samples, resulting in matches of our varieties to theirs, and this then demonstrates these aren’t chance seedlings but rather and sometimes gives suggested identities. Additionally, there has been an effort to see whether parentage can be extracted and what it may reveal. We’ll address examples of these in the following articles.

First, though, let’s make a brief recap of how our samples are fingerprinted and what it looks like; it is essentially the same for pears and cherries.

Fingerprinting

The DNA fingerprinting we’ve been using is relatively cheap (about £30 a sample) so has to be simple yet able to distinguish between many varieties. A double helix has about 1 billion units, or nucleotides, along its length. The method analyses a contiguous section about one millionth of this, in a part that does not make proteins. This bit, typically 80% of the whole helix, used to be called is called the intron (it has been called ‘junk’, because we had no idea what it does) but now ‘non-functional’ is preferred. As it isn’t directly involved with inherited characteristics (that bit is in the functional part that codes for making proteins), it isn’t under evolutionary pressure and the non-functional parts are thought to mutate less quickly. This is useful as it makes the method more likely stable over centuries.

DNA is usually extracted from leaves (though stipules, buds, stems and fruit can also be used) by grinding and digesting them with mild chemicals. The quantity recovered is tiny, far too little to analyse directly; of this only about a millionth part is of interest. A procedure known as amplification using polymerase chain reaction (PCR) is applied, very similar to that being used for corona virus testing.

DNA comprises four nucleotides known usually by their bases as A,C,G,T; and the two chains of the helix are bound together by A linking to T and C to G. The sequence in which these four types appear determines “the world and everything” of the . Within the non-functional part there are many short sequences of a dozen or more nucleotides common to all . The fingerprinting method involves adding two synthesized chemicals (the marker-pair) that cut out a section which has characteristic sequences at both ends. This tiny fragment is typically 100-250 nucleotides long, and must then be amplified in quantity by PCR roughly one billion-fold to make it measurable. As such the method would not distinguish between one variety and another but for a fortunate fact which can be exploited. Notwithstanding the comments above, over time when the DNA has been reproduced some errors in copying occur, and those in the non- functional region have no evolutionary pressure to be corrected. The adroitness of this method is that different apple varieties have ‘evolved’ with different numbers of nucleotides present between the two end members defined by our marker-pairs. It may be that during replication 40 or 41 or 42 … repeats of simple sequences such as .TA. or .GCA. etc. occurred. The length of these fragments or alleles varies from one variety to another and it is length we measure and report as so many base pairs (i.e. nucleotides). We measure the length of these Simple Sequence Repeats (SSRs).

Now doing this with just one marker-pair would not unambiguously differentiate between the thousands of varieties of apples. Twelve different marker pairs are used as the standard in the European Cooperative Programme for Plant Genetic Resources. In the US, 19 marker pairs are used. These marker pairs each select a strand of about 100-250 nucleotides.

For most apples, there are two copies of each of the 17 chromosomes. These are the diploid varieties. About 10-20% of apples are triploids; they have three copies of the chromosomes, 17 from one parent and 34 from the other. And about 1% of apple are tetraploid with four sets of chromosomes. When the DNA fingerprint is analysed for diploids, as there is a pair of chromosomes which may have a different length for the section between the markers, i.e the allele, then there will be two possible lengths for each marker pair. For triploids there are three possible lengths, and for tetraploids four. To add confusion, for a given marker pair the lengths of alleles may be the same for some or all the chromosomes. Severn Bank, a tetraploid, has a fingerprint 114,118,120,129 for marker-pair CH04c07. By convention, regardless of ploidy, four numbers are carried for each marker pair; with triploids there is no fourth chromosome and the length is zero for all varieties, e.g. Arlingham Schoolboys has 96,106,112,0. For diploids the third and fourth digits are zero by convention, e.g. Cox’s Orange Pippin 106,112,0,0. And if the alleles from chromosomes are the same, then that length is only listed once, e.g. Braddick is by convention 106,0,0,0, though actually 106,106,106.

Now let’s look at real examples in the form you may already have seen. Three of the varieties above are shown in the figure, for each variety there are 12x4 columns of numbers, the twelve marker pairs being shown having successively three repeats of the colours blue, green, yellow, red, each divided into four columns with numbers corresponding to the allele lengths or with zero if duplicated. Titles in the top row are the name of the marker pair with an appendage ‘_PKx’ indicating the four peaks of the alleles. These 48 numbers make up the fingerprint; usually at least half are zeros. Despite those zeros, as each allele has typically at least six common values, there are about 6 raised to the power of 24 possible combinations, that’s more than a million million million. That’s the basis of the method’s capacity to distinguish between varieties. Further details are given on our website http://www.marcherapple.net/research/dna- analysis/

CH04c07_PK1 CH04c07_PK2 CH04c07_PK3 CH04c07_PK4 CH01h10_PK1 CH01h10_PK2 CH01h10_PK3 CH01h10_PK4 CH01h01_PK1 CH01h01_PK2 CH01h01_PK3 CH01h01_PK4 Hi02c07_PK1 Hi02c07_PK2 Hi02c07_PK3 Hi02c07_PK4 CH01f02_PK1 CH01f02_PK2 CH01f02_PK3 CH01f02_PK4 CH01f03b_PK1 CH01f03b_PK2 CH01f03b_PK3 CH01f03b_PK4 GD12_PK1 GD12_PK2 GD12_PK3 GD12_PK4 GD147_PK1 GD147_PK2 GD147_PK3 GD147_PK4 CH04e05_PK1 CH04e05_PK2 CH04e05_PK3 CH04e05_PK4 CH02d08_PK1 CH02d08_PK2 CH02d08_PK3 CH02d08_PK4 CH02c11_PK1 CH02c11_PK2 CH02c11_PK3 CH02c11_PK4 CH02c09_PK1 CH02c09_PK2 CH02c09_PK3 CH02c09_PK4

Cox's Orange Pippin 106 112 0 0 96 0 0 0 117 129 0 0 116 150 0 0 203 205 0 0 158 0 0 0 148 153 0 0 139 150 0 0 173 198 0 0 254 0 0 0 215 0 0 0 232 256 0 0

Braddick Nonpareil 106 0 0 0 96 107 0 0 111 119 129 0 106 118 150 0 168 178 182 0 136 170 178 0 147 148 0 0 135 137 139 0 173 200 0 0 228 254 0 0 215 227 231 0 244 256 0 0 Severn Bank 114 118 120 129 96 101 0 0 111 115 127 0 114 116 0 0 178 180 0 0 158 170 176 0 147 148 0 0 135 139 150 0 173 216 220 0 216 254 0 0 213 229 233 0 232 244 0 0

When comparing two fingerprints, we may conclude that they are the same variety if all values (really the non-zero values) are the same. Occasionally small experimental errors/glitches can occur with one or perhaps two alleles having a measured value 1, 2, 3 or 4 base pairs more or less than is correct. And experts are essential for confirming if a difference is probable or not.

Now let’s look at a MAN example in some detail: Cheddar Cross (CC), which has the DNA sample number 1973-134. Let’s compare it with the fingerprints of Allington Pippin (AP) and (RA), both diploids, which I’m hoping to convince you are its parents.

test any parents- progeny relationship

CH04c07_PK1 CH04c07_PK2 CH04c07_PK3 CH04c07_PK4 CH01h10_PK1 CH01h10_PK2 CH01h10_PK3 CH01h10_PK4 CH01h01_PK1 CH01h01_PK2 CH01h01_PK3 CH01h01_PK4 Hi02c07_PK1 Hi02c07_PK2 Hi02c07_PK3 Hi02c07_PK4 CH01f02_PK1 CH01f02_PK2 CH01f02_PK3 CH01f02_PK4 CH01f03b_PK1 CH01f03b_PK2 CH01f03b_PK3 CH01f03b_PK4 GD12_PK1 GD12_PK2 GD12_PK3 GD12_PK4 GD147_PK1 GD147_PK2 GD147_PK3 GD147_PK4 CH04e05_PK1 CH04e05_PK2 CH04e05_PK3 CH04e05_PK4 CH02d08_PK1 CH02d08_PK2 CH02d08_PK3 CH02d08_PK4 CH02c11_PK1 CH02c11_PK2 CH02c11_PK3 CH02c11_PK4 CH02c09_PK1 CH02c09_PK2 CH02c09_PK3 CH02c09_PK4

Cheddar Cross 112 133 0 0 96 0 0 0 117 0 0 0 118 150 0 0 203 205 0 0 158 162 0 0 148 150 0 0 137 139 0 0 173 198 0 0 224 254 0 0 215 217 0 0 242 256 0 0

Allington Pippin 106 112 0 0 96 0 0 0 115 117 0 0 106 150 0 0 178 203 0 0 158 170 0 0 148 153 0 0 137 139 0 0 173 198 0 0 254 0 0 0 215 217 0 0 242 256 0 0 Red Astrachan 106 133 0 0 96 107 0 0 117 119 0 0 116 118 0 0 186 205 0 0 143 162 0 0 148 150 0 0 137 150 0 0 173 0 0 0 210 224 0 0 215 235 0 0 242 256 0 0

Take each marker pair in turn. The first marker pair CH04c07 shows CC has two alleles of 112 and 133. Either parent could

CH04c07_PK1 CH04c07_PK2 CH04c07_PK3 CH04c07_PK4 CH01h10_PK1 CH01h10_PK2 CH01h10_PK3 CH01h10_PK4 CH01h01_PK1 CH01h01_PK2 CH01h01_PK3 CH01h01_PK4 Hi02c07_PK1 Hi02c07_PK2 Hi02c07_PK3 Hi02c07_PK4 CH01f02_PK1 CH01f02_PK2 CH01f02_PK3 CH01f02_PK4 CH01f03b_PK1 CH01f03b_PK2 CH01f03b_PK3 CH01f03b_PK4 GD12_PK1 GD12_PK2 GD12_PK3 GD12_PK4 GD147_PK1 GD147_PK2 GD147_PK3 GD147_PK4 CH04e05_PK1 CH04e05_PK2 CH04e05_PK3 CH04e05_PK4 CH02d08_PK1 CH02d08_PK2 CH02d08_PK3 CH02d08_PK4 CH02c11_PK1 CH02c11_PK2 CH02c11_PK3 CH02c11_PK4 CH02c09_PK1 CH02c09_PK2 CH02c09_PK3 CH02c09_PK4 have passed on 106. But in this case they didn’t, they passed Cheddar Cross 112 133 0 0 112 and 133, respectively. Allington Pippin 106 112 0 0 A421 106 0 0 0 88 96 0 0 111 129 0 0 108 124 0 0 182 193 0 0 136 178 0 0 153 0 0 0 131 137 0 0 173 0 0 0 212 254 0 0 215 227 0 0 232 244 0 0 Red Astrachan 106 133 0 0 96 106 0 0 88 96 0 0 129 0 0 0 108 116 0 0 180 182 0 0 136 170 0 0 148 153 0 0 131 137 0 0 173 0 0 0 224 254 0 0 227 0 0 0 232 244 0 0

Reinette Rouge Étoilée (syn Rote106 Sternrenette)120 0 0 88 103 0 0 111 129 0 0 118 124 0 0 186 193 0 0 178 182 0 0 148 153 0 0 131 0 0 0 173 206 0 0 210 212 0 0 205 215 0 0 232 244 0 0 Now the second marker pair, CH01h10. In this case both the possible parents have 96 in one of their chromosomes, and in the other 96 and 103, respectively (recall what happens when CH04c07_PK1 CH04c07_PK2 CH04c07_PK3 CH04c07_PK4 CH01h10_PK1 CH01h10_PK2 CH01h10_PK3 CH01h10_PK4 CH01h01_PK1 CH01h01_PK2 CH01h01_PK3 CH01h01_PK4 Hi02c07_PK1 Hi02c07_PK2 Hi02c07_PK3 Hi02c07_PK4 CH01f02_PK1 CH01f02_PK2 CH01f02_PK3 CH01f02_PK4 CH01f03b_PK1 CH01f03b_PK2 CH01f03b_PK3 CH01f03b_PK4 GD12_PK1 GD12_PK2 GD12_PK3 GD12_PK4 GD147_PK1 GD147_PK2 GD147_PK3 GD147_PK4 CH04e05_PK1 CH04e05_PK2 CH04e05_PK3 CH04e05_PK4 CH02d08_PK1 CH02d08_PK2 CH02d08_PK3 CH02d08_PK4 CH02c11_PK1 CH02c11_PK2 CH02c11_PK3 CH02c11_PK4 CH02c09_PK1 CH02c09_PK2 CH02c09_PK3 CH02c09_PK4 the second alleles is reported as 0). Both parents pass on 96, Cheddar Cross 96 0 0 0 AP because that’s all it has and RA doesn’t pass 107. Allington Pippin 96 0 0 0 A421 106 0 0 0 88 96 0 0 111 129 0 0 108 124 0 0 182 193 0 0 136 178 0 0 153 0 0 0 131 137 0 0 173 0 0 0 212 254 0 0 215 227 0 0 232 244 0 0 Red Astrachan 96 107 0 0 Newton Wonder 96 106 0 0 88 96 0 0 129 0 0 0 108 116 0 0 180 182 0 0 136 170 0 0 148 153 0 0 131 137 0 0 173 0 0 0 224 254 0 0 227 0 0 0 232 244 0 0

The third marker pair is CH01h01. A similar situationReinette arises Rouge Étoilée (syn Rote106 Sternrenette)120 0 0 88 103 0 0 111 129 0 0 118 124 0 0 186 193 0 0 178 182 0 0 148 153 0 0 131 0 0 0 173 206 0 0 210 212 0 0 205 215 0 0 232 244 0 0 with both possible parents having 117 in a chromosome, so neither 115 nor 119 are passed on. Note, had either AP orCH04c07_PK1 RACH04c07_PK2 CH04c07_PK3 CH04c07_PK4 CH01h10_PK1 CH01h10_PK2 CH01h10_PK3 CH01h10_PK4 CH01h01_PK1 CH01h01_PK2 CH01h01_PK3 CH01h01_PK4 Hi02c07_PK1 Hi02c07_PK2 Hi02c07_PK3 Hi02c07_PK4 CH01f02_PK1 CH01f02_PK2 CH01f02_PK3 CH01f02_PK4 CH01f03b_PK1 CH01f03b_PK2 CH01f03b_PK3 CH01f03b_PK4 GD12_PK1 GD12_PK2 GD12_PK3 GD12_PK4 GD147_PK1 GD147_PK2 GD147_PK3 GD147_PK4 CH04e05_PK1 CH04e05_PK2 CH04e05_PK3 CH04e05_PK4 CH02d08_PK1 CH02d08_PK2 CH02d08_PK3 CH02d08_PK4 CH02c11_PK1 CH02c11_PK2 CH02c11_PK3 CH02c11_PK4 CH02c09_PK1 CH02c09_PK2 CH02c09_PK3 CH02c09_PK4 Cheddar Cross 117 0 0 0 not had 117, they would have had to pass on 115 or 119; either Allington Pippin 115 117 0 0 of which are sufficiently close to 117, A421that they might have106 0 0 0 88 96 0 0 111 129 0 0 108 124 0 0 182 193 0 0 136 178 0 0 153 0 0 0 131 137 0 0 173 0 0 0 212 254 0 0 215 227 0 0 232 244 0 0 Red Astrachan 117 119 0 0 been considered within experimental uncertainlyNewton Wonder and take96 to106 be 0 0 88 96 0 0 129 0 0 0 108 116 0 0 180 182 0 0 136 170 0 0 148 153 0 0 131 137 0 0 173 0 0 0 224 254 0 0 227 0 0 0 232 244 0 0

117. This is a typical issue to confrontReinette with Rouge‘near Étoilée fits’. (syn Rote106 Sternrenette)120 0 0 88 103 0 0 111 129 0 0 118 124 0 0 186 193 0 0 178 182 0 0 148 153 0 0 131 0 0 0 173 206 0 0 210 212 0 0 205 215 0 0 232 244 0 0

And finally let’s consider the fourth marker pair, Hi02c07. It’s clear, AP has to pass on 150 and RA 118. Neither the 116 nor

CH04c07_PK1 CH04c07_PK2 CH04c07_PK3 CH04c07_PK4 CH01h10_PK1 CH01h10_PK2 CH01h10_PK3 CH01h10_PK4 CH01h01_PK1 CH01h01_PK2 CH01h01_PK3 CH01h01_PK4 Hi02c07_PK1 Hi02c07_PK2 Hi02c07_PK3 Hi02c07_PK4 CH01f02_PK1 CH01f02_PK2 CH01f02_PK3 CH01f02_PK4 CH01f03b_PK1 CH01f03b_PK2 CH01f03b_PK3 CH01f03b_PK4 GD12_PK1 GD12_PK2 GD12_PK3 GD12_PK4 GD147_PK1 GD147_PK2 GD147_PK3 GD147_PK4 CH04e05_PK1 CH04e05_PK2 CH04e05_PK3 CH04e05_PK4 CH02d08_PK1 CH02d08_PK2 CH02d08_PK3 CH02d08_PK4 CH02c11_PK1 CH02c11_PK2 CH02c11_PK3 CH02c11_PK4 CH02c09_PK1 CH02c09_PK2 CH02c09_PK3 CH02c09_PK4 118 are required in CC. CH04c07_PK1 CH04c07_PK2 CH04c07_PK3 CH04c07_PK4 CH01h10_PK1 CH01h10_PK2 CH01h10_PK3 CH01h10_PK4 CH01h01_PK1 CH01h01_PK2 CH01h01_PK3 CH01h01_PK4 Hi02c07_PK1 Hi02c07_PK2 Hi02c07_PK3 Hi02c07_PK4 CH01f02_PK1 CH01f02_PK2 CH01f02_PK3 CH01f02_PK4 CH01f03b_PK1 CH01f03b_PK2 CH01f03b_PK3 CH01f03b_PK4 GD12_PK1 GD12_PK2 GD12_PK3 GD12_PK4 GD147_PK1 GD147_PK2 GD147_PK3 GD147_PK4 CH04e05_PK1 CH04e05_PK2 CH04e05_PK3 CH04e05_PK4 CH02d08_PK1 CH02d08_PK2 CH02d08_PK3 CH02d08_PK4 CH02c11_PK1 CH02c11_PK2 CH02c11_PK3 CH02c11_PK4 CH02c09_PK1 CH02c09_PK2 CH02c09_PK3 CH02c09_PK4 Cheddar Cross 118 150 0 0 The same process can then be applied to the other eight marker Allington Pippin 106 150 0 0 pairs. CC fingerprint couldA421 arise then from106 a mix0 of0 AP0 and88 96 0A421Red 0Astrachan111 129 0 0 106108116 1241180 00 00 18288 19396 0 0 111136 129178 0 0 108153 1240 0 0 182131 193137 0 0 136173 1780 0 0 153212 2540 0 0 131215 137227 0 0 173232 2440 0 0 212 254 0 0 215 227 0 0 232 244 0 0 RA, one or other of theirNewton chromosomes Wonder being96 passed106 0 to0 CC88 . 96 0Newton0 129 Wonder0 0 0 10896 106116 0 0 18088 18296 0 0 129136 1700 0 0 108148 116153 0 0 180131 182137 0 0 136173 1700 0 0 148224 153254 0 0 131227 1370 0 0 173232 2440 0 0 224 254 0 0 227 0 0 0 232 244 0 0

Reinette Rouge Étoilée (syn Rote106 Sternrenette)120 0 0 88 103 0Reinette0 111 Rouge129 Étoilée0 (syn0 Rote106118 Sternrenette)120124 0 0 18688 103193 0 0 111178 129182 0 0 118148 124153 0 0 186131 1930 0 0 178173 182206 0 0 148210 153212 0 0 131205 2150 0 0 173232 206244 0 0 210 212 0 0 205 215 0 0 232 244 0 0 Using the 4000 fingerprints now in the fruitID dataset, which includes all from the National Fruit Collection, Irish Seed Savers Association, Tamar Valley Group, Marcher Apple Network, Gloucestershire Orchard Trust etc., there is no other combination of two varieties that can do this. Final questions to ask:  Morphology: Does the progeny look a bit like both possible parents? What do you think?

 Provenance: Were both the parents in existence (well), and with a reasonably high probability of co-location somewhere, before the progeny was introduced? Cheddar Cross was bred at Long Ashton in 1916. Allington Pippin is from Lincolnshire before 1884 and Red Astrachan came from Russia but was in England by 1813.

 Ploidy: it is well known that triploids produce little viable pollen, and seeds from the apples of triploids are often weak. Both parents being triploid is improbable, even one triploid parent is less likely than their statistical presence may suggest. In this case all three cultivars are diploid so these concerns don’t apply.

Now you’re well ready to look at a few examples of identification arising from the DNA 2019 campaign and at parentage, including comparison with those from a recent whole genome study and the National Apple Register.

DNA Fingerprinting – what can we discover of apple variety’s parents?

The following article is based upon one that was recently prepared for members of Marcher Apple Network in their Annual ‘Apple and Pears’ newsletter. It has been adapted for GOT members by including some varieties of direct interest.

More information can be squeezed from the DNA than just from matching to identify. It has been realised for some time that the DNA fingerprints can give information about parentage, at least which varieties are not the parents of a given variety! Further work has shown that it can identify plausible pairs of parents, for instance ‘’ x ‘Kidd’s Orange Red’ produced ‘’. How far can it go and with what certainty?

The answer is mixed, sometimes parents can be suggested with considerable confidence and sometimes none seem likely. Oh dear! The devil is in the detail. First then let’s test if it can match what we already (think we) know.

There are two datasets for testing how useful is the parentage from DNA SSR. A recent paper by Hélène Muranty et al. (Dr Matthew Ordidge was one of the authors) used whole-genome DNA to reconstruct with high confidence 295 families of mother, father and their progeny, called ‘trios’, this is the ‘SNP’ study. Of these there are 115 with DNA SSR data available from the NFC for comparison, and many show at least one parent being Cox’s Orange Pippin, or McIntosh.

A second source is the historic record of Breeders. Muriel Smith in the National Apple Register (NAR) of 1971 lists both parents for 181 varieties of which we also have DNA data and a further 90 with one (European) variety as parent. Similarly many include a parent being a Cox’s Orange Pippin, a Jonathan, or a McIntosh.

We tested the DNA SSR method discussed above against parentages given in these two sources, then investigated what the parentages might be of accessions held by MAN, GOT, Pippin Orchards, and Welsh Perry and Society. A full article is on the Marcher Apple Network website http://www.marcherapple.net/research/dna-parentage-investigation/. A worksheet tool was developed to carry this out efficiently and is available at https://www.fruitid.com/#help; there is also a user guide for using the tool. After nearly 1500 investigations time for someone else to have fun. Here are a few results….

Comparison of parentage derived from DNA SNP and SSR

Of the 115 SNP parents-progeny, the ‘trios’, DNA SSR data were unequivocally and immediately matched in 52 cases. Parents of a further 59 varieties were identified quite easily, though it did require care and used additional information, such as ploidy and provenance. Together that’s 90% of all tested.

Just four cases gave a wrong parentage: Golden Melon, Laxton's Superb, Norfolk Beauty and Rubens. All were traceable to data mismatches likely from experimental glitches in what are delicate and subtle data differences.

Don’t expect it to be infallible but a potentially useful tool. Cheap and cheerful…. But that’s what DNA SSR is compared with whole genome studies.

Comparison between parentage from Plant Breeder records and that derived from DNA SSR

Of the 183 varieties with two parents given in the NAR, for only 87 do both parents appear plausible, perhaps another 16 trios might be plausible if there were experimental ‘glitches’ in DNA fingerprints. Eighty were clearly implausible, which is a polite way of saying are wrong. It suggests that either frequent mistakes were made in the original ID of trees, or record keeping was poor, or parentages were retro-actively assessed from morphology. Among breeders it is notable that Wastie failed to get one right out of 13. By contrast, Research Stations such as East Malling, Long Ashton, Merton, Wageningen, and Ottawa are pretty reliable. It can be done. Parentage records should be viewed with considerable caution.

It is quite often quoted that parentage of Cox’s Orange Pippin may be , possibly crossed with . Is this reasonable? No. Firstly note that both those suggested parents are triploid, so most unlikely that both could be true as one parent has little viable pollen and the other produces weak seeds, secondly the SNP study mentioned above has found one parent: Margil. It was known in England before 1750 according to the National Apple Register, whereas Cox’s Orange Pippin was raised about 1825.

Margil has a lovely synonym, Fail-me-Never.

Of those with parents wrongly assigned, DNA SSR showed 73 trios for which alternative pairs of parents seemed more probable. Overall 87% of the 181 end up having plausible parents.

Suggested parentage of varieties in GOT’s collection from fingerprint results

A total of 148 DNA samples submitted by GOT have been investigated, in total 148 samples of 94 different varieties.

Both the study with varieties in the DNA SNP dataset and those of the plant breeders have explicitly selected cultivars that are likely to have parent-progeny relationships. If anything, as with the MAN dataset, GOT samples is biased away from ‘normal’ cultivars to rarer varieties and ones that may be seedlings, or we haven’t (yet) found their parents.

Of the 148 samples with DNA there are about 70 varieties for which no suggestion has been made for either parent, usually because none seem remotely likely. About one-third may have parents revealed, well at least to some level of confidence (yes, some are really doubtful). number varieties DNA samples parentage 148 102 having high confidence 3 2 that are probable 8 6 that are possible 28 19 that are unlikely 6 5 no confidence at all 103 70

Just as was found among MAN’s samples, ones for which parents could not be identified were in the majority. The parents aren’t in our dataset (yet). Perhaps they are lost, or not yet found, or not yet fingerprinted, or perhaps elsewhere in the world awaiting connections to be made. However, it does seem likely that many parents were themselves ephemeral seedlings, and our collections hold seedlings of seedlings (of seedlings of…). That is perhaps the most obvious statement which can be made. It all sounds a bit like wheels within wheels, doesn’t it?

Examples of parent-progeny relationships

A few examples from among the samples submitted by GOT are described below. In some cases there are photographs available, with thanks to Jim Chapman, John Savidge and the NFC.

Emneth Early, , Lord Grosvenor and Lord Suffield. The SNP study showed that Emneth Early (1899), Grenadier (1862) were progeny of Hawthornden (1780) x Keswick Codlin (1793); now it is seen that so too are Lord Grosvenor (1872) and Lord Suffield (1836). All are diploids. This is a rather neat conclusion and looks consistent with morphology. Since this was written I have seen that Gold Medal may also be a sibling, as alternative parents are likely excluded because they were bred after Gold Medal (1882).

Falstaff A2323(diploid,196) from (diploid,1893) x Golden Delicious (diploid 1890).

Fon’s Spring A228 (diploid,1948) from John Standish (diploid,1873) x Cox’s Orange Pippin (diploid,1825). It is either a synonym or sport of Eden, as suggested by Muriel Smith in the NAR, or a sibling as given by Joan Morgan, since both have the same parents.

Now let’s look at the DNA of these parents and progeny. For ease of reading (!) it’s been split into two blocks. The second and third rows of each block are the fingerprints of parents, and Eden and Fon’s Spring are shown in the fourth and fifth rows. Both progeny contain one or other of the alleles (which here are PK1 or PK2) in their parents. So both can be progeny of these parents. If they are synonyms or sports of one another, their fingerprints should be the same. They aren’t. Only 16 of 24 non-zero alleles are the same. They are siblings.

Cultivar - test for parents- progeny and sibling relationships

CH04c07_PK1 CH04c07_PK2 CH04c07_PK3 CH04c07_PK4 CH01h10_PK1 CH01h10_PK2 CH01h10_PK3 CH01h10_PK4 CH01h01_PK1 CH01h01_PK2 CH01h01_PK3 CH01h01_PK4 Hi02c07_PK1 Hi02c07_PK2 Hi02c07_PK3 Hi02c07_PK4 CH01f02_PK1 CH01f02_PK2 CH01f02_PK3 CH01f02_PK4 CH01f03b_PK1 CH01f03b_PK2 CH01f03b_PK3 CH01f03b_PK4 Cox's Orange Pippin (Vinson) 106 112 0 0 96 0 0 0 117 129 0 0 116 150 0 0 203 205 0 0 158 0 0 0 John Standish 120 133 0 0 88 103 0 0 129 0 0 0 116 118 0 0 178 193 0 0 170 178 0 0 Eden 112 133 0 0 88 96 0 0 129 0 0 0 116 118 0 0 193 205 0 0 158 178 0 0 Fon's Spring 112 120 0 0 96 103 0 0 129 0 0 0 116 150 0 0 193 203 0 0 158 178 0 0

Cultivar - test for parents- progeny and sibling

relationships GD12_PK1 GD12_PK2 GD12_PK3 GD12_PK4 GD147_PK1 GD147_PK2 GD147_PK3 GD147_PK4 CH04e05_PK1 CH04e05_PK2 CH04e05_PK3 CH04e05_PK4 CH02d08_PK1 CH02d08_PK2 CH02d08_PK3 CH02d08_PK4 CH02c11_PK1 CH02c11_PK2 CH02c11_PK3 CH02c11_PK4 CH02c09_PK1 CH02c09_PK2 CH02c09_PK3 CH02c09_PK4 Cox's Orange Pippin (Vinson) 148 153 0 0 139 150 0 0 173 198 0 0 254 0 0 0 215 0 0 0 232 256 0 0 John Standish 148 153 0 0 131 147 0 0 173 0 0 0 212 228 0 0 205 0 0 0 232 242 0 0 Eden 148 153 0 0 147 150 0 0 173 0 0 0 212 254 0 0 205 215 0 0 242 256 0 0 Fon's Spring 148 153 0 0 131 150 0 0 173 198 0 0 228 254 0 0 205 215 0 0 232 242 0 0

Tewkesbury Baron A1154 (diploid,1883) from Devonshire Quarrenden (diploid,1678) x Rushock (diploid,1821), though there a mismatch in one allele of 2 base pairs, considered within experimental margins (see below).

An unnamed variety from Robinswood, A728, (diploid) maybe from Maidstone Favourite (diploid,1913) x Newton Wonder (diploid,1887)

Lodgemore Nonpareil (diploid,1808) is likely from Nonpareil (diploid,1500s) as one parent. The second might be Brown Crofton (diploid,1835) if the latter was raised at least 50 years earlier and had a connection nearer to Gloucestershire than Ireland.

This next one gave me some pleasure to unravel. A2975, Chapman’s Delight a diploid, is a seedling planted 15 years ago as part of a Countryside Environmental Scheme. It is a cross between Gloster 69 (1951) and Golden Delicious (diploid,1890), just a pity that Gloster 69 doesn’t hale from Gloucestershire! To be fair, there is a tiny mismatch of 1 base pair between

Chapman’s Delight and A2975, which is within reasonably likely experimental error.

And it highlights the limitation of this method, when is the DNA of the progeny unlikely to have been the result of a random scrambling of the DNA of its parents? At the last Register of Local Cultivars meeting at Reading University, it was suggested that just one mismatching marker-pair with difference of 2 base pairs was about the limit of what might be expected within the high quality dataset being assembled by East Malling Research. This experimental uncertainty is broadly supported by the comparison between the high certainty of parents revealed from the whole genome SNP study mentioned above and the simple DNA fingerprint method. There are a few cases where the fingerprint needs a small shift in a single allele DNA mismatch of 1-4 base pairs to reveal the same parents found by the SNP study.

Rubens, A254 & A727, is an apparent exception where the mismatch was on eight marker-pairs, making parentage impossible! It seems that Rubens and Saltcote Pippin were at some stage interchanged by the NFC/EMR; both have Cox’s Orange Pippin as one parent. Rubens should have Reinette Rouge Étoilée as its second parent, while Saltcote Pippin has Knobby Russet.

Finally, there is the intriguing case of triploids, varieties that have three sets of the 17 apples chromosomes, two sets from one parent and one from the other. It seems that the Hungarian variety Herceg Batthyanyi Alma, which is diploid, has produced quite a few triploid progeny, these contain all DNA of one or other diploid parent. In this case triploid progeny may include Carter's Pearmain, , Fremy, Lady Hopetown, , , NFC 1947-076… and GOT A724. Other cultivars that have produced triploid progeny are , Dutch Mignonne, Nonpareil and Cox’s Orange Pippin.

In summary Assessing parentage from DNA fingerprints was previously thought a step too far; I hope you are now minded to believe it may be just about possible for some. Recall though that it is effective at refuting implausible parentage; another example where it is easier to demolish than create.

No worry, there are still another 3000 varieties in the NFC/fruitID inventory for you to work on.

Both Peter Laws and I record our fulsome thanks to Dr Matthew Ordidge, Curator of the National Fruit Collection, for kindly technical advice and encouragement along our journeys.

Stephen Ainsleigh Rice Marcher Apple Network 15th September, 2020 [email protected]