<<

UvA-DARE (Digital Academic Repository)

DNA markers for forensic identification of non-human biological traces

Wesselink, M.

Publication date 2018 Document Version Final published version License Other Link to publication

Citation for published version (APA): Wesselink, M. (2018). DNA markers for forensic identification of non-human biological traces.

General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).

Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.

UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl)

Download date:26 Sep 2021

DNA markers for forensic identification

of non-human biological traces

Monique Wesselink

DNA markers for forensic identification

of non-human biological traces

ACADEMISCH PROEFSCHRIFT

ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam op gezag van de Rector Magnificus prof. dr. ir. K.I.J. Maex ten overstaan van een door het College voor Promoties ingestelde commissie, in het openbaar te verdedigen in de Agnietenkapel op donderdag 26 april 2018, te 12:00 uur

door

Monique Wesselink

geboren te Zwolle

Promotiecommissie:

Promotor: Prof. dr. A.D. Kloosterman Institute for Biodiversity and Ecosystem Dynamics Universiteit van Amsterdam

Copromotor: Dr. I. Kuiper Niet-Humane Biologische Sporen Nederlands Forensisch Instituut

Overige leden: Prof. dr. A.M.T. Linacre Biological Sciences Flinders University

Prof. dr. mr. M.E. de Meijer Amsterdam Centre on the Legal Professions Universiteit van Amsterdam

Prof. dr. M. Schilthuizen Institute of Biology Leiden Universiteit Leiden

Prof. dr. M.J. Sjerps Korteweg-de Vries Institute for Mathematics Universiteit van Amsterdam

Prof. dr. P.H. van Tienderen Institute for Biodiversity and Ecosystem Dynamics Universiteit van Amsterdam

Faculteit : Faculteit der Natuurwetenschappen, Wiskunde en Informatica

The research described in this thesis was performed within the Non-Human Biological Traces group of the Netherlands Forensic Institute.

Printing of this thesis was financially supported by the Netherlands Forensic Institute, the Co van Ledden Hulsebosch Centrum (CLHC), Amsterdam Center for Forensic Sciences and Medicine and the Institute for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam.

DNA markers for forensic identification of non-human biological traces Copyright © 2018 Monique Wesselink

PhD thesis, University of Amsterdam (IBED), The Netherlands ISBN/EAN 978-94-91407-58-1

Cover design the author Cover image Vikpit Printing GildePrint

Table of contents

Outline of this thesis 9

Chapter 1 13 Molecular species identification of “Magic

Chapter 2 39 Forensic utility of the feline mitochondrial control region - A Dutch perspective

Chapter 3 53 Forensic analysis of mitochondrial control region DNA from single cat hairs

Chapter 4 57 Local populations and inaccuracies: Determining the relevant mitochondrial haplotype distributions for North West European cats

Chapter 5 75 DNA typing of birch: Development of a forensic STR system for Betula pendula and Betula pubescens

Chapter 6 103 The forensic potential of DNA typing of birch (Betula) seeds

Chapter 7 115 General discussion

Epilogue 127 Summary 129 Samenvatting 133 Overview of author contributions 137 Acknowledgements 141

7

Outline of this thesis

M. Wesselink and I. Kuiper

The majority of investigative questions encountered in forensic case work involve a certain level of classification, often referred to as ‘source level’ questions. Such questions can seem quite trivial, examples being “Is this blood human?”, “Is this white powder cocaine?” and “Are these wounds caused by a baseball bat?”. However answering such questions is generally far from a trivial matter. Apart from the specific biological, chemical or physical knowledge that is required to answer such questions, additional knowledge of for example the relevant classes, within and between class variation, and reliability of the measurements and features under investigation are always needed to reliably provide an answer that may be used in a criminal proceeding. After addressing the questions of classification, additional knowledge of how traces or specific features are transferred from one object to another, and how these persist after transfer has occurred, can further influence the certainty with which forensic questions such as ‘activity level’ questions may be answered.

One of the first steps in providing an answer that may be used in a criminal proceeding, is to determine the relevant and reliable level of classification, where relevant and reliable often are far from identical. Relevant classes may be very narrow, in which cases the relevance of such investigative questions is readily seen (one specific chemical compound in “Is this white powder cocaine?”). Alternatively the classes may be broader, in which cases additional (case) information may be required to correctly see the relevance (the biological species human in “Is this blood human?” and the group of bats designed to play baseball in “Are these wounds caused by a baseball bat?”). Independent of whether the defined groups are narrow or broad, once the desired level of classification has been determined, the next challenge to define whether class specific characteristics are expected to exist. Biological groups that have a common ancestor (monophyletic groups such as all individuals of a species) or chemical compounds that are derived from a certain precursor (such as certain opiates) are more likely

9

to exhibit readily recognizable class specific features, than paraphyletic groups that are better described by use (such as bats used to play a specific ball game; remains from wooden (baseball/softball/cricket) bats may be easier to discriminate from remains from metal bats than distinguishing between remains from baseball bats in general and remains from all other bats). When class specific characteristics are expected to exist or can theoretically be distinguished, irrespective of whether these are based on usage or origin, the next effort will be to determine whether identification of these characteristics is practically possible in the relevant groups, more specifically whether the identified characteristics can reliably be measured in forensic case samples. If again this question can be answered affirmatively, analyzing test samples, creating databases, and sharing this data with the forensic community for review and comparison are generally the final steps that precede application of the identified characteristics in case work.

When biological traces are under investigation, morphological traits and DNA characteristics are generally utilized. Although morphological classification has many benefits, its application is limited in cases when often only small or poorly conserved traces are available. Additionally, the number of species for which classification at the level of a single organism is possible based solely on morphological features is limited, even when relative large samples are available. Since analysis of DNA markers has become common practice, identification of individuals of at least one species (Homo sapiens) is well known in forensics, illustrating the forensic potential of such markers. Comparable methods may be designed for other species, but biological factors such as reproductive strategies (e.g. sexual, self-fertilizing or clonal) may influence their application. Another biological factor that influences both the technical applicability of DNA markers as well as the interpretation of results, is the (cellular) origin of the DNA markers. The different types of DNA present in cells (i.e. autosomal, mitochondrial, chloroplast) have their own potentials and pitfalls that should be taken into account when determining whether a marker should be pursued for a forensic application. As the history and relatedness of the DNA of relevant classes can often be studied theoretically, the existence of characteristics that are relatively constant within a class (the intraclass variation) but differ between classes (interclass variation), can often be predicted. From a biological point of view, if a certain species is the class of interest, a valuable characteristic for identification would display a small intraspecies variation and a larger interspecies variation, enabling one species to be distinguished from other species.

In this thesis, DNA markers are described that enable forensically relevant classification of three groups of non-human biological traces: (A) fungi (chapter 1), (B) domestic cats (chapters 2, 3 an d 4) and (C) birch trees (chapters 5 and 6). Because the forensic questions associated with these traces require different levels of classification and the theoretical availability of class specific characteristics varies between the groups studied in these chapters, the different chapters of this thesis illustrate the variation in levels of classification. Additionally, the inheritance of the DNA available for forensic investigations and the reproductive strategies

10 Outline of this thesis differ between the studied species, thereby influencing the value of being included as member of a certain class. Determining the forensic value of inclusion into a class is therefore included in all chapters and varies considerably.

Chapter 1 of this thesis describes the search for a DNA marker that is capable of discriminating between species of mushrooms that are controlled by law and their neighbor species that are not controlled by law. Although from a legal point of view these are distinct classes, these groups are not necessarily biologically distinguishable due to the paraphyletic nature of the classes (comparable to the group of bats designed to play one specific ball game). Chapter 1 focusses on the search for class specific DNA markers for the relevant species, after which the testing of these markers is described and an ideal marker is proposed.

The DNA markers studied in chapter 2 have the potential to discriminate between groups of individuals within the species Felis catus (domestic cat). Identification of only the species Felis catus is not often of forensic interest as such. However when cat hairs are encountered as trace evidence, distinguishing between for example the victim’s cat and the suspect’s cat can be highly relevant, potentially providing a link to a crime scene. In chapter 2, the value of several potentially distinguishing characteristics within the domestic Dutch cat population are described, and a marker is proposed for forensic use. Due to apparent differences between the Dutch cat population and other cat populations described in literature, testing of additional samples from other origins is advised.

Adaption of the laboratory procedures used to reliably characterize the DNA markers proposed in chapter 2 not only on high quality samples, but also in forensic case samples such as single shed undercoat hairs is described in the technical note included as chapter 3.

Chapter 4 addresses the testing of Belgian, German and additional Dutch cat samples with the DNA markers proposed in chapter 2, through the improved method described in chapter 3, and incorporates published data from the of America, Canada, the United Kingdom and Poland. The DNA marker proposed in chapter 2 is considered sufficient for the desired discrimination in North West continental European cats and superior to other described markers. However, major differences between the North West continental European cat population and cats from the United Kingdom, Canada and parts of the United States of America are recognized, leading to the conclusion that although a suitable marker has been found, case specific testing of relevant cat populations may be necessary prior to application in case work especially in regions and localities not covered by the present studies. Additional analysis of the large combined dataset gave rise to certain technical recommendations to improve the nature of the data recorded and published.

Analogous to the forensic relevance of distinguishing between cat hairs, indicating that certain botanical traces may have originated from a specific tree can provide information about

11

someone’s prior whereabouts. Chapter 5 describes the identification and technical validation of DNA markers suitable to distinguish between individual birch (Betula) trees. Testing these DNA markers on larger numbers of individuals of the two species of birch naturally occurring in the Netherlands, showed greater differences between these two species than between the different locations in which the trees had been sampled, calling for the recording of population data in two different databases. As one of the two species is a diploid species, and the other a tetraploid species, different models to estimate the evidential value of a DNA profile are considered for these two species of trees. Apart from this genetic influence on the estimation of the evidential value, the natural versus human influenced propagation of trees is found to be a factor that needs consideration.

Refinement of the method developed in chapter 5 to enable application not only to botanical debris as leaves and twigs but also to seeds is described in chapter 6. Although the DNA markers advised for the DNA typing of birch are no different than the markers described in chapter 5, all technical components involved in obtaining and interpreting a DNA profile have been reconsidered and altered to enable efficient typing of birch seeds allow comparison to profiles from reference trees. The application of this method to an actual forensic case is described.

In chapter 7, the findings described in chapters 1-6 are discussed both from an applied forensic biological perspective, but also integrated with perceptions from molecular biology, and ecology on one hand, and law, policy making and criminalistics on the other. The scientific advances described in this thesis, and how these could influence court proceedings will also be touched upon.

12

Chapter 1

Molecular species identification of “Magic Mushrooms”

M. Wesselink, Esther M. van Ark and I. Kuiper

Abstract The use of DNA markers for the identification of fungal species has been described for diverse applications. However, the ideal marker for forensic identification of hallucinogenic species of fungi has been the cause of debate. As species identification of seized samples is required for law enforcement, and seizures of magic mushrooms often lack the morphological features required to identify the species, a better understanding of the performance of different DNA markers is necessary. Markers ITS and LSU were sequenced from authenticated specimens of several , and species, as well as from seized samples and samples sold in shops prior to the banning of these mushrooms in the Netherlands. Sequences were obtained for both markers from all samples. However due to amplicon length, ITS is expected to outperform LSU when DNA degradation of samples has occurred. Inter and intraspecies variation was calculated based on the obtained sequences, complemented with all Psilocybe, Deconica and Panaeolus ITS and LSU sequences present in GenBank. Use of these sequences revealed the presence of several incorrectly labeled or misidentified sequences, demonstrating both the pitfalls and value of such public databases. The between species variability of markers ITS1 and ITS2 is demonstrated to be far greater than that of marker LSU, caused by both differences in sequence length and composition. The vast majority of intraspecies variation detected for Psilocybe and Deconica species was due to single differences in nucleotide composition, or insertion/deletion of single nucleotides. Of the studied DNA markers, the complete ITS1-5.8S-ITS2 region was shown to be the most informative for identification at the species level, although several groups of closely related species were found that could not be distinguished due to insufficient interspecies variability. A comparable performance was noted for markers ITS1 an ITS2 separately. The interspecies variability of marker LSU was lower than of the ITS regions, rendering this region less suited for species identification.

13 Chapter 1

1. Introduction For centuries the hallucinogenic properties of certain species of plants, animal and fungi have been known to man, such species being appreciated for their value in ritual and medicinal proceedings (reviewed in [1]). In more recent times, some of these species of fungi have reached a wider audience and have become more widely used as drugs of abuse [1,2]. Although the health risks of using such “magic mushrooms” are not considered to exceed the risks of for example drinking alcohol [3,4], most countries have some form of legislation in place to prevent wide scale production, sales and consumption of hallucinogenic mushrooms. Regulation of the pure chemicals responsible for the hallucinogenic properties of most “magic mushrooms” (MM), psilocin and is clear as these are prohibited in most UN member countries after being placed on Schedule 1 of the United Nations Convention on Psychotropic substances. Regulation of the capable of producing these components is far more diffuse, as regulations can depend on the phase of the lifecycle that is encountered (, mycelium, sclerotia and fruiting bodies), on the species of fungus that is encountered, on the intrinsic capability of the encountered material to produce the regulated components and in some cases on a combination of the above (e.g. [5,6]). Although many fungal species in the order have been recognized that can produce hallucinogenic components, the majority of the abused species of “magic mushrooms” belong to the fungal genera Panaeolus and Psilocybe [2]. In the recent past, the Psilocybe consisted of species with and without hallucinogenic capacities, with the non-hallucinogenic Psilocybe montana being the lectotype. Since the proposal of taxonomical redesignation of the genus Psilocybe, the name Psilocybe has been given to the species with hallucinogenic properties ( as type), whilst species without these properties have been placed in the genus Deconica [7,8].

To identify forensically relevant samples, from a scientific point of view, identification of the genus Psilocybe would be sufficient. However in many countries, correctly identifying a species as opposed to this genus, is still necessary from a legal point of view as legislation precedes this scientific advancement, and identification of the species of fungal material may be required for prosecution. In the Netherlands for example, fruiting bodies (mushrooms) of 188 species of fungi, mainly belonging to the genera Psilocybe and Panaeolus, have been forbidden by law on December 1st 2008 [5]. The import, export, trade, cultivation and possession of mushrooms of these species has thereby become illegal, calling for a robust technique to correctly identify samples of these species, to enable distinction between these and other (legal) species of mushrooms for regulatory purposes and to prosecute offenders. When complete fungi in an informative phase of their lifecycle are the subject of investigation, morphological identification of the species can be performed. However other lifeforms of fungi (such as sclerotia or mycelium) or material that has been dried, shredded or otherwise treated, are not easily identified morphologically and may require either cultivation or molecular identification. As the majority of seized samples have been dried, grinded or powdered, DNA based

14 Molecular species identification of “Magic Mushrooms” identification of forensic samples seems a potent method, as has been described repeatedly for animal and plant samples [e.g. 9-13].

In fungi, DNA markers have been applied for several distinct purposed, each imposing different requirements on the DNA markers selected. In phylogenetic studies of fungi, the internal transcribed spacers (ITS1 and/or ITS2) and the ribosomal large subunit (LSU) are used as markers, where ITS is mainly used to study relationships at the species/genus level, and LSU is generally used to study genus/family level relationships [14-17]. In many fungi, including the order Agaricales, the internal transcribed spacers display variations not only in sequence composition, but also in sequence length, which hampers meaningful alignment of these sequences, leading these sequences to be omitted from comparative analyses (e.g. [16- 18]).

Studies describing the DNA based identification of fungi from a DNA barcoding or species descriptive perspective, generally rely on markers ITS2 and/or ITS1, supplemented with additional markers for certain orders or families of fungi [17-18]. In these studies, ease of alignment of sequences is of less importance than a (much) larger between species variation than within species variation, the ‘barcoding gap’. Length variations that may hamper alignment efficiency often even increase the informative value of a species identification marker by adding many unique characters. Apart from the sequence information, robust amplification and sequencing conditions throughout a large group of organisms (i.e. fungi) with ‘universal’ primers is of importance for DNA based identification efforts. Additionally the availability of reliable reference sequences is a valuable criterion, although additional databases can obviously be built for specific purposes. Several studies have focused on the DNA based identification of specific species of (hallucinogenic) fungi for forensic applications [19-22]. Identification of magic mushrooms has been reported favoring either marker ITS [19,21,22] or LSU [20,21]. However as some of the results of these studies seem contradictory, there is no consensus yet on which marker is most suitable for MM identification for forensic applications [23]. Obviously to enable the use of DNA based identification of seized fungal samples, discussions concerning the applied DNA markers are undesirable. Therefore this study aims to clarify the differences between previously published (forensic) studies, and explore whether marker ITS1 and ITS2 together, or single markers ITS1 or ITS2 are the most appropriate for forensic identification of magic mushrooms or whether these spacer regions should be abandoned in favor of marker LSU. To this extent, these markers were examined extensively for their amplification and sequencing ease in both reference samples and seizures of MM, the extent of intra and interspecific variation and the presence of reliable reference sequences in the public database GenBank [24].

15 Chapter 1

2. Material and methods 2.1. Sample material DNA extracts of seven specimens of Psilocybe, one specimen of Deconica and one specimen of Panaeolus were purchased from the Centraalbureau voor Schimmelcultures, Utrecht, the Netherlands. A single Deconica tissue sample was obtained from Gent Herbarium, Belgium (Table 1). Two species of Psilocybe mushrooms were obtained from a store before December 2008, as were two species of Panaeolus mushrooms. Three Canadian accessions of Psilocybe cubensis, Deconica montana and Panaeolus sphinctrinus kept at Canadian National Mycological Herbarium (DAOM) could not be used for additional DNA extraction or sequencing, but resequencing of these samples was performed in Canada and sequences kindly communicated by Scott A. Redhead (Canadian Collection of Fungal Cultures). In a later stage of the study, ten unidentified seizures were used that were submitted to our laboratory by the National Police of the Netherlands for either species identification or identification of chemical substances. These seizures consisted of dried mushrooms, fragments of mushrooms, grinded material and sclerotia.

Table 1. Psilocybe, Deconica and Panaeolus samples used in this study

Sample name IDa GenBank accession nr

Psilocybe Ps caerulescens CBS strain 837.87 HM035072 Ps coprophilla CBS strain 417.82 HM035073 Ps cubensis CBS strain 140.85 HM035074 Ps cubensis CBS strain 590.79 HM035075 Ps cyanescens CBS strain 295.94 HM035076 Ps mexicana CBS strain 831.87 HM035077 Ps montana CBS strain 101791 HM035078 Ps montana Gent 3330 HM035079 Ps semilanceata CBS strain 101868 HM035080

Deconica D montana CBS strain 101791 HM035078 D montana Gent 3330 HM035079

Panaeolus Pa sphinctrinus CBS strain 582.79 HM035081

Market samples “Ps cubensis ” Production line 1.002 HM035082 “Ps mexicana ” Production line 9596 HM035083 “Pa cyanescens” Production line 1.007 HM035084 “Pa” Production line 1.045 HM035085 a CBS: Centraalbureau voor Schimmelcultures (Utrecht, the Netherlands); Gent: Herbarium Universitatis Gandavensis (Gent, Belgium).

16 Molecular species identification of “Magic Mushrooms”

2.2. Extraction of DNA and quantification Fungal tissue was grinded using a TissueLyser (Qiagen), after which DNA was extracted using the DNeasy Plant Mini kit (Qiagen) following the manufacturers’ protocol. Total DNA concentration was estimated using NanoDrop ND-1000 (NanoDrop Technologies).

2.3. Amplification, visualization and sequencing ITS was amplified using the fungal specific forward primer ITS1-F [25] and the universal reverse primer ITS4-R [26] (Figure 1). PCR reactions were performed in 25 μl reaction mix containing either (I) 1x reaction Buffer II, 3 mM MgCl2, 0.2 mM each dNTP, 1 unit of AmpliTaq Gold (all Applied Biosystems), 20 pmol each primer (Biolegio, the Netherlands), 5 μg BSA (New England Biolabs) and < 10 ng of template DNA or (II) 1x HotStarTaq PCR reaction mix including 1.5 mM MgCl2, 1.5 mM additional MgCl2, 0.2 mM each dNTP, 0.5 units of HotStarTaq (all Qiagen), 10 pmol each primer (Biolegio, the Netherlands) and < 10 ng of template DNA. LSU was amplified using universal fungal primers 5.8SR and LR7 [27] (Figure 1). PCR reactions were performed in 25 μl containing either (I) 1x reaction Buffer II, 2.5 mM MgCl2, 0.2 mM each dNTP, 1 unit of AmpliTaq Gold, 5 pmol each primer and <10 ng of template DNA or (II) 1x HotStarTaq PCR reaction mix including 1.5 mM MgCl2, 1 mM additional MgCl2, 0.2 mM each dNTP, 0.625 units of HotStarTaq (all Qiagen), 5 pmol each primer (Biolegio, the Netherlands) and <10 ng of template DNA. PCRs were performed on MyCycler (BioRad Laboratories, CA, USA) or GeneAmp 9700 (Applied Biosystems) thermocyclers using the following cycling parameters for ITS: (I) 10 min denaturation step at 94 °C; 32 cycles of 94 °C for 1 min, 50 °C for 1 min, and 72 °C for 2 min; 10 min elongation step at 72 °C or (II) 15 min denaturation step at 95 °C; 35 cycles of 94 °C for 1 min, 50 °C for 1 min, and 72 °C for 1 min; 10 min elongation step at 72 °C. For amplification of LSU the following cycling parameters were used: (I) 10 min denaturation step at 94 °C; 35 cycles of 94 °C for 1 min, 55 °C for 1 min, and 72 °C for 2 min; 10 min elongation step at 72 °C or (II) 15 min denaturation step at 95 °C; 35 cycles of 94 °C for 1 min, 55 °C for 90 s, and 72 °C for 90 s; 10 min elongation step at 72 °C.

Figure 1. Relative positions of DNA markers and primers (arrows) used in this study.

PCR success was determined by gel-electroforesis followed by ethidium bromide coloring and UV detection. Positive and negative controles were used throughout. PCR products were sequenced using the abovementioned primers ITS1-F and ITS4-R for ITS and primers 5.8SR, LR0R, LR16 and LR7 for LSU [27,28] either commercially (BaseClear B.V., Leiden, the Netherlands) or in house (purification of PCR products with ExoSAP-IT® (Affymetrix) following the manufacturer’s protocol, cycle sequencing with 0,5-2 μl of purified PCR product

17 Chapter 1

in a 10 μl sequencing reaction containing 1x BigDye sequencing buffer, 0.2x BigDye Terminator v3.1 Ready Reaction Mix (Life Technologies) and 2.5 pmol of primer, for 1 min at 96 °C; 28 cycles of 95 °C for 10 s, 50 °C for 5 s and 60 °C for 4 min in the abovementioned thermocyclers, removal of residual sequencing components with BigDye XTerminator® Purification Kit following the manufacturer’s protocol, separation of sequencing products on an AB3500 XL genetic analyzer containing 50 cm capillaries using POP-7TM polymer (all Life Technologies).

2.4. Data analysis Evaluation of sequences and assembly of forward and reverse sequences was performed with Geneious 4.7.6 [29]. Sequence data was compared with GenBank [24] through BlastN in March 2014. To compare sequences from this study to previously published sequences and sequences submitted to GenBank, a local copy was made of all ITS1, 5.8S, ITS2 and LSU sequences named Psilocybe, Deconica or Panaeolus present in GenBank on March 23rd 2014. For readability, all species considered a (non-potent) Deconica species [18,24,30-32], will be referred to as Deconica, independent of whether these sequences were originally deposited in GenBank as a Psilocybe or Deconica sequence. To further investigate a peculiarity, three ITS1, 5.8S, ITS2 sequences named clavata were additionally downloaded. Several sequences were removed from the dataset prior to analyses; a sequence containing more than 25 N’s (GU565174), ITS1-5.8S-ITS2 sequences that differed from other sequences from that genus in the 5.8S region by more than 5 nucleotides (EU870081, KF429543, JF961375 and JF961370), LSU sequences shorter than 500 nucleotides or containing only other LSU regions than the studied 5’ D1-D2 region (multiple) and one accession that differed from all other Psilocybe, Deconica and Panaeolus sequences in LSU length and sequence (FJ755221). Alignments of ITS1 and ITS2 sequences were manually edited in Geneious to accommodate length differences; the readily alignable ‘conserved’ 5’ and 3’ regions were not edited, the highly variable regions were added to the 5’ ‘conserved’ region thereby reflecting sequence length more than sequence composition. The obtained alignments were used to create maximum likelihood trees in Mega 5.2.2 [33] (using the Tamura-Nei model evolutionary distances with invariable sites [34], 1000 bootstrap replicates, without deletion of missing data/gaps) and calculate pairwise distances between sequences (Tamura-Nei model [34], with gamma distribution, pairwise removal of ambiguous positions). The distance matrices were ordered by species and genus to enable selection of intraspecies distances, interspecies but intragenus distances and intergenus distances.

18 Molecular species identification of “Magic Mushrooms”

3. Results and discussion 3.1. Amplification of ITS and LSU Markers ITS and LSU were amplified directly in known samples and seizures with the ‘universal’ primers and both conditions described. Amplicons of roughly 500 to 800 bp voor ITS and 1800 bp for LSU were obtained. Although differences have been noted in PCR performance of these markers [16,17], such differences are not expected in the studied species as these are closely related. Positive PCR results were obtained with as little as 0.5 ng total DNA as template per reaction. Whilst sample or DNA availability was not limiting in amplification of known samples, the detection of such small amounts of DNA is of great value in forensic investigations where samples are often ill conserved, powdered, or otherwise formulated leading to degradation of DNA. This is demonstrated by amplification of both markers in the investigated seizures.

3.2. ITS and LSU sequence data Using the described primers, markers ITS1, 5.8S, ITS2 and LSU (1st 900 nucleotides, variable regions D1 and D2) were sequenced for all samples (see Table 1 for accession number of sequences deposited in GenBank). The sequence of the complete ITS amplicon was obtained in a single sequencing reaction, whilst at least three sequencing reactions were needed to obtain the complete LSU sequence (see Figure 1 for primer locations). Neither long mononucleotide stretches nor other challenging internal structures were encountered. The sequences of all samples were aligned, both pairwise and as a group. The LSU region of the sequences was easily aligned for all Psilocybe, Deconica and Panaeolus species. The 5.8S region of all sequences was also readily aligned for all tested species. This is in agreement with the nature of the LSU and 5.8S regions, as these parts of the DNA are under evolutionary constraint as has been demonstrated in large scale studies of fungi [17]. Alignment of ITS1 and ITS2 of different samples of the same species was also straightforward as no large length differences were observed within a species. Alignment of ITS1 and ITS2 of samples of different species proved more troublesome, as not only nucleotide- but also length differences were present between the different species. This effect was more pronounced in ITS1 than in ITS2. However by aligning the sequences as described reflecting sequence length differences, it was possible to further investigate the differences between the different regions.

Between the two ITS1-5.8S-ITS2-LSU sequences of Psilocybe cubensis that were generated from authenticated samples (HM035074 and HM035075), the market sample sold as Psilocybe cubensis (HM035082) and multiple seized samples, no differences in sequence length or composition were detected, demonstrating absence of intraspecies variability at least within the available Psilocybe cubensis samples. The sequences of the two Deconica montana strains (HM035078 and HM035079) differed from each other by one position in ITS1 and one position in ITS2. No differences were detected between the 5.8S or LSU regions of these sequences, illustrating minor intraspecies variability in Deconica montana. Comparison of the sequence obtained from

19 Chapter 1

the authenticated Psilocybe mexicana strain (HM035077), the market sample sold as Psilocybe mexicana (HM035083) and all seized sclerotia demonstrated the presence of three variable positions in ITS1, two in ITS2 and none in both 5.8S and LSU. No intraspecies length differences were observed for these three species. Through interspecies comparison of pairs of sequences both large differences in sequence length and sequence composition were detected.

3.3. BLAST results The sequences obtained from the known samples (Table 1) were compared to all sequences present in NCBI GenBank to find the deposited sequences that were most similar at both the individual nucleotide and length level. These comparisons were hampered by the fact that only very few ITS1-LSU sequences were present in GenBank, and that blasting our long ITS1-LSU sequences therefore automatically led to a high bit-score (and low E-value) with the long sequences present, even though the actual percentage of similarity was too low to reflect sequences from the same species. By dividing our sequences into smaller portions that correspond to portions more frequently deposited in GenBank (i.e. ITS1, ITS1-5.8S-ITS2 and LSU separately), this artifact was circumvented. However comparison of these different regions led to the finding of significantly different results for the different regions of the Psilocybe cubensis, Deconica montana and Panaeolus sphinctrinus sequences.

The ITS1 region of the Psilocybe cubensis sequences HM035074 and HM035075 obtained in this study were identical in length and sequence composition to ITS1 sequences deposited in GenBank with species descriptions Psilocybe cubensis (four accessions), Deconica montana (AY129360), Panaeolus sphinctrinus (AY129348) and Galerina clavata (AY281021). Moreover the ITS1 region of the Psilocybe cubensis sequences HM035074 and HM035075 differed from two other sequences deposited in GenBank as Psilocybe cubensis (AY281023 and AY129351) both in length and sequence. Comparison of the ITS1-5.8S-ITS2 region of the Psilocybe cubensis sequences HM035074 and HM035075 to GenBank only returned sequences with identical length and composition that had been deposited as Psilocybe cubensis (two accessions). Comparison of the LSU region of these Psilocybe cubensis sequences returned a large number of hits with sequences of the same length. The accessions in the database with the most comparable sequence have been deposited as Psilocybe cubensis (three accessions) or the closely related Psilocybe subcubensis (two accessions).

Comparable results were found for the Deconica montana sequences HM035078 and HM035079. ITS1 sequences with comparable length and sequence present in GenBank have been deposited as Deconica montana (four accessions) or Psilocybe cubensis (AY129351). ITS1 accession AY129360, deposited as Deconica montana, differed in both length and sequence composition from the ITS1 region of HM035078 and HM035079. However all sequences present in GenBank matching the ITS1-5.8S-ITS2 region or the LSU region have been deposited as Deconica montana or closely related Deconica species.

20 Molecular species identification of “Magic Mushrooms”

For the ITS1 region of Panaeolus sphinctrinus sequence HM035081, all sequences with comparable length and sequence present in GenBank have been deposited as Panaeolus sphinctrinus. One ITS1 accession named Panaeolus sphinctrinus (AY129348) differed in both length and sequence from the ITS1 region of HM035081. All accessions matching the ITS1- 5.8S-ITS2 region or the LSU region have been deposited as Panaeolus sphinctrinus or closely related Panaeolus species.

These results may indicate that marker ITS1 is insufficient to correctly identify certain species of interest due to absence of a barcoding gap as identical sequences are found in (at least) four genera, as suggested by Nugent and Saville [20]. To further investigate the similarities and differences between the different markers in these genera, especially as the sequences causing this finding have been generated in two studies [20,35], all ITS1, ITS1-5.8S-ITS2 and LSU sequences of Psilocybe, Deconica and Panaeolus present in GenBank were downloaded, as were all Galerina clavata sequences for these regions. These deposited sequences, together with our sequences, were aligned as described for ITS1, ITS1-5.8S-ITS2 and LSU.

3.4. Evaluation of markers 3.4.1 ITS1

Alignment of ITS1 sequences was performed as described, enabling the distinction of several groups that displayed minimal variability in both length and DNA sequence. Because several sequences were significantly shorter than others and the sequences were not all trimmed to the same starting and ending position, the placement of these sequences in the Maximum Likelihood tree was influenced. In the Maximum Likelihood tree (Supplementary Figure S1) four groups could roughly be distinguished: the Panaeolus species, the Deconica species, a diverse Psilocybe group harboring the Galerina clavata sequences and a small group containing sequences of the two Psilocybe species Psilocybe calongei and Psilocybe magnivelaris. Although the majority of sequences was placed in the expected group, several sequences were placed in unexpected positions in this tree. Firstly one accession of Psilocybe fasciata (DQ001401) was placed in the Deconica group whereas the other accession of this Psilocybe species (AB158635) was placed in the Psilocybe group. Secondly, the Psilocybe australiana sequence (AY129366) was placed in the Deconica group where placement near Psilocybe subaeruginosa would be expected.

A third incongruity was found in that three near identical groups of species were recognized in the tree that consisted of multiple species that are not closely related; (I) the “Psilocybe cubensis subgroup”, consisting of Psilocybe cubensis, Psilocybe subcubensis, Deconica montana, Panaeolus sphinctrinus and Galerina clavata sequences placed within the Psilocybe group. (II) The “Deconica montana subgroup” within the Deconica group, consisting of Deconica montana and Psilocybe cubensis sequences. (III) The “Galarina clavata subgroup”, consisting of Galerina clavata and Psilocybe cubensis sequences. A separate comparison of the ITS1 sequences of Psilocybe cubensis, Deconica

21 Chapter 1

montana and Galerina clavata supplemented with the available Panaeolus sphinctrinus sequences (Figure 2), shows that four groups of near identical sequences can be identified that do not correspond to the four groups based on the species name of the accessions. We hypothesize that the groups with near identical sequences actually are single species groups, and that the heterogeneity of the groups is caused by misidentification, contamination or accidental swapping of samples, DNA extracts, PCR products or sequence data of AY129348 (Panaeolus sphinctrinus), AY129351 (Psilocybe cubensis) and AY129360 (Deconica montana) [20] and AY281021 (Galerina clavata) and AY281023 (Psilocybe cubensis) [35].

Figure 2. Alignment of ITS1 sequences deposited in GenBank as Psilocybe cubensis, Deconica montana, Panaeolus sphinctrinus and Galerina clavata. Grey: identical positions; green: adenine; blue: cytosine, black: guanine, red: thymine.

To test this hypothesis, samples of Panaeolus sphinctrinus DAOM 180389, Psilocybe cubensis DAOM 169061 and Deconica montana DAOM 167409 held by the Canadian Collection of Fungal Cultures were requested, from which sequences AY129348, AY129351 and AY129360 are said to have been derived. Unfortunately samples or DNA could not be transferred, however the original samples were resequenced and these sequences kindly provided by Scott A. Redhead. Comparison of these unpublished sequences to the previously deposited sequences [20] strengthens our hypothesis that the published ITS1 sequences are erroneous. We therefore removed the abovementioned sequences from our dataset, prior to estimation of the ITS1 intra and interspecies variation.

Pairwise distances were calculated between all pairs of ITS sequences; identical sequences resulting in a distance of 0. The highest difference in the dataset (0.765) was found between Psilocybe caerulipes AY129371 and Psilocybe subcubensis KC669297. All observed pairwise distances between ITS1 sequences obtained from the same Psilocybe species were below 0.17 (Figure 3A), with the vast majority of pairwise distances (95%) below 0.05. Within the genus Deconica, more variation was detected (Figure 3B) with only 55% of distances lower than 0.05. Closer inspection of the data showed that this was mainly due to the various Deconica montana and Deconica coprophila accessions differing by multiple nucleotides. For the majority of Psilocybe and Deconica species the interspecies variation proved far higher than 0.05 (Figures 3A-B), due to both differences in sequence length and composition. Although a higher interspecies than intraspecies variation is visible, a true barcoding gap is absent. Upon further inspection of

22 Molecular species identification of “Magic Mushrooms”

Figure S1 and the sequence alignments, several groups without interspecies variation were detected within the genus Psilocybe (e.g. Psilocybe cyanescens – Psilocybe subaeruginosa, Psilocybe serbica – Psilocybe moravica and Psilocybe arcana – Psilocybe behomica, Supplementary figure S1). Additionally groups were detected with low interspecies variation. These consisted of sequences with identical sequence length, but slight differences in nucleotide composition (e.g. Psilocybe cubensis – Psilocybe subcubensis and Psilocybe allenii – Psilocybe cyanescens).

Pairwise distances of ITS1 sequences Pairwise distances of ITS1 sequences 0,3 0.74 Pairwise distances of ITS1 sequences 0,3 0,3 Psilocbye intraspecies (n=142) Deconica intraspecies (n=40) Panaeolus intraspecies (n=21) A Psilocybe interspecies (n=3018) B Deconica interspecies (n=236) C Panaeolus interspecies (n=279) 0,25 0,25 0,25 between Deconica sp. and Psilocybe sp. (n=1920) between Deconica sp. and Psilocybe sp. (n=1920) between Deconica sp. and Panaeolus sp.(n=634) between Psilocybe sp. and Panaeolus sp. (n=2000) between Deconica sp. and Panaeolus sp.(n=634) between Psilocybe sp. and Panaeolus sp. (n=2000) 0,2 0,2 0,2

0,15 0,15 0,15

0,1 0,1 0,1 Frequencyof observed pairwise distance Frequencyof observed pairwise distance Frequencyof observed pairwise distance 0,05 0,05 0,05

0 0 0 <0,1 <0,2 <0,3 <0,4 <0,5 <0,6 <0,1 <0,2 <0,3 <0,4 <0,5 <0,6 <0,1 <0,2 <0,3 <0,4 <0,5 <0,6 <0,01 <0,02 <0,03 <0,04 <0,05 <0,06 <0,07 <0,08 <0,09 <0,11 <0,12 <0,13 <0,14 <0,15 <0,16 <0,17 <0,18 <0,19 <0,21 <0,22 <0,23 <0,24 <0,25 <0,26 <0,27 <0,28 <0,29 <0,31 <0,32 <0,33 <0,34 <0,35 <0,36 <0,37 <0,38 <0,39 <0,41 <0,42 <0,43 <0,44 <0,45 <0,46 <0,47 <0,48 <0,49 <0,51 <0,52 <0,53 <0,54 <0,55 <0,56 <0,57 <0,58 <0,59 <0,01 <0,02 <0,03 <0,04 <0,05 <0,06 <0,07 <0,08 <0,09 <0,11 <0,12 <0,13 <0,14 <0,15 <0,16 <0,17 <0,18 <0,19 <0,21 <0,22 <0,23 <0,24 <0,25 <0,26 <0,27 <0,28 <0,29 <0,31 <0,32 <0,33 <0,34 <0,35 <0,36 <0,37 <0,38 <0,39 <0,41 <0,42 <0,43 <0,44 <0,45 <0,46 <0,47 <0,48 <0,49 <0,51 <0,52 <0,53 <0,54 <0,55 <0,56 <0,57 <0,58 <0,59 <0,01 <0,02 <0,03 <0,04 <0,05 <0,06 <0,07 <0,08 <0,09 <0,11 <0,12 <0,13 <0,14 <0,15 <0,16 <0,17 <0,18 <0,19 <0,21 <0,22 <0,23 <0,24 <0,25 <0,26 <0,27 <0,28 <0,29 <0,31 <0,32 <0,33 <0,34 <0,35 <0,36 <0,37 <0,38 <0,39 <0,41 <0,42 <0,43 <0,44 <0,45 <0,46 <0,47 <0,48 <0,49 <0,51 <0,52 <0,53 <0,54 <0,55 <0,56 <0,57 <0,58 <0,59 Figure 3. Frequency of intraspecies (light blue), intragenus (red) and intergenus (green and purple) pairwise differences between ITS1 sequences of . A:Psilocybe, B:Deconica, C:Panaeolus.

In the genus Panaeolus, a different pattern was observed (Figure 3C). Upon inspection of S1 and the sequence alignments, two Panaeolus sphinctrinus sequences (FJ55227 and HE819397) were found to be identical to a Panaeolus rickenii sequence (JF908516), but this group of sequences differed from a group of two other Panaeolus sphinctrius sequences (DQ182503 and HM035081) and a second Panaeolus rickenii sequence (JF908523) in sequence length and composition. The same was observed for two different sequences of Panaeolus campanulatus and Panaeolus retirugis. As Panaeolus sphinctrinus is the only species for which more than two accessions are available, the available dataset was considered insufficient to calculate intra and interspecies variation. Sequencing of Panaeolus specimens is advisable to evaluate the potential for sequence based identification of this genus.

3.4.2 ITS1-5.8S-ITS2 Although less complete ITS1-5.8S-ITS2 sequences were available than ITS1 sequences (respectively 124 and 151), the overall alignment and obtained trees were in agreement with one another other. Alignment of the 5.8S region was straightforward, with only two sequences (KC66926 and KC669300) differing from all others in sequence length and only two additional variable positions in the complete dataset. Alignment of ITS2 proved less troublesome than alignment of ITS1, as less sequence length variation was observed. In the Maximum Likelihood tree (Supplementary Figure S2) the same groups could be distinguished as described for ITS1 alone.

The patterns of intraspecies sequence variation that were described when marker ITS1 was considered, were also present when both markers (Figure 4), or only marker ITS2 was considered (data not shown). The calculated pairwise distances between all pairs of ITS1-5.8S- ITS2 sequences were between 0 and 0.348 (Figure 4). As for marker ITS1, the lowest

23 Chapter 1

intraspecies variation was found within the genus Psilocybe with 95% of differences below 0.02. The highest Psilocybe intraspecies difference was 0.067 (Figure 4A). The highest intraspecies values observed for Deconica and Panaeolus were 0.094 and 0.103 respectively (Figures 4B and 4C). As described for ITS1, the histogram reflecting the frequency of observed pairwise difference values, demontrates a higher between species variation than within species variation for Psilocybe and Deconica although a true barcoding gap is absent.

0,82 Pairwise distances of ITS1-5.8S-ITS2 sequences Pairwise distances of ITS1-5.8S-ITS2 sequences 0,3 Pairwise distances of ITS1-5.8S-ITS2 sequences 0,3 0,3 Psilocbye intraspecies (n=105) Deconica intraspecies (n=33) Panaeolus intraspecies (n=16) A Psilocybe interspecies (n=2040) B Deconica interspecies (n=138) C Panaeolus interspecies (n=215) 0,25 0,25 0,25 between Deconica sp. and Psilocybe sp. (n=1254) between Deconica sp. and Psilocybe sp. (n=1254) between Deconica sp. and Panaeolus sp.(n=418) between Psilocybe sp. and Panaeolus sp. (n=1452) between Deconica sp. and Panaeolus sp.(n=418) between Psilocybe sp. and Panaeolus sp. (n=1452) 0,2 0,2 0,2 0,15 0,15 0,15

0,1 0,1 0,1

Frequency of observed pairwiseobserved Frequencyof distance Frequencyof observed pairwise distance Frequencyof observed pairwise distance 0,05 0,05 0,05

0 0 0 <0,1 <0,2 <0,3 <0,1 <0,2 <0,3 <0,1 <0,2 <0,3 <0,01 <0,02 <0,03 <0,04 <0,05 <0,06 <0,07 <0,08 <0,09 <0,11 <0,12 <0,13 <0,14 <0,15 <0,16 <0,17 <0,18 <0,19 <0,21 <0,22 <0,23 <0,24 <0,25 <0,26 <0,27 <0,28 <0,29 <0,01 <0,02 <0,03 <0,04 <0,05 <0,06 <0,07 <0,08 <0,09 <0,11 <0,12 <0,13 <0,14 <0,15 <0,16 <0,17 <0,18 <0,19 <0,21 <0,22 <0,23 <0,24 <0,25 <0,26 <0,27 <0,28 <0,29 <0,01 <0,02 <0,03 <0,04 <0,05 <0,06 <0,07 <0,08 <0,09 <0,11 <0,12 <0,13 <0,14 <0,15 <0,16 <0,17 <0,18 <0,19 <0,21 <0,22 <0,23 <0,24 <0,25 <0,26 <0,27 <0,28 <0,29 <0,005 <0,015 <0,025 <0,035 <0,045 <0,055 <0,065 <0,075 <0,085 <0,095 <0,105 <0,115 <0,125 <0,135 <0,145 <0,155 <0,165 <0,175 <0,185 <0,195 <0,205 <0,215 <0,225 <0,235 <0,245 <0,255 <0,265 <0,275 <0,285 <0,295 <0,005 <0,015 <0,025 <0,035 <0,045 <0,055 <0,065 <0,075 <0,085 <0,095 <0,105 <0,115 <0,125 <0,135 <0,145 <0,155 <0,165 <0,175 <0,185 <0,195 <0,205 <0,215 <0,225 <0,235 <0,245 <0,255 <0,265 <0,275 <0,285 <0,295 <0,005 <0,015 <0,025 <0,035 <0,045 <0,055 <0,065 <0,075 <0,085 <0,095 <0,105 <0,115 <0,125 <0,135 <0,145 <0,155 <0,165 <0,175 <0,185 <0,195 <0,205 <0,215 <0,225 <0,235 <0,245 <0,255 <0,265 <0,275 <0,285 <0,295 Figure 4. Frequency of intraspecies (light blue), intragenus (red) and intergenus (green and purple) pairwise differences between ITS1-5.8S-ITS2 sequences of . A:Psilocybe, B:Deconica, C:Panaeolus.

Further inspection of Figure S2 and sequence alignments showed that accessions that were (near) identical in the ITS1 region, were (near) identical in the ITS2 region and sequences differing in sequence length in the ITS1 region were easily distinguished in the ITS2 region. The species with a moderate ITS1 intraspecies variation (e.g. D. coprophila, D. montana) also displayed a large ITS2 intraspecies variation. The species that showed no or little interspecies variation (e.g. Psilocybe cyanescens - Psilocybe subaeruginosa and Psilocybe cubensis and Psilocybe subcubensis) did not display interspecies variation for ITS2 either. The species that could not be identified based on their ITS1 sequence due to large intraspecies variation (e.g. several Panaeolus species), could also not be identified based on their ITS2 sequences. As described for ITS1, further investigation of such species groups should be performed to determine the true nature of these findings.

3.4.3 LSU Alignment and grouping of the 129 LSU sequences was easily performed as no significant length differences were present. Independent of the chosen distance model and tree building method (data not shown), three main groups could be distinguished; the Psilocybe species, the Deconica species, the Paneaeolus species (Maximum Likelihood tree shown in Supplementary Figure S3). The groups are in agreement with the ITS1 and ITS1-5.8S-ITS2 trees (Supplementary Figures S1 and S2). Where a small fourth Psilocybe group was recognized for ITS1 and ITS1-5.8S-ITS2, no LSU sequences are available for these species, preventing further comparison.

Within the LSU tree, three incongruities were observed. One accession of Psilocybe silvatica (AF042618) was placed in the Deconica group whereas the other accession of this MM species (AY129383) was placed in the Psilocybe group. Secondly the only Panaelous uliginosus (AY129384)

24 Molecular species identification of “Magic Mushrooms”

accession was placed within the Psilocybe group. Also a Psilocybe cyanescens accession (EU029946) was placed in the Panaeolus group, between three Panaeolus cyanescens sequences. Several groups were also discovered that consisted of more than one species that did not show any sequence differences. Moreover other accessions carrying the same species names were present in different groups. Whether these groups consist of wrongly named sequences or of closely related species displaying less between than within species variation could not be determined without resequencing multiple samples. Unfortunately, due to strict regulations governing the transport and possession of the species of interest, obtaining these samples for additional investigation is nearly impossible, even when collection curators are willing to contribute material. However, as marker LSU is often considered uninformative for species identification due to lack of a sufficient barcoding gap [17], we hypothesize that LSU accessions AF042618 (Psilocybe silvatica), AY129384 (Panaeolus uliginosus) and EU029946 (Psilocybe cyanescens) may have been wrongly identified, that contamination occurred during laboratory procedures or that sequence mix-up has occurred. These questioned sequences were removed the abovementioned sequences from our dataset, prior to estimation of the LSU intra and interspecies variation. The histogram reflecting the observed pairwise difference values of LSU sequences (Figure 5), shows a higher interspecies than intraspecies variation, however even less of barcoding gap is observed than for ITS1 and ITS1-5.8S-ITS2.

0.65 Pairwise distances of LSU sequences 0.42 Pairwise distances of LSU sequences 0.42 Pairwise distances of LSU sequences 0,3 0,3 0,3 Psilocbye intraspecies (n=89) Deconica intraspecies (n=50) Panaeolus intraspecies (n=12) A Psilocybe interspecies (n=1808) B Deconica interspecies (n=621) C Panaeolus interspecies (n=133) 0,25 0,25 0,25 between Deconica sp. and Psilocybe sp. (n=2366) between Deconica sp. and Psilocybe sp. (n=2366) between Deconica sp. and Panaeolus sp.(n=646) between Psilocybe sp. and Panaeolus sp. (n=1071) between Deconica sp. and Panaeolus sp.(n=646) between Psilocybe sp. and Panaeolus sp. (n=1071) 0,2 0,2 0,2 0,15 0,15 0,15

0,1 0,1 0,1

Frequencyof observed pairwise distance Frequencyof observed pairwise distance Frequencyof observed pairwise distance 0,05 0,05 0,05 0 0 0 <0,1 <0,1 <0,1 <0,01 <0,02 <0,03 <0,04 <0,05 <0,06 <0,07 <0,08 <0,09 <0,11 <0,12 <0,01 <0,02 <0,03 <0,04 <0,05 <0,06 <0,07 <0,08 <0,09 <0,11 <0,12 <0,01 <0,02 <0,03 <0,04 <0,05 <0,06 <0,07 <0,08 <0,09 <0,11 <0,12 <0,002 <0,004 <0,006 <0,008 <0,012 <0,014 <0,016 <0,018 <0,022 <0,024 <0,026 <0,028 <0,032 <0,034 <0,036 <0,038 <0,042 <0,044 <0,046 <0,048 <0,052 <0,054 <0,056 <0,058 <0,062 <0,064 <0,066 <0,068 <0,072 <0,074 <0,076 <0,078 <0,082 <0,084 <0,086 <0,088 <0,092 <0,094 <0,096 <0,098 <0,102 <0,104 <0,106 <0,108 <0,112 <0,114 <0,116 <0,118 <0,002 <0,004 <0,006 <0,008 <0,012 <0,014 <0,016 <0,018 <0,022 <0,024 <0,026 <0,028 <0,032 <0,034 <0,036 <0,038 <0,042 <0,044 <0,046 <0,048 <0,052 <0,054 <0,056 <0,058 <0,062 <0,064 <0,066 <0,068 <0,072 <0,074 <0,076 <0,078 <0,082 <0,084 <0,086 <0,088 <0,092 <0,094 <0,096 <0,098 <0,102 <0,104 <0,106 <0,108 <0,112 <0,114 <0,116 <0,118 <0,002 <0,004 <0,006 <0,008 <0,012 <0,014 <0,016 <0,018 <0,022 <0,024 <0,026 <0,028 <0,032 <0,034 <0,036 <0,038 <0,042 <0,044 <0,046 <0,048 <0,052 <0,054 <0,056 <0,058 <0,062 <0,064 <0,066 <0,068 <0,072 <0,074 <0,076 <0,078 <0,082 <0,084 <0,086 <0,088 <0,092 <0,094 <0,096 <0,098 <0,102 <0,104 <0,106 <0,108 <0,112 <0,114 <0,116 <0,118 Figure 5. Frequency of intraspecies (light blue), intragenus (red) and intergenus (green and purple) pairwise differences between LSU sequences of . A:Psilocybe, B:Deconica, C:Panaeolus.

The calculated pairwise distances between all LSU sequence pairs ranged between 0 and 0.089 (Figure 5). Differences between LSU on one hand and and ITS1 and ITS1-5.8S-ITS2 on the other, were that frequency of identical sequences obtained within a species was higher than for the other markers, 65%, 42% and 42% for Psilocybe, Deconica and Panaeolus respectively (Figure 5). Also the pattern observed for Panaeolus as a whole was more comparable to the patterns of Psilocybe and Deconica. Upon further evaluation of the Figure S3 and the sequence alignments, the within species variation of Psilocybe semilanceata and Panaeolus sphinctrinus were due to one single sequence differing from the other available sequences (AY129380 and FH755227 respectively). As shown in Figure S3 these two accessions were placed in other subgroups than the other sequences of these species and therefore require further evaluation.

25 Chapter 1

3.4.4 Comparison of markers If all sequences that in our hypothesis are incorrectly named, are omitted from the comparisons, only single species or related species groups with near identical sequences are found for all studied regions within the genera Psilocybe and Deconica. The resulting inter- and intraspecific variation for ITS1, ITS1-5.8S-ITS2 and LSU in this selection of accessions present in GenBank is in agreement with the within species variation presented by others studying the usability of ITS as a marker for barcoding and species identification [16,17,36,37]. Additionally, the authors of the study rejecting the use of ITS in favor of LSU for the forensic identification of magic mushrooms [20] based their conclusion on the sequences that we consider erroneous after resequencing of these samples. Extrapolation of these results for fungi in general and for the genera Psilocybe and Deconica in specific to the genus Panaeolus would imply that markers ITS1 and ITS2 are suitable to identify at least groups of closely related species. However the (unvalidated) Panaeolus sequences currently present in GenBank do not support this statement as sequences of several unrelated species share identical sequences whilst sequences bearing the same species name have significantly different sequences.

The presence of incorrectly identified or described sequences in GenBank has been demonstrated for fungi [37,38] and other species [11], as has the presence of imprecise sequence editing of fungal sequences (e.g. [39]). In the future alternative databases, including the Barcode of Life database [40], and UNITE [37] with more stringent quality control mechanisms for sample identification, and trace file availability have been suggested to improve the feasibility of DNA based identification. Although these efforts will certainly improve the overall reliability of DNA databases for species identification, human errors including contamination and accidental swapping of samples or sequences will unfortunately remain potential causes of incorrect sequences being submitted to databases. Additionally, at present GenBank contains such a wealth of information that ignoring such databases for this reason would be a waste of resources, even though such errors may be hard to detect when only minute numbers of sequences of a certain taxonomic group are available. The interpretation of comparisons with databases therefore remains a crucial step when molecular markers and public databases are used to identify unknown samples, especially when the results are to be used in forensic investigations. Accurate and transparent records of such comparisons should be kept, including the value assigned to the obtained results such as previously described for the DNA based identification of plants [13,41]. Additionally many laboratories will maintain an additional collection of sequences from known samples that may be used to validate public databases or obtained results when using such databased for forensic purposes.

26 Molecular species identification of “Magic Mushrooms”

4. Conclusions Comparison of the sequences obtained in this study and sequences present in GenBank illustrates advantages and disadvantages of the studied DNA regions for answering different forensic questions. Both ITS and LSU amplicons were obtained from all samples in this study, including police seizures consisting of dried and powdered samples. Due to amplicon length, amplification of marker ITS is expected to still be feasible when amplification of LSU fails. Additionally, due to the length of the informative region, the LSU sequences used in this study can only be obtained through multiple sequencing reactions whilst the complete ITS amplicon is readily sequenced in a single reaction, which gives rise to higher costs and more opportunities for error.

Of the studied markers, ITS1-5.8S-ITS2 is shown to be the most informative region for identification at the species level, outperforming the separate ITS1 and ITS2 regions and far superior to LSU. The sequences described in this study illustrate that previous disqualification of marker ITS1 for the forensic identification of magic mushrooms [20] is based on at least three erroneous sequences and marker ITS1 does not perform differently in the genera Psilocybe and Deconica than in other species of fungi. The length of both markers ITS1 and ITS2 was constant within different accessions of a species, whereas variation of up to 50 bp was observed in ITS1 length between species of Psilocybe. Although alignment of such sequences is challenging and could be considered arbitrary, if the sequence of an unknown sample is identical to a reference sequence in both length and sequence, length differences do not hamper the identification process, but add to the variation thereby increasing the certainty with which a sample can be identified. Several groups of closely related species have been identified that, based on the current sequences available, cannot be distinguished due to insufficient interspecies variation. Only by the addition of ITS, LSU and other marker sequences obtained from authenticated reference samples, ideally performed by multiple laboratories with stringent quality control measures, will help discern whether these groups of species can be distinguished by molecular markers.

In cases where the ITS1 and ITS2 sequences of an unknown sample do not match any known sequences, species identification is impossible, and the presence of the conserved 5.8S region in the obtained ITS1-5.8S-ITS2 sequence may enable genus identification through alignment to sequences of related species. Marker LSU is equally well suited for genus identification, and is sufficiently variable to distinguish between groups of closely related fungi. Through alignment of LSU sequences, an unknown sample may readily be placed in the Deconica or the Psilocybe genus, enabling categorization of non-potent or hallucinogenic species. However, as only few informative nucleotides have been identified in this marker that may be used to discriminate between closely related species, usage of only marker LSU will lead to failure to identify closely related species in more cases than when marker ITS is used, especially when only a smaller

27 Chapter 1

section of this marker has been sequenced. If the legal requirement is not species identification but categorization of a sample as Psilocybe sp. as opposed to Deconica sp., marker LSU outperforms marker ITS.

At present nucleotide databases like GenBank contain a wealth of information but those who use this information to classify or identify unknown samples should be well aware of the absence of quality control of sequences submitted to such databases and the potential pitfalls accompanying this fact. To use such public databases for forensic investigations a thorough understanding of the deposited sequences is required and in some cases it may be beneficial to generate valid reference sequences instead of or in addition to publicly available sequences.

Acknowledgements We thank Scott A. Redhead (Canadian Collection of Fungal Cultures) for resequencing of DAOM 169061, DAOM 167409 and DAOM 180389.

References [1] G. Guzmán, The hallucinogenic mushrooms: diversity, traditions, use and abuse with special reference to the genus Psilocybe, Fungi from different environments, edited by J.K. Misra, S.K. Deshmukh, Progress in mycological research, Science Publishers (2009) 256-277 [2] P. Stamets, Psilocybin mushrooms of the world. Ten Speed Press (1996) [3] J. van Amsterdam, A. Opperhuizen, M. Koeter, W. van den Brink, Ranking the harm of alcohol, tobacco and illicit drugs for the individual and the population. European addiction research, 16(4) (2010) 202-207, DOI: 10.1159/000317249 [4] A. Winstock,, M.Barratt, J.Ferris, L. Maier, Global Drugs Survey 2017, https://www.globaldrugsurvey.com/gds2017-launch/results-released/ [accessed December 26th 2017] [5] Opium Act Schedule II, Staatsblad van het Koninkrijk der Nederlanden [Bulletin of Acts and Decrees] 2008, 486. [6] European Monitoring Centre for Drugs and Drug Addiction (EMCDDA) http://www.emcdda.europa.eu/html.cfm/index17341EN.html [accessed December 26th 2017] [7] S.A. Redhead, J.M. Moncalvo, R. Vilgalys, P.B. Matheny, L. Guzmán-Dávalos, G. Guzmán, (1757) Proposal to conserve the name Psilocybe () with a conserved type, Taxon 56 (1) (2007) 255–257. [8] L.L Norvell, Report of the nomenclature committee for fungi: 15, Taxon 59 (1) (2010) 291-293. [9] W. Parson, K. Pegoraro, H. Niederstatter, M. Foger, M. Steinlechner, Species identification by means of the cytochrome b gene. Int J Legal Med., 114(1–2) (200) 23–8, DOI: 10.1007/s004140000 [10] W. Branicki T. Kupiec, R. Pawlowski, Validation of cytochrome b sequence analysis as a method of species identification. J Forensic Sci. 48(1) (2003) 83–7, DOI: 10.1520/JFS2002128

28 Molecular species identification of “Magic Mushrooms”

[11] N. Dawnay R. Ogden, R. McEwing, G.R. Carvalho, R.S. Thorpe, Validation of the barcoding gene COI for use in forensic genetic species identification. Forensic Sci Int. 173(1) (2007) 1–6, DOI: 10.1016/j.forsciint.2006.09.013 [12] L. Tsai, Y. Yu, H. Hsieh, J. Wang, A. Linacre, J. Lee, Species identification using sequences of the trnL intron and the trnL-trnF IGS of chloroplast genome among popular plants in Taiwan, For. Sci. Int. 164 (2006) 193-200, DOI: 10.1016/j.forsciint.2006.01.007 [13] M. Wesselink, I. Kuiper, Species identification of botanical trace evidence using molecular markers, For. Sci. Int.: Genetics Supplement Series 1 (2008) 630-632, 10.1016/j.fsigss.2007.10.211 [14] J.S. Hopple, R. Vilgalys, Phylogenetic relationships in the genus Coprinus and dark-spored allies based on sequence data from the nuclear gene coding for the large ribosomal subunit RNA: divergent domains, outgroups, and monophyly. Molecular phylogenetics and evolution, 13(1) (1999) 1-19, DOI: 10.1006/mpev.1999.0634. [15] J.M. Monocalvo, R. Vilgalys, S.A. Redhead, J.E. Johnson, T.Y. James, M.C. Aime, et al., , One hundred and seventeen clades of euagarics, Mol. Phylogenet. Evol. 23 (2002) 357-400, DOI: 10.1016/S1055-7903(02)00027-1 [16] K.A. Seifert, Progress toward DNA barcoding of fungi. Mol Ecol. Resources 9 (2009) (Suppl. 1) 83- 89. [17] C.L. Schoch, B. Robbertse, V. Robert D. Vu, G. Cardinali, L. Irinyi, et al., Finding needle in haystacks: linking scientific names, reference specimens and molecular data for Fungi. Database 2014 (2014), DOI:10.1093/database/bau061 [18] V. Ramírez-Cruz, G. Guzmán, A.R. Villalobos-Arámbula, A. Rodríguez, P.B. Matheny, M. Sánchez- García, L. Guzmán-Dávalos, Phylogenetic inference and trait evolution of the psychedelic mushroom genus Psilocybe sensu lato (Agaricales). Botany, 91(9) (2013) 573-591, DOI: 10.1139/cjb- 2013-0070 [19] J. Chun-I Lee, M. Cole, A. Linacre, Identification of members of the genera Panaeolus and Psilocybe by a DNA test A preliminary test for hallucinogenic fungi, 112 (2000) 123-133, DOI: 10.1016/S0379- 0738(00)00181-X [20] K.G. Nugent, B. Saville, Forensic analysis of hallucinogenic fungi: a DNA-based approach, For. Sci. Int. 140 (2004) 147-157. [21] T. Maruyama, N. Kawahara, K. Yokoyama, Y. Makino, T. Fukihara, Y. Goda, Phylogenetic relationship of psychoactive fungi based on rRNA gene for a large subunit and their identification using the TaqMan assay (II), For. Sci. Int. 163 (2006) 51-58, DOI: 10.1016/j.forsciint.2004.10.028 [22] M. Kowalczyk, A. Sekuła, P. Mleczko, Z. Olszowy, A. Kujawa, S. Zubek, T. Kupiec,. Practical aspects of genetic identification of hallucinogenic and other poisonous mushrooms for clinical and forensic purposes. Croatian medical journal, 56(1) (2015) 32-40, DOI:10.3325/cmj.2015.56.32 [23] D.L. Hawksworth, P.E. Wiltshire, Forensic : the use of fungi in criminal investigations. Forensic Science International, 206(1) (2011) 1-11, 10.1016/j.forsciint.2010.06.012 [24] D.A. Benson, I. Karsch-Mizrachi, D.J. Lipman, J. Ostell, E.W. Sayers, GenBank, Nucleic Acids Res. 37(Database issue) (2009) D26-31, DOI: 10.1093/nar/gks1195 [25] M. Gardes and T.D. Bruns. 1993. ITS primers with enhanced specificity for basidiomycetes – application to the identification of mycorrhizae and rusts. Molecular Ecology 2, 113-118, DOI: 10.1111/j.1365-294X.1993.tb00005.x [26] T.J. White, T. Bruns. S. Lee and J. Taylor. 1990. Amplification and Direct sequencing of fungal ribosomal RNA genes for phylogenetics. In: M.A. Innis, D.H. Gelfand, J.J. Sninsky, T.J. White (eds) PCR protocols: a guide to methods and applications, pp315-322. Academic Press, New York.

29 Chapter 1

[27] R. Vilgalys, M. Hester. Rapid genetic identification and mapping of enzymatically amplified ribosomal DNA from several Cryptococcus species. Journal of Bacteriology 172 (1990) 4238-4246, DOI: 10.1128/jb.172.8.4238-4246 [28] Vilgalys website: http://sites.biology.duke.edu/fungi/mycolab/primers.htm [accessed 2008, expired, reposted in July 2017 as as https://sites.duke.edu/vilgalyslab/rdna_primers_for_fungi/) [29] A.J. Drummond, B. Ashton, M. Cheung, J. Heled, M. Kearse, R. Moir, et al., Geneious v4.7, Biomatters Ltd, Auckland, New Zealand, 2009. [30] P.W. Crous, W. Gams, J.A. Stalpers, V. Robert, G. Stegehuis, MycoBank: an online initiative to launch mycology into the 21st century. Studies in Mycology, 50(1) (2004) 19-22 [31] M.E. Noordeloos, The genus Deconica (WG SM.) P. KARST. in Europe–new combina-tions. Öst. Zeit. Pilzk., 18 (2009) 207-210. [32] V. Robert, D. Vu, A.B.H. Amor, N. van de Wiele, C. Brouwer, B. Jabas, et al., MycoBank gearing up for new horizons. IMA fungus, 4(2) (2013) 371-379, DOI: 10.5598/imafungus.2013.04.02.16 [33] K. Tamura, D. Peterson, N. Peterson, G. Stecher, M. Nei, S. Kumar, MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Molecular Biology and Evolution 28 (2011): 2731-9, DOI:10.1093/molbev/msr121 [34] K. Tamura, M. Nei, Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution 10 (1993) 512- 526, DOI: 10.1093/oxfordjournals.molbev.a040023 [35] L. Guzmán-Dávalos, G.M. Mueller, J. Cifuentes, A.N. Miller, A. Santerre, Traditional infrageneric classification of is not supported by ribosomal DNA sequence data. Mycologia, 95(6) (2003) 1204-1214, DOI: 10.1080/15572536.2004.11833028 [36] P.M. Brock, H. Döring M.I. Bidartondo, How to know unknown fungi: the role of a herbarium, New Phytologist 181 (2009) 719-724, DOI: 10.1111/j.1469-8137.2008.02703.x [37] U. Kõljalg, R.H. Nilsson, T., Abarenkov, L. Tedersoo, A.F. Taylor, M. Bahram, et.al.,. Towards a unified paradigm for sequence‐based identification of fungi. Molecular ecology, 22(21) (2013), 5271- 5277, DOI: 10.1111/mec.12481 [38] R.H. Nilsson, M. Ryberg, E. Kristiansson, K. Abarenkov, K-H. Larsson, U. Kõljalg, Taxonomic reliability of DNA sequences in public sequence databases: a fungal perspective, PLoS ONE 1 (2006) e59, DOI: 10.1371/journal.pone.0000059 [39] J. Borovička, M. Oborník, J. Stříbrný, N.E. Noordeloos, L.P. Sánchez, M. Gryndler, Phylogenetic and chemical studies in the potential psychotropic species complex of Psilocybe atrobrunnea with taxonomic and nomenclatural notes. Persoonia: Molecular Phylogeny and Evolution of Fungi, 34 (2015) 1, DOI: 10.3767/003158515X685283 [40] S. Ratnasingham, P.D. Hebert, BOLD: The Barcode of Life Data System (http://www. barcodinglife. org). Molecular Ecology Resources, 7(3) (2007) 355-364. DOI: 10.1111/j.1471- 8286.2007.01678.x [41] ENFSI-BPM-APS-01: Best Practice Manual for the application of molecular methods for the forensic examination of non-human biological traces (version 01-November 2015) [Internet] Available from: http://enfsi.eu/documents/best-practice-manuals/

30 Molecular species identification of “Magic Mushrooms”

Following pages:

Supplementary Figure S1: Maximum Likelihood tree of 151 Psilocybe, Deconica, Panaeolus and Galerina clavata ITS1 sequences extracted from GenBank. Sequences clustering in unexpected genera are highlighted. Pages 32 & 33

Supplementary Figure S2: Maximum Likelihood tree of 124 Psilocybe, Deconica, Panaeolus and Galerina clavata ITS1-5.8S-ITS2 sequences extracted from GenBank. Sequences clustering in unexpected genera are highlighted. Pages 34 & 35

Supplementary Figure S3: Maximum Likelihood tree of 129 Psilocybe, Deconica, and Panaeolus and LSU sequences extracted from GenBank. Sequences clustering in unexpected genera are highlighted. Pages 36 & 37

31 Chapter 1

Psilocybe (incl. Galerina clavata)

Deconica

32 Molecular species identification of “Magic Mushrooms”

Psilocybe (incl. Galerina clavata)

Deconica

Panaeolus

Supplementary Figure S1: Maximum Likelihood tree of 151 Psilocybe, Deconica, Panaeolus and Galerina clavata ITS1 sequences extracted from GenBank. Sequences clustering in unexpected genera are highlighted.

33 Chapter 1

Psilocybe (incl. Galerina clavata)

Deconica

34 Molecular species identification of “Magic Mushrooms”

Psilocybe (incl. Galerina clavata)

Deconica

Panaeolus

Supplementary Figure S2: Maximum Likelihood tree of 124 Psilocybe, Deconica, Panaeolus and Galerina clavata ITS1-5.8S-ITS2 sequences extracted from GenBank. Sequences clustering in unexpected genera are highlighted.

35 Chapter 1

Psilocybe

Panaeolus

36 Molecular species identification of “Magic Mushrooms”

Psilocybe

Panaeolus

Deconica

Supplementary Figure S3: Maximum Likelihood tree of 129 Psilocybe, Deconica, and Panaeolus and LSU sequences extracted from GenBank. Sequences clustering in unexpected genera are highlighted.

37

Chapter 2

Forensic utility of the feline mitochondrial control region – A Dutch perspective

M. Wesselink, L. Bergwerff, D. Hoogmoed, A.D. Kloosterman and I. Kuiper

Forensic Science International: Genetics 17 (2015) 25-32 http://dx.doi.org/10.1016/j.fsigen.2015.03.0041

Abstract Different portions of the feline mitochondrial DNA control region (CR) were evaluated for their informative value in forensic investigations. The 402 bp region located between RS2 and RS3 described most extensively in the past is not efficient for distinguishing between the majority of Dutch cats, illustrated by a random match probability (RMP) of 41%. Typing of the whole region between RS2 and RS3, and additional typing of the 5’portion of the feline CR decreases the RMP to 29%, increasing the applicability of such analyses for forensic investigations. The haplotype distribution in Dutch random bred cats (N = 113) differs greatly from the distributions reported for other countries, with a single haplotype NL-A1 present in 54% of the population. The three investigated breeds showed haplotype distributions differing from each other and the random bred cats with haplotype NL-A1 accounting for 4%, 29% and 32% of Maine Coon, Norwegian forest cats and Siamese & Oriental cats. These results indicate the necessity of validating haplotype frequencies within continents and regions prior to reporting the value a mtDNA match. In cases where known purebred cats are involved, further investigation of the breed may be valuable.

1 Supplementary data associated with this article can be found in the online version

39 Chapter 2

1. Introduction In the Netherlands, a country with almost 17 million human inhabitants, the number of cats is estimated to fluctuate around 3 million [1,2]. As in other parts of the world, approximately 95% of these domestic cats are random bred, also referred to as European shorthair or non- breed, the remaining 5% consisting of fancy breed cats [1,3]. In the densely inhabited parts of the Netherlands, some cats have an indoor life (mainly fancy breed cats), others pass time outdoor (mainly random bred cats). Few cats live independently of humans, due to human control of feral cat populations [4,5]. Due to this lifestyle and the continuous, year round shedding of hairs, cat hairs are easily spread to the belongings of people living in proximity of cats.

When these people become involved in a crime, whether as a victim, witness or perpetrator, cat hairs can be transferred. Like hairs from other species, in this manner cat hairs can be silent witnesses of a crime, when for example they are transferred from a suspect’s clothes to tape used to restrain a victim. If nuclear STR markers can be amplified from such trace evidence, the link between the trace evidence (a hair) and the donor (the cat) can be extremely informative [6]. In our experience, as also pointed out by others [7,8], amplifying nuclear DNA (nDNA) markers from animal hairs encountered in case work is generally unsuccessful. The amplification of mitochondrial DNA (mtDNA) from shed hairs has significantly higher success rates, mainly due to the high copy number and greater stability of the mtDNA molecule and the distribution of mtDNA in hairs [9]. This makes mtDNA the DNA molecule of choice for forensic investigation of cat hairs, despite the fact that it is less informative than nDNA due to its maternal inheritance and lack of recombination [10,11].

To use the mtDNA from hairs as trace evidence, a reliable and robust technique is necessary with which a maximum of information can be obtained from such small traces containing only a limited amount of mtDNA. The mtDNA control region (CR), also referred to as displacement loop, D-loop and hypervariable regions (HV), has proven to be the most informative part of the mtDNA in many species including humans [12], dogs [13] and cats [7,14]. In humans and dogs specific portions of the CR have been recognized that have diverse characteristics and therefore are of different value for forensic purposes. The extent to which the applicability of these regions has been investigated varies.

The tandem repeat structures present in the feline CR (RS2 and RS3) have been shown to be too variable for forensic applications, as the number of repeats may vary within an individual, often prohibiting complete sequencing of both strands of DNA [15]. Due to the presence of a numt and the possibility of amplifying nDNA instead of mtDNA [16], the information content of the 30 portion of the feline CR has only been investigated briefly [17]. The remaining two portions of the feline CR, the 50 portion located before RS2 (also referred to as HVI and HV1a) and the middle portion located between RS2 and RS3 (also referred to as HVII and

40 Forensic utility of the feline mitochondrial control region – A Dutch perspective

HV1b) have both been described as forensically informative regions, but their informative value has not yet been compared.

One of the factors in determining the value of a portion of the mtDNA CR may be the availability of relevant reference samples for which this region has been typed [8]. Geographical origin, random bred vs fancy breed cats and human factors all influence whether publicly available data can be used. Several studies have focused on specific parts of the mtDNA in specific breeds, random bred cats or wild cats in different regions. Portions of the mtDNA of cats from islands ( and the United Kingdom) have been described [18,19], and from several mainland European countries (Italy, and Poland) [17,20], however as different regions of the mtDNA were used the data from these studies cannot be compared. Analysis of large number of cats from different regions of the United States resulted in comparable haplotype distributions, but differed from a relatively small sampling of non-US samples [21] and analyses of non-US cats was advised.

To investigate whether cat breeds could be distinguished by their mtDNA, several individuals (4–25) of different breeds mainly from the US were typed, however no breed specific haplotypes were identified [21]. This fits the observation that the development of cat breeds is a relatively recent process with most breeds originating within the last 50–100 years and the fact that crossbreeding of cats from different breeds is permitted within breed standards as some are defined by coat pattern, hair length or color mutations alone. The genetic relationship and admixture between several breeds has been clearly shown by autosomal DNA studies [22– 25]. To be able to apply feline mtDNA in forensic case work in The Netherlands a better understanding of random bred cats and fancy breed cats in The Netherlands and North West Europe is essential. Therefore we evaluated the value of the different portions of the feline mitochondrial control region for nucleotide diversity and random match probability within random bred cats from the Netherlands. We additionally explored substructuring of the domestic cat population in the Netherlands to establish a framework for interpretation of mtDNA matches in Dutch casework.

2. Material and methods 2.1. Sample collection A random sampling of Dutch random bred cats was obtained by collecting buccal swabs of cats owned by (relatives of) personnel of the Netherlands Forensic Institute living throughout the Netherlands. In cases where swabs of multiple littermates or parents and offspring had been collected, only one sample of each known maternal group was included in this study. This random sampling resulted in the collection of 113 random bred cats (RBC, 88%), 4 mixed breed cats, 3 British Shorthairs, 2 Maine Coons, 2 Persians, 2 Siamese, 1 Bengal, 1 Birman, and 1 Siberian Forest cat, the majority collected in the most densely inhabited part of the country (distribution of RBC samples in Fig. 1).

41 Chapter 2

2.2. Laboratory procedures DNA was extracted from buccal swabs using the DNeasy Blood and Tissue kit (Qiagen GmbH, Germany) following the manufacturer’s protocol. Total DNA concentration was estimated through spectrophotometry with a NanoDrop ND- 1000 (NanoDrop Technologies, DE, USA). A portion of the feline mtDNA CR was amplified using forward primer FCB-Z [18] and reverse primer JHmtR3 [26] (Fig. 2). PCR reactions were performed in 25 μl containing 1x PCR Buffer (containing 1.5 mM MgCl2), 0.5 mM MgCl2, 0.75 unit of HotStarTaq 0.1 mM each dNTP (all Qiagen), 5 pmol of each primer and 0.5–25 ng of template DNA. Fig. 1. Distribution of housholds, cats and RBC PCRs were performed on MyCycler (BioRad samples. Map generated with LocalFocus. Laboratories) or GeneAmp 9700 (Applied Biosystems) thermocyclers using the following cycling parameters: 15 min at 95 °C; 35 cycles of 94 °C for 45 s, 58 °C for 30 s, and 72 °C for 1 min; 10 min at 72 °C. PCR success was determined by gel-electrophoresis followed by ethidium bromide staining and UV detection. Positive and negative controls were used throughout. ExoSAP-IT1 (Affymetrix) was used following the manufacturer’s protocol for removal of unincorporated primers an dNTPs. 1 μl of each purified PCR product was sequenced with the PCR primers and JHmtF3 [26] in a total of 10 μl, containing 1x BigDye sequencing buffer, 0.2x BigDye Terminator v3.1 Ready Reaction Mix (Life Technologies) and 5 pmol primer. Cycling parameters were 1 min at 96 °C; 28 cylces of 95 °C for 10 s, 50 °C for 5 s and 60 °C for 3 min. Residual sequencing components were removed using BigDye XTerminator1 Purification Kit (Life Technologies) following the manufacturer’s protocol. Sequencing products were separated on an AB3500 XL genetic analyzer, containing 50 cm capillaries using POP-7TM polymer (all Life Technologies).

2.3. Sequence analysis Sequences were evaluated using Geneious 4.7.6 [27]. Forward and reverse sequences were assembled into contigs (containing the RS2 region) and basecalling of the non-RS2 region was checked and edited manually. Sequences containing ambiguities and sequence assemblies producing ambiguities were resequenced. Positions remaining heteroplasmic were denoted according to IUPAC nomenclature. After assembly, consensus sequences were aligned and trimmed to remove non-control region DNA and the RS2 repeat using Geneious. Nucleotide positions (NP) 16314-16504 and 16780-220 (corresponding to Genbank U20753 and NC_001700 [16]) were retained.

42 Forensic utility of the feline mitochondrial control region – A Dutch perspective

Variable nucleotide positions were identified in comparison with the first published feline mitochondrial DNA sequence [16]. The nucleotide positions of the published “Sylvester reference sequence” (SRS) [21,26] is included to facilitate comparisons with studies considering NP 16813-206, but was not used otherwise.

Fig. 2. Domestic cat control region with nucleotide positions (NP) 16315–17009/0–866 as described by Lopez et al. [16]. RS2 consists of multiple repeats of appr. 80 bp; RS3 consists of multiple repeats of appr. 10 bp. The Numt begins at NP 529 and extends to NP 8454. Sequenced portions of the control region used in this study and in [17–19], [21] and [26] are indicated, as are the positions of amplification primers FCB-Z [18] and JHmtR3 [26].

2.4. Data analysis JmodelTest2.1.3 [28,29] was used to select the models best describing the complete 641 bp region. The HKY model [30] with invariable sites which assumes differences in base frequencies, and differences between the rates of transitions and transversions described the data most precisely. This model was used for construction of maximum likelihood trees in MEGA 5.2 [31] (all sites, all haplotypes, 10000 bootstrap replicates). 2 Random match probability (RMP) was calculated as (1-∑ 푝푖 ) where p is the frequency of the observed haplotype and discrimination capacity is 1-RMP. The unbiased haplotype diversity 푁 2 was calculated as (1-∑ 푝 ) [32]. A phylogeographic network of haplotypes was constructed 푁−1 푖 with the median joining algorithm implemented in Network 4.6 (Fluxus Technology Ldt) [33].

3. Results 3.1. Characteristics of the different regions Amplicons of 1.2 to 1.6 kb were obtained from all buccal samples. From the majority of samples, fragments with varying lengths were obtained as visualized after gel-electrophoresis. Sequencing of the different fragments was not hampered by their presence. Sequences of on average 641 bp were obtained from 129 randomly selected cats (RSC). The different regions of the feline CR were analyzed using this randomly sampled cat data to evaluate the forensic utility of these regions. The four regions of interest were: the 50 190 bp preceding RS2 (NP 16314-16504); the 402 bp corresponding to SRS; the 451 bp between RS2 and RS3 (NP 16780-

43 Chapter 2

223); and the combination of the 190 bp before RS2 and 451 bp between RS2 and RS3 (Table 1). The lowest discrimination capacity of approximately 53% was observed in the 50 region (NP 16314-16504), followed by 59% in the 402 bp SRS region. The discrimination capacity of NP 16780-222 exceeded the other regions (66%), however the highest discrimination capacity was observed when the 5'region preceding RS2 and the region between RS2 and RS3 were combined (71%). The increase in informative value when comparing the regions before and after RS2 and these regions combined, is also demonstrated by the number of haplotypes, number of singletons and gene diversity (Table 1). Based on these data, the combined region of 641 bp was used for further analyses.

Table 1. Comparison of segments of feline mtDNA CR based on 129 randomly sampled cats. 190 bp before RS2 402 bp [19,21,26] 451 bp (RS2-RS3) Total 641 bp 16315- Based on 129 RSC 16315-16504 16813-206 16780-222 16504+16780-222 # haplotypes 11 18 23 30 # singletons 7 11 15 21 unbiased gene diversit y (SD) 0.5294 ± 0.0437 0.5919 ± 0.0495 0.6578 ± 0.0465 0.7184 ± 0.0428 # matches/8256 comparisons 3885 3369 2825 2325 random match probabiliy 0.4747 0.4127 0.3473 0.2872 discrimination capacity 0.5253 0.5873 0.6527 0.7128

3.2. Analysis of haplogroups The 641 bp sequence of the combined regions (preceding RS2 and between RS2 and RS3) were aligned for 113 random bred cats (RBC), 51 Maine Coon cats (MC), 41 Norwegian Forest (NF) cats and 44 Siamese and Oriental Shorthair cats with the first published feline mtDNA sequence U20753 as is common practice for human and canine mitochondrial DNA studies. This resulted in the recognition of 39 different haplotypes. Through phylogenetic analysis of these haplotypes, six haplogroups (NL-A, NL-B, NL-C, NL-D, NL-E and NL-F) were recognized (Network diagram in Fig. 3, Maximum Likelihood tree in supplementary data). Most of the haplotypes differed from the three most frequently occurring Fig. 3. Network diagram of all mtDNA CR sequences without haplotypes by only one heteroplasmic positions encountered in this study. U20753 [16] is (67%) or two (8%) shown as a reference. Circles are scaled to frequencies for the different breed groups: RBC (white), MC (black), NF (dark gray) and nucleotide substitutions. 2 S–O (light grey). All differences between haplotypes are indicated.

44 Forensic utility of the feline mitochondrial control region – A Dutch perspective haplotypes derived from 2 individuals contained a heteroplasmic position. One haplotype (NL- A10) was found to contain a 14 bp deletion. Additional typing of a maternally related individual confirmed this deletion. All haplotypes were numbered and nucleotide positions polymorphic when compared to U20753 are displayed in Table 2. The nucleotide position numbering of SRS is shown to facilitate comparison with other studies.

Table 2. Haplotype defining polymorphic nucleotide positions. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7 5 6 6 6 1 1 1 1 1 1 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 9 2. 3 5 3 3 5 6 7 8 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 1 0 4 9 9 3 1 1 1 6 8 1 1 2 2 2 3 3 5 6 7 7 8 8 0 0 8 8 8 8 8 9 9 9 9 9 0 0 0 0 2 2 5 5 6 5 7 8 8 8 9 5 6 3 1 0 0. 2 7 9 0 3 4 0 5 9 8 9 1 3 0 1 2 3 8 0 1 5 7 8 1 2 3 4 0 4 5 9 7 7 3 5 6 8 7 1 [16] U20753 G C A G A Δ C A A C C T C C T G G C C C A C C T C T A A G T T C T T A G T C C A G T C G A T - T T T T T C G G NL-A1 C . . C . . . . G T . . T . . T T . . T G . . A . C ...... C . . . C . . T A C . . . C . . . C A . T A A NL-A2 C . . C . A . . G T . . T . . T T . . T G . . A . C ...... C . . . C . . T A C . . . C . . . C A . T A A NL-A3 C . . C . . . . G T . . T . . T T . . T . . . A . C ...... C . . . C . . T A C . . . C . . . C A . T A A NL-A4 C . . C Δ . . . G T . . T . . T T . . T G . . A . C ...... C . . . C . . T A C . . . C . . . C A . T A A NL-A5 C . . C . . . . G T . C T . . T T . . T G . . A . C ...... C . . . C . . T A C . . . C . . . C A . T A A NL-A6 C . . C . . . . G T . . T . . T T . . . G . . A . C ...... C . . . C . . T A C . . . C . . . C A . T A A NL-A7 C . . C . . . . G T . . T . . T T . . T G . . A . C . . . C . . C . . . C . . T A C . . . C . . . C A . T A A NL-A8 C . . C . . . . G T . . T . . T T . . T G . . A . C ...... C . . T A C . . . C . . . C A . T A A NL-A9 C . T C . . . . G T T . T . . T T . . T G . . A . C ...... C . . . C . . T A C . . . C . . . C A . T A A NL-A10 C . . C . . . ‡ ‡ ‡ ‡ . T . . T T . . T G . . A . C ...... C . . . C . . T A C . . . C . . . C A . T A A NL-A11 C . . C . . . . G T . . T . . T T . T T G . . A . C ...... C . . . C . . T A C . . . C . . . C A . T A A NL-A12 C . . C . . . . G T . . T . . T T . . T G . . A . C ...... C . . . C . . T A C . . . C . . . C A . T . A NL-A13 C . . C . . . . G T . . T . . T T . . T G . . A . C ...... C . . . C T . T A C . . . C . . . C A . T A A NL-A14 C . . C . . . . G T . . T . . T T . . T G . . A . C ...... C . . . C . . T A C . A . C . . . C A . T A A NL-A15 C . . C . . . . G T . . T . . T T . . T G . . A . C ...... C . . . C . . T A C . . . C . . . C A . . A A NL-A16 C . . C . . . . G T . . T . . T T . . T G . . A . C ...... C . . . C . . T A C . . . C . . . C A . Y A A NL-D1 C . . C . . . . G T . . T . . T T . . T . . . A . C . . A . . . C C . . . . T T A C . . G C T A . C A . T A A NL-D2 C . . C Δ . . . G T . . T . . T T . . T . . . A . C . . A . . . C C . . . . T T A C . . G C . . A C A . T A A NL-B1 C . . C . . A . G T . . . A C T T ...... A . C G G A . . . . . G . C . . T A . . . . C . A . . A C T . A NL-B2 C . . C . . A . G T . . . A C T T . . . . T . A . C G G A . . . . . G . C . . T A . . . . C . A . . A C T . A NL-B3 C . . C . . A . G T . . . A C T T ...... A . C G G A . . T . . G . C . . T A . . . . C . A . . A C T . A NL-B4 C . . C . . A . G T . . . A C T T ...... A T C G G A . . . . . G . C . . T A . . . . C . A . . A C T . A NL-B5 C . . C . . A . G T . . . A C T T ...... A . C G G A . . . . . G . C . . T A . . A . C . A . . A C T A A NL-B6 C . . C . . A . G T . . . A C T T ...... A . C G G A . . . . . G . C T . T A . . . . C . A . . A C T . A NL-B7 C . . C . . A . G T . . . A C T T ...... A . C G G A . . . . . G . C . . T A C . . . C . A . . A C T . A NL-B8 C . . C . . A . G T . . . A C T T ...... A . C G G A ...... C . . T A . . . . C . A . . A C T . A NL-B9 C . . C . . A . G T . . . A C T T ...... A . C G G A . . . . . G . C . . T A . . . . C . A . . A C . . A NL-B10 C . . C . . A . G T . . . A C T T ...... C G G A . C . . . G . C . . T A . . . . C . A . . A C . . A NL-B11 C T . C . . A . G T . . . A C T T ...... C G G A . C . . . G . C . . T A . . . . C . A . . A C . . A NL-B12 C . . C . . A . G T . . . A C T T ...... C G G A . C . . . . . C . . T A . . . . C . A . . A . . . A NL-C1 C . . C . . . . G T . . . . . T T . . T . . T A . C G G . . C . . . G . . . . T ...... A . . A C T . A NL-C2 C . . C . . . . G T . . . . . T T . . T . . T A . C G G . . C Y . . G . . . . T ...... A . . A C T . A NL-C3 C . . C . . . . G T . . . . . T T . . T . . T A . C G G . . C . . C G . . . . T ...... A . . A C T . A NL-C4 C . . C . . . . G T . . . . . T T T . T . . T A . C G G . . C . . . G . . . . T C ...... A . . A C T . A NL-C5 C . . C . . . . G T . . . . . T T . . T . . T A . C G G . . C . . . G . C . . T ...... A . . A C T . A NL-C6 C . . C . . . . G T . . . . . T T . . T . . T A . C G G . . C . . . G A . . . T ...... A . . A C T . A NL-C7 C . . C . . . . G T . . . . . T T . . T . . T A . C G G . . C . . . G . . . . T ...... A . . A . T . A NL-E1 C . . C . . . . G T . . . . . T T . . T . . . A . C G G . . C . . C . . . . T T A . T A . C T A . . A . . . A NL-F1 C . . C ...... T T ...... A . . . A 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 SRS [26] ------1 4 4 5 4 6 7 7 7 8 0 5 5 6 6 2 3 5 6 7 7 7 1 2 6 4 4 0 2 3 5 4 3 5 9 0 2 7 1 6 6 0 8 Dots indicate positions identical to the reference sequence U20753 [16]. Base calls (IUPAS nomenclature) show the substitutions at specified positions (Y=C/T), Δ indicates a deletion, ‡ indicates a 14 bp deletion from NP 16427-16440 encountered in a single haplotype. Grey nucleotide positions are the same in all encountered haplotypes, but differ from the reference U20753. Nucleotide positions of SRS [26] are indicated for comparative purposes only.

45 Chapter 2

3.3. Comparison of breed groups The distribution of the haplotypes within the random bred cats, MC, NF and S–O is shown in Table 3. A large variation in haplotype distribution, haplogroup distribution and genetic diversity was observed between the four groups of cats. The vast majority of random bred cats (74%) has a haplotype in haplogroup A, 20% have a haplotype within haplogroup B, whereas haplogroups C, D, E and F are only represented by a total of 6% of the random bred cats. This is in contrast with the three breed groups, where in MCs only 11% is placed in haplogroup A, 22% in haplogroup B, 15% in haplogroup E, and all others (66%) in haplogroup C. In NF haplogroups A, C and D each account for approximately 30% of haplotypes, with the remaining 7% belonging to haplogroup B. Siamese and Oriental Shorthairs exhibited only haplotypes belonging to haplogroups A (39%) and B (60%). To examine a possible effect of mainly sampling RBC from the western part of The Netherlands, frequency distributions from the western, middle and eastern part of The Netherlands were compared. No pronounced differences were observed (data not shown). Of the 39 haplotypes, 6 were present in more than 15% of individuals within at least one breed. The most frequently encountered haplotype, NL-A1 is carried by more than half of all random bred cats. This haplotype is encountered less frequently in Maine Coon, Norwegian Forest and Siamese & Oriental Shorthairs (4%, 30% and 32%, respectively). Haplotype NL-B1 was encountered in 18% of Maine Coons and 36% of Siamese & Oriental Shorthairs, while it was only seen in 6% of random bred cats and 5% of Norwegian Forest cats. Haplotypes NL- C3 and NL-E1 were observed in 18% and 16% of Maine Coons, respectively, but not in the other cats breeds or random bred cats. Haplotype NL-D1 was found in 34% of Norwegian Forest cats and in a single random bred cat. Of the 39 haplotypes, 22 were encountered only in a single individual.

4. Discussion In this study sequencible amplicons were obtained from all tested buccal DNA extracts. Although the presence of the RS2 repeat in the center of the amplicon caused amplicons to increase in length up to 0.4 kb, this did not have a negative effect on PCR or sequence success of the tested samples. Amplification of this large fragment from single hair DNA extracts as encountered in casework, can be hampered when only a limited amount of possibly degraded DNA is present (data not shown). In such cases the use of multiple smaller amplicons is needed to obtain both regions of the CR. Ideally a single amplicon is preferred to minimize costs and the risk of laboratory sample mix up. However the increase in information when both the CR region prior to and the region directly after RS2 are used outweighs these disadvantages. Although the portion of feline mtDNA CR located between RS2 and RS3 has been most studied most extensively and this region is more informative than the region before RS2, it is not sufficient to distinguish between the majority of the Dutch cats. To increase the information that can be retrieved from a single cat hair, especially when a sample is typed as haplogroup NL-A, it is advisable to include the region before RS2 even if this implies

46 Forensic utility of the feline mitochondrial control region – A Dutch perspective

Table 3. Haplotype distribution within random bred cats and three breed groups. Random bred cats Maine Coon Norwegian Forest cats Siamese & Oriental total # frequency # frequency # frequency # frequency # NL-A1 61 0.540 2 0.039 12 0.293 14 0.318 89 NL-A2 1 0.009 4 0.091 5 NL-A3 4 0.035 4 NL-A4 3 0.027 3 NL-A5 1 0.009 1 NL-A6 1 0.009 1 NL-A7 1 0.009 1 NL-A8 1 0.009 1 NL-A9 1 0.009 1 NL-A10 1 0.009 1 NL-A11 1 0.024 1 NL-A12 5 0.044 5 NL-A13 4 0.078 4 NL-A14 1 0.009 1 NL-A15 1 0.009 1 NL-A16 1 0.009 1 NL-D1 1 0.009 14 0.341 15 NL-D2 1 0.009 1 NL-B1 7 0.062 9 0.176 2 0.049 16 0.364 34 NL-B2 4 0.091 4 NL-B3 1 0.009 1 0.024 2 NL-B4 1 0.009 1 NL-B5 2 0.018 2 NL-B6 1 0.009 1 NL-B7 1 0.02 1 NL-B8 1 0.02 1 NL-B9 5 0.114 5 NL-B10 2 0.018 1 0.023 3 NL-B11 1 0.009 1 NL-B12 8 0.071 8 NL-C1 2 0.018 17 0.333 8 0.195 27 NL-C2 1 0.024 1 NL-C3 9 0.176 9 NL-C4 1 0.009 1 NL-C5 1 0.009 1 NL-C6 1 0.024 1 NL-C7 1 0.024 1 NL-E1 8 0.157 8 NL-F1 1 0.009 1 N 113 51 41 44 249 # haplotypes 28 8 9 6 39 # singletons 19 2 5 1 22 % singleton samples 0.21 0.04 0.12 0.02 0.09 unbiased gene diversity (SD) 0.6996 ± 0.0478 0.8094 ± 0.0297 0.7732 ± 0.0372 0.7537 ± 0.0379 # matches/# comparisons 1901/6328 243/1275 186/820 233/946 random match probabiliy 0.3066 0.2065 0.2457 0.2634 discrimination capacity 0.6934 0.7935 0.7543 0.7366 amplification and sequencing of additional amplicons. By addition of this region, the frequency of the largest halpogroup (70%) is reduced to 55%. As this one group is still very large, the added value of even more regions of the mtDNA should be considered. Whole genome sequencing of the mtDNA could be used to determine regions or SNPs of interest, but other regions previously used to distinguish felines should not be forgotten. Cytb and NADH are logical candidate regions, as is the 30 region of the CR especially when applied to hairs with no nDNA present. The typing of this region in such hairs should not pose any difficulties, although typing reference buccal swabs might could well cause difficulties due to the numt present in the nDNA. The collection of not only reference buccal swabs but also reference hair samples could circumvent this problem.

47 Chapter 2

4.1. Geographical comparisons The 402 bp SRS region of the mtDNA CR of the random bred cats was used to enable comparison of our dataset with previously published haplotype frequencies from the United States (n = 493) [21], European mainland (Germany n = 21 and Italy n = 23) [21] and the United Kingdom (n = 152) [19]. As shown in Fig. 4A, the frequencies of the major haplotypes encountered in this study differ greatly from the frequencies of these haplotypes in other studies. Some resemblance can be seen between the data from this study and a small sampling of European mainland cats published previously, but discrepancy between the observed frequency of haplogroup A and the frequency of unique haplotypes is pronounced. Although no statistical analyses were performed, the greatest difference between the frequencies observed in this study and in the US and UK studies is in the frequencies of haplogroups A and C. When calculated for this 402 bp region, the unbiased gene diversity of the 113 Dutch RBC is only 0.5490 ± 0.0551, which is far lower than values calculated for European mainland (0.9799 ± 0.0117, n = 24) and American RBC (0.8321 ± 0.0095, n = 493) [21].

The haplotype distribution of the Dutch RBCs was also compared to domestic cats from Japan (Tsushima Islands, n = 50) [18] utilizing only the 5’ portion of CR sequences (Fig. 4B). This resulted in comparable results, with haplogroup NL-A highly overrepresented in the Dutch population and NL-B and NL-C underrepresented. Several new haplotypes were detected in the Dutch population and a type found in 10% of the Japanese samples was not observed in The Netherlands. When calculated for this 109 bp region, the unbiased gene diversity of the 113 Dutch RBC is only 0.4997 ± 0.0496, which is substantially lower than the 0.7118 ± 0.0332 calculated for the Japanese cats (n = 50) [18].

Fig. 4. Comparison of haplotype distribution of 113 RBCs with previous feline mtDNA CR studies [18,19]and [21]. 4A: 402 bp region between RS2 and RS3. 4B: 5’ sequence before RS2.

Multiple biological and reproduction factors related to human and cat population density, sterilization and breeding practices that may well differ between countries or regions, as well as the number of wild living/stray cats may contribute to the observed differences between sample sets. As cat movement between The Netherlands and the United Kingdom is thought to be less frequent than between The Netherlands and other mainland North-West European

48 Forensic utility of the feline mitochondrial control region – A Dutch perspective countries, additional typing of cats from the eastern part of The Netherlands, Belgium and Germany is desirable. This may indicate if the frequencies described in this study are typical for a ‘North-West European’ distribution, or whether the distribution in The Netherlands differs from that in its neighboring countries.

4.2. Breed comparisons In line with previous studies [21,26], no direct relationship between breed and haplogroup or haplotype was observed. This fits the description of domestication of cats, and the relatively recent origin of fancy cat breeds. However the haplogroups (and haplotype) distribution between the studied cat breeds differed notably. This has implications for forensic casework if the same mtDNA profile is found in a crime scene sample and in a reference cat and the reference cat is known to be a fancy breed cat. As only three breed groups were tested, no conclusions can be drawn about fancy cat breeds in general. As many different cat breeds are recognized and some are (intentionally) cross bred with others, obtaining a reasonable number of samples of these breeds seems unfeasible. However if such a situation occurs during a case investigation, it may be valuable to type additional unrelated cats of the same breed to allow better evaluation of the evidence.

5. Conclusion Our results show that mtDNA haplotype NL-A1 is highly overrepresented in random bred cats from The Netherlands. As the vast majority of Dutch cats are random bred, chances of encountering a hair with haplotype NL-A1 in casework are high. When examination of a reference animal (or other trace evidence) shows that animal also has haplotype NL-A1, the evidential value of such a match will generally be deemed small as the random match probability is high (~25%). However if a casework hair and reference cat both have a non-NL- A1 haplotype, the evidential value can rise dramatically as the probability of encountering an uncommon mtDNA haplotype in a Dutch cat is approximately 23%. The impact of cat hairs in forensic investigations may increase even further, when hairs from multiple unrelated cats are transferred during a crime. Although the portion of feline mtDNA CR located between RS2 and RS3 has been studied most extensively and this region is more informative than the region before RS2, it is not sufficient to distinguish between the majority of the Dutch cats. To increase the information that can be retrieved from a cat hairs, especially when a sample is typed as haplogroup NL-A, it is advisable to additionally type the region before RS2. Comparison of the frequency distributions of Dutch cat mtDNA haplotypes with distributions from other studies indicate that it is undesirable to utilize frequency distributions from other continents or regions without validating these frequencies for a particular region.

49 Chapter 2

Acknowledgements We appreciate all private cat owners and catteries who contributed cat DNA samples and pedigrees for this study. We also thank Jon Wetton, University of Leicester, UK for permission to reproduce the UK mtDNA CR frequency distributions. Arnoud Kal NFI, and two anonymous reviewers are thanked for valuable comments on the manuscript.

References [1] HAS Kennistransfer. Feiten & cijfers gezelschapsdierensector. Hogeschool HAS Den Bosch; 2011. Dutch. [2] FEDIAF facts & figures 2012 [Internet]. The European Pet Food Industry Federation; Brussels, Belgium. [cited 2014 October 02] Available from:www.fediaf.org/fact-figures/. [3] What Kind of Cat Do You Want? [Internet]. The Cat Fanciers' Association; United States of America. [cited 2014 May 01]. Available from: www.cfainc.org/FutureOwners/FindingtheKittenofYourDreams.aspx. [4] D.R. Lammertsma, R. Janssen, J. van der Hout, H.A.H. Jansman, D.R. Lammertsma, R. Janssen, J. van der Hout, H.A.H. Jansman, Huiskatten in natuurgebieden; Kan TNR hybridisatie met de Wilde kat voorkomen? Alterra, Wageningen University, Dutch, 2011. [5] Ministry of Economic Affairs. Uitvoering motie afschieten van katten. 00000001003214369000. March 2014. Dutch. [6] M.A. Menotti-Raymond, V.A. David, S.J. O’Brien, Pet cat hair implicates murder suspect, Nature 386 (1997) 774, doi:http://dx.doi.org/10.1038/386774a0. [7] J.L. Halverson, C. Basten, Forensic DNA identification of animal-derived trace evidence: tools for linking victims and suspects, Croat. Med. J. 46 (2005) 598–605. [8] L.A. Lyons, R.A. Grahn, T.J. Kun, L.R. Netzel, E.E. Wictum, J.L. Halverson, Acceptance of domestic cat mitochondrial DNA in a criminal proceeding, Forensic Sci. Int. Genet. 13 (2014) 61–67, doi:http://dx.doi.org/10.1016/j.fsigen.2014.07.007. [9] C.A. Linch, Degeneration of nuclei and mitochondria in human hairs, J. Forensic Sci. 54 (2009) 346– 349, doi:http://dx.doi.org/10.1111/j.1556-4029.2008.00972.x. [10] B. Budowle, M.W. Allard, M.R. Wilson, R. Chakraborty, Forensics and mitochondrial DNA: applications, debates and foundations, Annu. Rev. Genom. Hum. Genet. 4 (2003) 119–141. [11] R.E. Giles, H. Blanc, H.M. Cann, D.C. Wallace, Maternal inheritance of human mitochondrial DNA, Proc. Natl. Acad. Sci. U. S. A. 77 (1980) 6715–6719. [12] S. Horai, K. Hayasaka, Intraspecific nucleotide sequence differences in the major noncoding region of human mitochondrial DNA, Am. J. Hum. Genet. 46 (4) (1990) 828–842. [13] P. Savolainen, B. Rosén, A. Holmberg, T. Leitner, M. Uhlén, J. Lundeberg, Sequence analysis of domestic dog mitochondrial DNA for forensic use, J. Forensic Sci. 42 (1997) 593–600. [14] J.L. Halverson, L.A. Lyons, Forensic DNA identification of feline hairs: casework and a mitochondrial database, Proc. Am. Acad. Forensic Sci. 10 (2004) B150. [15] F. Fridez, S. Rochat, R. Coquoz, Individual identification of cats and dogs using mitochondrial DNA tandem repeats? Sci. Justice 39 (1999) 167–171, doi:http://dx.doi.org/10.1016/S1355- 0306(99)72,042-3.

50 Forensic utility of the feline mitochondrial control region – A Dutch perspective

[16] J.V. Lopez, S. Cevario, S.J. O’Brien, Complete nucleotide sequences of the domestic cat (Felis catus) mitochondrial genome and a transposed mtDNA tandem repeat (Numt) in the nuclear genome, Genomics 33 (2) (1996) 229–246, doi:http://dx.doi.org/10.1006/geno.1996.0188. [17] W. Branicki, A. Olszanska, M. Konopinski, Sequence variation in the control region of mitochondrial DNA within a population sample of domestic cats Felis catus linnaeus–implications for domestic and wild cats differentiation, Prob. Forensic Sci. 67 (2006) 279–288. [18] T. Tamada, N. Kurose, R. Masuda, Genetic diversity in domestic cats Felis catus of the Tsushima Islands, based on mitochondrial DNA cytochrome b and control region nucleotide sequences, Zool. Sci. 22 (6) (2005) 627–633, doi:http://dx.doi.org/10.2108/zsj.22.627. [19] J.H. Wetton, B. Ottolini, G. Matharu Lall, M.A. Jobling, A Forensic Database of Cat Mitochondrial DNA Variants, DNA in Forensics, Brussels, 2014, pp. P48. [20] E. Randi, M. Pierpaoli, M. Beaumont, B. Ragni, A. Sforzi, Genetic identification of wild and domestic cats (Felis silvestris) and their hybrids using Bayesian clustering methods, Mol. Biol. Evol. 18 (9) (2001) 1679–1693. [21] R.A. Grahn, J.D. Kurushima, N.C. Billings, J.C. Grahn, J.L. Halverson, E. Hammer, et al., Feline non-repetitive mitochondrial DNA control region database for forensic evidence, Forensic Sci. Int. Genet. 5 (1) (2011) 33–42, doi:http://dx.doi.org/10.1016/j.fsigen.2010.01.013. [22] M.J. Lipinski, L. Froenicke, K.C. Baysac, N.C. Billings, C.M. Leutenegger, A.M. Levy, et al., The ascent of cat breeds: genetic evaluations of breeds and worldwide random-bred populations, Genomics 91 (1) (2008) 12–21, doi:http://dx.doi.org/10.1016/j.ygeno.2007.10.00. [23] M. Menotti-Raymond, V.A. David, S.M. Pflueger, K. Lindblad-Toh, C.M. Wade, S.J. O’Brien, W.E. Johnson, Patterns of molecular genetic variation among cat breeds, Genomics 91 (1) (2008) 1–11, doi:http://dx.doi.org/10.1016/j.ygeno.2007.08.008. [24] J.D. Kurushima, M.J. Lipinski, B. Gandolfi, L. Froenicke, J.C. Grahn, R.A. Grahn, L.A. Lyons, Variation of cats under domestication: genetic assignment of domestic cats to breeds and worldwide random-bred populations, Anim. Genet. 44 (3) (2013) 311–324, doi:http://dx.doi.org/10.1111/age.12008. [25] H. Alhaddad, R. Khan, R.A. Grahn, B. Gandolfi, J.C. Mullikin, S.A. Cole, et al., Extent of linkage disequilibrium in the domestic cat, Felis silvestris catus, and its breeds, PLoS One 8 (1) (2013) e53537, doi:http://dx.doi.org/10.1371/journal.pone.0053537. [26] C.R. Tarditi, R.A. Grahn, J.J. Evans, J.D. Kurushima, L.A. Lyons, Mitochondrial DNA. sequencing of cat hair: an informative forensic tool, J. Forensic. Sci. 56 (s1) (2011) S36–S46, oi:http://dx.doi.org/10.1111/j. 1556-4029.2010.01592.x. [27] A.J. Drummond, B. Ashton, M. Cheung, J. Heled, M. Kearse, R. Moir, et al., Geneious v4.7, Biomatters Ltd, Auckland, New Zealand, 2009. [28] D. Darriba, G.L. Taboada, R. Doallo, D. Posada, jModelTest 2: more models, new heuristics and parallel computing, Nat. Methods 9 (8) (2012) 772, doi:http://dx.doi.org/10.1038/nmeth.2109. [29] S. Guindon, O. Gascuel, A simple, fast and accurate method to estimate large phylogenies by maximum-likelihood, Syst. Biol. 52 (2003) 696–704, doi:http://dx.doi.org/10.1080/10635150390235520. [30] M. Hasegawa, H. Kishino, T. Yano, Dating of the human-ape splitting by a molecular clock of mitochondrial DNA, J. Mol. Evol. 22 (2) (1985) 160–174, doi:http://dx.doi.org/10.1007/BF02101694. [31] K. Tamura, D. Peterson, N. Peterson, G. Stecher, M. Nei, S. Kumar, MEGA5: Molecular evolutionary genetics analysis using maximum likelihood,evolutionary distance, and maximum

51 Chapter 2

parsimony methods, Mol. Biol. Evol.28 (2011) 2731–2739, doi:http://dx.doi.org/10.1093/molbev/msr121. [32] D.A. Jones, Blood samples: probability of discrimination, J. Forensic Sci. Soc. 12 (2) (1972) 355–359, doi:http://dx.doi.org/10.1016/S0015-7368(72) 70,695-7. [33] H.-J. Bandelt, P. Forster, A. Röhl, Median-joining networks for inferring intraspecific phylogenies, Mol. Biol. Evol. 16 (1999) 37–48.

52

Chapter 3

Forensic analysis of mitochondrial control region DNA from single cat hairs

M. Wesselink, L. Bergwerff and I. Kuiper

Forensic Science International: Genetics Supplement Series 5 (2015) e564-e565 http://dx.doi.org/10.1016/j.fsigss.2015.09.223

Abstract Although different parts of the mitochondrial control region (CR) have been described as forensically relevant, factors such as geographical location and breed may influence their forensic value. Analysis of multiple CR regions has been shown to increase the evidential value for certain populations, but due to large amplicon size, PCR success could be influenced when single shed hairs are used for analysis. A new strategy was developed to circumvent amplification of the repeat element RS2 without dramatically decreasing the number of potentially informative nucleotide positions sequenced bidirectional. DNA-extracts of 40 shed cat hairs were typed to compare the PCR success of this duplex strategy to the single amplicon approach. Amplicons were obtained from all tested hairs using the duplex strategy, compared to 60% using the single amplicon, illustrating the applicability of this strategy to case type samples.

53 Chapter 3

1. Introduction The use of cat hairs in forensics has gained attention in the last years, most often in investigations where either suspect or victim lives in proximity to cats. Given that the hairs that are transferred during crimes are mainly shed hairs, hardly any nuclear DNA (nDNA) is available for typing. However, as mitochondrial DNA (mtDNA) is present in larger quantities than nDNA and less prone to degradation, mtDNA typing can still be feasible. The part of the feline mtDNA CR located between repeat elements RS2 and RS3 has been described as a forensically informative region [1–2]. In cats from The Netherlands however, this region was insufficient to distinguish between 70% of cats. Additional typing of the 50 region of the CR prior to RS2, and of the 50 region between RS2 and RS3, reduced the size of the largest group by 20% and reduced the random match probability (RMP) from 0.41 to 0.29 [3]. Amplification of these two regions in a single amplicon is only achieved by coamplifying the uninformative RS2, which results in large amplicon size. As hairs from crime scenes or items of evidence generally contain little and potentially degraded DNA, this large amplicon size may limit the number of hairs from which information is obtained. To circumvent amplification of RS2 and facilitate bidirectional sequencing of all potentially informative nucleotide positions (NP), a new amplification and sequencing strategy was designed and tested on single shed cat hairs.

2. Material and methods Shed undercoat and guard hairs were collected from a household where no other indoor pets than cats resided. DNA was extracted using DNeasy Blood and Tissue kit (Qiagen GmbH, Germany) following the manufacturers’ guidelines with modifications: DTT was added during the proteinase K/ATL incubation phase and elution was performed in 25% AE following 5 min incubation at 70 °C. Amplification was performed in 12.5 ml reactions containing 1x Multiplex Master Mix (Qiagen), 2.5 pmol of each amplification primer (either FCB-Z [4] and JHmtR3 [1] or FCB-Z, F16483-M13R 5’-CAGGAAACAGCTATGACCGTCTTATGTATATGGGTGTATAATAC- AACCTGGG-3’ , M13F-JHmtF [1] 5’-GTAAAACGACGGCCAGTGATAGTGCTTAATC- GTGC-3’ and JHmtR3, see Fig. 1) and 2.5 ml DNA extract. Thermocycling was performed on a MyCycler (BioRad laboratories) for 15 min at 95 °C, 35 cycles of 45 s at 94 °C, 90 s at 58 °C, 60 s at 72 °C followed by a final extension of 10 min at 72 °C. PCR success was determined by gel-electrophoresis followed by ethidium bromide staining and UV detection. Positive and negative controls were used throughout. Sequencing was performed as described [3] using primers FCB-Z, M13R (AMBeR), M13F (AMBeR) or JHmtR3. Sequence analysis was performed using Chromas or Geneious 4.7.6.

54 Forensic analysis of mitochondrial control region DNA from single cat hairs

Fig. 1. Domestic cat mtDNA nucleotide positions 16315-17009/0-270 as described [5]. Relative positions of amplification primers FCB-Z, R16483-M13R, M13F-JHmtF3 and JHmtR3, amplicons, sequencing primers FCB-Z, M13R (AMBeR), M13F (AMBeR) and JHmtR3 are shown.

3. Results and discussion 3.1. Duplex amplification of feline mtDNA CR Duplex amplification and sequencing of both fragments was performed with 10 previously typed DNA extracts from domestic cats [3]. The duplex amplification strategy with M13 tailed amplification primers and AMBeR M13 sequencing primers enabled bidirectional sequencing of potentially informative NP 16315-16483 and 16780-195 partly facilitated by the improved readability of electropherograms obtained with AMBeR primers. No differences were observed between the sequences obtained with the two strategies. Two previously described NP (16503 and 16501) variable in two Dutch haplotypes, cannot be determined using the duplex strategy due to the position of primer R16483. No suitable primers could be designed to incorporate these two positions in the amplicon. The absence of these potentially informative positions does not hamper the distinction between 38 of the previously described 39 haplotypes [3]. As only haplotype NL-A11, encountered in a single Norwegian forest cat became indistinguishable from the abundant haplotype NL-A1, the random bred RMP for The Netherlands is not altered by using the duplex amplification strategy.

3.2. Amplification success Extracts from 20 undercoat and 20 guard hairs were used as template for the single (1.2–1.6 kb) amplicon and the duplex (2x ~0.5 kb) amplification strategy. As shown in Fig. 2, significantly more hairs could be typed using the duplex strategy. This effect was more pronounced for undercoat than for guard hairs.

55 Chapter 3

Fig. 2. Amplification success of single shed hairs. + enough PCR product for sequencing, +/- product visible but additional steps needed for sequencing, no product visible.

4. Conclusions Bidirectional sequencing of almost all previously described potentially informative positions in the regions between RS2 and RS3 and prior to RS2 was feasible using M13 tailed amplification primers and AMBeR M13 sequencing primers. The duplex strategy thus enables distinction of 38 of the 39 Dutch feline haplotypes. When compared to typing of only the region between RS2 and RS3, the duplex strategy increases the evidential value of cat hairs in forensics. When compared to the single amplicon approach, the duplex amplification strategy increases the number of single shed hairs that are successfully typed, illustrating the value of this approach for case samples.

References [1] C.R. Tarditi, R.A. Grahn, J.J. Evans, et al., Mitochondrial DNA sequencing of cat hair: an informative forensic tool, J. Forensic. Sci. 56 (1) (2011) S36–S46. [2] R.A. Grahn, J.D. Kurushima, N.C. Billings, et al., Feline non-repetitive mitochondrial DNA control region database for forensic evidence, Forensic Sci. Int. Genet. 5 (1) (2011) 33–42. [3] M. Wesselink, L. Bergwerff, D. Hoogmoed, et al., Forensic utility of the feline mitochondrial control region - A Dutch perspective, Forensic Sci. Int. Genet. 17 (2015) 25–32. [4] T. Tamada, N. Kurose, R. Masuda, Genetic diversity in domestic cats Felis catus of the Tsushima Islands, based on mitochondrial DNA cytochrome b and control region nucleotide sequences, Zool. Sci. 22 (6) (2005) 627–633. [5] J.V. Lopez, S. Cevario, S.J. O’Brien, Complete nucleotide sequences of the domestic cat (Felis catus) mitochondrial genome and a transposed mtDNA tandem repeat (Numt) in the nuclear genome, Genomics 33 (2) (1996) 229–246.

56

Chapter 4

Local populations and inaccuracies: Determining the relevant mitochondrial haplotype distributions for North West European cats

M. Wesselink, S. Desmyter and I. Kuiper

Forensic Science International: Genetics, 30 (2017) 71-80 http://dx.doi.org/10.1016/j.fsigen.2017.05.0111

Abstract Typing of different portions of the feline mitochondrial control region has illustrated pronounced differences in haplotype distributions between cats from the Netherlands and other parts of the world. To gain a better understanding of the haplotype distribution of North West Continental Europe, 605 bp of mitochondrial DNA was typed from randomly selected cats from the Netherlands (N = 146), Belgium (N = 64) and South West Germany (N = 128). The genetic differences between these randomly sampled European populations correlate to the geographical distances, with the Dutch and the South West German populations furthest apart and the Belgian population as an intermediate (Fst values 0.01–0.03). Comparison of North West European mainland distributions to published feline mitochondrial haplotype distributions illustrated moderate to large genetic differentiation (Fst values 0.01–0.32). In this comparison, the correlation between geographical and genetic distance was absent, leading to founder effects and human impact on cat population structure and dispersion being considered as important parameters. When an accurate estimation of feline haplotype distribution is required in forensics, care should be taken when deciding whether extrapolating the frequency data from a certain source to a larger area (country/continent) is justified or whether additional typing of local populations is necessary. This may differ from case to case as local frequencies can be relevant, but can also be deceitful. To improve the applicability of forensic feline mitochondrial DNA studies, documentation and publishing of sampling strategies is advised, as is the implementation of measures to help eliminate potentially erroneous haplotypes.

1 Supplementary data associated with this article (S1-S3) can be found in the online version

57 Chapter 4 Local populations and inaccuracies: Determining

1. Introduction The domestic cat is an extremely popular household pet throughout many parts of the world including Europe. In the North West European countries Germany, Belgium and the Netherlands 20%, 26% and 24% of households respectively are estimated to have at least one cat. In these countries, the cat population is estimated to be approximately one sixth of the human population [1]. An inevitable by-product of living in the proximity of cats, is the accumulation of shed cat hairs on humans and their belongings. As the transfer of such hairs, both in friendly contact and during crimes, is unavoidable [2], cat hairs have played a part in the investigation of a variety of human crimes [3–5].

Due to the DNA content of shed hairs, mitochondrial DNA (mtDNA) is most often used to investigate hair despite the fact that it is less informative than nuclear DNA due to its maternal inheritance and lack of recombination [6,7]. To reliably use mtDNA from hairs as trace evidence, a robust technique, with sufficient discriminating capacity and knowledge of the haplotype distribution of relevant populations is a prerequisite. Different portions of the feline mitochondrial control region (CR) have been studied in cats sampled in different parts of the world [8–10]. Most of the available public data is limited to the 402 bp conserved core region located between tandem repeat structures RS2 and RS3 in the feline CR designated ‘Sylvester Reference Sequence’ (SRS) [10,11]. So far the SRS region has been typed for European cats from Germany (n = 21) and Italy (n = 25) [11], Poland (N = 181) [12] and the United Kingdom (N = 152) [13]. A larger portion of the feline control region, 605 bp encompassing the SRS region, was typed for 113 random bred cats and 136 fancy breed cats from the Netherlands [14]. All these studies observed pronounced differences between the genetic composition of the mainland European cats and cats sampled in other parts of the world, mainly in the proportion of singleton haplotypes [11] or the abundance of dominant haplotypes [12–14].

To apply feline mtDNA in forensic case work in different European countries, a better understanding of the European cat population is required. Biological and reproduction factors that are related to human and cat populations as well as sterilisation and breeding practices are thought to be comparable between densely populated North West European countries. As the Netherlands shares land borders with Belgium and Germany and a sea border with the United Kingdom, these countries were chosen to determine whether the previously established haplotype distribution for the Netherlands is specific for that country or if it could be representative of North-West Europe. As movement of cats between the United Kingdom and Continental Europe occurs by human facilitation (both coincidental and intentional), the movement of cats between mainland European countries is thought to occur more frequently. Therefore this study focusses on cats from the mainland European countries of the Netherlands, Belgium and Germany. To evaluate the presence of micropopulations and the influence of sampling strategies on haplotype distribution estimates, cats were sampled

58 the relevant mitochondrial haplotype distributions for North West European cats randomly throughout the Dutch, Belgian and German populations and from two micropopulations: a single municipality in Belgium and a city in Germany.

2. Material and methods 2.1. Sample collection Random sampling of the Dutch population was performed through buccal sampling of cats living in close contact with (friends or relatives of) personnel of the Netherlands Forensic Institute (The Hague, the Netherlands) throughout the country (sampling locations provided as Supplementary material S1). Breeds and maternal relatedness were assigned by the owners. If maternal relatedness was indicated, only one individual from each maternal group was included. The previously described 129 randomly sampled cats [14] were supplemented giving a total of 146 cats. The majority of these samples were collected in the areas of the country most densely populated by humans. In total swabs were obtained from 125 random bred cats, 6 mixed breed cats, 3 British Shorthairs, 3 Maine Coons, 1 Persian, 1 Siamese/Oriental, 1 Bengal, 1 Birman, 3 Siberian Forest Cats and 2 Norwegian Forest Cats. Sampling of the Belgian cat population was performed as described for the Dutch population, by (friends or relatives of) personnel of the National Institute of Criminalistics and Criminology (Brussels, Belgium) (sampling locations as Supplementary material S1). One hair sample and 63 buccal swabs were collected from cats living in close contact with humans throughout Belgium. Of these samples, 60 originated from random bred cats, 2 from mixed breed cats, 1 from a British Shorthair and 1 from an Egyptian Mau.

Sampling of the south west German feline population consisted of two sampling strategy efforts. Random sampling of cats living in close contact with humans from the feline micropopulation around the city of Wiesbaden, in the south west of Germany was performed by (friends or relatives of) personnel of the Bundeskriminalamt, Forensic Science Institute (Wiesbaden, Germany). Buccal swabs from 79 cats from the Wiesbaden area were collected. Swabs were obtained from 61 random bred cats, 5 mixed breed cats, 2 British Shorthairs, 3 Maine Coons, 3 Persians, 1 Siamese/Oriental, 1 Birman and 3 Chartreux. Additionally samples were obtained from 49 not further specified cats, sent to the Institute of Veterinary Pathology, Justus-Liebig-Universität Giessen (Giessen, Germany) for post mortem examination.

A stray cat micropopulation in Putte, Belgium was sampled. Buccal swabs were collected from 19 cats caught in Putte Municipality, that had been brought to a veterinarian in a trapneuter- release program. The familial relationship of these individuals is unknown. Sequences published by others and used for comparison, originated from a random selection matching human population density in the United Kingdom (UK population, n = 120), supplemented with samples from a city in the south of the United Kingdom (UK micropopulation, n = 32, sum N = 152) [13], from random bred cats from seven different areas of the United States (total N = 493, four micropopulations, , New York and not specified) [11], from

59 Chapter 4 Local populations and inaccuracies: Determining

three Canadian micropopulations (total N = 96) [15] and from eight Polish micropopulations (total N = 181) [12].

2.2. Laboratory procedures DNA extraction of the majority (n = 129) of the Dutch samples has been described previously [14]. The additional Dutch and Belgian samples were processed following this protocol. DNA extraction of the German samples was performed as described elsewhere [16]. Primer pairs FCB-Z [8] & F16483-M13R [17] and M13F-JHmtF & JHmtR3 [10] were used for duplex amplification of the control region positions 16065–16483 and 16756–17009/0- 223 (numbers corresponding to Genbank U20753/NC_001700 [18]) as described in [17]. Evaluation of PCR success through gelelectrophoresis, purification of PCR products and sequencing was performed as described in [14] using FCB-Z, JHmtR3 and AMBeR M13 sequencing primers.

2.3. Sequence analysis All sequences were manually evaluated and forward and reverse sequences were assembled in Geneious 4.7.6 [19]. Following assembly, sequences were aligned and trimmed to nucleotide positions (NP) 16315–16483 and 16780–17009/0-206, resulting in sequences of approximately 605 bp (Supplementary material S1). A phylogeographic network of haplotypes was constructed with the median joining algorithm implement in Network 5.0 (Fluxus Technology Lt.) [20]. Arlequin 3.5.2.2 [21] was used to calculate population statistics including gene diversity, nucleotide diversity, AMOVA, fixation indices (Fst) and population pairwise differences (within population, between population and corrected values/Nei’s distance). Pairwise Fst p values were adjusted by the sequential Bonferroni correction [22]. Sequences of all haplotypes were aligned with the SRS, and previously described SRS haplotypes [10–13,15]. Following trimming of all sequences to SRS length (NP 16814–17009/0-206), the unique sequences were retained and population statistics recalculated using Arlequin as described for the 605 bp sequences. The corrected population pairwise differences (Nei’s distances) were visualized using the Neighbor Joining (NJ) method [23] implemented MEGA5 [24], after which the phylogram was unrooted to only present the degree of kinship (opposed to an evolutionary path), and edited in MrEnt 2.5 [25].

3. Results 3.1. Haplotype designation Sequences of on average 605 bp were obtained from all 338 randomly selected cats living in contact with humans from the Netherlands, Belgium and the South-West of Germany and the 19 captured stray cats from Putte Municipality, Belgium. Alignment of the sequences of these 357 individuals resulted in the finding of 53 different sequences of which 32 have previously been described [14]. For the following analyses, these 53 haplotypes were complemented with all previously described haplotypes encountered in fancy breed cats from the Netherlands (NL-

60 the relevant mitochondrial haplotype distributions for North West European cats

B2, NL-B3, NLC2, NL-C3, NL-C6 and NL-C7) that were not found in this study. All 59 haplotypes were given EU names (EU-haplogroup-number), based on the previously published names (NL-haplogroupnumber) and phylogenetic analysis (Network diagram depicted in Fig. 1). To facilitate easy translation between NL and EU haplotypes, and because NL-A11 and NL-A01 are indistinguishable when the EU sequence length is used [17], haplotype EU-A11 was not assigned. The polymorphic nucleotides defining the different haplotypes, numbered according to the first published feline mtDNA sequence U20753 are displayed in Table 1. To enable comparisons with other studies, translation to SRS nucleotide positions and described haplotypes [10–13,15] is available as Supplementary material S2.

Six distinct haplogroups were recognized (EU-A, EU-B, EU-C, EU-D, EU-E and EU-F) corresponding to the previously described Dutch haplogroups (NL-A, NL-B, NL-C, NL-D, NL-E and NL-F). The largest haplogroup, based on occurrence was EU-A. The largest group based on number of different haplotypes was EU-B. As displayed in Fig. 1, the EU haplogroups EU-A, EU-C, EU-D and EU-F consist of a single haplotype occurring frequently and other less frequent haplotypes differing from this most frequent haplotype by only one or two nucleotides. Three exceptions to this star like pattern are visible. The novel haplotype EU- A20 is an intermediate between haplogroups EU-A and EU-D, differing from EU-A01 by 4 nucleotides and by 6 nucleotides from EU-D01. Secondly haplogroup EU-E is made up of a single rare haplotype. Haplogroup B has a more complex structure than the other haplogroups; it consists of multiple frequently occurring haplotypes, organized in subgroups, differing from the EU-B01 haplotype by multiple nucleotides (maximum EU-B16, 6 nucleotides).

Fig.1. Network diagram of all mtDNA sequences without heteroplasmic positions (EU-A16 and EU- C02) described in Table 1. Circles are scaled to the numbers of observations in total of Dutch, Belgian and German randomly selected cats. Coloured circles indicate one or zero observations in these groups. Nodes and crossings indicate one nucleotide difference unless indicated otherwise.

61 Chapter 4 Local populations and inaccuracies: Determining

Table 1 Haplotype defining polymorphic nucleotide positions. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7 5 6 6 6 1 1 1 1 1 1 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 9 2. 3 9 3 3 5 6 7 8 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 1 0 4 9 9 3 1 1 1 6 7 8 8 1 1 2 2 2 3 3 4 5 6 6 7 7 8 8 8 8 8 8 9 9 9 9 9 0 0 0 0 0 2 2 5 5 6 5 6 7 8 8 8 8 9 5 6 3 6 1 6 0 0. 2 7 9 0 3 0 4 0 9 5 9 0 1 2 3 6 8 0 1 5 7 8 0 1 2 3 4 0 4 5 9 7 7 3 3 5 6 8 9 7 [18] 1 U20753 G C A G G T A Δ C A A C C C T C A C T C A C C A T C T A A G C T T C T T A G T C C A A G T C T G A T Δ T T T T T C G G EU-A01 C . . . C . . . . . G T . . . T . . . T G . . . A . C ...... C . . . C . . . T A C . . . . C . . . C A . T A A EU-A02 C . . . C . . A . . G T . . . T . . . T G . . . A . C ...... C . . . C . . . T A C . . . . C . . . C A . T A A EU-A03 C . . . C . . . . . G T . . . T . . . T . . . . A . C ...... C . . . C . . . T A C . . . . C . . . C A . T A A EU-A04 C . . . C . Δ . . . G T . . . T . . . T G . . . A . C ...... C . . . C . . . T A C . . . . C . . . C A . T A A EU-A05 C . . . C . . . . . G T . . C T . . . T G . . . A . C ...... C . . . C . . . T A C . . . . C . . . C A . T A A EU-A06 C . . . C . . . . . G T . . . T . . . . G . . . A . C ...... C . . . C . . . T A C . . . . C . . . C A . T A A EU-A07 C . . . C . . . . . G T . . . T . . . T G . . . A . C . . . . C . . C . . . C . . . T A C . . . . C . . . C A . T A A EU-A08 C . . . C . . . . . G T . . . T . . . T G . . . A . C ...... C . . . T A C . . . . C . . . C A . T A A EU-A09 C . T . C . . . . . G T T . . T . . . T G . . . A . C ...... C . . . C . . . T A C . . . . C . . . C A . T A A EU-A10 C . . . C . . . . ‡ ‡ ‡ ‡ ‡ . T . . . T G . . . A . C ...... C . . . C . . . T A C . . . . C . . . C A . T A A EU-A12 C . . . C . . . . . G T . . . T . . . T G . . . A . C ...... C . . . C . . . T A C . . . . C . . . C A . T . A EU-A13 C . . . C . . . . . G T . . . T . . . T G . . . A . C ...... C . . . C T . . T A C . . . . C . . . C A . T A A EU-A14 C . . . C . . . . . G T . . . T . . . T G . . . A . C ...... C . . . C . . . T A C . . A . C . . . C A . T A A EU-A15 C . . . C . . . . . G T . . . T . . . T G . . . A . C ...... C . . . C . . . T A C . . . . C . . . C A . . A A EU-A16 C . . . C . . . . . G T . . . T . . . T G . . . A . C ...... C . . . C . . . T A C . . . . C . . . C A . Y A A EU-A17 C . . . C . . . . . G T . . . T . . . T G . . . A . C ...... C . . . C . . . T A C . . . . C . . . C A C T A A EU-A18 C . . . C . . . . . G T . . . T . . . T G . . . A . C . . . T . . . C . . . C . . . T A C . . . . C . . . C A . T A A EU-A19 C . . . C . . . . . G T . . . T . . . T G . . G A . C ...... C . . . C . . . T A C . . . . C . . . C A . T A A EU-A20 C . . . C . . . . . G T . . . T . . . T . . . . A . C ...... C ...... T A C . . . . C . . . C A . T A A EU-D01 C . . . C . . . . . G T . . . T . . . T . . . . A . C . . A . . . . C C . . . . T . T A C . . . G C T A . C A . T A A EU-D02 C . . . C . Δ . . . G T . . . T . . . T . . . . A . C . . A . . . . C C . . . . T . T A C . . . G C T A Δ C A . T A A EU-D03 C . . . C . . . . . G T . . . T . . . T . . . . A . C . . A . . . . C C . . . . T . T A C . . . . C T A . C A . T A A EU-D04 C . . . C . . . . . G T . . . T . . . T . . . . A . C . . A . . . T C C . . . . T . T A C . . . G C T A . C A . T A A EU-B01 C . . . C . . . A . G T . . . . . A C . . . . . A . C G G A ...... G . C . . . T A . . . . . C . A . . A C T . A EU-B02 C . . . C . . . A . G T . . . . . A C . . T . . A . C G G A ...... G . C . . . T A . . . . . C . A . . A C T . A EU-B03 C . . . C . . . A . G T . . . . . A C . . . . . A . C G G A . . . T . . G . C . . . T A . . . . . C . A . . A C T . A EU-B04 C . . . C . . . A . G T . . . . . A C . . . . . A T C G G A ...... G . C . . . T A . . . . . C . A . . A C T . A EU-B05 C . . . C . . . A . G T . . . . . A C . . . . . A . C G G A ...... G . C . . . T A . . . A . C . A . . A C T A A EU-B06 C . . . C . . . A . G T . . . . . A C . . . . . A . C G G A ...... G . C T . . T A . . . . . C . A . . A C T . A EU-B07 C . . . C . . . A . G T . . . . . A C . . . . . A . C G G A ...... G . C . . . T A C . . . . C . A . . A C T . A EU-B08 C . . . C . . . A . G T . . . . . A C . . . . . A . C G G A ...... C . . . T A . . . . . C . A . . A C T . A EU-B09 C . . . C . . . A . G T . . . . . A C . . . . . A . C G G A ...... G . C . . . T A . . . . . C . A . . A C . . A EU-B10 C . . . C . . . A . G T . . . . . A C ...... C G G A . . C . . . G . C . . . T A . . . . . C . A . . A C . . A EU-B11 C T . . C . . . A . G T . . . . . A C ...... C G G A . . C . . . G . C . . . T A . . . . . C . A . . A C . . A EU-B12 C . . . C . . . A . G T . . . . . A C ...... C G G A . . C . . . . . C . . . T A . . . . . C . A . . A . . . A EU-B13 C . . A C . . . A . G T . . . . . A C . . T . . A . C G G A ...... G . C . . . T A . . . . . C . A . . A C T . A EU-B14 C . . . C . . . A . G T . . . . . A C . . . . . A . C G G A ...... G . C . . . T A . . C A . C . A . . A C T . A EU-B15 C . . . C . . . Δ . G T . . . . . A C . . . . . A . C G G A ...... G . C . . . T A . . . . . C . A . . A C T . A EU-B16 C . . . C . Δ . A . G T . . . . . A C ...... C G G A . . C . . . . . C . . . T A . . . . . C . A . . A . . . A EU-B17 C . . . C . . . A . G T . . . . . A C . . . . . A . C G G A ...... G . C . . . T A . . . A . C . A . . A C T . A EU-B18 C . . . C . . . A . G T . . . . . A C . . T . . A . C G G A . . . . C . G . C . . . T A . . . . . C . A . . A C T . A EU-B19 C . . . C . . . A . G T . . . . . A C . . . . . A . C G G A . . C . . . G . C . . . T A . . . . . C . A . . A C T . A EU-B20 C . . . C . . . A . G T . . . . . A C ...... C G G A . . C . . . G . C . . . T A ...... A . . A C . . A EU-B21 C . . . C . . . A . G T . . . . . A C ...... C G G A . . C T . . G . C . . . T A . . . . . C . A . . A C . . A EU-B22 C . . . C . . . A . G T . . . . . A C . . . . . A . C G . A ...... G . C . . . T A . . . . . C . A . . A C T . A EU-C01 C . . . C . . . . . G T ...... T . . T . A . C G G . . . C . . . G . . . . . T ...... A . . A C T . A EU-C02 C . . . C . . . . . G T ...... T . . T . A . C G G . . . C Y . . G . . . . . T ...... A . . A C T . A EU-C03 C . . . C . . . . . G T ...... T . . T . A . C G G . . . C . . C G . . . . . T ...... A . . A C T . A EU-C04 C . . . C . . . . . G T ...... T . . T . A . C G G . . . C . . . G . . . . . T C ...... A . . A C T . A EU-C05 C . . . C . . . . . G T ...... T . . T . A . C G G . . . C . . . G . C . . . T ...... A . . A C T . A EU-C06 C . . . C . . . . . G T ...... T . . T . A . C G G . . . C . . . G A . . . . T ...... A . . A C T . A EU-C07 C . . . C . . . . . G T ...... T . . T . A . C G G . . . C . . . G . . . . . T ...... A . . A . T . A EU-C08 C . . . C . . . . . G T . . . . T . . T . . T . A . C G G . . . C . . . G . . . . . T ...... A . . A C T . A EU-C09 C . . . C . Δ . . . G T ...... T . A . C G G . . . C . . . G . . . . . T ...... A . . A C T . A EU-C10 C . . . C . . . . . G T ...... T . . T . A . C G G . . . C . . . G . . . . . T ...... C . A . . A C T . A EU-C11 C . . . C . . . . . G T ...... T . . T . A . C G G . . . C . . . G . . . . G T ...... A . . A C T . A EU-E01 C . . . C . . . . . G T ...... T . . . . A . C G G . . . C . . C . . . . T . T A . T . A . C T A . . A . . . A EU-F01 C . . . C ...... A . . . A EU-F02 C . . . C C ...... A . . . A 1 1 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 SRS [10] ------1 4 4 5 4 5 6 7 7 7 7 8 0 5 5 6 6 2 3 5 6 7 7 7 1 2 6 4 4 0 0 2 3 5 6 4 3 5 9 0 2 7 1 6 6 0 8 Dots indicate positions identical to the reference sequence U20753 [18]. Base calls (IUPAC nomenclature) show the substitutions at specified positions (Y=C/T), Δ indicates a deletion, ‡ indicates a 14 bp deletion from NP 16427-16440 encountered in a single haplotype. Grey nucleotide positions are the same in all encountered haplotypes, but differ from the reference U20753. Nucleotide positions of SRS [10] are indicated for comparative purposes only.

62 the relevant mitochondrial haplotype distributions for North West European cats

Table 2 Haplotype distribution within randomly selected cats from the Netherlands, Belgium and South-West Germany and a local stray cat population from Belgium. Netherlands Belgium S-W Germany (Σ) Local - Belgium [14] # Frequency # Frequency # Frequency # Frequency EU-A01 NL-A1 & NL-A11 71 0.4863 27 0.4219 50 0.3906 4 0.2105 EU-A02 NL-A2 1 0.0068 1 0.0156 EU-A03 NL-A3 4 0.0274 1 0.0156 EU-A04 NL-A4 3 0.0205 1 0.0078 EU-A05 NL-A5 1 0.0068 EU-A06 NL-A6 1 0.0068 1 0.0156 1 0.0078 EU-A07 NL-A7 1 0.0068 EU-A08 NL-A8 1 0.0068 EU-A09 NL-A9 1 0.0068 EU-A10 NL-A10 1 0.0068 EU-A12 NL-A12 5 0.0342 1 0.0156 1 0.0078 EU-A13 NL-A13 1 0.0068 3 0.04689 1 0.0078 EU-A14 NL-A14 1 0.0068 2 0.0156 EU-A15 NL-A15 1 0.0068 1 0.0078 EU-A16 NL-A16 1 0.0068 EU-A17 1 0.0068 1 0.0078 1 0.0526 EU-A18 1 0.0078 EU-A19 1 0.0078 EU-A20 1 0.0068 1 0.0156 EU-D01 NL-D1 4 0.0274 3 0.0469 7 0.0547 1 0.0526 EU-D02 NL-D2 1 0.0068 EU-D03 1 0.0068 EU-D04 1 0.0156 EU-B01 NL-B1 12 0.0822 5 0.0781 15 0.1172 3 0.1579 EU-B02 NL-B2 EU-B03 NL-B3 1 0.0068 2 0.0313 EU-B04 NL-B4 1 0.0068 1 0.0078 EU-B05 NL-B5 2 0.0137 EU-B06 NL-B6 1 0.0068 1 0.0156 2 0.0156 EU-B07 NL-B7 EU-B08 NL-B8 1 0.0068 EU-B09 NL-B9 4 0.0313 EU-B10 NL-B10 4 0.0274 2 0.0156 EU-B11 NL-B11 1 0.0068 EU-B12 NL-B12 11 0.0753 1 0.0156 5 0.0391 EU-B13 1 0.0068 EU-B14 1 0.0156 EU-B15 1 0.0156 EU-B16 1 0.0156 2 0.0156 EU-B17 1 0.0078 EU-B18 1 0.0078 EU-B19 1 0.0078 EU-B20 4 0.0313 EU-B21 5 0.0391 EU-B22 10 0.5263 EU-C01 NL-C1 6 0.0411 9 0.1406 11 0.0859 EU-C02 NL-C2 EU-C03 NL-C3 EU-C04 NL-C4 1 0.0068 EU-C05 NL-C5 1 0.0068 EU-C06 NL-C6 EU-C07 NL-C7 EU-C08 1 0.0156 EU-C09 1 0.0078 EU-C10 1 0.0078 EU-C11 1 0.0078 EU-E01 NL-E1 1 0.0156 1 0.0078 EU-F01 NL-F1 1 0.0068 1 0.0156 3 0.0234 EU-F02 1 0.0156 N 146 64 128 19 # haplotypes 34 21 29 5 # singletons (per population) 24 15 16 2 # singletons (combined populations) 13 5 8 0 % singleton samples (combined pops.) 8.90% 7.81% 6.25% 0% Unbiased gene diversity ± SD 0.7494 ± 0.0374 0.7996 ± 0.0462 0.8222 ± 0.0299 0.6842 ± 0.0917 Random match probabiliy 25.58 21.29 18.42 35.18 Discrimination capacity 74.42 78.71 81.58 64.82 # polymorphic sites 56 * 40 41 21 Maximum # of pairwise differences 34 * 22 22 18 Mean # pairwise differrences±SD 8.5418 ± 3.9714 10.1032 ± 4.6771 10.7799 ± 4.9377 7.4620 ± 3.6481 Nucleotide diversity ± SD 0.0141 ± 0.0072 0.0166 ± 0.0085 0.0178 ± 0.0090 0.0123 ± 0.0067 Haplotypes in italics: Encountered in previous study of fancy bred cat(s) from the Netherlands [14]. * 11 sites are due to a deletion at NP 16427-16440 in EU-A10. Σ indicates sum of populations 63 Chapter 4 Local populations and inaccuracies: Determining

3.2. European feline haplotype distribution The haplotype distributions in the randomly selected cats living in close contact with humans from three different countries, and the captured stray cats are described in Table 2. The haplotype distribution of the smallest group - the 19 captured stray cats from a single Belgian micropopulation - differs strikingly from the distributions of the randomly selected cats. Haplotype EU-B22 is encountered in 53% of the local stray cat population, whereas this haplotype is not encountered in any of the other populations. The major haplotype in the randomly selected cats, haplotype EU-A01, of which 39% of South-West German randomly selected cats, 40% of Belgian randomly selected cats and 49% of Dutch randomly selected cats consist, was only encountered in 21% of the local stray Belgian population.

The percentage of singleton samples differs considerably between the populations, ranging from 9% in the Dutch population, 8% in the Belgian, 6% in the German and 0% in the local stray cat population. As the haplotype statistics of the local Belgian population differed notably from the randomly selected cats, this small subpopulation was not included in further comparisons. To more accurately compare the haplotype distributions of the randomly selected cats from the three North-West European countries, the unbiased gene diversity, nucleotide diversity and the maximum and average number of pairwise differences within populations were determined (Table 2). For all these parameters, intermediate values were obtained for the Belgian cats, whereas the Dutch cats were least diverse and German cats were most diverse. Through AMOVA and pairwise Fst calculation, significant genetic differentiation from the random expectation at the 0.05 level (with and without Bonferrroni correction) was detected between the Dutch and the German randomly selected cats (Table 3) with 3.45% of the variation found between populations. No statistical significant genetic variation was detected between the Dutch and the Belgian cats, or between the Belgian and the German cats. These differences are also expressed in the corrected average pairwise difference (Nei’s distance) between populations (Table 3) which is highest between the Dutch and the South West German populations (0.3414).

Table 3 Population pairwise Fst values based on pairwise difference (above diagonal) and Nei's distance (corrected average pairwise difference; below diagonal) based on 605 bp region. Bold indicates Fst values significant at 0.05 level after Bonferroni correction. Σ indicates sum of (micro) populations. N-W Continental EU (Σ) the Netherlands Belgium S-W Germany (Σ) N-W Continental Europe 0.0049 -0.0022 0.0067 the Netherlands 0.0489 0.0098 0.0345 Belgium -0.0233 0.0823 0.0101 S-W Germany 0.0658 0.3414 0.1103

3.3. European v.s. American feline haplotype distribution To evaluate the North-West European haplotype distribution with respect to the distributions of cats sampled on the American continent and in other parts of Europe, the 605 bp sequences of the Dutch, Belgian and German randomly selected cats were trimmed to the 402 bp SRS

64 the relevant mitochondrial haplotype distributions for North West European cats region and compared to published haplotypes. 86% of our 338 continental European samples corresponded to ten of the twelve ‘worldwide major haplotypes’ A, B, C, D, E, F, G, H, J, and K described by [11], 5% to the ‘minor haplotypes’ A6, B4 and I [11], 1% corresponded to CA17587 [15], 5% were novel minor haplotypes and 2% were novel singleton haplotypes (Supplementary material S2). None of the other two major, 30 minor or 106 unique haplotypes described by [11] and [15] were encountered. Population pairwise Fst values and average numbers of pairwise differences were calculated for (micro)populations sampled in North- West Continental Europe, Eastern Continental Europe, the United Kingdom, United States of America and Canada (Table 4). The Dutch population differed significantly from the UK population, USA (micro) populations and Canadian micropopulations. After Bonferonni correction, no significant difference was found between the Dutch population and the other mainland European populations, including the combined Polish micropopulations. The differences between the Belgian and German populations on one hand and the UK, USA and Canadian populations on the other hand were less pronounced based on both Fst values and Nei’s distances. Based on these parameters, the population from the United Kingdom was more similar to populations sampled on the American continent than those sampled on the European mainland (Supplementary material S3). The combined North-West Continental European population differed least from the Hawaiian cat population (Fst 0.0443, Nei’s distance 0.1995) and the cat population from New York (Fst 0.0612, Nei’s distance 0.2879). As depicted in Fig. 2, the Dutch, Belgian, German, Polish, New York and Hawaiian populations have a high frequency of haplotype A, and a relatively low frequency of haplotype C in common.

Table 4 Population pairwise Fst values (bold indicates significant at 0.05 level after Bonferroni correction,* significant prior to Bonferroni correction) and population average pairwise differences based on 402 bp SRS region. Σ indicates sum of (micro) populations. N-W Continental EU (Σ) the Netherlands Belgium S-W Germany (Σ) Poland (Σ) United Kingdom (Σ) n FST Nei's dist. FST Nei's dist. FST Nei's dist. FST Nei's dist. FST Nei's dist. FST Nei's dist. the Netherlands 146 0.0082 0.0352 Belgium 64 -0.0021 -0.0129 0.0201 0.0740 S-W Germany Σ=128 0.0070 0.0303 0.0425* 0.1808 0.0037 0.0174 Polanda Σ=181 0.0384 0.1669 0.0146* 0.0554 0.0410* 0.1727 0.0839 0.3894 United Kingdomb Σ=152 0.1457 0.7637 0.2242 1.2122 0.1138 0.6280 0.0795 0.4176 0.2691 1.60006 USAc Σ=493 0.0964 0.5012 0.1574 0.8673 0.0769 0.4133 0.0436 0.2253 0.2084 1.23264 0.0056 0.0285 CaliforniaNc 712 0.1035 0.5194 0.1754 0.8970 0.0786 0.4264 0.0453* 0.2329 0.2249 1.26935 0.0011 0.0055 CaliforniaSc 99 0.1334 0.6714 0.2182 1.0910 0.1077 0.5659 0.0682 0.3433 0.2666 1.50079 -0.0034 -0.0156 Floridac 50 0.1497 0.7620 0.2447 1.2066 0.1202 0.6489 0.0796* 0.4091 0.2856 1.60861 -0.0030 -0.0123 Hawaiic 59 0.0443* 0.1996 0.1031 0.4235 0.0389* 0.1878 0.0097 0.0478 0.1585 0.75189 0.0471* 0.2373 Missouric 24 0.1238* 0.5923 0.2177 1.0166 0.0851* 0.4670 0.0550* 0.2687 0.2510 1.34468 -0.0078 -0.0568 New Yorkc 100 0.0610 0.2874 0.1249 0.5791 0.0423* 0.2182 0.0176* 0.0870 0.1727 0.8889 0.0214* 0.1075 Te xa s c 28 0.1846 0.9764 0.2768 1.4739 0.1346 0.8246 0.1066* 0.5824 0.3126 1.89153 0.0097 0.0252 Canadad Σ=96 0.1381 0.7028 0.2213 1.1236 0.1062 0.5677 0.0756 0.3880 0.2666 1.51565 -0.0005 -0.0025 Ottawad 28 0.2073 1.1252 0.3180 1.6377 0.1765 0.9884 0.1310 0.7066 0.3529 2.11692 0.0088 0.0582 Winnipegd 53 0.0916 0.4397 0.1680 0.7876 0.0570* 0.3062 0.0409* 0.2073 0.2093 1.10443 0.0052 0.0240 Vancouverd 15 0.1841* 0.9766 0.2986 1.4838 0.1470* 0.8391 0.1038* 0.5645 0.3368 1.97914 -0.0211 -0.0763 a Sequences and frequencies from [12]. b Sequences and frequencies from [13]. c Sequences and frequencies from [11]. d Sequences and frequencies from [15].

65 Chapter 4 Local populations and inaccuracies: Determining

Fig 2. Unrooted phylogram inferred using the NJ method based on Nei’s distances (402 bp SRS region this study and [11–13,15], Table 4 and Supplementary material S3). Pie charts indicate haplotype distribution (A, A subtypes, B, B subtypes, C, C subtypes, D, E, F, G, H, I, J, K, L, OL1, OL2, CA17587, other types, unique types, Supplementary material S2), arrowheads indicate sampling location, colour corresponds to major haplotype (A, B, B6 or C).

4. Discussion 4.1. Variation within North West Continental Europe Based on the 605 bp control region sequences, the randomly sampled cats from the Netherlands, Belgium and South West Germany have several characteristics in common. The most abundant haplotype in all three populations is EU-haplotype A01, with frequencies of 49, 42 and 39% respectively. The abundance of haplotype EU-B01 was also comparable in the three populations (8, 8 and 11% respectively). However other haplotypes frequent in one population were rarer in others. The most striking examples include EU-B12 and EU-C01. EU-B12 was encountered in 8% of the Dutch cats (both random bred cats and mixed breed cats from the Western part of the country, data not shown) whilst this haplotype was only encountered in a single individual from Belgium and only in 4% of German cats. EU-C01 was encountered in 14% of the Belgian cats, in 9% of the German cats and only in 4% of the Dutch cats. Apart from haplotype EU-D01 that was encountered in 5% of the Belgian and German populations, all other haplotypes were seen in less than 5% of the different populations. When these abovementioned ‘frequent’ and the minor/rare haplotypes are considered as two groups, approximately 70% of the three populations (both separately and as

66 the relevant mitochondrial haplotype distributions for North West European cats a group) is described by the five main haplotypes, whereas the remaining 30% is described by haplotypes occurring in less than 5% of the population. 8% of the combined North West Continental European samples has a haplotype only encountered once this dataset. As seen in Fig. 1, all singleton sequences except EU-A10 differ in only one or two nucleotides from their nearest, more frequently occurring neighbour. The star like pattern, indicating recent evolution of these haplotypes is clearly visible for the EU-A and EU-C haplogroups. Interestingly, most EU-A singletons were encountered in cats from the Netherlands, the country with the highest frequency of haplogroup EU-A (66% v.s. 53% and 47%) possibly indicating a population structuring effect in the Dutch population that was not experienced, to the same extent, by populations in neighbouring countries. The more complex structure of haplogroup EU-B sets this haplogroup apart from the other groups.

Based on the calculated population pairwise distances, the genetic variation between cats from the Netherlands, Belgium and Germany is small but pronounced (Fst 0.01–0.03, Table 3). The variation between the north west mainland European cats as a whole and the local stray cat population from Putte on the other hand, is moderate to large (Fst 0.14). Such local populations, potentially reflecting strong parent offspring relationships, small territories and only minimal natural dispersion may influence haplotype frequency estimations. Even though sample sizes of 50–150 cats from a region have been demonstrated to representatively describe populations [26], if all samples are taken from a small geographical area or certain feline social structure (i.e. stray cats, pure bred cats, etc.) and extrapolated to a larger country or continent, haplotypes may be over or underrepresented. This implies that the sampling strategy used to estimate haplotype frequencies in a certain area, country or continent should be well documented and care should be taken when a large number of samples are derived from a small geographic area such as a single city, veterinary clinic or animal shelter which has previously been recommended for other species [27–29]. In certain forensic cases, or to answer specific questions, additional typing of case specific local populations may be necessary.

4.2. Variation between European and American populations The portion of the feline control region, studied in samples from across the world, is smaller than the portion described in this study. To enable comparisons, sequences were trimmed from 605 to 402 bp. This reduction decreases the exclusion capacity in the Dutch, Belgian and German populations by 14%, 7% and 5% respectively (Supplementary material S2), demonstrating the added value of sequencing 602 bp compared to 402 bp. On a sequence level, comparison of the trimmed European data from this study to previously published haplotypes demonstrated that 86% of the EU data corresponded to ten of the twelve ‘worldwide major haplotypes’ whilst the two other ‘worldwide major haplotypes’ A6a and L [11] were not encountered. Haplotype I [11] found in only 1% USA samples, was encountered in 4% of our North-West European samples and is even the second largest haplotype in Eastern European cats [12] with a frequency of 10%. The haplotype described as a new

67 Chapter 4 Local populations and inaccuracies: Determining

haplotype in several studies (CA17587 [12], B-UK1 [13], PL13 [15]) was already encountered once in our Dutch cat study [14] and was additionally encountered in two cats from Germany and one cat from Belgium in this study, placing its frequency in the same order of magnitude as ‘worldwide major haplotypes’ H, J and K and multiple novel European haplotypes. Ten sequences were only detected once in the European samples, of which eight are novel singletons and two correspond to haplotypes B4 and G (Supplementary material S2). In the worldwide study by [11], 44 European samples were included, to which we separately compared our sequences in more detail. The last nucleotide position found to be variable in our studies (N = 357 this study, N = 136 fancy breed cats [14]) and 429 other cats (N = 181 [12], N = 152 [13], N = 96 [15]) was NP 173, whilst 66% of the German and Italian cats from [11] contained variable nucleotide positions after NP 173 (types A3, B5, C5 and 23 singletons). Although different explanations may be offered for this striking difference, the possibility of sample type derived differences, or PCR or sequencing derived errors should be considered as possible causes especially in this 3’ area of the amplicon. As recommended for human and canine mtDNA population studies [29,30], phylogenetic analysis of sequences can help identify such errors. We did not succeed in describing a phylogenetically sound relationship between the deviant German and Italian sequences [11] and the other sequences [this study, 12–15] due to the large number (2–6) of variable nucleotide positions in the last 33 nucleotides of which most are encountered only once, compared to 1–19 variable positions in the first 369 nucleotides. We therefore omitted these previously published German and Italian sequences from our comparisons. The presence of erroneous sequences in mtDNA population studies has been described for other species and has proven a delicate matter when such population data is applied in forensic casework [29–31].

Comparison of European mainland data with data from America and the United Kingdom on a population statistical level, revealed interesting similarities and differences especially when comparing geographical and genetic distances. The Continental European populations shared several features, most notably the presence of a single dominant haplotype (haplotype A, ranging from 42% in the random sampling from South West Germany to 63% in the combined micropopulations from Poland). Variations between the haplotype distributions of Polish cats versus the North-West Continental European cats, may represent true differences but may also be influenced by differences in sampling strategies. The population geographically closest to the Continental European populations, the UK population, was sampled in the same way as the North-West Continental population. Nevertheless the UK population differed significantly from this European population [Table 4, Fig. 2]. The Continental European populations differed least from the Hawaiian and New York cat (micro)populations, whilst the UK population on the other hand was not distinguishable from the cat populations from the United States and Canada. The largest differences found between the UK and non-European cats, were between the UK cat populations and the Hawaiian and New York cat (micro)populations. Differences between the Hawaiian and New York cat populations have also been described when comparing Canadian populations to United States cat populations

68 the relevant mitochondrial haplotype distributions for North West European cats

[15]. Although geographically speaking, the distance between continental Europe and the United Kingdom is obviously much smaller than between the United Kingdom and the American continent, their cat populations indicate otherwise. Genetic studies based on phenotypic characteristics have described similar findings, when comparing cat populations from (port) cities in different parts of the world. The finding that gene frequencies of cats from the Dutch port cities Amsterdam and Rotterdam were similar to those of cats from New York (once Nieuw Amsterdam) and cities in South has been correlated with human (and cat) emigration patterns in those regions. The same was described for the cat populations studied in (port) cities in the United Kingdom, , Spain and Portugal on one hand and the cities in the United States, Canada and Southern American cities on the other hand [32–37].

The domestication of cats is often described as both recent and incomplete when compared to other domestic or livestock animals. Although intentional breeding of cats has been described as early as the Egyptian dynasties, intentional breeding of coat colour variations or breeds is relatively young, occurring mainly within the last 50–100 years. The influence of humans on cat population structure in the timeframe between these events may be underestimated, perhaps because of cats’ independent attitude towards humans. The castration of cats, intentional persecution of cats, reduction of cat numbers, selection for or against specific phenotypes, human (and vessel) facilitated migration of cats have been described as early as the 17th century (e.g. [38]). When these human induced effects are taken into account, isolation and founder effects may be an issue especially in the New World and on islands where humans in the past and present strongly influence the cat population. Based on the typing of nearly 1400 cats from around the world, the necessity of establishing “local” cat population databases has been questioned when using cat haplotypes as exclusionary tools in forensics [26]. However cat micropopulations do exist as illustrated by the Putte samples in this study and the differences between eight different Polish micropopulations [12]. We therefore argue that although nowadays isolation of cat populations may not be perceived as so severe, depending on the forensic question to be answered, local populations may be more or less relevant. Sampling of local micropopulations, or targeted sampling of larger local populations should be considered in order to enable the use of cat mtDNA not only as an exclusionary tool, but also for the evaluation of the likelihood of randomly obtaining a certain haplotype from the relevant population. The definition of both relevant and local should obviously depend on the forensic question to be answered, but should take into account the relevant historical human movement and actions, founder effects, selective breeding and strong parent offspring (matriarchal) relationships.

69 Chapter 4 Local populations and inaccuracies: Determining

5. Conclusion The haplotype data obtained from randomly sampled cats from the Netherlands, Belgium and South West Germany indicate small differences in haplotype distribution between these cats from North West Continental Europe, but profound differences with European cats described in several other studies. The largest genetic differences between these randomly sampled European populations were found between the Dutch and the German populations. Based on the calculated population statistics, the Belgian population can be described as an intermediate population which corresponds to the geographical distances between these populations. Although geographical distance may correlate with genetic distance, the presence of strong local differences is demonstrated through the typing of a local stray cat micropopulation derived from a single municipality in Belgium. The haplotype present in more than half of this population, would be considered a minor or singleton haplotype when sampling on a nationwide or North West European scale, illustrating the effect of different sampling strategies. The differences between geographical and genetic distance in domestic cat populations is also illustrated by comparing the populations from different parts of Europe and the American continent. These illustrate that knowledge of feline population structure and dispersion, and perhaps more importantly the past and present human impact on these features, are relevant in deciding whether extrapolation of haplotype frequencies from other sources or locations is justified.

To maximize the forensic potential of feline mitochondrial population studies, the following steps have been applied in this study, and are advised for future studies. When sequences are obtained that do not correspond to previously described haplotypes, or when new variable nucleotides positions are encountered, measures such as phylogenetic analysis are advised to help eliminate potentially erroneous haplotypes. Furthermore documentation of the sampling strategy used to estimate haplotype frequencies in a certain area is considered critical. Additionally care should be taken when extrapolating frequency data from a small geographical area as a single city, veterinary clinic, animal shelter or certain feline social structure (i.e. stray cats, pure bred cats, etc.) to a larger country or continent as sampling of such local areas has illustrated the presence of micropoulations and may not be representative for a larger area. In some forensic cases, or to answer specific questions, extrapolation of haplotype frequencies from other sources or populations may be justified, or can be proven valid by testing of a relatively small sample. In other cases, or when available statistics are invalid, additional typing of relevant local (micro)populations may be necessary to provide an accurate estimation of these frequencies.

70 the relevant mitochondrial haplotype distributions for North West European cats

Acknowledgements We appreciate the collection of samples by private cat owners through the Netherlands Forensic Institute (the Netherlands), the National Institute of Criminalistics and Criminology (Belgium) and the Bundeskriminalamt Forensic Science Institute (Germany). We appreciate sample contribution by veterinarians Marie Borre (Belgium) and Werner Hecht (Germany). Members of the KT46 section of the Bundeskrimanalamt are acknowledged for graciously providing DNA extracts. Ate D. Kloosterman, Irene O’Sullivan and two anonymous reviewers are recognized for their valuable comments on the manuscript.

References [1] FEDIAF facts & Figs. 2014 [Internet], The European Pet Food Industry Federation; Brussels, Belgium. (2014) [cited 2016 September 07] Available from: http://www.fediaf.org/facts-figures/. [2] F. D’Andrea, F. Fridez, R. Coquoz, Preliminary experiments on the transfer of animal hair during simulated criminal behaviour, J. Forensic Sci. 43 (6) (1998) 1257–1258, doi:http://dx.doi.org/10.1520/JFS14399J. [3] M.A. Menotti-Raymond, V.A. David, S.J. O’Brien, Pet cat hair implicates murder suspect, Nature 386 (1997) 774, doi:http://dx.doi.org/10.1038/386774a0. [4] J.L. Halverson, C. Basten, Forensic DNA identification of animal-derived trace evidence: tools for linking victims and suspects, Croat. Med. J. 46 (2005) 598–605. [5] L.A. Lyons, R.A. Grahn, T.J. Kun, L.R. Netzel, E.E. Wictum, J.L. Halverson, Acceptance of domestic cat mitochondrial DNA in a criminal proceeding, Forensic Sci, Int. Genet. 13 (2014) 61–67, doi:http://dx.doi.org/10.1016/j.fsigen.2014.07.007. [6] B. Budowle, M.W. Allard, M.R. Wilson, R. Chakraborty, Forensics and mitochondrial DNA: applications, debates and foundations, Annu. Rev. Genom. Hum. Genet. 4 (2003) 119–141. [7] R.E. Giles, H. Blanc, H.M. Cann, D.C. Wallace, Maternal inheritance of human mitochondrial DNA, Proc. Natl. Acad. Sci. U. S. A. 77 (1980) 6715–6719. [8] T. Tamada, N. Kurose, R. Masuda, Genetic diversity in domestic cats Felis catus of the Tsushima Islands, based on mitochondrial DNA cytochrome b and control region nucleotide sequences, Zool. Sci. 22 (6) (2005) 627–633, doi:http://dx.doi.org/10.2108/zsj.22.627. [9] W. Branicki, A. Olszanska, M. Konopinski, Sequence variation in the control region of mitochondrial DNA within a population sample of domestic cats Felis catus Linnaeus–implications for domestic and wild cats differentiation, Prob. Forensic Sci. 67 (2006) 279–288. [10] C.R. Tarditi, R.A. Grahn, J.J. Evans, J.D. Kurushima, L.A. Lyons, Mitochondrial DNA sequencing of cat hair: an informative forensic tool, J. Forensic Sci. 56 (s1) (2011) S36–S46, doi:http://dx.doi.org/10.1111/j. 1556-4029.2010.01592.x. [11] R.A. Grahn, J.D. Kurushima, N.C. Billings, J.C. Grahn, J.L. Halverson, E. Hammer, et al., Feline non-repetitive mitochondrial DNA control region database for forensic evidence, Forensic Sci, Int. Genet. 5 (1) (2011) 33–42, doi:http://dx.doi.org/10.1016/j.fsigen.2010.01.013. [12] I. Głażewska, T. Kijewski, A new view on the European feline population from mtDNA analysis in Polish domestic cats, Forensic Sci. Int. Genet. 27 (2017) 116–122, doi:http://dx.doi.org/10.1016/j.fsigen.2016.12.010.

71 Chapter 4 Local populations and inaccuracies: Determining

[13] B. Ottolini, G.M. Lall, F. Sacchini, J.H. Wetton, M.A. M.A.Jobling, Application of a mitochondrial DNA control region frequency database for UK domestic cats, Forensic Sci, Int. Genet. 17 (2017) 149–155, doi:http://dx.doi.org/10.1016/j.fsigen.2016.12.008. [14] M. Wesselink, L. Bergwerff, D. Hoogmoed, A.D. Kloosterman, I. Kuiper, Forensic utility of the feline mitochondrial control region - A Dutch perspective, Forensic Sci. Int. Genet. 17 (2015) 25–32, doi:http://dx.doi.org/10.1016/j.fsigen.2015.03.004. [15] M. Arcieri, G. Agostinelli, Z. Gray, A. Spadaro, L.A. Lyons, K.M. Webb, Establishing a database of Canadian feline mitotypes for forensic use, Forensic Sci. Int. Genet. 22 (2016) 169–174, doi:http://dx.doi.org/10.1016/j.fsigen.2016.02.013. [16] N. Schury, U. Schleenbecker, A.P. Hellmann, Forensic animal DNA typing: Allele nomenclature and standardization of 14 feline STR markers, Forensic Sci. Int. Genet. 12 (2014) 42–59, doi:http://dx.doi.org/10.1016/j.fsigen.2014.05.002. [17] M. Wesselink, L. Bergwerff, I. Kuiper, Forensic analysis of mitochondrial control region DNA from single cat hairs, Forensic Sci. Int. Genet. Suppl. Ser. 5 (2015) e564–e565, doi:http://dx.doi.org/10.1016/j.fsigss.2015.09.223. [18] J.V. Lopez, S. Cevario, S.J. O'Brien, Complete nucleotide sequences of the domestic cat (Felis catus) mitochondrial genome and a transposed mtDNA tandem repeat (Numt) in the nuclear genome, Genomics 33 (2) (1996) 229–246, doi:http://dx.doi.org/10.1006/geno.1996.0188. [19] A.J. Drummond, B. Ashton, M. Cheung, J. Heled, M. Kearse, R. Moir, et al., Geneious v4.7, Biomatters Ltd, Auckland, New Zealand, 2009. [20] P. Bandelt, A. Forster, Median-joining networks for inferring intraspecific phylogenies, Mol. Biol. Evol. 16 (1999) 37–48. [21] H.E.L. Excoffier, Arlequin suite ver 3. 5: a new series of programs to perform population genetics analyses under Linux and Windows, Mol. Ecol. Resour. 10 (2010) 564–567, doi:http://dx.doi.org/10.1111/j. 1755-0998.2010.02847.x. [22] W.R. Rice, Analyzing tables of statistical tests, Evolution 43 (1989) 223–225. [23] N. Saitou, M. Nei, The neighbor-joining method: a new method for reconstructing phylogenetic trees, Mol. Biol. Evol. 4 (4) (1987) 406–425, doi:http://dx.doi.org/10.1093/oxfordjournals.molbev.a040454. [24] K. Tamura, D. Peterson, N. Peterson, G. Stecher, M. Nei, S. Kumar, MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods, Mol. Biol. Evol. 28 (2011) 2731–22273, doi:http://dx.doi.org/10.1093/molbev/mst197. [25] D. Zuccon, MrEnt v.2.5. Program Distributed by the Authors, (2013) . http://www.mrent.org. [26] R.A. Grahn, H. Alhaddad, P.C. Alves, E. Randi, N.E. Waly, L.A. Lyons, Feline mitochondrial DNA sampling for forensic analysis: When enough is enough!, Forensic Sci, Int. Genet. 16 (2015) 52–57, doi:http://dx.doi.org/10.1016/j.fsigen.2014.11.017. [27] W. Parson, H.J. Bandelt, Extended guidelines for mtDNA typing of population data in forensic science, Forensic Sci, Int. Genet. 1 (2007) 13–19, doi:http://dx.doi.org/10.1016/j.fsigen.2006.11.003. [28] W. Parson, L. Roewer, Publication of population data of linearly inherited DNA markers in the International Journal of Legal Medicine, Int. J. Legal Med. 124 (2010) 505–509, doi:http://dx.doi.org/10.1007/s00414-010-0492-y. [29] S. Verscheure, T. Backeljau, S. Desmyter, Reviewing population studies for forensic purposes: Dog mitochondrial DNA, in: Z.T. Nagy, T. Backeljau, M. De Meyer, K. Jordaens (Eds.), DNA Barcoding:

72 the relevant mitochondrial haplotype distributions for North West European cats

a Practical Tool for Fundamental and Applied Biodiversity Research. ZooKeys, 3652013, pp. 381– 411, doi:http://dx.doi.org/10.3897/zookeys.365.5859. [30] H.J. Bandelt, P. Lahermo, M. Richards, V. Macaulay, Detecting errors in mtDNA data by phylogenetic analysis, Int. J. Legal Med. 115 (2001) 64–69, doi:http://dx.doi.org/10.1007/s004140100228. [31] J.-J. Song, W.-Z. Wang, N.O. Otecko, M.-S. Peng, Y.-P. Zhang, Reconciling the conflicts between mitochondrial DNA haplogroup trees of Canis lupus, Forensic Sci. Int. Genet. 23 (2016) 83–85, doi:http://dx.doi.org/10.1016/j.fsigen.2016.03.008. [32] N.B. Todd, Gene frequencies in the cat population of New York City, J. Hered. 57 (1966) 185–187. [33] A.T. Lloyd, Mutant allele frequencies in the domestic cat populations of the Netherlands, Genetica 58 (1982) 223–228. [34] R.J. van Aarde, B.J. Erasmus, B. Blumenberg, Frequencies of mutant alleles in the cat populations of Cape Town and Pretoria, , South Afr. J. Sci. 77 (1981) 168–171. [35] M. Ruiz Garcia, Mutant allele frequencies in domestic cat populations in Catalonia, Spain, and their genetic relationships with Spanish and English colonial cat populations, Genetica 82 (1990) 209–214. [36] A.E. Vinogradov, Fine structure of gene frequency landscapes in domestic cat: The Old and New Worlds compared, Hereditas 126 (1) (1997) 95–102. [37] G.G. Goncharenko, S.A. Zyat’kov, The level of genetic differentiation in cats (Felis catus L.) in Western European, North American, and Eastern European populations, Russ. J. Genet.: Appl. Res. 2 (1) (2012) 47–52, doi:http://dx.doi.org/10.1134/S207905971201008X. [38] C. Jansz, De verstandige huys-houder, voor-schryvende de alderwijste wetten om profijtelijck, gemackelijck en vermakelijck te leven, so inde stadt als op ’t landt, Amsterdam, (1660).

73

Chapter 5

DNA typing of birch: Development of a forensic STR system for Betula pendula and Betula pubescens

M. Wesselink‡, A. Dragutinović‡, J.W. Noordhoek, L. Bergwerff and I. Kuiper

Submitted to Forensic Science International: Genetics1

Abstract Although botanical trace evidence is often encountered in case investigations, the utilization of such traces in forensic investigations is still limited. Development of a forensic STR system for the two species of Betula (birch) indigenous to and abundant in North West Europe is a step in enhancing the applicability of traces from these species. We describe six microsatellite markers developed for birch species in detail, including repeat structure, and we propose a nomenclature for the encountered alleles. To assess the population characteristics, the genetic composition of wild, planted and intermediate populations of Betula pendula (a diploid species) and Betula pubescens (a tetraploid species) were investigated. The genetic differences between these two species were larger than the differences between populations of one species, even when both species co-occurred at one location. Therefore allele frequencies were estimated for both species separately. General, conservative random match probabilities were estimated for wild trees based on these allele frequencies (5∙10-6 for the diploid B. pendula and 1∙10-13 for the tetraploid B. pubescens), illustrating the potential relevance if trace evidence secured from a suspect is found to match a birch tree growing on or near a crime scene. Apart from wild trees, planted Betula trees also occur that may not originate from seeds, but may have been propagated through cloning. Based on the studied Betula trees, the random match probability of a potentially planted profile might be as high as 1.4∙10-2.

‡ Both authors contributed equally to this publication. 1

75 Chapter 5 DNA typing of birch:

1. Introduction Botanical traces are encountered in many different types of investigations, ranging from burglary to sexual assault and murder. Although many botanical traces originate from outdoor environments, they may be transferred to a variety of tools, vehicles or clothing. Their forensic utility is comparable to other types of trace evidence like fibres and hairs, being silent witnesses of crimes. Although botanical traces are often encountered during crime scene investigations, their application still seems limited. The biological diversity of such traces, the variety of forensic applications in which botanical traces can be used and investigators’ unfamiliarity with their potential has hampered the use of plant remains in forensic investigations. However, the use of present-day DNA techniques is helping to fulfil the potential of such traces.

Generic methods for plant genotyping such as RAPD and AFLP have successfully been applied to forensic cases involving plants as marihuana [1-3], Paolo Verde [4, 5], strawberries [6], [7] and knotgrass [8]. The application of such techniques to new species is time consuming though, because although generic, every new species group requires optimization and validation of laboratory techniques, and population testing for interpretation purposes. Additionally, interpretation of the results obtained by these techniques is ambiguous compared to the interpretation of STRs. The process to optimize and validate forensic STR systems for species for which STR markers have been described, is hardly more time consuming than optimizing and validating generic systems and has been done for plants as Cannabis (marijuana) [9-11] and Quercus (oak) [12]. Such systems are often developed for a single species and traditionally species specificity is valued when a panel for forensic genotyping is developed. However cross species amplification potential is considered a valuable characteristic for plant genotyping as this enables a single system to be used for multiple species and thereby potentially in a larger number of cases.

Based on their prevalence in the Netherlands, the occurrence of wild populations with natural reproduction, and the frequency with which traces have been observed in casework in recent years, birch (Betula) was chosen as a genus to investigate in more detail. As in other parts of the North West Europe, two species of birch occur in the wild in the Netherlands, Betula pendula (silver birch) and Betula pubescens (downy birch). Both are monoecious wind-pollinating pioneer trees or shrubs that have a wide climatic tolerance and occur in a variety of habitats. B. pendula is a diploid species (2n=28) that prefers dry sandy soils, whereas B. pubescens is a tetraploid species (4n=56) that prefers wetter soils [13]. Introgression and hybridization between these species have been shown to occur and have played an important role in the evolution of these species [14-17]. Because the two species and their hybrids are morphologically very similar and they occur in overlapping habitats, distinguishing between the species and their hybrids can be ambiguous, even when whole trees are available [13, 18, 19]. Distinguishing between the species and their hybrids when only plant debris such as trace evidence, is available, is even more complicated. Apart from these naturally occurring species and their hybrids, several

76 Development of a forensic STR system for Betula pendula and Betula pubescens exotic species including B. papyrifera are also cultivated in the Netherlands, of which the majority has specific morphological characteristics. The genetics of these exotic species are diverse, ranging from diploid to hexaploids.

Previously described microsatellite markers developed for birch species [20-23] were compared and those with cross species amplification potential and a large variety of alleles were selected for testing on the two naturally occurring Betula species and several cultivated varieties. Based on this first screening, the potential to distinguish between Betula trees in different populations in the Netherlands was investigated. The guidelines for the forensic DNA-typing of non- human biological traces [24-26] formed the basis for our analytical validation. After describing the markers in detail, we propose a nomenclature for the application of these markers. Based on the marker descriptions, an allelic ladder was not considered to provide added value for our within laboratory testing. To compensate for the lack of an allelic ladder, positive control samples covering the expected range were included throughout the study. The developed method was then used to assess the genetic compositions of Betula trees in different populations in the Netherlands. Based on these results, the value of Betula traces in forensic investigations in the Netherlands is estimated. To our knowledge this is the first study describing the forensic utility of STRs in both a tetraploid and a diploid plant species from the same genus.

2. Material and Methods 2.1 Sample collection Five geographically distinct plots or populations were selected that are expected to be uninfluenced by humans and therefor are denoted as ‘wild’ (Wa-We) (Table 1). Two plots are expected to originate from arborists or nurseries and are categorized as ‘planted’ (Pa, Pb), whereas two populations were expected to consist of planted trees, wild trees and crosses between wild and planted trees and are therefore denoted as ‘intermediate’ (Ia, Ib). Additionally samples were collected randomly from different urban areas, parks and nurseries. These were grouped together in population Pc because only a few trees per geographical location were sampled and their origin is unknown. As specified in Table 1 some plots were sampled exhaustively within specified boundaries (road, water, highway, etc), whilst in other locations approximately 50 trees within a plot were randomly selected.

77 Chapter 5 DNA typing of birch:

Table 1. Samples used in this study Population Sampling Sampling N code type location strategy total B.pendula* B.pubescens* other*

Wa wild N 51°47’34.4”, O 5°47’55.3” exhaustive 70 58 12 -

Wb wild N 52°12’07.0”, O 5°51’11.0” random 49 38 11 -

Wc wild N 52°12’40.3”, O 5°52’13.0” random 49 47 - 2

Wd wild N 52°12’04.0”, O 5°35’28.0” random 50 1 49 -

We wild N 52°22’20.1”, O 5°50’09.9” exhaustive 36 35 1 -

Ia intermediate N 52°16’49.0”, O 5°11’57.0” exhaustive 115 103 10 2

Ib intermediate N 52°21’48.2”, O 4°46’20.2” exhaustive 34 15 16 3

Pa planted N 51°54’39.6”, O 4°18’10.9” random 50 39 10 1

Pb planted N 51°46’19.3”, O 5°33’46.8” exhaustive 16 12 1 3

Pc planted thoughout NL 2-11 per location 78 48 25 5 Total 547 396 135 16 * species differentiation based on genetic composition (2n, 4n or other)

As morphological identification of B. pendula, B. pubescens and their hybrids can be ambiguous, samples were collected without attempting to determine the species. The ploidity and species of each sample was determined after STR typing, based on the allelic dosage (see 2.3 for more details). Samples with two peaks or less per marker were considered diploid (and thus B. pendula), samples with less than four peaks for all markers, and either one marker with four peaks or with two or three unbalanced peaks were considered tetraploid (and thus B. pubescens). Samples with three or less peaks per marker and at least one marker with three well balanced peaks or with more than four peaks per marker were considered hybrids or exotic breed trees and denoted ‘other’.

2.2 DNA extraction DNA was extracted using the DNeasy Plant mini kit (Qiagen GmbH, Germany) according to the manufacturer’s protocol with the following modifications. Prior to the extraction of DNA, approximately 0.5 cm2 of fresh leaf or bud material was placed in a 2 ml reaction tube containing 400 μl of buffer AP1 and a 5 mm stainless steel bead (both Qiagen) and grinded twice for 30 s at 30 Hz. The optional centrifugation step advised by the manufacturer was performed in all cases. The extraction was either performed manually or by Qiacube (Qiagen). Total DNA concentration was estimated through spectrophotometry with a NanoDrop ND- 1000 (NanoDrop Technologies, DE, USA).

2.3 Fragment length analysis Microsatellite loci L13.1, L63, L5.4 and L1.10 (Table 2) were amplified in a 10 μl multiplex reaction containing 1x Type-it Multiplex PCR Master Mix (Qiagen), 1.25 pmol of both L13.1 primers, 2.5 pmol of both L5.4 primers, 10 pmol of both L1.10 primers, 5 pmol of both L63 primers (Biolegio) and 0.5-5 ng of template DNA. PCRs were performed on MyCycler (BioRad Laboratories) or GeneAmp 9700 (Applied Biosystems) thermocyclers using the following cycling parameters: 15 min at 95 °C; 26 cycles of 94 °C for 1 min, 54 °C for 90 s, and 72 °C for 2 min; 30 min at 72 °C. Loci Bo.F330 and Bo.F394 (Table 2) were amplified in

78 Development of a forensic STR system for Betula pendula and Betula pubescens two separate reactions under the same conditions as the multiplex reaction with either 10 pmol of both Bo.F330 primers (Biolegio) or 5 pmol of both Bo.F394 primers (Biolegio). Fluorescently labeled amplicons were mixed with 0.4 μl internal standard GeneScan 500 ROX and 8.6 μl Hi-DiTM Formamide (both Life technologies) and denatured by heating (4 min 95 °C) and cooling on ice. Separation and detection were performed on an AB3500 XL genetic analyzer, containing 50 cm capillaries using POP-7TM polymer (all Life Technologies).

Table 2. Primer sequences and published characteristics of the six Betula microsatellite loci used in this study. Developed for Accession nr. Reference Marker Dye Primer sequence (5'-3') Species Published repeat motif L13.1 FAM F - CAC CAC CAC AAC CAC CAT TA B.pendula AF310871 [20]

R - AAC ACC CTT TGC AAC AAT GA (CA)3(GA )14 L5.4 FAM F - GAA AGC ATG AGA CCC GTC TT B.pubescens AF310862 [21]

R - AAC CTA AAC AGC CTG CCA AA (TC)26 L1.10 HEX F - TTT CCA ACG CTT TCT TGA TG B.pubescens AF310856 [21]

R - TGG A TA A GG A A G GGC A TG TC (A G)4AA(AG)11 L63 TAMRA F - AAT CCA CCG AGC ATT TCA AC B.pendula AF310873 [20]

R - CTA CAA CAG CGC CAA GGA AT (GA T)6 Bo.F330 TAMRA F - TGG CAG CAC GAA AGT B.pubescens AY423611 [21]

R - TGG GAA TGA GAG AAC AAG (TC)14 Bo.F394 HEX F - AAT GCA GCA TCT CTT ACC B.pubescens AY423608 [21] R - CAC GCA ATA ATA TGG AAA (TC)13

Allele calling was performed using GeneMapper 4.1 (Life Technologies) with a peak threshold of 100 rfu. Primer concentrations and annealing temperature were optimized for allele balance of heterozygous individuals. This balance enabled recognition of diploids and tetraploids, but also enabled allele scoring in partially homozygous tetraploids. If one or two balanced alleles were observed for each marker, an individual was considered diploid and allele scoring was straightforward. If four balanced alleles were observed for at least one marker, and one to four were observed for other markers, an individual was considered tetraploid. Scoring of such individuals was performed in the following manner: when completely heterozygous or homozygous for any marker, copy number assignment, as in diploids, was straightforward. If two or three alleles were observed in tetraploid individuals, the peak balance was used to assign the number of copies per allele: two balanced peaks corresponds to two copies of each allele, two unbalanced peaks to one copy of one allele and three copies of the other allele (representative examples in Figures 1C, 2C, 3C, 4C, 5C and 6C) . Although tetraploid individuals completely homozygous at all six markers, or carrying two alleles in two copies would wrongly be considered a diploid using his approach, the variability of the complete set of six markers is such that the frequency of such misidentifications is negligible. When three unbalanced alleles were observed, two alleles would be scored with one copy, and the other allele with two copies. Individuals with no more than three balanced alleles per marker were considered triploid.

79 Chapter 5 DNA typing of birch:

2.4 Allele sequencing To enable sequencing of the complete variable and flanking regions, the primers in Table 2 were extended with M13F tails (forward primers) or M13R tails (reverse primers). Per marker fragments with the most frequently observed length, fragments from small to large, and multiple intermediate fragments were selected and amplified in a 10 μl reaction containing 1x Type-it Multiplex PCR Master Mix (Qiagen), 5 pmol of both M13 tailed primers, and 0.5-10 ng of template DNA. PCRs were performed on the previously described thermocyclers using the same cycling parameters for 30 cycles. PCR products were visualized by gel-electrophoresis, ethidium bromide staining and UV detection. When heterozygote individuals were used for allele sequencing, bands were excised from gel, after which the products were recovered using QIAquick gel extraction kit (Qiagen) according to the manufacturer’s protocol. These products were either directly sequenced, or amplified for up to an additional 17 cycles. PCR products that had not been purified through gel-electrophoresis, were purified with ExoSAP-IT® (Affymetrix) following the manufacturer’s protocol. 1 μl of purified PCR product was added to a 10 μl sequencing reaction containing 1x BigDye sequencing buffer, 0.2x BigDye Terminator v3.1 Ready Reaction Mix (Life Technologies) and 2.5 pmol of either M13F (AMBeR) or M13R (AMBeR) sequencing primer (Biolegio) and subjected to the following cycling parameters: 1 min at 96 °C; 28 cycles of 95 °C for 10 s, 50 °C for 5 s and 60 °C for 150 s. Residual sequencing components were removed using BigDye XTerminator® Purification Kit (Life Technologies) following the manufacturer’s protocol, after which sequencing products were separated on the same AB3500 XL genetic analyzer as described for fragment length analysis. Evaluation of sequences and assembly of forward and reverse sequences was performed with Geneious 4.7.6 [27]. Characterization of the repeat structures was performed following the recommendations for human STR typing of the International Society for Forensic Genetics [25, 28, 29] based on the number of full tandem repeat motifs. The observed fragment lengths were grouped, based on average values and Δ values calculated (difference between minimal and maximal fragment length) and compared with Δ values of the alleles of our positive control sample. This approach was feasible for markers L13.1, L5.4 and L63. Alternative approaches had to be adopted for L1.10, Bo.F330 and Bo.F394 which could not solely be described by dinucleotide repeat motifs. Some alleles in marker L1.10 were found to contain a 10 bp insertion/deletion motif, whilst other alleles contained a duplication of the complete repeat structure and/or the flanking regions, after which the regular dinucleotide repeat pattern was continued. Marker Bo.F330 contained a mononucleotide repeat, in which an even number of mononucleotides was considered an integer dinucleotide repeat, whilst an odd number of mononucleotides was considered an intermediate dinucleotide repeat. Part of the variation in marker Bo.F394 was due to insertion/deletion motifs of 2, 4, 8 and 26 nucleotides.

80 Development of a forensic STR system for Betula pendula and Betula pubescens

For all markers, repeat numbers were determined from the sequenced alleles, and allele numbers for other alleles were interpolated from the sequenced alleles. In allele numbering the encountered insertions were considered a multitude of dinucleotide repeats.

2.5 Population statistics In diploids, genotype frequency is calculated as 푝2 for homozygotes, and 2푝푞 for heterozygotes. In autotetraploids gene frequency is calculated as 푝4 for complete homozygotes (AAAA), 4푝3� for partial heterozygote AAAB, 6푝2�2 for partial heterozygote AABB, 12푝2�푟 for partial heterozygote AABC, and 24푝푞�푠 for complete heterozygotes (ABCD). In allotetraploids with non-overlapping alleles, genotype frequencies are calculated as 푝2�2 for AACC, 2푝2𝑟 for AACD, 2푝푞�2 for ABCC and 4푝푞�푠 for complete heterozygotes (ABCD). Allotetraploids with overlapping alleles follow the same calculation, but distinction between the genotypes is less straightforward. The calculations for autotetraploids were used for the tetraploid birches, as the estimates obtained in this manner are the most conservative from a forensic point of view. 푠 2 Calculation of expected heterozygosity in diploids was performed as 퐻푒 = 1 − ∑푖=1 푝푖 where

푝푖 is the frequency of the 푖th allele. Application of this formula to tetraploids, causes the heterozygosity estimation to only respond to changes in numbers of full homozygotes, ignoring the presence of partial heterozygotes. Therefore in tetraploids, when allele dosage can be determined, heterozygosity can be described as 1-(the likelihood of any two alleles being identical by descent). The heterozygosity expected under random mating, assuming random chromosomal segregation [퐻푒(퐶�)] was calculated using AUTOTET [30], with genotypic frequencies AAAA = 푝4, AAAB = 4푝3�, AABB = 6푝2�2, AABC = 12푝2�푟, ABCD = 24푝푞�푠 where genotypes are weighted inversely to the probability of any two of their alleles being identical by descent (AAAA = 0, AAAB = 1/2, AABB = 2/3, AABC = 5/6 and ABCD = 1). These same weights are assigned to the five possible classes of genotypes to calculate the observed heterozygosity 퐻표. Fst values were calculated using these 퐻푒 and 퐻표 values. Allele linkage was estimated for both tetraploid and diploid ‘wild’ and ‘intermediate’ locations with more than 25 individuals using LD4X [31] under R [32]. P threshold values were adjusted by the sequential Bonferroni correction [33] incorporated in LD4X.

3. Results and Discussion 3.1 Locus characteristics All six microsatellite markers were analysed for all 547 samples, and the observed fragment lengths (in bp) were recorded. The observed fragment lengths were plotted, after which distinct groups of comparable fragment lengths could clearly be distinguished for all markers except BoF330 (see further and 3.1.4). Calculation of the Δ values of the alleles of our positive control sample, each analysed more than 60 times within the last years, indicated a between injection variation of 0.63 – 0.81 bp. Based on these values, the defined groups with

81 Chapter 5 DNA typing of birch:

comparable fragment lengths were considered a single allele if the difference between the smallest and largest observed value was smaller than 0.9 bp. This led to successful designation of alleles, differing from each other by approximately 2 bp for markers L13.1 and L5.4, and by approximately 3 bp for marker L63. For markers L1.10 and Bo.F394 distinct alleles could also be defined, but several intermediate alleles differing by less than 2 bp were recognized. Although this way of grouping fragment lengths was successful for marker Bo.F330, the identified alleles turned out to be only 1 bp apart, indicating the presence of intermediate alleles and potentially a complex repeat motif.

Allele frequencies were calculated using a) all individuals with the same ploidity within each of the 10 populations, b) all individuals within each of the 10 populations, c) all individuals with the same ploidity (irrespective of population) and d) all individuals combined. The allele frequency variation determined within a ploidity (method a vs. c) was found to be far smaller than within a population (method a vs. b) (Exact test of sample differentiation and Euclidean distances between populations based on allele frequencies per locus available in Supplementary Data S1). This correlates with the allotetraploid nature of B. pubescens, of which B. pendula has been suggested to be one of the progenitor species [16,17].

3.1.1 Marker L13.1 Fragments with thirteen different lengths, ranging from 76 to 124 bp were sequenced for marker L13.1. All the sequenced alleles displayed the (CA)x(GA)x compound repeat motif initially described (Table 2) [20]. The majority of the variation in this compound repeat could be assigned to the number of (GA) repeats, varying from 5 to 28, whereas the number of (CA) repeats only varied between 1 and 3 (Figure 1A). In the diploid individuals, two dominant alleles were identified (17 and 20) both occurring at approximately 25%. Alleles 12, 14 and 16 occurred at frequencies between 7.5 and 12.5%, whilst other alleles were found at (much) lower frequencies. In the tetraploid individuals no dominant alleles were identified. Multiple alleles (6, 15, 16, 19, 20 and 21) were found to occur at frequencies between 7.5 and 12.5%, all other alleles were encountered at lower frequencies (Figure 1B). Representative chromatograms for homozygotes and (partial) heterozygotes are displayed in Figure 1C.

3.1.2 Marker L5.4 Marker L5.4 has previously been described as dinucleotide repeat with a simple (TC)x repeat motif (Table 2) [21]. Fragments with seventeen different lengths were sequenced, ranging from 132 to 197 bp. As shown in Figure 2A, approximately a quarter of the alleles that were sequenced consisted of this single (TC)x repeat motif, whereas all others contained at least one additional (CC)1 motif. Approximately one third of these alleles were found to harbour an additional a CC or AC within the (TC) x motif. As these more complex patterns are recognized in a substantial part of the alleles, we consider marker L5.4 to contain a compound repeat

82 Development of a forensic STR system for Betula pendula and Betula pubescens motif, consisting of a number of (TC) repeats, in some instances split by a (CX) repeat, followed by a single CC. In diploid individuals, a single allele was encountered at a relatively high frequency (allele 13, approximately 30%). Three alleles were found to occur in approximately 10-15% of the diploid alleles (allels 15, 17 and 22), two in 5-10% (18 and 20) whilst the remaining alleles were encountered at frequencies of less than 5%. A different pattern was observed in tetraploid individuals, were no dominant allele was observed, the smaller alleles (7-12) were encountered and the frequency of alleles 13, 14, 16,20 and 22 differed from the frequency of these alleles in diploids by more than 5% (Figure 2B). Representative chromatograms for homozygotes and (partial) heterozygotes are displayed in Figure 2C.

3.1.3 Marker L63 Initially, marker L63 was described as a trinucleotide repeat with a simple (GAT)x repeat motif preceded by a (GAA)2 structure (Table 2) [20]. Fragments of 138, 148, 152, 154 and 157 bp were sequenced. Multiple sequences not only showed variation in the number of (GAT) repeats, but also in the number of (GAA) repeats, and in the inclusion of other trinucleotide motifs (Figure 3A). A similarity between the allele distribution of diploid and tetraploid individuals is that allele 8 is dominant in both groups (respectively almost 60% and 40%), however the frequency distribution for other alleles differs significantly. Allele 5 and 9 contribute to more than 20% and 10% of tetraploid alleles respectively, whereas these alleles are hardly encountered in diploids. The opposite is true for allele 11, which constitutes more than 20% of diploid alleles, but is only rarely found in tetraploids (Figure 3B). Representative chromatograms for homozygotes and (partial) heterozygotes are displayed in Figure 3C.

83 Chapter 5 DNA typing of birch:

Figure 1. Characteristics of marker L13.1. Underlined: primer sequences, bold: polymorphic regions, italics: constant regions. A. Allele designation based on sequenced number of dinucleotide repeats, fragment length in (bp) as determined through CE, number of alleles sequenced per fragment length and sequence of original marker description (accession number) values in black: measured values, values in grey italics: interpolated from black values. B. Allele frequencies for diploid and tetraploid individuals. C. Representative examples of electropherograms enabling estimation of allelic dosage (repeat number and RFU shown).

84 Development of a forensic STR system for Betula pendula and Betula pubescens

Figure 2. Characteristics of marker L5.4. Underlined: primer sequences, bold: polymorphic regions, italics: constant regions. A. Allele designation based on sequenced number of dinucleotide repeats, fragment length in (bp) as determined through CE, number of alleles sequenced per fragment length and sequence of original marker description (accession number) values in black: measured values, values in grey italics: interpolated from black values. B. Allele frequencies for diploid and tetraploid individuals. C. Representative examples of electropherograms enabling estimation of allelic dosage (repeat number and RFU shown).

85 Chapter 5 DNA typing of birch:

Figure 3. Characteristics of marker L63. Underlined: primer sequences, bold: polymorphic regions, italics: constant regions. A. Allele designation based on sequenced number of trinucleotide repeats, fragment length in (bp) as determined through CE, number of alleles sequenced per fragment length and sequence of original marker description (accession number) values in black: measured values, values in grey italics: interpolated from black values. B. Allele frequencies for diploid and tetraploid individuals. C. Representative examples of electropherograms enabling estimation of allelic dosage (repeat number and RFU shown).

86 Development of a forensic STR system for Betula pendula and Betula pubescens

Figure 4. Characteristics of marker Bo.F330. Underlined: primer sequences, bold: polymorphic regions, italics: constant regions. A. Allele designation based on sequenced number of dinucleotide repeats, an even number mononucleotides (C)e resulting in an integer allele designation, an odd number of mononucleotides resulting in an allele.1 designation, fragment length in (bp) as determined through CE, number of alleles sequenced per fragment length and sequence of original marker description (accession number) values in black: measured values, values in grey italics: interpolated from black values. B. Allele frequencies for diploid and tetraploid individuals. C. Representative examples of electropherograms enabling estimation of allelic dosage (repeat number and RFU shown).

87 Chapter 5 DNA typing of birch:

Figure 5. Characteristics of marker L1.10. Underlined: primer sequences, bold: polymorphic regions, italics: constant regions. A. Allele designation based on sequenced number of dinucleotide repeats, each (AAGAGAGAGA)1 considered as five dinucleotide repeats, each (GTATATTT)1 considered as four dinucleotide repeats, fragment length in (bp) as determined through CE, number of alleles sequenced per fragment length and sequence of original marker description (accession number) values in black: measured values, values in grey italics: interpolated from black values. B. Allele frequencies for diploid and tetraploid individuals. C. Representative examples of electropherograms enabling estimation of allelic dosage (repeat number and RFU shown).

88 Development of a forensic STR system for Betula pendula and Betula pubescens

Figure 6. Characteristics of marker Bo.F394. Underlined: primer sequences, bold: polymorphic regions, italics: constant regions. A. Allele designation based on sequenced number of dinucleotide repeats, each (GCAT)1 considered as two dinucleotide repeats, each (GAGCTCTC)1 considered as four dinucleotide repeats, each (GCATTTACTTTCATTTCTGCTCGTTC)1 considered as thirteen dinucleotide repeats, fragment length in (bp) as determined through CE, number of alleles sequenced per fragment length and sequence of original marker description (accession number) values in black: measured values, values in grey italics: interpolated from black values. B. Allele frequencies for diploid and tetraploid individuals. C. Representative examples of electropherograms enabling estimation of allelic dosage (repeat number and RFU shown).

3.1.4 Marker Bo.F330 Although marker Bo.F330 has previously been described as dinucleotide repeat with a simple

(TC)x repeat motif (Table 2) [21], the repeat structure was suspected to be more complex as most alleles were identified to be 1 bp apart. Fragments with eleven different lengths were sequenced, ranging from 173 to 205 bp as shown in Figure 4A. Multiple motifs were identified in these sequences. In the shorter fragments the majority of the variation within the repeat structure was due to a variable number of (TC) repeats, whilst in longer fragments (21.5 repeats and higher) the (TC)x repeat was split by a (CC)1 motif. The cause for the large number of intermediate was found to be a mononucleotide C stretch prior to the (TC)x motif. Alleles with (C)6 and (C)8 were considered an integer allele, whilst the alleles with (C)7 were considered intermediate alleles. Other variations did not impact the repeat structure, but were noted and described as they may influence mobility and thereby fragment length determination. Especially for markers with intermediate alleles, such subtle mobility shifts may hamper correct allele designation. In the upstream flanking region two positions were found to be variable; AT

89 Chapter 5 DNA typing of birch:

and GC were encountered most often, but AC was also observed. Within the (TC)x motif, both TA and TT variants were observed. Downstream of the (TC)x motif both (TT)1 or (TT)2 was encountered. In the diploid individuals, almost half of the alleles were found to be allele 14 (Figure 4B), although of the seven sequenced alleles with this fragment length, only two displayed an identical sequence (Figure 4a). Approximately one eighth of all alleles was allele 23.1, alleles 18, 19, 21.1, 22.1, 24 and 24.1 were each found in approximately 5-7% of all alleles, and the remaining alleles was encountered at frequencies less than 1%. For tetraploid individuals, a completely different distribution was encountered. Three alleles were found in more to reach frequencies of or more 10% (14 in 18%, 23.1 in 12% and 9.1 in 10%), three alleles occurred in 5-7% of alleles (12, 13 and 15) and most other alleles were encountered in 1-5% of alleles (Figure 4B). Representative chromatograms for homozygotes and (partial) heterozygotes are displayed in Figure 4C.

3.1.5 Marker L1.10 Marker L1.10 has been described as a compound dinucleotide repeat with an (AG)xAA(GA)x motif (Table 2) [21]. Although this motif was certainly recognized in the sequences obtained from fragments with lengths varying from 155 to 213 bp, the complete motif was found to be complex and hypervariable. The last constant nucleotides, flanking the first repeat were (GTATATTT) (Figure 5A). The repeat structure consisted of the presence or absence of a single (AA(GA)4) motif followed by an AA(GA)x motif, in some cases followed by a single (GG) repeat. In the longer alleles, the (GTATATTT) flanking region, the presence or absence of the motif (AA(GA)4) and the repeat motif AA(GA)x were duplicated. In both diploids and tetraploids, smaller alleles were more frequent than larger alleles. However the distributions of alleles in diploids and tetraploids varied considerably, especially within this smaller allele range. In diploid individuals, allele 16 was the most frequent, reaching a frequency of 20%, followed by allele 20 at a frequency of almost 15%. In tetraploids, none of the alleles exceeded a frequency of 11% (Figure 5B). Representative chromatograms for homozygotes and (partial) heterozygotes are displayed in Figure 5C.

3.1.6 Marker Bo.F394 In the shorter alleles of marker Bo.F394, a simple (TC)x repeat motif was found, as was previously described (Table 2) [21]. However, a large portion of the overall fragment length variation of marker Bo.F394, ranging from 109 - 255 bp, could be attributed to the insertion/deletion of several motifs. A two bp, a four bp and a 26 bp fragment directly downstream of the (TC)x repeat were identified. Additionally in the longest alleles (>200 bp) a

(GAGCTCTC)x bp repeat motif was encountered (Figure 6A). The two intermediate alleles that were sequenced both contained the 26 bp downstream fragment, but with a single basepair deletion causing the fragment to be 25 bp, thereby creating allele 27.1. The most common alleles in diploid and tetraploid individuals are the integer alleles 26 – 33 with intermediate fragment lengths; the smaller, larger and intermediate alleles are considerably

90 Development of a forensic STR system for Betula pendula and Betula pubescens rarer in both groups (Figure 6B). However, when examined in more detail, different distributions can be recognized when the two groups are compared especially in these abundant alleles. In both groups, allele 28 is found at a frequency of approximately 20%, however other common alleles in diploids (29 and 32, respectively 20% and 12%) are encountered less often in tetraploids (12% and 7% respectively). The same is true for allele 30, which is relatively common in tetraploids (9%) and is only found in 5% of diploids. Representative chromatograms for homozygotes and (partial) heterozygotes are displayed in Figure 6C.

3.2 Population characteristics STR profiles consisting of the alleles of six markers were obtained for all 547 trees. The majority (72%) of these samples turned out to originate from diploid individuals, 25% from tetraploid individuals and 3% from non-diploid, non-tetraploid individuals. The distribution of diploid, tetraploid and other ploidity trees differed per population/sampling location (Table 1).

Trees sampled at wild locations Wc and We, and planted location Pb almost exclusively consisted of diploid individuals (B. pendula). One location, wild location Wd almost exclusively consisted of tetraploids (B. pubescens). The other locations (Wa, Wb, Ia, Ib, and Pa) and random sampling Pc consisted of both di- and tetraploid individuals with the percentage of tetraploids ranging from 9% in Ia to 48% in Ib illustrating the overlapping habitats of these species.

As described previously, the allele frequency variation determined within either diploids or tetraploids was found to be far smaller than within a location (both diploids and tetraploids) indicating that species boundaries influence gene flow more than location. Therefore, Fst values were separately calculated within and between the diploid (Table 3 A) and tetraploid fractions (Table 3 B) of the wild, intermediate and planted populations (He, Ho, local inbreeding coefficients Fs and number of alleles A are shown in Supplementary Data 2).

Table 3. Weighted average pairwise population type Fst values ± standard deviation over six markers for A diploid and B tetraploid individuals. Within population type values on diagonal, between population type values below diagonal.

A Diploids Wabce Iab Pabc B Tetraploids Wabd Iab Pac

Wabce 0.0146 ± 0.0029 Wabd 0.0134 ± 0.0030

Iab 0.0039 ± 0.0018 0.0068 ± 0.0030 Iab 0.0035 ± 0.0018 0.0120 ± 0.0016 Pabc 0.0061 ± 0.0025 0.0065 ± 0.0028 0.1731 ± 0.0066 Pac 0.0058 ± 0.0025 0.0065 ± 0.0025 0.0150 ± 0.0060

The highest Fst value (0.1731) was found when comparing the diploid fraction of the three planted populations (Pa, Pb, Pc) which may be explained by inbreeding at nurseries and additionally by the nature of Pc: a collection of many different trees from unknown origin. The five other within population type Fst values were considerably lower (0.0068-0.0150). All six between population type Fst values were found to be lower (0.0035-0.0065) than the within population type Fst values, indicating that only a small portion of the genetic variation can be attributed to between population/location differences. This is in accordance with the results

91 Chapter 5 DNA typing of birch:

from the Exact test of sample differentiation and Euclidean distances between populations and enables using population data derived from one population, for frequency estimates of other locations, at least within the Western and central area of the Netherlands.

Comparison of all 547 DNA profiles showed that all 254 profiles from ‘wild’ populations could be distinguished from all other profiles. Four pairs of trees sampled in intermediate or planted locations were found to share DNA profiles: two diploid individuals from Ia, two diploid individuals from Pa, one diploid individual from Ib and one diploid individual from Pc and two tetraploid individuals from Pc were indistinguishable. Unique DNA profiles were obtained from all other trees. These observations fit the reproductive strategy of birch where, in the wild without human interference, new individuals arise from pollinated seeds. In more urbanized areas, humans more often interfere with plants’ breeding strategies. Desirable (phenotypic) traits may inspire humans to select, specifically breed or even propagate birch trees though cuttings. Selectional breeding or inbreeding may eventually lead to individuals that are not identical, but cannot be distinguished with the six described loci, whereas cloning produces genetically identical individuals that may be dispersed over a large distance. Although these differences do not influence the use of birch DNA as an exclusionary tool in forensic investigations, discriminating between a ‘wild’ or ‘planted’ location will influence how the likelihood of obtaining a certain DNA profile by random match is calculated and interpreted. Testing linkage disequilibrium between all pairs of alleles in ‘wild’ and ‘intermediate’ individuals per ploidity, per sampling location (with N>25) and per sample type, resulted in the finding of 7 out of 41435 pairs with a P value below the Bonferroni corrected threshold value (Supplementary data S3). Four of these seven pairs consisted of at least one rare allele (occurring at a frequency below 3%) potentially influencing the linkage study therefore these pairs were not further considered. The three remaining pairs of loci in which linkage between certain alleles may be suspected, differed between the three populations in which they were found. No indications for linkage between alleles or loci were detected in more than one location, sample type or ploidity.

3.3 Forensic value of matching birch STR profiles The value of matching birch DNA profiles in a forensic investigation depends on several different factors, including the number of birch fragments found, where they were found, which explanations may be given for the presence of such fragments, and the random match probability of the profile itself. As mentioned previously, apart from the allele frequencies, the random match probability of a birch DNA profile also depends on the suspected origin of a tree (wild vs. planted) and which ploidity of birch is encountered (diploid vs. tetraploid). For wild trees, the allele frequencies for diploid and tetraploids were used to calculate the most conservative random match probability (using the most common alleles, maximum value in Table 4), average random match probability (excluding singleton alleles) and minimum random match probability (using the rarest alleles encountered more than once) (Table 4).

92 Development of a forensic STR system for Betula pendula and Betula pubescens

Table 4. Maximum, average and minimum random match probabilities (minimum based on lowest, non- singleton allele frequencies). Diploid Tetraploid maximum average minimum maximum average minimum alleles value value value alleles value value value L13.1 2pq 17, 20 1,37E-01 4,93E-02 6,47E-06 24pqrs 6, 15, 20, 21 3,37E-03 6,79E-04 1,88E-10 L5.4 p2 13, 13 9,40E-02 3,47E-02 6,47E-06 24pqrs 13, 14, 15, 17 6,64E-03 1,06E-03 1,88E-10 L1.10 2pq 16, 20 5,71E-02 1,42E-02 4,05E-05 24pqrs 15, 17, 18, 19 1,61E-03 2,22E-04 1,88E-10 L63 p2 8, 8 3,44E-01 2,35E-01 6,47E-06 12p2qr 5, 8, 8, 10 1,13E-01 4,98E-02 1,88E-10 Bo.F330 p2 14, 14 2,39E-01 8,91E-02 6,47E-06 12p2qr 9.1, 14, 14, 23.1 4,64E-03 4,22E-04 1,88E-10 Bo.F394 2pq 28, 29 8,38E-02 2,67E-02 6,47E-06 12p2qr 28, 28, 29, 30 5,91E-03 9,20E-04 1,88E-10 combined 5,05E-06 1,35E-08 4,60E-31 1,12E-13 3,08E-18 4,44E-59

Although B. pubescens is considered to have an allotetraploid origin [16,17], all alleles were treated equally in this calculation, as if this species has an autotetraploid origin. As can be seen from the formulas in 2.5, this leads to more conservative (and thus less incriminating) genotype frequency estimates (for example 24푝푞푟푠 vs 4푝푞푟푠 for complete heterozygotes). For diploids, the frequencies of the most common genotypes ranged from 0.344 (marker L63) to 0.0571 (marker L1.10) (Table 4). For tetraploids, values of 0.113 (marker L63) to 0.00161 (marker L1.10) were observed. We did not find any indication for linkage disequilibrium in different populations thereby enabling the markers to be treated independently and allowing multiplication of the per marker frequencies to obtain the combined genotype frequency. Combination of all six markers results in conservative values of 5∙10-6 for diploids and 1∙10-13 for tetraploids. This corresponds to a random match probability of one in two hundred thousand and one in 9 trillion respectively. As illustrated in Table 4, these values are far more conservative than the average value of a genotype, thereby correcting for (unknown) genetic influences, as inbreeding, double reduction in tetraploids, the presence of null alleles, etc. However, of greater influence than such genetic factors, is whether a tree has a ‘natural’ origin, or whether human selection and or propagation have been involved. To estimate a random match probability for non-wild trees, the 275 unique genotypes encountered in 279 ‘intermediate’ and ‘planted’ samples were considered. Based on these figures, the random match probability of a potentially planted profile might be as high as 1.4∙10-2 – corresponding to one match in approximately seventy individuals. Further forensic validation of this method, including the typing of case specific reference trees is advised if such a method is to be applied in case investigations.

93 Chapter 5 DNA typing of birch:

4. Conclusion A six STR marker set was made suitable for forensic analysis of birch DNA, through characterisation of the alleles, proposition of an allele nomenclature and allele frequency determination. The ability of the method to distinguish between individual birch trees indigenous to the Netherlands, Betula pubescens and Betula pendula, was assessed. The genetic differences between these two species were found to be larger than the differences between populations of the same species, illustrating the need for allele frequency estimates from both species if an exact profile based random match probability is to be calculated. If a general, conservative profile frequency is to be estimated for a wild tree, a random match probability of 5∙10-6 for the diploid B. pendula and 1∙10-13 for the tetraploid B. pubescens is reached with the described six markers. Calculation of profile specific random match probabilities leads to an averaged 1∙10-8 for B. pendula and 3∙10-18 for B. pubescens. This illustrates the potential relevance if trace evidence secured from a suspect is found to match a birch tree growing on or near a crime scene and demonstrates the value of a six STR marker system in typing botanical traces. Although the majority of birch trees and shrubs in the Netherlands originate from seeds and are thus genetically distinct individuals, care should be taken when planted, possibly cloned trees are relevant to an investigation, as RMPs will be significantly higher.

References [1] V. Jagadish, J. Robertson, A. Gibbs, RAPD analysis distinguishes Cannabis sativa samples from different sources. Forensic Science International, 79(2) (1996) 113-121. DOI: 10.1016/0379- 0738(96)01898-1 [2] H. Miller Coyle, T. Palmbach, N. Juliano, C. Ladd, H.C. Lee, An overview of DNA methods for the identification and individualization of marijuana, Croatian Medical J. 44(3) (2003) 315-321. [3] H. Miller Coyle, G. Shutler, S. Abrams, J. Hanniman, S. Neylon, C. Ladd, et al., A simple DNA extraction method for marijuana samples used in amplified fragment length polymorphism (AFLP) analysis, J. Forensic Sci. 48(2) (2003) 343-7. [4] C.K. Yoon, Botanical witness for the prosecution, Science, 260(5110) (1993) 894–895. [5] R. Mestel, Murder trial features trees genetic fingerprint, New Sci., 138(1875) (1993) 6. [6] L. Congiu, M. Chicca, R. Cella, R. Rossi, G. Bernacchia, The use of random amplified polymorphic DNA (RAPD) markers to identify strawberry varieties: a forensic application. Molecular Ecology, 9(2) (2000) 229-232. DOI: 10.1046/j.1365-294X.2000.00811.x [7] H. Korpelainen, V. Virtanen, DNA fingerprinting of mosses, J. Forensic Sci., 48(4) (2003) 804-807. [8] W.J. Koopman, I. Kuiper, D.J. Klein-Geltink, G.J. Sabatino, M.J. Smulders, Botanical DNA evidence in criminal cases: knotgrass (Polygonum aviculare L.) as a model species, Forensic Sci. Int. Genet. 6(3) (2012) 366-374. DOI:10.1016/j.fsigen.2011.07.013 [9] H.-M. Hsieh, R.-J. Hou, L.-C. Tsai, C.-S. Wei, S.-W. Liu, L.-H. Huang et al., A highly polymorphic STR locus in Cannabis sativa, Forensic Sci. Int. 131(1) (2003) 53–8. DOI: 10.1016/S0379- 0738(02)00395-X [10] C. Howard, S. Gilmore, J. Robertson, R. Peakall, Developmental validation of a Cannabis sativa STR multiplex system for forensic analysis. J Forensic Sci., 53(5) (2008) 1061–7, DOI: 10.1111/j.1556- 4029.2008.00792.x

94 Development of a forensic STR system for Betula pendula and Betula pubescens

[11] C. Howard, S. Gilmore, J. Robertson, R. Pealkall, A Cannabis sativa STR genotype database for Australian seizures: Forensic applications and limitations, J. Forensic Sci. 54(3) (2009) 556-563, DOI: 10.1111/j.1556-4029.2009.01014.x [12] K.J. Craft, J.D. Owens, M.V. Ashley, Application of plant DNA markers in forensic botany: genetic comparison of Quercus evidence leaves to crime scene trees using microsatellites. Forensic Sci. Int. 165(1) (2007) 64-70, DOI: 10.1016/j.forsciint.2006.03.002 [13] M.D. Atkinson, Betula pendula Roth (B. verrucosa Ehrh.) and B. pubescens Ehrh. J Ecol 80 (1992) 837– 870. [14] ÆH.T. Thórsson, E. Salmela, K. Anamthawat-Jónsson K, Morphological, cytogenetic, and molecular evidence for introgressive hybridization in birch, J. Heredity, 92 (2001) 404–408, DOI: 10.1093/jhered/92.5.404 [15] A.E. Palme, Q. Su, S. Palsson, M. Lascoux, Extensive sharing of chloroplast haplotypes among European birches indicates hybridization among Betula pendula, B. pubescens and B. nana, Mol. Ecol. 13 (2004) 167–178, DOI: 10.1046/j.1365-294X.2003.02034.x [16] Y. Tsuda, V. Semerikov, F. Sebastiani, G.G. Vendramin, M. Lascoux, Multispecies genetic structure and hybridization in the Betula genus across Eurasia, Mol. Ecol., 26(2) (2017) 589-605, DOI: 10.1111/mec.13885 [17] D.E. Howland, R. P. Oliver, A. J. Davy, Morphological and molecular variation in natural populations of Betula, New Phytologist 130(1) (1995) 117-124, DOI: 10.1111/j.1469- 8137.1995.tb01821.x [18] M.D. Atkinson, A.N. Codling, A reliable method for distinguishing between Betula pendula and B. pubescens, Watsonia, 16 (1986) 75-76. [19] N. Wang, J.S. Borrell, R.J.A. Buggs, Is the Atkinson discriminant function a reliable method for distinguishing between Betula pendula and B. pubescens (Betulaceae)?, New J. Botany 4(2) (2014) 90-94, DOI: 10.1179/2042349714Y.0000000044 [20] K.K.M. Kulju, M. Pekkinen, S. Varvio, Twenty‐three microsatellite primer pairs for Betula pendula (Betulaceae), Mol. Ecol. Notes, 4(3) (2004) 471-473, DOI: 10.1111/j.1471-8286.2004.00704.x [21] C. Truong, A.E. Palmé, F. Felber, Recent invasion of the mountain birch Betula pubescens ssp. tortuosa above the treeline due to climate change: genetic and ecological study in northern Sweden. J. Evol. Biol. 20(1) (2007) 369-380, DOI: 10.1111/j.1471-8286.2004.00848.x [22] Y. Tsuda, S. Ueno, Y. Ide, Y. Tsumura Y, Development of 14 EST-SSRs for Betula maximowicziana and their applicability to related species. Conservation Genet., 10 (2009) 661–664, doi:10.1007/s10592-008-9608-z [23] Y. Tsuda, S. Ueno, J. Ranta, K. Salminen, Y. Ide, K. Shinohara, Y. Tsumura, (2009) Development of 11 EST-SSRs for Japanese white birch, Betula platyphylla var. japonica and their transferability to related species. Conservation Genetics, 10, 1385–1388, DOI: 10.1007/s10592-008-9701-3 [24] A. Linacre, L, Gusmao, W. Hecht, A.P. Hellmann, W.R. Mayr, W. Parson et al., ISFG: recommendations regarding the use of non-human (animal) DNA in forensic genetic investigations. Forensic Sci. Int. Genet., 5(5) (2011) 501-505, DOI: 10.1016/j.fsigen.2010.10.017 [25] B. Budowle, P. Garofano, A. Hellman, M. Ketchum, S. Kanthaswamy, W. Parson, et al., Recommendations for animal DNA forensic and identity testing. Int. J. Legal Med., 119(5) (2005) 295-302, DOI: 10.1007/s00414-005-0545-9 [26] ENFSI-BPM-APS-01: Best Practice Manual for the application of molecular methods for the forensic examination of non-human biologcial traces (version 01-November 2015) [Internet] Available from: http://enfsi.eu/documents/best-practice-manuals/

95 Chapter 5 DNA typing of birch:

[27] A.J. Drummond, B. Ashton, M. Cheung, J. Heled, M. Kearse, R. Moir, et al., Geneious v4.7, Biomatters Ltd, Auckland, New Zealand, 2009. [28] W.R. Mayr, DNA recommendations – 1994 report concerning further recommendations of the DNA commission of the ISFH regarding PCR-based polymorphisms in STR (short tandem repeat) systems, Forensic Sci. Int., 69 (1994), pp. 103–104, Vox Sanguinis, 69(1), 70-71, DOI: 10.1111/j.1423-0410.1995.tb00350.x [29] B. Olaison, W. Bär, B. Brinkmann, B. Budowle, A. Carracedo, P. Gill, et al., DNA recommendations 1997 of the International Society for Forensic Genetics Vox Sang., 74 (1998), pp. 61–63, DOI:10.1159/000030907 [30] P.H. Thrall, A. Young, Computer note. AUTOTET: A program for analysis of autotetraploid genotypic data. Journal of Heredity, 91(4) (2000) 348-349. [31] B. Julier, A program to test linkage disequilibrium between loci in autotetraploid species, Mol. Ecol. Resour. 9(3) (2009) 746-8, DOI: 10.1111/j.1755-0998.2009.02530.x [32] R Core Team, R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, , (2014) http://www.R-project.org/ [33] W.R. Rice, Analyzing tables of statistical tests, Evolution, 43 (1989), 223–225

96 Development of a forensic STR system for Betula pendula and Betula pubescens

Supplementary material S2: Population characteristics M. Wesselink, A. Dragutinovic, J.W. Noordhoek, L. Bergwerff & I. Kuiper 2017 All data based on allele frequences of six STR markers ------Diploid populations - per sampling location Diploid populations - per type (wild, intermediate, planted) Tetraploid populations - per sampling location Tetraploid populations - per type (wild, intermediate, planted)

Wa Ho He (He-Ho)/He A Average AI Wa Ho He (He-Ho)/He A Average AI N 58 Fs N 12 Fs L13.1 0,86206897 0,81420927 -0,0587806 13 1,86 L13.1 0,958 0,909 -0,05391 14 3,75 L5.4 0,93103448 0,83739596 -0,1118211 14 1,06 L5.4 0,903 0,886 -0,01919 16 3,5 L1.10 0,9137931 0,90651011 -0,0080341 19 1,08 L1.10 0,950 0,905 -0,04972 15 3,58 L63 0,60344828 0,5789239 -0,042362 5 1,39 L63 0,778 0,707 -0,10042 4 2,91 Bo.F330 0,70689655 0,67553508 -0,0464246 11 1,29 Bo.F330 0,917 0,921 0,004343 18 3,58 Bo.F394 0,82758621 0,88064804 0,06025316 17 1,17 Bo.F394 0,931 0,898 -0,03675 16 3,58 weighted average -0,032303 weighted average -0,04038

Wb Ho He (He-Ho)/He A Average AI Wb Ho He (He-Ho)/He A Average AI N 38 Fs N 11 Fs L13.1 0,842 0,804 -0,0478242 11 1,84 L13.1 0,909 0,889 -0,0225 13 3,45 L5.4 0,895 0,852 -0,0504065 13 1,10 L5.4 0,955 0,901 -0,05993 13 3,72 L1.10 0,895 0,836 -0,0708661 16 1,10 L1.10 0,967 0,915 -0,05683 14 3,72 L63 0,605 0,543 -0,1140854 4 1,39 L63 0,712 0,740 0,037838 4 2,54 Bo.F330 0,789 0,724 -0,0898662 12 1,21 Bo.F330 0,924 0,878 -0,05239 13 3,54 Bo.F394 0,895 0,852 -0,0504065 14 1,10 Bo.F394 0,883 0,88 -0,00341 14 3,36 weighted average -0,0673676 weighted average -0,02825

Wc Ho He (He-Ho)/He A Average AI Wd Ho He (He-Ho)/He A Average AI N 47 Fs N 49 Fs L13.1 0,830 0,851 0,02448111 10 1,82 L13.1 0,915 0,906 -0,00993 18 3,48 L5.4 0,872 0,882 0,01052632 13 1,12 L5.4 0,881 0,876 -0,00571 17 3,38 L1.10 0,872 0,855 -0,0206568 16 1,12 L1.10 0,976 0,948 -0,02954 27 3,83 L63 0,660 0,644 -0,0235335 4 1,34 L63 0,738 0,688 -0,07267 6 2,69 Bo.F330 0,702 0,648 -0,0827225 11 1,29 Bo.F330 0,892 0,915 0,025137 25 3,42 Bo.F394 0,872 0,878 0,00618876 12 1,12 Bo.F394 0,856 0,869 0,01496 21 3,26 weighted average -0,0107046 weighted average -0,01077

We Ho He (He-Ho)/He A Average AI Wabce Ho He (He-Ho)/He A Wabd Ho He (He-Ho)/He N 35 Fs N 178 Fs Hs Fst N 72 Fs Hs Fst A L13.1 0,771 0,797 0,03225806 8 1,77 L13.1 0,831 0,829 -0,00249 0,818215 0,013477 13 L13.1 0,921 0,914 -0,00766 0,903903 0,011047 19 L5.4 0,857 0,826 -0,0380623 10 1,14 L5.4 0,893 0,863 -0,03475 0,849851 0,015531 18 L5.4 0,896 0,892 -0,00448 0,881486 0,011787 19 L1.10 0,943 0,891 -0,0581768 15 1,05 L1.10 0,904 0,892 -0,01367 0,874626 0,019802 23 L1.10 0,971 0,946 -0,02643 0,935792 0,010791 28 L63 0,686 0,622 -0,103086 4 1,31 L63 0,635 0,605 -0,04944 0,597004 0,013097 5 L63 0,741 0,709 -0,04513 0,699111 0,013948 6 Bo.F330 0,600 0,670 0,10420475 9 1,40 Bo.F330 0,702 0,687 -0,02193 0,677691 0,013804 16 Bo.F330 0,901 0,928 0,029095 0,910347 0,019022 27 Bo.F394 0,971 0,856 -0,1354962 15 1,02 Bo.F394 0,882 0,879 -0,00365 0,868788 0,011415 20 Bo.F394 0,873 0,888 0,016892 0,875514 0,014061 25 weighted average -0,0359926 weighted average -0,01944 0,014655 weighted average -0,00493 0,013426

Ia Ho He (He-Ho)/He A Average AI Ia Ho He (He-Ho)/He A Average AI N 102 Fs N 10 Fs L13.1 0,784 0,795 0,01312209 11 1,77 L13.1 0,817 0,849 0,037691 12 3,1 L5.4 0,863 0,827 -0,0431145 16 1,13 L5.4 0,917 0,901 -0,01776 13 3,5 L1.10 0,882 0,918 0,03874346 20 1,11 L1.10 0,783 0,895 0,12514 13 3 L63 0,637 0,582 -0,0954151 4 1,37 Iab Ho He (He-Ho)/He A L63 0,633 0,616 -0,0276 4 2,1 Iab Ho He (He-Ho)/He A Bo.F330 0,716 0,707 -0,0120285 14 1,27 N 117 Fs Hs Fst Bo.F330 0,950 0,889 -0,06862 16 3,7 N 26 Fs Hs Fst Bo.F394 0,843 0,872 0,03274892 19 1,15 L13.1 0,769 0,796 0,033191 0,788579 0,008873 11 Bo.F394 0,907 0,870 -0,04253 14 3,5 L13.1 0,891 0,902 0,012195 0,889615 0,01373 15 weighted average -0,0053474 L5.4 0,872 0,840 -0,03756 0,834439 0,0069 16 weighted average 0,00259 L5.4 0,900 0,905 0,005525 0,893615 0,01258 19 L1.10 0,872 0,916 0,048212 0,911631 0,004721 20 L1.10 0,859 0,920 0,066304 0,911615 0,009114 20

Ib Ho He (He-Ho)/He A Average AI L63 0,624 0,572 -0,0899 0,570982 0,002594 4 Ib Ho He (He-Ho)/He A Average AI L63 0,712 0,675 -0,05481 0,666462 0,01265 5 N 15 Fs Bo.F330 0,726 0,730 0,005102 0,721929 0,011356 14 N 16 Fs Bo.F330 0,923 0,910 -0,01429 0,898231 0,012933 25 L13.1 0,667 0,747 0,10714 9 1,66 Bo.F394 0,838 0,874 0,041264 0,868192 0,006256 19 L13.1 0,937 0,915 -0,02404 15 3,68 Bo.F394 0,900 0,895 -0,00559 0,884769 0,011431 23 L5.4 0,933 0,884 -0,05528 13 1,06 weighted average 0,005778 0,006858 L5.4 0,889 0,889 0 17 3,12 weighted average 0,004225 0,01204 L1.10 0,800 0,869 0,07928 19 1,20 L1.10 0,906 0,922 0,017354 18 3,43 L63 0,533 0,498 -0,07143 3 1,46 L63 0,760 0,698 -0,08883 5 2,75 Bo.F330 0,800 0,822 0,02703 14 1,20 Bo.F330 0,906 0,904 -0,00221 21 3,43 Bo.F394 0,800 0,844 0,05263 12 1,20 Bo.F394 0,896 0,894 -0,00224 18 3,43 weighted average 0,02811 weighted average -0,01379

Pa Ho He (He-Ho)/He A Average AI Pabc Ho He (He-Ho)/He A Pac Ho He (He-Ho)/He A N 39 Fs N 99 Fs Hs Fst N 35 Fs Hs Fst L13.1 0,923 0,840 -0,09902 1,92 L13.1 0,878 0,836 -0,05015 0,695478 0,167732 L13.1 0,858 0,916 0,063319 0,898143 0,019495 17 L5.4 0,795 0,714 -0,11275 1,20 L5.4 0,816 0,831 0,017544 0,68202 0,179182 L5.4 0,809 0,901 0,102109 0,891714 0,010306 19 L1.10 0,974 0,902 -0,08017 1,02 L1.10 0,929 0,907 -0,02323 0,755022 0,168008 L1.10 0,934 0,916 -0,01965 0,907571 0,009201 24 L63 0,513 0,433 -0,18541 1,48 L63 0,469 0,516 0,090304 0,422735 0,18072 L63 0,538 0,724 0,256906 0,705714 0,025257 6 Bo.F330 0,846 0,767 -0,10377 1,15 Bo.F330 0,714 0,776 0,079812 0,646639 0,166959 Bo.F330 0,833 0,903 0,077519 0,890429 0,013922 29 Bo.F394 0,795 0,799 0,00535 1,20 Bo.F394 0,776 0,852 0,090209 0,699733 0,179108 Bo.F394 0,875 0,914 0,04267 0,901 0,014223 23 weighted average -0,08789 weighted average 0,029039 0,173149 weighted average 0,080963 0,01506

Pb Ho He (He-Ho)/He A Average AI Pa Ho He (He-Ho)/He A Average AI N 12 Fs 10 Fs L13.1 0,833 0,788 -0,05727 1,83 L13.1 0,926 0,886 -0,04515 13 3,5 L5.4 0,750 0,813 0,07692 1,25 L5.4 0,926 0,906 -0,02208 16 3,5 L1.10 0,917 0,847 -0,08197 1,08 L1.10 0,938 0,889 -0,05512 14 3,43 L63 0,583 0,601 0,02890 1,41 L63 0,683 0,655 -0,04275 6 2,56 Bo.F330 0,417 0,531 0,21569 1,58 Bo.F330 0,944 0,924 -0,02165 20 3,56 Bo.F394 0,917 0,743 -0,23364 1,08 Bo.F394 0,889 0,881 -0,00908 15 3,25 weighted average -0,02169 weighted average -0,03209

Pc Ho He (He-Ho)/He A Average AI Pc Ho He (He-Ho)/He A Average AI N 48 Fs 25 Fs L13.1 0,851 0,816 -0,0432852 1,85 L13.1 0,833 0,903 0,077519 17 3,28 L5.4 0,851 0,879 0,03167654 1,14 L5.4 0,767 0,886 0,134312 17 3,04 L1.10 0,894 0,896 0,00227445 1,10 L1.10 0,933 0,915 -0,01967 21 3,6 L63 0,404 0,529 0,23544521 1,59 L63 0,480 0,726 0,338843 5 2,08 Bo.F330 0,681 0,821 0,1702069 1,31 Bo.F330 0,793 0,877 0,095781 24 3,16 Bo.F394 0,723 0,871 0,16900676 1,27 Bo.F394 0,870 0,909 0,042904 21 3,36 weighted average 0,0843725 weighted average 0,103528

Supplementary material S3: Linkage characteristics Wild and Intermediate populations M. Wesselink, A. Dragutinovic, J.W. Noordhoek, L. Bergwerff & I. Kuiper 2017 All data based on six STR markers ------Diploid populations - per sampling location & per type (N>25)

# of pairs with P value # pairs after Bonferroni corr. (original P value 0,05) N # of pairs <0,05 <0,01 <0,001 signicant with allele freq > 0,03 locus (allele repeat nr) Wa 58 2540 61 8 0 0 0 - Wb 38 1999 53 8 0 0 0 - Wc 47 1775 72 16 3 1 1 L63 (10) x Bo.F394 (32) We 35 1505 34 4 0 0 0 - Wabce 178 3892 128 23 4 1 1 L5.4 (21) x Bo.F330 (24) Ia 103 2847 105 30 6 2 1 L1.10 (19) x L63 (8) Iab 118 3202 117 36 7 3 1 L1.10 (19) x L63 (8)

Tetraploid populations - per location & per type & total (N>25) # of pairs with P value # pairs after Bonferroni corr. (original P value 0,05) # of pairs <0,05 <0,01 <0,001 signicant with allele freq > 0,03 locus (allele) Wd 49 5281 176 39 2 0 0 - Wabde 73 6339 263 48 10 0 0 - Iab 26 4555 113 14 1 0 0 - W&Itotal 99 7500 290 64 6 0 0 -

97 Chapter 5 DNA typing of birch:

Supplementary material S1a: Exact test of population differentiation M. Wesselink, A. Dragutinovic, J.W. Noordhoek, L. Bergwerff & I. Kuiper 2017 All data based on allele frequences of six STR markers

------

Non-differentiation exact P values Exact test of sample differentiation based on allele frequencies, sorted by ploidity, per marker 1000000 steps in Markov chain, 10000 dememorization steps Significance level 0,005 Calculated with Arlequin 3.5.2.2

L13.1 W a - di Wb - di W c - di W e - di Ia - di Ib - di Pa - di Pb - di Pc - di Wa - tetra Wb - tetra Wd - tetra Ia - tetra Ib - tetra Pa - tetra Pc - tetra W a - di Wb - di 0,94023 W c - di 0,00338 0,03944 W e - di 0,11815 0,15382 0,00101 Ia - di 0,00591 0,02229 0,00001 0,14102 Ib - di 0,11231 0,27005 0,0344 0,48142 0,06185 Pa - di 0,00003 0,01185 0,00002 0,00854 0 0,25154 Pb - di 0,91844 0,94803 0,14407 0,83987 0,90677 0,76524 0,58375 Pc - di 0,01487 0,21241 0,00907 0,03017 0,00031 0,69231 0,00476 0,76243 Wa - tetra 0 0 0 0 0 0,00018 0,00006 0,04647 0,00003 Wb - tetra 0 0,00007 0 0 0 0,00038 0,00043 0,02868 0,00001 0,02613 Wd - tetra 0 0 0 0 0 0,00002 0 0,00075 0 0,08497 0,00141 Ia - tetra 0 0 0 0 0 0,00007 0 0,00027 0 0,11708 0,00054 0,05771 Ib - tetra 0 0 0 0 0 0,00024 0 0,00769 0 0,79197 0,02764 0,34942 0,59816 Pa - tetra 0 0,00002 0 0,00004 0 0,00076 0 0,02959 0 0,0427 0,00247 0,00488 0,03016 0,30627 Pc - tetra 0 0 0 0 0 0 0 0,0006 0 0,30172 0,00949 0,00404 0,00204 0,17482 0,00112

L5.4 W a - di Wb - di W c - di W e - di Ia - di Ib - di Pa - di Pb - di Pc - di Wa - tetra Wb - tetra Wd - tetra Ia - tetra Ib - tetra Pa - tetra Pc - tetra W a - di Wb - di 0,0202 W c - di 0,00876 0,0038 W e - di 0,30948 0,00424 0,00072 Ia - di 0,58189 0,00586 0,00112 0,14021 Ib - di 0,04357 0,28542 0,23781 0,05384 0,0198 Pa - di 0,06024 0,12523 0,00014 0,00029 0,04743 0,0034 Pb - di 0,29154 0,05829 0,27069 0,09416 0,48705 0,36555 0,03932 Pc - di 0,32946 0,00071 0,06477 0,1116 0,04474 0,27996 0,00027 0,44584 Wa - tetra 0,00012 0 0,00012 0,00001 0 0,01242 0 0,16735 0,07704 Wb - tetra 0,00054 0,00035 0,00992 0,0005 0,00002 0,09011 0,00001 0,61331 0,02474 0,37665 Wd - tetra 0 0 0 0 0 0,00017 0 0,2033 0 0,03141 0,00708 Ia - tetra 0 0 0,00135 0 0 0,06696 0 0,14198 0,00033 0,06598 0,28743 0,01271 Ib - tetra 0,00238 0,00017 0,00232 0,00002 0,00005 0,01786 0,00006 0,51026 0,01929 0,49066 0,39033 0,47954 0,15357 Pa - tetra 0,00128 0,00265 0,00134 0,00033 0,00021 0,16397 0,00001 0,29623 0,05371 0,79704 0,56666 0,02169 0,19639 0,73224 Pc - tetra 0 0 0 0 0 0 0 0,00596 0 0,00306 0,00349 0 0,00365 0,04025 0,04285

L1.10 W a - di Wb - di W c - di W e - di Ia - di Ib - di Pa - di Pb - di Pc - di Wa - tetra Wb - tetra Wd - tetra Ia - tetra Ib - tetra Pa - tetra Pc - tetra W a - di Wb - di 0,02133 W c - di 0,00001 0,00076 W e - di 0,38238 0,04445 0 Ia - di 0,20841 0,07826 0 0,03124 Ib - di 0,69005 0,11815 0,15213 0,18415 0,63736 Pa - di 0,00651 0 0 0,01132 0 0,03783 Pb - di 0,3976 0,03439 0,00377 0,50405 0,09055 0,16937 0,29315 Pc - di 0,0249 0,00019 0 0,0411 0 0,00156 0,01395 0,12426 Wa - tetra 0 0 0 0 0 0,00001 0,00002 0,00005 0 Wb - tetra 0,0001 0 0 0,00002 0 0,00126 0,00757 0,00632 0,00159 0,12605 Wd - tetra 0 0 0 0 0 0,00749 0,00082 0,28391 0,00062 0,12248 0,42657 Ia - tetra 0,00007 0 0 0 0 0,00002 0,00002 0,00003 0,00064 0,13727 0,43191 0,37999 Ib - tetra 0 0 0 0 0 0,00007 0,00069 0,00222 0,00029 0,36908 0,60586 0,33744 0,46854 Pa - tetra 0,00116 0 0 0 0 0,0003 0,00449 0,00501 0,00711 0,09277 0,71366 0,75841 0,60502 0,91585 Pc - tetra 0 0 0 0 0 0 0 0,00092 0 0,0256 0,27432 0,00001 0,03919 0,06475 0,20296

L63 W a - di Wb - di W c - di W e - di Ia - di Ib - di Pa - di Pb - di Pc - di Wa - tetra Wb - tetra Wd - tetra Ia - tetra Ib - tetra Pa - tetra Pc - tetra W a - di Wb - di 0,74625 W c - di 0,0613 0,17624 W e - di 0,05771 0,251 0,94068 Ia - di 0,0071 0,07292 0,08513 0,20685 Ib - di 0,55455 0,90074 0,41816 0,55137 0,67783 Pa - di 0,00408 0,1303 0,01088 0,07218 0,02671 0,6993 Pb - di 0,77426 0,36686 0,39784 0,25629 0,2494 0,21654 0,00917 Pc - di 0,00436 0,21695 0,00523 0,05179 0,0003 0,89268 0,32404 0,10131 Wa - tetra 0 0 0 0 0 0 0 0 0 Wb - tetra 0 0 0 0 0 0 0 0 0 0,72756 Wd - tetra 0 0 0 0 0 0 0 0 0 0,0138 0,02149 Ia - tetra 0 0 0,00001 0,00002 0 0,00099 0,00004 0,00002 0,00128 0,15451 0,0625 0,74951 Ib - tetra 0 0 0 0 0 0,00001 0 0 0 0,03692 0,09825 0,63334 0,39992 Pa - tetra 0 0 0 0,00006 0 0,00864 0,00004 0,00037 0,00711 0,14819 0,11099 0,24461 0,81908 0,3563 Pc - tetra 0 0 0 0 0 0 0 0 0 0,01923 0,1961 0,04141 0,02458 0,42583 0,03552

Bo.F330 W a - di Wb - di W c - di W e - di Ia - di Ib - di Pa - di Pb - di Pc - di Wa - tetra Wb - tetra Wd - tetra Ia - tetra Ib - tetra Pa - tetra Pc - tetra W a - di Wb - di 0,06734 W c - di 0,00015 0,13847 W e - di 0,57537 0,0683 0,0001 Ia - di 0,00296 0,00075 0,0002 0,00064 Ib - di 0,01781 0,00277 0,00001 0,01878 0,00039 Pa - di 0,03161 0,03962 0,00038 0,12482 0,00004 0,03648 Pb - di 0,13069 0,18573 0,09957 0,12214 0,0693 0,12717 0,5761 Pc - di 0 0,00046 0 0,00144 0,00001 0,09888 0,19738 0,89044 Wa - tetra 0 0 0 0 0 0,00061 0 0,00004 0 Wb - tetra 0 0 0 0 0 0,00131 0 0 0 0,00603 Wd - tetra 0 0 0 0 0 0,0005 0 0,00576 0 0 0,00067 Ia - tetra 0 0,00002 0 0,00002 0 0,03858 0,00003 0,02174 0,00029 0,00365 0,00008 0,01931 Ib - tetra 0 0 0 0 0 0,00142 0 0,00089 0 0,00022 0,00214 0,43462 0,1774 Pa - tetra 0 0 0 0 0 0,09624 0 0,00328 0,00031 0,09252 0,07792 0,095 0,25633 0,6722 Pc - tetra 0 0 0 0 0 0,00986 0 0,00683 0,00001 0,00003 0,00019 0,00188 0,06799 0,27795 0,03063

Bo.F394 W a - di Wb - di W c - di W e - di Ia - di Ib - di Pa - di Pb - di Pc - di Wa - tetra Wb - tetra Wd - tetra Ia - tetra Ib - tetra Pa - tetra Pc - tetra W a - di Wb - di 0,19125 W c - di 0,19477 0,10498 W e - di 0,04384 0,04548 0,00149 Ia - di 0,05647 0,1276 0,00635 0,00036 Ib - di 0,58535 0,24962 0,25885 0,0943 0,26244 Pa - di 0,0011 0,03088 0,00039 0,00369 0,00043 0,33652 Pb - di 0,08644 0,01 0,00186 0,09801 0,29041 0,1468 0,28804 Pc - di 0,02597 0,77694 0,03313 0,24416 0,02109 0,15277 0,0054 0,05995 Wa - tetra 0,00014 0,00407 0,00002 0,00021 0,00005 0,07364 0 0,00504 0,01189 Wb - tetra 0,00072 0,00541 0,00056 0,0025 0,00142 0,12852 0,00059 0,08052 0,07611 0,66344 Wd - tetra 0 0,00057 0,00017 0,00093 0,00001 0,70767 0,01621 0,41322 0,00049 0,00006 0,09866 Ia - tetra 0,3682 0,71248 0,04147 0,10944 0,01851 0,51403 0,04301 0,03479 0,63677 0,2324 0,11031 0,02527 Ib - tetra 0,00153 0,06794 0,00016 0,00092 0,00466 0,16216 0,00591 0,52753 0,05785 0,20702 0,43654 0,00015 0,335 Pa - tetra 0,40813 0,03988 0,00923 0,1456 0,05828 0,34185 0,01083 0,5847 0,13131 0,35646 0,29299 0,03089 0,54753 0,43718 Pc - tetra 0,00163 0,02804 0,00512 0,00085 0,00001 0,33633 0,00002 0,09036 0,05313 0,01299 0,45441 0 0,25031 0,15063 0,1045 98 Development of a forensic STR system for Betula pendula and Betula pubescens

Non-differentiation exact P values Exact test of sample differentiation based on allele frequencies, sorted by sampling location, per marker 1000000 steps in Markov chain, 10000 dememorization steps

L13.1 W a - di Wa - tetra Wb - di Wb - tetra W c - di Wd - tetra W e - di Ia - di Ia - tetra Ib - di Ib - tetra Pa - di Pa - tetra Pb - di Pc - di Pc - tetra W a - di Wa - tetra 0 Wb - di 0,94023 0 Wb - tetra 0 0,02613 0,00007 W c - di 0,00338 0 0,03944 0 Wd - tetra 0 0,08497 0 0,00141 0 W e - di 0,11815 0 0,15382 0 0,00101 0 Ia - di 0,00591 0 0,02229 0 0,00001 0 0,14102 Ia - tetra 0 0,11708 0 0,00054 0 0,05771 0 0 Ib - di 0,11231 0,00018 0,27005 0,00038 0,0344 0,00002 0,48142 0,06185 0,00007 Ib - tetra 0 0,79197 0 0,02764 0 0,34942 0 0 0,59816 0,00024 Pa - di 0,00003 0,00006 0,01185 0,00043 0,00002 0 0,00854 0 0 0,25154 0 Pa - tetra 0 0,0427 0,00002 0,00247 0 0,00488 0,00004 0 0,03016 0,00076 0,30627 0 Pb - di 0,91844 0,04647 0,94803 0,02868 0,14407 0,00075 0,83987 0,90677 0,00027 0,76524 0,00769 0,58375 0,02959 Pc - di 0,01487 0,00003 0,21241 0,00001 0,00907 0 0,03017 0,00031 0 0,69231 0 0,00476 0 0,76243 Pc - tetra 0 0,30172 0 0,00949 0 0,00404 0 0 0,00204 0 0,17482 0 0,00112 0,0006 0

L5.4 W a - di Wa - tetra Wb - di Wb - tetra W c - di Wd - tetra W e - di Ia - di Ia - tetra Ib - di Ib - tetra Pa - di Pa - tetra Pb - di Pc - di Pc - tetra W a - di Wa - tetra 0,0001 Wb - di 0,01984 0 Wb - tetra 0,00075 0,37947 0,00035 W c - di 0,00878 0,00014 0,00396 0,01056 Wd - tetra 0 0,03337 0 0,0078 0 W e - di 0,30364 0,00007 0,0049 0,00057 0,00106 0 Ia - di 0,58272 0 0,00672 0,00005 0,00122 0 0,13551 Ia - tetra 0 0,06561 0 0,28683 0,00114 0,01155 0,00002 0 Ib - di 0,04154 0,01455 0,28157 0,09256 0,23874 0,00005 0,05506 0,02034 0,06449 Ib - tetra 0,00195 0,50361 0,00025 0,38548 0,00253 0,47839 0,00002 0,00004 0,15252 0,01636 Pa - di 0,05914 0 0,12755 0,0001 0,00028 0 0,00038 0,04565 0 0,00366 0,00001 Pa - tetra 0,00081 0,79433 0,00252 0,5653 0,00153 0,02335 0,00024 0,00006 0,19958 0,16127 0,73003 0,00003 Pb - di 0,29096 0,17 0,05696 0,60925 0,27728 0,20281 0,08948 0,49755 0,13776 0,36394 0,50183 0,03863 0,29998 Pc - di 0,32828 0,0758 0,00032 0,02073 0,06406 0 0,11576 0,04557 0,00033 0,28001 0,02103 0,00014 0,05322 0,43669 Pc - tetra 0 0,00403 0 0,00355 0 0 0 0 0,00314 0,00002 0,04134 0 0,04133 0,00569 0

L1.10 W a - di Wa - tetra Wb - di Wb - tetra W c - di Wd - tetra W e - di Ia - di Ia - tetra Ib - di Ib - tetra Pa - di Pa - tetra Pb - di Pc - di Pc - tetra W a - di Wa - tetra 0 Wb - di 0,02292 0 Wb - tetra 0,00025 0,12592 0 W c - di 0,00001 0 0,00094 0 Wd - tetra 0 0,12223 0 0,42515 0 W e - di 0,38545 0 0,04678 0 0 0 Ia - di 0,22187 0 0,07741 0 0,00001 0 0,02922 Ia - tetra 0,00001 0,13932 0 0,42738 0 0,36841 0 0 Ib - di 0,68075 0,00002 0,11624 0,00131 0,15337 0,00699 0,18005 0,6408 0,00002 Ib - tetra 0 0,37035 0 0,60944 0 0,33449 0 0 0,47546 0,00005 Pa - di 0,00536 0,00001 0 0,00835 0 0,00067 0,00909 0 0,00003 0,03779 0,00069 Pa - tetra 0,00077 0,09169 0 0,71058 0 0,74821 0 0,00001 0,61131 0,00015 0,91484 0,00368 Pb - di 0,39868 0,00008 0,03459 0,00504 0,00345 0,28413 0,50526 0,09232 0,00002 0,17193 0,00216 0,29101 0,00475 Pc - di 0,02494 0 0,00018 0,00159 0 0,00073 0,04091 0 0,00054 0,00135 0,00024 0,01398 0,00727 0,12665 Pc - tetra 0 0,02577 0 0,2672 0 0 0 0 0,0413 0 0,05771 0 0,20723 0,00101 0

W a - di Wa - tetra Wb - di Wb - tetra W c - di Wd - tetra W e - di Ia - di Ia - tetra Ib - di Ib - tetra Pa - di Pa - tetra Pb - di Pc - di Pc - tetra W a - di Wa - tetra 0 Wb - di 0,74639 0 Wb - tetra 0 0,72873 0 W c - di 0,06249 0 0,18096 0 Wd - tetra 0 0,0142 0 0,01951 0 W e - di 0,06094 0 0,25276 0 0,94086 0 Ia - di 0,00642 0 0,06994 0 0,08415 0 0,20647 Ia - tetra 0 0,15601 0 0,06492 0 0,74864 0,00004 0 Ib - di 0,55868 0,00002 0,90094 0 0,41888 0 0,54695 0,68344 0,00114 Ib - tetra 0 0,03752 0 0,09737 0 0,63676 0 0 0,39791 0,00001 Pa - di 0,00375 0 0,12789 0 0,0093 0 0,06958 0,03102 0,00004 0,69458 0 Pa - tetra 0 0,14041 0,00001 0,11235 0 0,24735 0,00014 0 0,81815 0,00884 0,35935 0,00014 Pb - di 0,77708 0 0,36673 0 0,39786 0 0,25199 0,24702 0,00006 0,21684 0 0,00891 0,00044 Pc - di 0,00471 0 0,21409 0 0,00504 0 0,05035 0,00059 0,00107 0,89214 0 0,32529 0,00658 0,10376 Pc - tetra 0 0,02131 0 0,19669 0 0,04171 0 0 0,02716 0 0,42638 0 0,03438 0 0

Bo.F330 W a - di Wa - tetra Wb - di Wb - tetra W c - di Wd - tetra W e - di Ia - di Ia - tetra Ib - di Ib - tetra Pa - di Pa - tetra Pb - di Pc - di Pc - tetra W a - di Wa - tetra 0 Wb - di 0,06548 0 Wb - tetra 0 0,00617 0 W c - di 0,00011 0 0,13935 0 Wd - tetra 0 0 0 0,00048 0 W e - di 0,57112 0 0,06722 0 0,00011 0 Ia - di 0,00289 0 0,00135 0 0,00021 0 0,00083 Ia - tetra 0 0,00313 0,00003 0,00004 0 0,01878 0,00003 0 Ib - di 0,01787 0,00066 0,00319 0,00135 0,00002 0,00014 0,01783 0,0004 0,03848 Ib - tetra 0 0,00007 0 0,00181 0 0,43408 0 0 0,18349 0,00202 Pa - di 0,03595 0 0,03549 0 0,00018 0 0,13565 0,00001 0,00007 0,03614 0 Pa - tetra 0 0,09604 0 0,07203 0 0,08897 0 0 0,25691 0,09882 0,6687 0 Pb - di 0,13337 0,00002 0,17505 0 0,09924 0,00588 0,12506 0,07032 0,02287 0,13061 0,0007 0,57095 0,00289 Pc - di 0 0 0,00034 0 0 0 0,00222 0 0,00055 0,09952 0 0,20242 0,00035 0,88945 Pc - tetra 0 0,00002 0 0,0002 0 0,00246 0 0 0,07054 0,00808 0,28433 0 0,02873 0,00763 0

Bo.F394 W a - di Wa - tetra Wb - di Wb - tetra W c - di Wd - tetra W e - di Ia - di Ia - tetra Ib - di Ib - tetra Pa - di Pa - tetra Pb - di Pc - di Pc - tetra W a - di Wa - tetra 0,00011 Wb - di 0,18951 0,00435 Wb - tetra 0,00058 0,66134 0,00552 W c - di 0,19068 0 0,10011 0,00071 Wd - tetra 0 0,00001 0,00058 0,09327 0,00016 W e - di 0,04303 0,00018 0,04526 0,0033 0,00127 0,00045 Ia - di 0,06319 0 0,12358 0,00184 0,00757 0,00002 0,00009 Ia - tetra 0,36165 0,23579 0,71677 0,11341 0,04342 0,02687 0,10909 0,02043 Ib - di 0,58444 0,06991 0,25443 0,12713 0,26228 0,70605 0,09177 0,26052 0,52347 Ib - tetra 0,00134 0,20949 0,06444 0,44155 0,00012 0,00007 0,0007 0,00536 0,33916 0,16169 Pa - di 0,00048 0 0,0283 0,00044 0,00046 0,01756 0,00421 0,00033 0,04197 0,33861 0,00556 Pa - tetra 0,40003 0,35407 0,03962 0,29191 0,00963 0,03198 0,15045 0,05866 0,55174 0,34021 0,43995 0,01006 Pb - di 0,08889 0,00398 0,01132 0,07731 0,00201 0,41151 0,09362 0,28295 0,03174 0,14447 0,52893 0,28872 0,58661 Pc - di 0,0263 0,01244 0,77698 0,07871 0,03485 0,00063 0,24778 0,01879 0,63025 0,14699 0,05482 0,00549 0,1341 0,06135 Pc - tetra 0,00225 0,01385 0,0255 0,44927 0,00642 0 0,00115 0 0,24944 0,3356 0,15717 0,00006 0,10761 0,09669 0,05762

99 Chapter 5 DNA typing of birch:

Supplementary material S1b: Distances between populations M. Wesselink, A. Dragutinovic, J.W. Noordhoek, L. Bergwerff & I. Kuiper 2017 All data based on allele frequences of six STR markers

------

Euclidean distance, sorted by ploidity Frequency distributions STR markers L13.1, L5.4, L1.10, L63, Bo.F330 and Bo.F394

W a - di Wb - di W c - di W e - di Ia - di Ib - di Pa - di Pb - di Pc - di total - di Wa - tetra Wb - tetra Wd - tetra Ia - tetra Ib - tetra Pa - tetra Pc - tetra total - tetra W a - di 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - di 0,37846 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W c - di 0,44409 0,4659 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W e - di 0,36558 0,45571 0,50513 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Ia - di 0,32388 0,4534 0,44729 0,40949 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Ib - di 0,51398 0,51692 0,59605 0,53227 0,53861 0 0 0 0 0 0 0 0 0 0 0 0 0 Pa - di 0,59435 0,58145 0,69407 0,60062 0,55695 0,61511 0 0 0 0 0 0 0 0 0 0 0 0 Pb - di 0,54174 0,676 0,67648 0,59127 0,56194 0,73324 0,68267 0 0 0 0 0 0 0 0 0 0 0 Pc - di 0,44071 0,4327 0,55503 0,45352 0,4429 0,52787 0,54765 0,71146 0 0 0 0 0 0 0 0 0 0 total - di 0,22552 0,30985 0,36591 0,30446 0,21481 0,43829 0,46538 0,52877 0,32506 0 0 0 0 0 0 0 0 0 Wa - tetra 0,94658 0,97926 0,91611 0,93535 0,88242 0,95214 0,95856 1,13995 0,80636 0,86344 0 0 0 0 0 0 0 0 Wb - tetra 0,92251 0,92075 0,91284 0,89211 0,91064 0,90789 0,91817 1,06967 0,80606 0,84738 0,53759 0 0 0 0 0 0 0 Wd - tetra 0,78781 0,81506 0,77837 0,76209 0,74662 0,78233 0,75203 0,93478 0,67494 0,69448 0,537 0,50308 0 0 0 0 0 0 Ia - tetra 0,86013 0,86002 0,7849 0,81514 0,80773 0,83786 0,82269 1,0499 0,72328 0,75501 0,61917 0,69501 0,50385 0 0 0 0 0 Ib - tetra 0,8545 0,85916 0,83127 0,84685 0,81921 0,86962 0,8384 1,02615 0,72006 0,76783 0,5152 0,52084 0,32105 0,49797 0 0 0 0 Pa - tetra 0,81294 0,83414 0,84598 0,80959 0,79104 0,81556 0,846 1,02571 0,67586 0,74251 0,55886 0,5947 0,43799 0,5517 0,42002 0 0 0 Pc - tetra 0,94288 0,98311 0,92388 0,94127 0,92848 0,98709 1,02972 1,12437 0,83231 0,88894 0,62256 0,6117 0,52299 0,65759 0,46391 0,62773 0 0 total - tetra 0,79648 0,82012 0,77761 0,77782 0,75994 0,80485 0,80245 0,97629 0,66417 0,70866 0,42645 0,4238 0,19898 0,44188 0,22301 0,37448 0,38872 0

------

Euclidean distance, sorted by ploidity, per marker

L13.1 W a - di Wb - di W c - di W e - di Ia - di Ib - di Pa - di Pb - di Pc - di total - di Wa - tetra Wb - tetra Wd - tetra Ia - tetra Ib - tetra Pa - tetra Pc - tetra total - tetra W a - di 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - di 0,13671 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W c - di 0,17523 0,2129 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W e - di 0,13492 0,16661 0,21273 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Ia - di 0,16468 0,23827 0,22867 0,15337 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Ib - di 0,21047 0,18642 0,27006 0,1543 0,25133 0 0 0 0 0 0 0 0 0 0 0 0 0 Pa - di 0,26752 0,21921 0,2486 0,22875 0,27887 0,23878 0 0 0 0 0 0 0 0 0 0 0 0 Pb - di 0,13254 0,14517 0,23713 0,0995 0,13249 0,16202 0,22431 0 0 0 0 0 0 0 0 0 0 0 Pc - di 0,14739 0,14434 0,20016 0,15355 0,21387 0,15976 0,22666 0,15399 0 0 0 0 0 0 0 0 0 0 total - di 0,09179 0,13029 0,15588 0,09203 0,1278 0,17031 0,19914 0,08934 0,11462 0 0 0 0 0 0 0 0 0 Wa - tetra 0,36208 0,36601 0,29897 0,38182 0,36998 0,4457 0,31208 0,37616 0,34356 0,33097 0 0 0 0 0 0 0 0 Wb - tetra 0,32177 0,30258 0,32058 0,34718 0,40476 0,39916 0,28706 0,35939 0,32064 0,32021 0,26999 0 0 0 0 0 0 0 Wd - tetra 0,35866 0,37019 0,32044 0,36374 0,40177 0,42416 0,32559 0,38293 0,34382 0,3414 0,17986 0,20701 0 0 0 0 0 0 Ia - tetra 0,4702 0,46969 0,36988 0,43239 0,47033 0,47566 0,34034 0,4683 0,43925 0,42029 0,28632 0,37291 0,26427 0 0 0 0 0 Ib - tetra 0,37386 0,3768 0,3139 0,37962 0,4012 0,43329 0,31829 0,39412 0,3532 0,3464 0,1449 0,24272 0,12908 0,22874 0 0 0 0 Pa - tetra 0,33373 0,38154 0,30975 0,34602 0,36181 0,43613 0,38453 0,37795 0,36678 0,33372 0,26673 0,30511 0,21081 0,32307 0,1972 0 0 0 Pc - tetra 0,40758 0,41109 0,37243 0,41881 0,43658 0,47476 0,36593 0,43209 0,36604 0,38537 0,18654 0,24905 0,17747 0,33786 0,18634 0,30046 0 0 total - tetra 0,34841 0,35625 0,30072 0,35599 0,38452 0,41852 0,30501 0,37308 0,3297 0,32587 0,13669 0,19908 0,0635 0,24887 0,08382 0,20154 0,14553 0

L5.4 W a - di Wb - di W c - di W e - di Ia - di Ib - di Pa - di Pb - di Pc - di total - di Wa - tetra Wb - tetra Wd - tetra Ia - tetra Ib - tetra Pa - tetra Pc - tetra total - tetra W a - di 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - di 0,16083 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W c - di 0,18516 0,19167 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W e - di 0,16217 0,23493 0,22318 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Ia - di 0,08617 0,17495 0,21298 0,17673 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Ib - di 0,22489 0,19121 0,17021 0,20079 0,22775 0 0 0 0 0 0 0 0 0 0 0 0 0 Pa - di 0,22399 0,25779 0,33493 0,32064 0,20039 0,36331 0 0 0 0 0 0 0 0 0 0 0 0 Pb - di 0,17441 0,24143 0,24011 0,2312 0,16206 0,28916 0,21405 0 0 0 0 0 0 0 0 0 0 0 Pc - di 0,14187 0,21225 0,16412 0,18018 0,15588 0,18958 0,32415 0,23837 0 0 0 0 0 0 0 0 0 0 total - di 0,06119 0,13732 0,15706 0,1522 0,06755 0,18465 0,22012 0,1605 0,12612 0 0 0 0 0 0 0 0 0 Wa - tetra 0,30399 0,33339 0,29894 0,32718 0,31322 0,31639 0,4632 0,36443 0,21649 0,29575 0 0 0 0 0 0 0 0 Wb - tetra 0,24329 0,25836 0,21195 0,26375 0,26117 0,25442 0,37506 0,2588 0,20335 0,22474 0,22574 0 0 0 0 0 0 0 Wd - tetra 0,18417 0,21954 0,22703 0,25347 0,19488 0,26078 0,32716 0,23434 0,18807 0,18211 0,20425 0,1993 0 0 0 0 0 0 Ia - tetra 0,3232 0,32884 0,25181 0,32869 0,34063 0,28747 0,47346 0,34278 0,27124 0,30595 0,25406 0,19992 0,21609 0 0 0 0 0 Ib - tetra 0,20862 0,2416 0,20762 0,28341 0,23882 0,26812 0,37566 0,26285 0,18846 0,21192 0,1973 0,18923 0,1056 0,20509 0 0 0 0 Pa - tetra 0,28647 0,28667 0,27703 0,31125 0,30612 0,28419 0,45921 0,36257 0,22287 0,28067 0,16884 0,21353 0,21134 0,24367 0,18122 0 0 0 Pc - tetra 0,35097 0,37067 0,32686 0,37823 0,3691 0,35462 0,50012 0,38734 0,29387 0,34569 0,27538 0,28238 0,26964 0,2673 0,22433 0,23995 0 0 total - tetra 0,22112 0,24808 0,21606 0,26752 0,24038 0,2543 0,38541 0,27065 0,17706 0,21434 0,16436 0,15911 0,09767 0,16895 0,0779 0,15246 0,19079 0

L1.10 W a - di Wb - di W c - di W e - di Ia - di Ib - di Pa - di Pb - di Pc - di total - di Wa - tetra Wb - tetra Wd - tetra Ia - tetra Ib - tetra Pa - tetra Pc - tetra total - tetra W a - di 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - di 0,21465 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W c - di 0,24934 0,22545 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W e - di 0,11852 0,21007 0,25955 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Ia - di 0,12313 0,20983 0,20762 0,15886 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Ib - di 0,18227 0,25472 0,21922 0,21843 0,19669 0 0 0 0 0 0 0 0 0 0 0 0 0 Pa - di 0,21346 0,30507 0,29996 0,23042 0,22407 0,29673 0 0 0 0 0 0 0 0 0 0 0 0 Pb - di 0,209 0,2656 0,31405 0,19783 0,24333 0,27988 0,27287 0 0 0 0 0 0 0 0 0 0 0 Pc - di 0,17323 0,21958 0,28743 0,18189 0,19417 0,3048 0,20804 0,23826 0 0 0 0 0 0 0 0 0 0 total - di 0,08971 0,16771 0,18759 0,11596 0,08326 0,18361 0,18791 0,20333 0,14675 0 0 0 0 0 0 0 0 0 Wa - tetra 0,31875 0,45788 0,40271 0,36074 0,29932 0,39401 0,3167 0,41771 0,32782 0,32221 0 0 0 0 0 0 0 0 Wb - tetra 0,2535 0,35328 0,33686 0,29646 0,25102 0,3445 0,24756 0,34898 0,25164 0,24792 0,21619 0 0 0 0 0 0 0 Wd - tetra 0,20545 0,33529 0,3021 0,24792 0,19675 0,29322 0,21496 0,29406 0,21309 0,19933 0,19066 0,14825 0 0 0 0 0 0 Ia - tetra 0,32925 0,40604 0,38218 0,37471 0,30248 0,41366 0,33071 0,42015 0,29944 0,3153 0,22119 0,19979 0,20967 0 0 0 0 0 Ib - tetra 0,27466 0,36668 0,31796 0,31276 0,25541 0,35775 0,25595 0,356 0,2519 0,25489 0,178 0,15161 0,14659 0,18833 0 0 0 0 Pa - tetra 0,28753 0,37862 0,36687 0,34312 0,28526 0,38279 0,28224 0,3678 0,26399 0,28282 0,23578 0,1746 0,17101 0,18371 0,14948 0 0 0 Pc - tetra 0,26105 0,39001 0,37116 0,31119 0,2694 0,35699 0,30907 0,35158 0,28131 0,27666 0,22072 0,17493 0,18192 0,21331 0,20078 0,18371 0 0 total - tetra 0,23111 0,35367 0,31995 0,27923 0,22051 0,32505 0,2402 0,32609 0,22989 0,22665 0,1576 0,11383 0,07938 0,15902 0,10463 0,12932 0,13007 0

L63 W a - di Wb - di W c - di W e - di Ia - di Ib - di Pa - di Pb - di Pc - di total - di Wa - tetra Wb - tetra Wd - tetra Ia - tetra Ib - tetra Pa - tetra Pc - tetra total - tetra W a - di 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - di 0,1036 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W c - di 0,16104 0,17514 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W e - di 0,17738 0,15671 0,05709 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Ia - di 0,14679 0,10761 0,09881 0,06358 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Ib - di 0,17704 0,07942 0,20887 0,17077 0,11526 0 0 0 0 0 0 0 0 0 0 0 0 0 Pa - di 0,28163 0,17946 0,29156 0,24203 0,20017 0,1072 0 0 0 0 0 0 0 0 0 0 0 0 Pb - di 0,13394 0,23343 0,19416 0,23966 0,2343 0,30069 0,40671 0 0 0 0 0 0 0 0 0 0 0 Pc - di 0,21456 0,11915 0,2374 0,19667 0,15508 0,0781 0,09864 0,33947 0 0 0 0 0 0 0 0 0 0 total - di 0,12137 0,06573 0,12119 0,09402 0,04405 0,09059 0,18529 0,22916 0,12802 0 0 0 0 0 0 0 0 0 Wa - tetra 0,49046 0,46867 0,37613 0,36279 0,39761 0,45981 0,47298 0,51958 0,42999 0,41727 0 0 0 0 0 0 0 0 Wb - tetra 0,52781 0,51584 0,42738 0,4215 0,4546 0,51535 0,53457 0,54753 0,47821 0,46928 0,10582 0 0 0 0 0 0 0 Wd - tetra 0,44307 0,41202 0,34118 0,32143 0,34989 0,40008 0,40842 0,48778 0,35968 0,36436 0,20875 0,21485 0 0 0 0 0 0 Ia - tetra 0,41063 0,35773 0,30244 0,26453 0,28662 0,32553 0,31499 0,47624 0,29301 0,30645 0,21945 0,27959 0,13588 0 0 0 0 0 Ib - tetra 0,47683 0,45151 0,38539 0,36995 0,39556 0,44287 0,45318 0,51639 0,39892 0,40737 0,22905 0,20662 0,06341 0,19177 0 0 0 0 Pa - tetra 0,36394 0,31268 0,29092 0,25734 0,2671 0,28964 0,28793 0,43843 0,2403 0,27353 0,23724 0,27113 0,14059 0,1118 0,17782 0 0 0 Pc - tetra 0,55649 0,54921 0,46772 0,46503 0,49516 0,55241 0,5742 0,5719 0,51007 0,50643 0,26514 0,19146 0,18502 0,3144 0,13429 0,29597 0 0 total - tetra 0,45846 0,43305 0,35762 0,34194 0,37131 0,42455 0,43669 0,49673 0,38364 0,38556 0,17102 0,1627 0,05418 0,16363 0,06127 0,15743 0,15394 0

Bo.F330 W a - di Wb - di W c - di W e - di Ia - di Ib - di Pa - di Pb - di Pc - di total - di Wa - tetra Wb - tetra Wd - tetra Ia - tetra Ib - tetra Pa - tetra Pc - tetra total - tetra W a - di 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - di 0,15151 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W c - di 0,16994 0,18109 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W e - di 0,0901 0,14956 0,20096 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Ia - di 0,14498 0,19822 0,16527 0,20132 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Ib - di 0,242 0,2511 0,34754 0,24953 0,27238 0 0 0 0 0 0 0 0 0 0 0 0 0 Pa - di 0,17132 0,16402 0,2314 0,16645 0,20446 0,21097 0 0 0 0 0 0 0 0 0 0 0 0 Pb - di 0,213 0,28843 0,21109 0,22393 0,24365 0,39352 0,27967 0 0 0 0 0 0 0 0 0 0 0 Pc - di 0,22605 0,21275 0,27495 0,22591 0,20968 0,19885 0,12542 0,314 0 0 0 0 0 0 0 0 0 0 total - di 0,08556 0,12391 0,14584 0,11956 0,10322 0,22513 0,12429 0,21962 0,16069 0 0 0 0 0 0 0 0 0 Wa - tetra 0,51911 0,46994 0,53769 0,53086 0,49577 0,38707 0,42479 0,64213 0,38138 0,47014 0 0 0 0 0 0 0 0 Wb - tetra 0,51243 0,46056 0,56409 0,50468 0,51585 0,36884 0,44492 0,63643 0,39354 0,47709 0,27203 0 0 0 0 0 0 0 Wd - tetra 0,4082 0,35858 0,44246 0,4018 0,39986 0,29224 0,32821 0,5283 0,26578 0,36064 0,27643 0,26487 0 0 0 0 0 0 Ia - tetra 0,34105 0,31524 0,38385 0,3485 0,336 0,26273 0,27695 0,47302 0,24255 0,30241 0,30556 0,34898 0,17714 0 0 0 0 0 Ib - tetra 0,45387 0,39894 0,50273 0,44723 0,45899 0,3347 0,37363 0,58575 0,32144 0,41508 0,29683 0,27745 0,12695 0,22517 0 0 0 0 Pa - tetra 0,47042 0,41366 0,52355 0,45955 0,47436 0,31502 0,37483 0,60398 0,32447 0,42835 0,24587 0,24914 0,16253 0,24622 0,16322 0 0 0 Pc - tetra 0,43832 0,4071 0,47316 0,43955 0,43186 0,34367 0,37484 0,55234 0,32213 0,39971 0,32325 0,33897 0,2008 0,23822 0,18945 0,27042 0 0 total - tetra 0,41513 0,3654 0,45597 0,41195 0,41002 0,28787 0,33227 0,54366 0,27388 0,36999 0,23477 0,23536 0,07215 0,17368 0,10398 0,14111 0,16282 0

Bo.F394 W a - di Wb - di W c - di W e - di Ia - di Ib - di Pa - di Pb - di Pc - di total - di Wa - tetra Wb - tetra Wd - tetra Ia - tetra Ib - tetra Pa - tetra Pc - tetra total - tetra W a - di 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - di 0,13752 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W c - di 0,12345 0,1439 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W e - di 0,18846 0,18348 0,22156 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Ia - di 0,11237 0,15261 0,14912 0,20764 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Ib - di 0,21488 0,24829 0,20225 0,28292 0,22114 0 0 0 0 0 0 0 0 0 0 0 0 0 Pa - di 0,27799 0,26719 0,28146 0,25734 0,24534 0,21284 0 0 0 0 0 0 0 0 0 0 0 0 Pb - di 0,37217 0,41158 0,40356 0,37354 0,31384 0,32275 0,22763 0 0 0 0 0 0 0 0 0 0 0 Pc - di 0,15885 0,11693 0,16273 0,16385 0,14285 0,28043 0,27337 0,39374 0 0 0 0 0 0 0 0 0 0 total - di 0,09269 0,11125 0,11738 0,15645 0,07565 0,19076 0,20826 0,32213 0,11346 0 0 0 0 0 0 0 0 0 Wa - tetra 0,2467 0,25235 0,26027 0,28012 0,22403 0,30219 0,32005 0,41037 0,21795 0,2226 0 0 0 0 0 0 0 0 Wb - tetra 0,28785 0,29404 0,26696 0,29241 0,24787 0,28267 0,2738 0,34848 0,24225 0,23603 0,18225 0 0 0 0 0 0 0 Wd - tetra 0,23009 0,26138 0,21956 0,24084 0,1924 0,18077 0,17824 0,26222 0,23777 0,17332 0,24073 0,17949 0 0 0 0 0 0 Ia - tetra 0,14466 0,13672 0,17604 0,19395 0,16925 0,21683 0,22703 0,37203 0,14925 0,12251 0,21538 0,25325 0,20863 0 0 0 0 0 Ib - tetra 0,19514 0,20085 0,2168 0,24091 0,14398 0,24591 0,22361 0,30343 0,17973 0,14648 0,18237 0,18412 0,18339 0,17506 0 0 0 0 Pa - tetra 0,1761 0,23622 0,22285 0,20901 0,15298 0,254 0,2322 0,29238 0,19971 0,15129 0,20087 0,22116 0,16524 0,18371 0,15518 0 0 0 Pc - tetra 0,17257 0,2016 0,16618 0,24745 0,176 0,26924 0,3325 0,41438 0,17125 0,17155 0,23119 0,22416 0,24833 0,21383 0,18956 0,22743 0 0 total - tetra 0,16781 0,19498 0,17142 0,20523 0,13142 0,19794 0,20871 0,3034 0,16134 0,11464 0,16421 0,14083 0,1076 0,15017 0,10548 0,12231 0,16247 0

100 Development of a forensic STR system for Betula pendula and Betula pubescens

Euclidean distance, sorted by sampling location Frequency distributions STR markers L13.1, L5.4, L1.10, L63, Bo.F330 and Bo.F394

W a - di Wa - tetra Wb - di Wb - tetra W c - di Wd - tetra W e - di Ia - di Ia - tetra Ib - di Ib - tetra Pa - di Pa - tetra Pb - di Pc - di Pc - tetra total - di total - tetra W a - di 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wa - tetra 0,94658 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - di 0,37846 0,97926 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - tetra 0,92251 0,53759 0,92075 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W c - di 0,44409 0,91611 0,4659 0,91284 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wd - tetra 0,78781 0,537 0,81506 0,50308 0,77837 0 0 0 0 0 0 0 0 0 0 0 0 0 W e - di 0,36558 0,93535 0,45571 0,89211 0,50513 0,76209 0 0 0 0 0 0 0 0 0 0 0 0 Ia - di 0,32388 0,88242 0,4534 0,91064 0,44729 0,74662 0,40949 0 0 0 0 0 0 0 0 0 0 0 Ia - tetra 0,86013 0,61917 0,86002 0,69501 0,7849 0,50385 0,81514 0,80773 0 0 0 0 0 0 0 0 0 0 Ib - di 0,51398 0,95214 0,51692 0,90789 0,59605 0,78233 0,53227 0,53861 0,83786 0 0 0 0 0 0 0 0 0 Ib - tetra 0,8545 0,5152 0,85916 0,52084 0,83127 0,32105 0,84685 0,81921 0,49797 0,86962 0 0 0 0 0 0 0 0 Pa - di 0,59435 0,95856 0,58145 0,91817 0,69407 0,75203 0,60062 0,55695 0,82269 0,61511 0,8384 0 0 0 0 0 0 0 Pa - tetra 0,81294 0,55886 0,83414 0,5947 0,84598 0,43799 0,80959 0,79104 0,5517 0,81556 0,42002 0,846 0 0 0 0 0 0 Pb - di 0,54174 1,13995 0,676 1,06967 0,67648 0,93478 0,59127 0,56194 1,0499 0,73324 1,02615 0,68267 1,02571 0 0 0 0 0 Pc - di 0,44071 0,80636 0,4327 0,80606 0,55503 0,67494 0,45352 0,4429 0,72328 0,52787 0,72006 0,54765 0,67586 0,71146 0 0 0 0 Pc - tetra 0,94288 0,62256 0,98311 0,6117 0,92388 0,52299 0,94127 0,92848 0,65759 0,98709 0,46391 1,02972 0,62773 1,12437 0,83231 0 0 0 total - di 0,22552 0,86344 0,30985 0,84738 0,36591 0,69448 0,30446 0,21481 0,75501 0,43829 0,76783 0,46538 0,74251 0,52877 0,32506 0,88894 0 0 total - tetra 0,79648 0,42645 0,82012 0,4238 0,77761 0,19898 0,77782 0,75994 0,44188 0,80485 0,22301 0,80245 0,37448 0,97629 0,66417 0,38872 0,70866 0

Euclidean distance, sorted by sampling location, per marker

L13.1 W a - di Wa - tetra Wb - di Wb - tetra W c - di Wd - tetra W e - di Ia - di Ia - tetra Ib - di Ib - tetra Pa - di Pa - tetra Pb - di Pc - di Pc - tetra total - di total - tetra W a - di 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wa - tetra 0,36208 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - di 0,13671 0,36601 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - tetra 0,32177 0,26999 0,30258 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W c - di 0,17523 0,29897 0,2129 0,32058 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wd - tetra 0,35866 0,17986 0,37019 0,20701 0,32044 0 0 0 0 0 0 0 0 0 0 0 0 0 W e - di 0,13492 0,38182 0,16661 0,34718 0,21273 0,36374 0 0 0 0 0 0 0 0 0 0 0 0 Ia - di 0,16468 0,36998 0,23827 0,40476 0,22867 0,40177 0,15337 0 0 0 0 0 0 0 0 0 0 0 Ia - tetra 0,4702 0,28632 0,46969 0,37291 0,36988 0,26427 0,43239 0,47033 0 0 0 0 0 0 0 0 0 0 Ib - di 0,21047 0,4457 0,18642 0,39916 0,27006 0,42416 0,1543 0,25133 0,47566 0 0 0 0 0 0 0 0 0 Ib - tetra 0,37386 0,1449 0,3768 0,24272 0,3139 0,12908 0,37962 0,4012 0,22874 0,43329 0 0 0 0 0 0 0 0 Pa - di 0,26752 0,31208 0,21921 0,28706 0,2486 0,32559 0,22875 0,27887 0,34034 0,23878 0,31829 0 0 0 0 0 0 0 Pa - tetra 0,33373 0,26673 0,38154 0,30511 0,30975 0,21081 0,34602 0,36181 0,32307 0,43613 0,1972 0,38453 0 0 0 0 0 0 Pb - di 0,13254 0,37616 0,14517 0,35939 0,23713 0,38293 0,0995 0,13249 0,4683 0,16202 0,39412 0,22431 0,37795 0 0 0 0 0 Pc - di 0,14739 0,34356 0,14434 0,32064 0,20016 0,34382 0,15355 0,21387 0,43925 0,15976 0,3532 0,22666 0,36678 0,15399 0 0 0 0 Pc - tetra 0,40758 0,18654 0,41109 0,24905 0,37243 0,17747 0,41881 0,43658 0,33786 0,47476 0,18634 0,36593 0,30046 0,43209 0,36604 0 0 0 total - di 0,09179 0,33097 0,13029 0,32021 0,15588 0,3414 0,09203 0,1278 0,42029 0,17031 0,3464 0,19914 0,33372 0,08934 0,11462 0,38537 0 0 total - tetra 0,34841 0,13669 0,35625 0,19908 0,30072 0,0635 0,35599 0,38452 0,24887 0,41852 0,08382 0,30501 0,20154 0,37308 0,3297 0,14553 0,32587 0

L5.4 W a - di Wa - tetra Wb - di Wb - tetra W c - di Wd - tetra W e - di Ia - di Ia - tetra Ib - di Ib - tetra Pa - di Pa - tetra Pb - di Pc - di Pc - tetra total - di total - tetra W a - di 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wa - tetra 0,30399 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - di 0,16083 0,33339 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - tetra 0,24329 0,22574 0,25836 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W c - di 0,18516 0,29894 0,19167 0,21195 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wd - tetra 0,18417 0,20425 0,21954 0,1993 0,22703 0 0 0 0 0 0 0 0 0 0 0 0 0 W e - di 0,16217 0,32718 0,23493 0,26375 0,22318 0,25347 0 0 0 0 0 0 0 0 0 0 0 0 Ia - di 0,08617 0,31322 0,17495 0,26117 0,21298 0,19488 0,17673 0 0 0 0 0 0 0 0 0 0 0 Ia - tetra 0,3232 0,25406 0,32884 0,19992 0,25181 0,21609 0,32869 0,34063 0 0 0 0 0 0 0 0 0 0 Ib - di 0,22489 0,31639 0,19121 0,25442 0,17021 0,26078 0,20079 0,22775 0,28747 0 0 0 0 0 0 0 0 0 Ib - tetra 0,20862 0,1973 0,2416 0,18923 0,20762 0,1056 0,28341 0,23882 0,20509 0,26812 0 0 0 0 0 0 0 0 Pa - di 0,22399 0,4632 0,25779 0,37506 0,33493 0,32716 0,32064 0,20039 0,47346 0,36331 0,37566 0 0 0 0 0 0 0 Pa - tetra 0,28647 0,16884 0,28667 0,21353 0,27703 0,21134 0,31125 0,30612 0,24367 0,28419 0,18122 0,45921 0 0 0 0 0 0 Pb - di 0,17441 0,36443 0,24143 0,2588 0,24011 0,23434 0,2312 0,16206 0,34278 0,28916 0,26285 0,21405 0,36257 0 0 0 0 0 Pc - di 0,14187 0,21649 0,21225 0,20335 0,16412 0,18807 0,18018 0,15588 0,27124 0,18958 0,18846 0,32415 0,22287 0,23837 0 0 0 0 Pc - tetra 0,35097 0,27538 0,37067 0,28238 0,32686 0,26964 0,37823 0,3691 0,2673 0,35462 0,22433 0,50012 0,23995 0,38734 0,29387 0 0 0 total - di 0,06119 0,29575 0,13732 0,22474 0,15706 0,18211 0,1522 0,06755 0,30595 0,18465 0,21192 0,22012 0,28067 0,1605 0,12612 0,34569 0 0 total - tetra 0,22112 0,16436 0,24808 0,15911 0,21606 0,09767 0,26752 0,24038 0,16895 0,2543 0,0779 0,38541 0,15246 0,27065 0,17706 0,19079 0,21434 0

L1.10 W a - di Wa - tetra Wb - di Wb - tetra W c - di Wd - tetra W e - di Ia - di Ia - tetra Ib - di Ib - tetra Pa - di Pa - tetra Pb - di Pc - di Pc - tetra total - di total - tetra W a - di 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wa - tetra 0,31875 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - di 0,21465 0,45788 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - tetra 0,2535 0,21619 0,35328 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W c - di 0,24934 0,40271 0,22545 0,33686 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wd - tetra 0,20545 0,19066 0,33529 0,14825 0,3021 0 0 0 0 0 0 0 0 0 0 0 0 0 W e - di 0,11852 0,36074 0,21007 0,29646 0,25955 0,24792 0 0 0 0 0 0 0 0 0 0 0 0 Ia - di 0,12313 0,29932 0,20983 0,25102 0,20762 0,19675 0,15886 0 0 0 0 0 0 0 0 0 0 0 Ia - tetra 0,32925 0,22119 0,40604 0,19979 0,38218 0,20967 0,37471 0,30248 0 0 0 0 0 0 0 0 0 0 Ib - di 0,18227 0,39401 0,25472 0,3445 0,21922 0,29322 0,21843 0,19669 0,41366 0 0 0 0 0 0 0 0 0 Ib - tetra 0,27466 0,178 0,36668 0,15161 0,31796 0,14659 0,31276 0,25541 0,18833 0,35775 0 0 0 0 0 0 0 0 Pa - di 0,21346 0,3167 0,30507 0,24756 0,29996 0,21496 0,23042 0,22407 0,33071 0,29673 0,25595 0 0 0 0 0 0 0 Pa - tetra 0,28753 0,23578 0,37862 0,1746 0,36687 0,17101 0,34312 0,28526 0,18371 0,38279 0,14948 0,28224 0 0 0 0 0 0 Pb - di 0,209 0,41771 0,2656 0,34898 0,31405 0,29406 0,19783 0,24333 0,42015 0,27988 0,356 0,27287 0,3678 0 0 0 0 0 Pc - di 0,17323 0,32782 0,21958 0,25164 0,28743 0,21309 0,18189 0,19417 0,29944 0,3048 0,2519 0,20804 0,26399 0,23826 0 0 0 0 Pc - tetra 0,26105 0,22072 0,39001 0,17493 0,37116 0,18192 0,31119 0,2694 0,21331 0,35699 0,20078 0,30907 0,18371 0,35158 0,28131 0 0 0 total - di 0,08971 0,32221 0,16771 0,24792 0,18759 0,19933 0,11596 0,08326 0,3153 0,18361 0,25489 0,18791 0,28282 0,20333 0,14675 0,27666 0 0 total - tetra 0,23111 0,1576 0,35367 0,11383 0,31995 0,07938 0,27923 0,22051 0,15902 0,32505 0,10463 0,2402 0,12932 0,32609 0,22989 0,13007 0,22665 0

L63 W a - di Wa - tetra Wb - di Wb - tetra W c - di Wd - tetra W e - di Ia - di Ia - tetra Ib - di Ib - tetra Pa - di Pa - tetra Pb - di Pc - di Pc - tetra total - di total - tetra W a - di 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wa - tetra 0,49046 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - di 0,1036 0,46867 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - tetra 0,52781 0,10582 0,51584 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W c - di 0,16104 0,37613 0,17514 0,42738 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wd - tetra 0,44307 0,20875 0,41202 0,21485 0,34118 0 0 0 0 0 0 0 0 0 0 0 0 0 W e - di 0,17738 0,36279 0,15671 0,4215 0,05709 0,32143 0 0 0 0 0 0 0 0 0 0 0 0 Ia - di 0,14679 0,39761 0,10761 0,4546 0,09881 0,34989 0,06358 0 0 0 0 0 0 0 0 0 0 0 Ia - tetra 0,41063 0,21945 0,35773 0,27959 0,30244 0,13588 0,26453 0,28662 0 0 0 0 0 0 0 0 0 0 Ib - di 0,17704 0,45981 0,07942 0,51535 0,20887 0,40008 0,17077 0,11526 0,32553 0 0 0 0 0 0 0 0 0 Ib - tetra 0,47683 0,22905 0,45151 0,20662 0,38539 0,06341 0,36995 0,39556 0,19177 0,44287 0 0 0 0 0 0 0 0 Pa - di 0,28163 0,47298 0,17946 0,53457 0,29156 0,40842 0,24203 0,20017 0,31499 0,1072 0,45318 0 0 0 0 0 0 0 Pa - tetra 0,36394 0,23724 0,31268 0,27113 0,29092 0,14059 0,25734 0,2671 0,1118 0,28964 0,17782 0,28793 0 0 0 0 0 0 Pb - di 0,13394 0,51958 0,23343 0,54753 0,19416 0,48778 0,23966 0,2343 0,47624 0,30069 0,51639 0,40671 0,43843 0 0 0 0 0 Pc - di 0,21456 0,42999 0,11915 0,47821 0,2374 0,35968 0,19667 0,15508 0,29301 0,0781 0,39892 0,09864 0,2403 0,33947 0 0 0 0 Pc - tetra 0,55649 0,26514 0,54921 0,19146 0,46772 0,18502 0,46503 0,49516 0,3144 0,55241 0,13429 0,5742 0,29597 0,5719 0,51007 0 0 0 total - di 0,12137 0,41727 0,06573 0,46928 0,12119 0,36436 0,09402 0,04405 0,30645 0,09059 0,40737 0,18529 0,27353 0,22916 0,12802 0,50643 0 0 total - tetra 0,45846 0,17102 0,43305 0,1627 0,35762 0,05418 0,34194 0,37131 0,16363 0,42455 0,06127 0,43669 0,15743 0,49673 0,38364 0,15394 0,38556 0

Bo.F330 W a - di Wa - tetra Wb - di Wb - tetra W c - di Wd - tetra W e - di Ia - di Ia - tetra Ib - di Ib - tetra Pa - di Pa - tetra Pb - di Pc - di Pc - tetra total - di total - tetra W a - di 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wa - tetra 0,51911 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - di 0,15151 0,46994 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - tetra 0,51243 0,27203 0,46056 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W c - di 0,16994 0,53769 0,18109 0,56409 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wd - tetra 0,4082 0,27643 0,35858 0,26487 0,44246 0 0 0 0 0 0 0 0 0 0 0 0 0 W e - di 0,0901 0,53086 0,14956 0,50468 0,20096 0,4018 0 0 0 0 0 0 0 0 0 0 0 0 Ia - di 0,14498 0,49577 0,19822 0,51585 0,16527 0,39986 0,20132 0 0 0 0 0 0 0 0 0 0 0 Ia - tetra 0,34105 0,30556 0,31524 0,34898 0,38385 0,17714 0,3485 0,336 0 0 0 0 0 0 0 0 0 0 Ib - di 0,242 0,38707 0,2511 0,36884 0,34754 0,29224 0,24953 0,27238 0,26273 0 0 0 0 0 0 0 0 0 Ib - tetra 0,45387 0,29683 0,39894 0,27745 0,50273 0,12695 0,44723 0,45899 0,22517 0,3347 0 0 0 0 0 0 0 0 Pa - di 0,17132 0,42479 0,16402 0,44492 0,2314 0,32821 0,16645 0,20446 0,27695 0,21097 0,37363 0 0 0 0 0 0 0 Pa - tetra 0,47042 0,24587 0,41366 0,24914 0,52355 0,16253 0,45955 0,47436 0,24622 0,31502 0,16322 0,37483 0 0 0 0 0 0 Pb - di 0,213 0,64213 0,28843 0,63643 0,21109 0,5283 0,22393 0,24365 0,47302 0,39352 0,58575 0,27967 0,60398 0 0 0 0 0 Pc - di 0,22605 0,38138 0,21275 0,39354 0,27495 0,26578 0,22591 0,20968 0,24255 0,19885 0,32144 0,12542 0,32447 0,314 0 0 0 0 Pc - tetra 0,43832 0,32325 0,4071 0,33897 0,47316 0,2008 0,43955 0,43186 0,23822 0,34367 0,18945 0,37484 0,27042 0,55234 0,32213 0 0 0 total - di 0,08556 0,47014 0,12391 0,47709 0,14584 0,36064 0,11956 0,10322 0,30241 0,22513 0,41508 0,12429 0,42835 0,21962 0,16069 0,39971 0 0 total - tetra 0,41513 0,23477 0,3654 0,23536 0,45597 0,07215 0,41195 0,41002 0,17368 0,28787 0,10398 0,33227 0,14111 0,54366 0,27388 0,16282 0,36999 0

Bo.F394 W a - di Wa - tetra Wb - di Wb - tetra W c - di Wd - tetra W e - di Ia - di Ia - tetra Ib - di Ib - tetra Pa - di Pa - tetra Pb - di Pc - di Pc - tetra total - di total - tetra W a - di 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wa - tetra 0,2467 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - di 0,13752 0,25235 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wb - tetra 0,28785 0,18225 0,29404 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W c - di 0,12345 0,26027 0,1439 0,26696 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wd - tetra 0,23009 0,24073 0,26138 0,17949 0,21956 0 0 0 0 0 0 0 0 0 0 0 0 0 W e - di 0,18846 0,28012 0,18348 0,29241 0,22156 0,24084 0 0 0 0 0 0 0 0 0 0 0 0 Ia - di 0,11237 0,22403 0,15261 0,24787 0,14912 0,1924 0,20764 0 0 0 0 0 0 0 0 0 0 0 Ia - tetra 0,14466 0,21538 0,13672 0,25325 0,17604 0,20863 0,19395 0,16925 0 0 0 0 0 0 0 0 0 0 Ib - di 0,21488 0,30219 0,24829 0,28267 0,20225 0,18077 0,28292 0,22114 0,21683 0 0 0 0 0 0 0 0 0 Ib - tetra 0,19514 0,18237 0,20085 0,18412 0,2168 0,18339 0,24091 0,14398 0,17506 0,24591 0 0 0 0 0 0 0 0 Pa - di 0,27799 0,32005 0,26719 0,2738 0,28146 0,17824 0,25734 0,24534 0,22703 0,21284 0,22361 0 0 0 0 0 0 0 Pa - tetra 0,1761 0,20087 0,23622 0,22116 0,22285 0,16524 0,20901 0,15298 0,18371 0,254 0,15518 0,2322 0 0 0 0 0 0 Pb - di 0,37217 0,41037 0,41158 0,34848 0,40356 0,26222 0,37354 0,31384 0,37203 0,32275 0,30343 0,22763 0,29238 0 0 0 0 0 Pc - di 0,15885 0,21795 0,11693 0,24225 0,16273 0,23777 0,16385 0,14285 0,14925 0,28043 0,17973 0,27337 0,19971 0,39374 0 0 0 0 Pc - tetra 0,17257 0,23119 0,2016 0,22416 0,16618 0,24833 0,24745 0,176 0,21383 0,26924 0,18956 0,3325 0,22743 0,41438 0,17125 0 0 0 total - di 0,09269 0,2226 0,11125 0,23603 0,11738 0,17332 0,15645 0,07565 0,12251 0,19076 0,14648 0,20826 0,15129 0,32213 0,11346 0,17155 0 0 total - tetra 0,16781 0,16421 0,19498 0,14083 0,17142 0,1076 0,20523 0,13142 0,15017 0,19794 0,10548 0,20871 0,12231 0,3034 0,16134 0,16247 0,11464 0

101

Chapter 6

The forensic potential of DNA typing of birch (Betula) seeds

M. Wesselink‡, A. Dragutinović‡, E.M. van Ark and I. Kuiper

Submitted to Forensic Science International: Genetics1

Abstract Silver birch (Betula pendula, a diploid species) and downy birch (Betula pubescens, a tetraploid species) are two common tree species in North West Europe. Due to their ubiquity and the large numbers of light, wind dispersed seeds produced by each tree, seeds are often encountered as trace evidence in forensic investigations. Birches produce fertilized seeds containing embryonic DNA and maternal DNA as well as unfertilized seeds containing only maternal DNA. When DNA profiles of unfertilized seeds are obtained, these profiles can be compared to potential trees of origin, resulting in evidence ranging from exclusions to matches with random match probabilities in the order of one in one million. However, because unfertilized birch seeds contain only minute amounts of DNA, existing techniques designed for DNA typing of birch are insufficient. To obtain a method capable of robustly typing DNA from unfertilized birch seeds, several parameters were optimized. Two methods were compared to determine the fertilization state of birch seeds. The method based on the intensity of DNA profiles obtained from the DNA extract of total seeds was preferred over the dissection of seeds prior to DNA extraction. DNA extraction was optimized to maximize the DNA yield. Subsequently, the DNA was concentrated to enable the use of the total DNA yield as template for triplicate multiplex PCRs, amplifying four DNA markers L13.1, L5.4, L1.10 and L63 in each reaction, to generate a robust consensus profile. Finally, considerations for the interpretation of (partial) consensus profiles are described, including the differences in calculations of random match probabilities between full diploid profiles and partial tetraploid profiles. The application of this optimized method for DNA typing of birch seeds is illustrated with a case report in which matches were reported between birch seeds found in a suspect’s car and birch trees at the crime scene where the victim was discovered.

‡ Both authors contributed equally to this publication. 1

103 Chapter 6

1. Introduction In North West Europe, silver birch (Betula pendula) and downy birch (Betula pubescens) are widely occurring species. Both species are wind pollinating pioneer trees or shrubs, with wide climatic and habitat tolerance, that easily lose fragments such as twigs, leaves and catkin scales on impact or due to environmental conditions as storms. Additionally large numbers of very light, winged seeds, samaras, are produced and dispersed by wind [1]. Although the presence of a birch fragment on a piece of evidence by itself has little evidential value, the ability to link such a fragment to a specific birch tree, growing at a crime related location may be highly relevant. Traditional morphologic investigation of botanical traces has not enabled determining whether a fragment originated from a specific tree and not from another tree of the same species. With the development of DNA profiling techniques for trees, comparisons between botanical debris as seeds, and potential trees of origin can be performed [2-5].

The comparison of DNA profiles obtained from leaves and catkin scales to a possible tree of origin is generally straightforward. However comparing seeds to a tree of origin is more complex, as much less material is available and fertilized seeds are embryos possessing their own DNA in the embryonic tissue, in addition to the maternal tree’s DNA in the surrounding tissues of the seed. Single birch trees have been described to produce seed rains of between 2·102 and 5·104 seeds per m2 per year, with large fluctuations observed between trees and between years [1,6]. Due to these large numbers of seeds produced by birch trees, we have experienced seeds to be the most abundant type of birch trace evidence encountered in crime scene investigations, increasing our interest in forensic analyses of these seeds. One method to compare seeds to trees would be through population profiling, assigning the embryonic DNA profile to potential parent populations as performed for other species and with other techniques [7]. However, interpopulation differences within the Netherlands are considered too small to perform such analyses with a forensic relevant discriminatory value [5] other than for exclusionary purposes. Fortunately for forensic applications, birch seeds are also produced in absence of pollination, resulting in the production of both fertilized and unfertilized seeds. Fertilization percentages ranging between <1% and 20% have been described for both B. pendula and B. pubescens in studied plants in studied years [1,8]. When unfertilized seeds are used for DNA profiling, the only DNA profile that will be obtained is that of the tree of origin that produced the seeds, enabling one to one comparison of DNA profiles from seeds and potential donor trees.

Although seeds are produced to be dispersed away from the tree they originated from, both studies on birch seed dispersion [9-12] and dispersion models for particles (Sutton’s formula [13], mechanistic GJ model [14]) applied to birch seed dispersion field data have shown that most seeds are found within twice the height of tree they originated from, which averages between 40-80 m from their mother tree. The tree height, number of released seeds, physical properties of the seeds influencing falling velocity, and wind speed during dispersion season

104 The forensic potential of DNA typing of birch (Betula) seeds being the main factors determining the distance of seed distribution. Although in field studies, seed densities of 2 seeds per m2 have been observed at a distance of 141 m from the mother tree [12], based on the dispersion models, more than 97% of birch seeds is expected to be found within 100 m of the tree of origin. Additionally, not only this primary dispersal of birch seeds, but also secondary transport of such seeds (and other birch debris including catkin scales) may be relevant. Botanical traces may travel vast distances when facilitated by for example humans, cars, soil under a person’s shoes, or flowing water. It is therefore important to consider the context in which botanical traces have been discovered. Especially in investigative situations when a suspect claims “never having been near the scene”, a distance of 100 m to a specific tree may be of great relevance.

The methods to generate DNA profiles from birch [5,15,16] were developed with relatively high quality and quantity botanical material such as leafs or buds, this study describes the specific adjustments that are required to robustly apply such methods to birch samaras, as especially the unfertilized seeds only contain minute amounts of DNA. Several characteristics, including determination of fertilization status, DNA extraction, DNA profile generation and interpretation are examined specifically for seeds, as are challenges involved with interpreting potentially partial or mixed maternal/embryonic STR profiles. Finally the application of these methods to case samples is illustrated through a case report.

2. Materials and Methods 2.1 Sampling Fresh catkins containing mature seeds were collected from several diploid Betula pendula trees that had previously been genotyped as described [5]. The seeds were dried and stored at room temperature or -20°C. A sampling of seeds was dissected to determine whether fertilization had taken place. When fertilized seeds were encountered, the maternal material was separated from the embryonic material and both fractions were separated and treated as different samples. Apart from these seeds with known fertilization status, several other seeds (both fresh and older) with unknown fertilization status were used for method optimization and testing.

2.2 Laboratory procedures Seeds and seed fragments were grinded two to four times prior to the extraction of DNA in Lysing Matrix A tubes (MP Biomedicals) containing 400 μl of buffer AP1 (DNeasy Plant mini kit, Qiagen) in a FastPrep machine (MP Biomedicals) run at 4.5 for 30 sec. Further DNA extraction was performed using DNeasy Plant mini kit according to the manufacturer’s protocol with the following modifications. The optional centrifugation step advised by the manufacturer was performed in all cases. Final elution was performed in 100% AE or 25% AE, at room temperature or after incubation at 70°C for 5 minutes. To maximize the amount of DNA available for subsequent PCR, in some cases the total elution volumes were

105 Chapter 6

concentrated to 10 μl using standard salt/ethanol precipitation with GlycoBlue coprecipitant (ThermoFisher). The results of DNA extraction were determined by PCR success, either measured by gel- electrophoresis followed by ethidium bromide staining and UV detection (comparison of extraction protocols), or through capillary electrophoresis (all other experiments). Birch microsatellite loci L13.1, L63, L5.4 and L1.10 were amplified in multiplex and analysed as previously described [5] with the exception that the number of PCR cycles was increased from 26 to 30. Amplification of single markers Bo.F330 and Bo.F394 described in [5] was not performed. PCRs were performed in triplicate. To create a consensus profile from these triplicate PCRs of a single seed, only those alleles scored in more than one reaction were designated true alleles and incorporated in the consensus profile as has previously been described for other species [17,18].

3. Results & Discussion 3.1 Sample pre-processing & dissection Although birch seeds are easily recognized based on their morphological features, determination of fertilization status based on visual characteristics proved more ambiguous. After dissection, most fresh seeds were classified as fertilized or unfertilized based on the visible embryonic structures (Figure 1). However in older, dried or more brittle seeds, these structures were more complex to recognize thereby complicating the identification of fertilization state.

A B C

embryo

Figure 1. Morphological features of seeds of Betula pendula. A: undissected seed, B: dissected fertilized seed, embryonic tissue indicated, remainder maternal tissue, C: dissected unfertilized seed, only tissues of maternal origin.

Manual separation of seeds into maternal and embryonic fractions is a laborious procedure. Furthermore, dissection of unfertilized seeds can lead to loss of material, especially when seeds are brittle or poorly conserved, as are many seeds encountered in forensic case work. In addition, separate DNA extraction and typing of both maternal and embryonic fractions showed that maternal tissues are prone to contamination by embryonic tissue (also see 3.3). This fits the theoretical expectation, with embryonic DNA present in high concentration and quality when compared to maternal DNA. Therefore, as dissection of fertilized seeds does not

106 The forensic potential of DNA typing of birch (Betula) seeds positively influence the ability of obtaining a maternal DNA profile, dissection of seeds is not considered advantageous.

3.2 Optimization of DNA extraction and PCR Comparison of the results of DNA extraction from multiple single birch seeds shows that increasing the number of mechanical grinding steps from two to four, increased the DNA yield. An additional increase in DNA yield was obtained by incubating the silica spin column with elution buffer for 5 min at 70°C compared to room temperature incubation (data not shown).

To maximize the DNA input in the multiplex PCR, the volume of DNA extract added as template to the reaction was increased. Although this should increase DNA concentration per reaction and thereby was expected to improve PCR success, large volumes of elution buffer AE were found to inhibit the reaction, thereby decreasing PCR success. To circumvent this inhibitory effect of buffer AE, elution was performed in 25% AE which did not negatively influence PCR success (data not shown). Alternatively elution volumes were concentrated to obtain more DNA in a smaller volume, thereby enabling the use of all available DNA for PCR without inhibition of the reaction.

To further increase the success of obtaining a DNA profile from an birch seed, the number of amplification cycles was increased from the routinely used 26 to 30 cycles. This increased the number of seeds from which a (partial) DNA profile could be obtained. Large differences in allele drop out/absence of alleles were observed between seeds, irrespective of the number of amplification cycles. We did not observe an increase in allele drop in or peak imbalance when the number of cycles was increased from 26 to 30 cycles and as PCR product yield in general was increased, amplification at 30 cycles was used for seeds.

3.3 Interpretation of DNA-profiles The obtained electropherograms could be divided into two categories: those generated from embryonic tissue and those generated from maternal tissue. The electropherograms obtained from embryonic material of fertilized seeds and complete fertilized seeds had a mean profile peak height of more than 10,000 rfu (Table 1). Of the detected alleles, at least one allele per locus corresponded to the known profile of the mother tree. Several profiles with three or more alleles per locus were observed.

Peaks in electropherograms obtained from the maternal fraction of fertilized seeds were generally comparable to those of unfertilized seeds (mean profile intensity 100-1,500 rfu range), although in one instance deviant alleles were detected, possibly due to contamination with embryonic tissue during dissection (Table 1). Although estimating the DNA profile of the maternal tree from the maternal fraction of fertilized seeds may be possible in some cases,

107 Chapter 6

accurate dissection of fresh birch seeds proved labour intensive and ambiguous. In addition, our attempts to dissect dried or more brittle seeds were unsuccessful, visibly causing the admixture of embryonic and maternal material. The electropherograms obtained from unfertilized seeds had a mean profile peak height ranging between 100 - 1,500 rfu (Table 1), with a mean of 7.7 alleles per profile. For the tested well conserved and fresh seeds, differentiating between fertilized and unfertilized seeds based on average profile intensity is unambiguous, thereby reducing the need to dissect seeds prior to DNA extraction.

Table 1. Peak heights and standard deviation of all alleles detected (markers L13.1, L5.4, L1.10 and L63) of embryonic and maternal tissues from fertilized B. pendula seeds and complete unfertilized B. pendula seeds (results of single PCR with fixed input volume). number of alleles* mean profile peak height (rfu, all alleles) Tissue N mean ± SD mean ± SD fertilized seed - total 9 10.3 ± 0.5 11,445 ± 4,730 fertilized seed - embryonic fraction 9 10.0 ± 0.0 15,792 ± 5,452 fertilized seed - maternal fraction 9 7.8 ± 0.3 1,065 ± 517 unfertilizd seed -total (maternal) 3 7.7 ± 0.5 1,045 ± 69 * In tissues of the maternal tree, 8 alleles were detected.

As the profiles of the birch tree from which the seeds originated was known, the profiles obtained from the unfertilized seeds could be compared to their true profile. Although some full profiles were obtained, the drop-out of one or multiple alleles was also observed (Table 1). To minimize the influence of stochastic effects and increase the robustness of the DNA profile whilst maximizing the completeness of the profile, the consensus method was used. The total DNA extract of a birch seed was split in three equal parts and used to perform triplicate PCRs, after which a consensus profile was generated from those alleles scored more than once. The completeness of the observed single profiles (observed alleles/true alleles per consensus profile) varied considerably, illustrating the need to address stochastic effects. Application of the consensus method resulted in profiles with more alleles than the poorest single profiles, but less alleles than the richest single profiles. As only allele drop-out, and no allele drop-in was observed in these experiments, the consensus method is considered a valid but conservative way to estimate the true profile of single birch seeds. However, as allele drop in is known to occur in low template amplification of STRs in other species [17,18], addition of all alleles detected at least once to the consensus profile was not considered a valid method.

An alternative approach could be to concentrate all DNA from a single seed into an even smaller volume, and utilize the complete DNA yield from a seed for one single PCR. Because this would cause a threefold increase in the amount of DNA available for amplification, the threshold below which stochastic effects play a considerable role may be avoided, thereby enabling more reliable DNA profiling. However because PCR failure or issues with control samples may disqualify a certain PCR reaction, the risk of obtaining no results from a single birch seed would increase. In the near future, quantification of both this risk and the increased

108 The forensic potential of DNA typing of birch (Betula) seeds profiling potential will help decide whether a single PCR should be considered a valid alternative to the triplicate PCR/consensus strategy for DNA-profiling of birch seeds.

3.4 Considerations in calculation of random match probabilities Based on the previously described six marker system, the random match probability (RMP) of a complete DNA profile obtained from a wild diploid B. pendula individual ranges between 5·10-6 and 5·10-31, whereas that of a wild tetraploid B. pubescens individual ranges between 1·10-13 and 4·10-59 when singleton alleles are not included in calculation of this RMP [5]. When only the four markers incorporated in the multiplex system can be typed, using the previously described allele frequencies, RMPs of 3·10-4 and 1·10-20 for B. pendula and 4·10-9 and 1·10- 39 for B. pubescens are reached for complete profiles (Table 2).

Table 2. Maximum, average and minimum random match probabilities (minimum based on lowest, non- singleton allele frequencies). Diploid Tetraploid maximum average minimum maximum average minimum alleles value value value alleles value value value L13.1 2pq 17,20 1,37E-01 4,93E-02 6,47E-06 24pqrs 6,15,20,21 3,37E-03 6,79E-04 1,88E-10 L5.4 p2 13,13 9,40E-02 3,47E-02 6,47E-06 24pqrs 13,14,15,17 6,64E-03 1,06E-03 1,88E-10 L1.10 2pq 16,20 5,71E-02 1,42E-02 4,05E-05 24pqrs 15,17,18,19 1,61E-03 2,22E-04 1,88E-10 L63 p2 8,8 3,44E-01 2,35E-01 6,47E-06 12p2qr 5,8,8,10 1,13E-01 4,98E-02 1,88E-10 combined 2,53E-04 5,69E-06 1,10E-20 4,08E-09 7,94E-12 1,25E-39

Ideally, ploidity is determined before calculating the RMP. However, the profiles obtained from birch seeds through the consensus profile method do not always allow accurate identification of diploid or tetraploid individuals. The underlying results of the triplicate amplification can be informative to distinguish between profiles of tetraploids with allele drop out and true diploid profiles. If for at least one marker, three or four alleles are observed in the consensus profile of an unfertilized seed, the possibility of this seed originating from a diploid individual can be excluded and RMPs can be calculated as for a tetraploid individual if the actual RMP of that specific (partial) profile is required. In contrast to profiles obtained from reference trees, the possibility of estimating the allelic dosage based on the DNA profile is not feasible due to the observed stochastic effects and the possibility of allelic drop out. This implies that only when a complete heterozygote profile is observed for a marker (ABCD), the genotype frequency of this marker will be the same when calculated for a reference tree (where estimation of allelic dosage is possible) as for an unfertilized seed (where estimation of allelic dosage is not possible), which equals 24푝푞푟푠. When only one, two or three alleles are observed, all possible combinations of alleles including at least the observed alleles should be considered, dramatically lowering the RMPs. For example, in a reference profile obtained from high quality botanical material, the genotypic frequency of a complete homozygote AAAA equals 푝4, whilst in a consensus profile of an unfertilized tetraploid seed where only a single allele is observed, the frequency is best approximated by 푝. When two alleles are observed in

109 Chapter 6

the consensus profile of a presumably unfertilized tetraploid seed, the combinations AABB (equalling 6푝2�2), AAAB (equalling 4푝3�) and ABBB (equalling 4푝푞3) should be considered as allelic dosage cannot be estimated, but also all other ABXX combinations should be considered as allelic drop out of one or two alleles cannot be excluded. Therefore when only two alleles are observed, the genotypic frequency is approximated by 12푝푞. The same is true when three alleles are observed in the consensus profile: AABC (equalling 12푝2𝑞), ABBC (equalling 12푝푞2�) and ABCC (equalling 12푝푞푟2) should be considered as allelic dosage cannot be estimated, and ABCX should be considered as allelic drop out cannot be excluded. Therefore, when three alleles are observed the genotypic frequency is approximated by 24푝푞푟.

If no more than two alleles are described for each marker in the consensus profile of an unfertilized seed, the profile may represent a complete profile of a diploid individual, or a partial profile of a tetraploid individual. As mentioned previously, the observation of additional alleles in the underlying profiles may indicate a tetraploid origin, whilst the absence of such peaks may point towards a diploid origin. If a complete diploid profile can be assumed, for example when the triplicate PCRs result in three identical diploid profiles, or variation is only observed in the presence or absence of the allele with the longest fragment length, the frequency for a homozygous marker would be 푝2, whilst the frequency for a heterozygous marker would be 2푝푞. When allelic drop out is to be considered for a specific marker, a homozygote is approximated by 푝 (as for tetraploids), but in such cases consideration of a tetraploid origin may be as appropriate.

3.5 Case example A man (victim) living in a small town was reported missing at the end of a cold winter day. As his last known appointment was with a business partner, the business partner became a suspect. During a police search of the suspects’ car, bloodstains and clothing were found as well as a new but visibly used tarpaulin sheet. Many different traces were secured, including more than twenty birch seeds and catkin scales. Several weeks later when the snow started to melt, a body was found under a tree near a road through a nature reserve, not far from the victims’ home town. In this part of the reserve, the main vegetation consisted of grassland with birch and oak trees and patches of heath, heather and broom with an overgrown appearance. To compare the birch fragments from the car to the birch trees at the nature reserve, DNA was extracted and the described method with triplicate PCRs was applied. DNA profiles that contained enough information for comparative purposes were obtained from seven samples. Complete DNA profiles were obtained from five single fertilized seeds and from one cluster of three unfertilized diploid seeds. A partial consensus profile containing information from four loci was obtained from one presumably tetraploid unfertilized catkin scale. Analysis of the other birch fragments resulted in the finding of several alleles, but the consensus profiles did not contain enough alleles for meaningful comparisons.

110 The forensic potential of DNA typing of birch (Betula) seeds

Figure 2. Sampled plots (1A-5B) on both sides of the road near the location where the victim was found (star).

To compare the profiles to relevant references, birch trees were sampled near the suspects’ house (n=1), at a site where the victim and suspect were seen arguing (n=1) and in the nature reserve where the victim was eventually found. Due to the large number of birch trees in this reserve, an area of approximately 2500 m2 surrounding the spot where the victims’ body was found was divided into 10 plots (Figure 2). All birch trees and shrubs in these plots were sampled (n=107). As the risk of potentially missing a tree was seen as a greater problem than double sampling of trees, trees possibly sprouting from the same root system, or with branches in more than one plot were all sampled thereby increasing the incidence of trees sampled more than once. When possible, fresh leaves or leaf buds were sampled to enable routine typing of high quality reference material. In a few incidences wood samples were collected from the stems of trees, where leaves or twigs were too high to be sampled safely, or when branches of multiple trees were growing through one another, thereby masking which leaves belonged to which trees.

Typing of all 109 reference samples resulted in the finding of 74 different DNA profiles. Both urban trees resulted in a unique DNA profile; 46 different diploid profiles, 3 different triploid profiles and 23 different tetraploid profiles were obtained from the 107 samples from the nature reserve. Not all samples from the nature reserve resulted in the finding of a unique DNA profile: 19 genotypes were observed in more than one sample. The chosen sampling strategy (preferable to sample one tree more than once compared to missing a tree) together with the fact that the samples that resulted in identical DNA-profiles originated from the same plots (16 profiles) or adjacent plots (3 profiles), demonstrates the need for rigorous documentation of sample collection and tree identification. Based on our documentation of sample collection, we had no reason to doubt that the samples resulting in identical DNA profiles, originated from the same root system and are therefore multiple samplings of one single tree.

Comparison of the two relevant profiles obtained from the suspect’s tarpaulin, to the reference profiles resulted in the finding of two matching reference profiles. The DNA profile of the cluster of unfertilized diploid seeds matched a diploid profile found in plot 1A where the victim had been found. The partial consensus profile of the unfertilized, presumably tetraploid, catkin scale corresponded to a tetraploid profile also found in plot 1A. This partial profile

111 Chapter 6

differed from all other 73 profiles obtained from the reference trees. The birch trees in the nature reserve where the victim was found was considered a ‘wild’ population, based on the overgrown appearance, the known human interference with tree propagation in this location and the observed variety in DNA profiles. Random match probabilities were calculated based on both local allele frequencies (72 reference genotypes from the nature reserve) and the conservative estimates for the four markers described in 3.4. This resulted in reporting of random match probabilities of less than one in in one million for encountering two pieces of birch debris with these two specific DNA profiles.

Although during trial the suspect denied having any involvement with the victims death, he was convicted for murder and hiding the victims whereabouts, in part based on the performed birch DNA-investigation [19]. During appeal he confessed to killing the victim in a rage after a business conflict. He was eventually convicted of manslaughter [20].

4. Conclusion Debris from birch trees are frequently encountered when botanical traces are secured in forensic investigations in The Netherlands. Although larger fragments such as twigs and leaves can be transferred during crimes, smaller and lighter fragments such as catkin scales and seeds are observed more frequently as trace evidence. Although models for primary dispersion of birch fragments differ for leaves, catkin scales and seeds, mainly due to the different falling velocity of the different structures, more than 97% of all birch debris (including seeds) is expected to be found within 100 m of the tree of origin. Through transfer of birch fragments to humans or materials, secondary transfer may occur and such debris can be encountered at great distances from the tree of origin but can still imply previous presence within a short range of a certain tree. As the ease whereby seeds or catkin scales are transferred in this way differs from the ease with which leaves and twigs are transferred, the forensic relevance of different types of botanical traces also differs. Additionally, seeds have their own genetic characteristics in comparison to other types of birch debris. To address the fact that smaller fragments, containing less DNA, are most often encountered in forensic investigations, and that seeds can have a different genetic composition than their mother tree, the previously described method for forensic birch DNA profiling [5] was adapted to improve the profiling of birch seeds and birch debris in general.

The DNA extraction protocol was altered to increase DNA yield. Parameters that were varied included grinding steps, incubation temperature, elution buffer and additional salt/ethanol precipitation. Additionally the number of PCR cycles was increased, and different methods for interpretation of the resulting low template electropherograms were compared. With these optimized conditions, partial profiles are obtained from almost all fresh or well conserved unfertilized seeds. These adapted laboratory protocols may also enable the DNA profiling of

112 The forensic potential of DNA typing of birch (Betula) seeds catkin scales and other small or deteriorated fragments from birch trees, thereby increasing the number of cases in which birch trace evidence may be of value.

If a DNA profile is obtained from a fertilized birch seed, population comparative methods may be suitable to compare seeds to potential parental populations, however within the Netherlands this is only feasible for exclusionary purposes. When a DNA profile is obtained from an unfertilized seed, one to one comparisons with potential mother trees can be performed enabling not only exclusionary conclusions, but also reporting of matches. To distinguish between fertilized and unfertilized seeds, both dissection prior to DNA extraction and deduction from the intensity of the obtained DNA profile perform comparably. However, as dissection of unfertilized seeds leads to loss of material, and dissection of fertilized seeds does not positively influence the ability to obtain a maternal DNA-profile, this procedure was not advantageous in our hands. Although both methods perform better on fresh material than on an average trace evidence, deduction of the fertilization status through DNA profiling of ill conserved samples leads to better results than dissection of such seeds. Moreover, if a low intensity profile from a fertilized seed is mistaken for a profile from an unfertilized seed, the chances of incorrectly identifying a matching mother tree are as low as the random match probability of the obtained profile.

The case report illustrates that typing birch seeds can be more difficult in practice, because samples are generally less well conserved when exposed to harsh environmental conditions. Nevertheless, when (partial) DNA profiles are obtained from trace material, comparisons with relevant reference trees can result in very valuable evidence in forensic investigations, ranging from exclusions to random match probabilities in the order of one in one million. When random match probabilities are calculated, the effect of using local allele frequencies as opposed to frequencies derived from larger population studies seems marginal when the previously described conservative estimates are used. This fits our previous observation that populations are not easily distinguished based on allele frequencies. The effect of calculating random match probabilities from partial consensus profiles, where estimation of the allelic dosage is generally not feasible and allelic drop out may need to be considered, is far greater.

Acknowledgements We thank Stefan C.A. Uitdehaag (NFI) for his support in evaluation of particle distribution models.

References [1] M.D. Atkinson, Betula pendula Roth (B. verrucosa Ehrh.) and B. pubescens Ehrh. J Ecol 80 (1992) 837– 870. [2] C.K. Yoon, Botanical witness for the prosecution, Science, 260(5110) (1993) 894–895.

113 Chapter 6

[3] R. Mestel, Murder trial features trees genetic fingerprint, New Sci., 138(1875) (1993) 6. [4] K.J. Craft, J.D. Owens, M.V. Ashley, Application of plant DNA markers in forensic botany: genetic comparison of Quercus evidence leaves to crime scene trees using microsatellites. Forensic Sci. Int. 165(1) (2007) 64-70, DOI: 10.1016/j.forsciint.2006.03.002 [5] M.Wesselink, A. Dragutinović, J.W. Noordhoek, L. Bergwerff, I. Kuiper, DNA typing of birch: Development of a forensic STR system for Betula pendula and Betula pubescens, Submitted to Forensic Sci. Int. Genet. [6] G. Houle, S. Payette, Seed dynamics of Betula alleghaniensis in a deciduous forest of north-eastern , J. Ecol. (1990) 677-690. [7] W.J. Koopman, I. Kuiper, D.J. Klein-Geltink, G.J. Sabatino, M.J. Smulders, Botanical DNA evidence in criminal cases: knotgrass (Polygonum aviculare L.) as a model species, Forensic Sci. Int. Genet. 6(3) (2012) 366-374. DOI:10.1016/j.fsigen.2011.07.013 [8] S.O. Holm, Reproductive variability and pollen limitation in three Betula taxa in northern Sweden. Ecography 17(1) (1994) 73-81, DOI: 10.1111/j.1600-0587.1994.tb00078.x [9] R. Sarvas, A research on the regeneration of birch in southern Finland. Communicationes Instituti Forestalis Fenniae 35(4) (1948) 1-91p [in Finnish with English summary] [10] R.H. Ford, T.L. Sharik, P.P. Feret, Seed dispersal of the endangered Virginia round-leaf birch (Betula uber), Forest Ecol. Management. 6 (1983) 115-128. [11] A. Karlsson, An analysis of successful natural regeneration of downy and silver birch on abandoned farmland in Sweden Silvia Fennica 32 (1998), 229-240. [12] H. Tanaka, M. Shibata, T. Nakashizuka, A mechanistic approach for evaluating the role of wind dispersal in tree population dynamics. J. Sust. Forest., 6(1-2) (1998) 155-174, DOI: 10.1300/J091v06n01_10 [13] C.R. Janssen, Verkenningen in de palynologie, Oosthoek’s Uitgeversmaatschappij B.V. (1974) p.176. [in Dutch] [14] D.F. Greene, E.A. Johnson, A model of wind dispersal of winged or plumed seeds, Ecol. 70(2) (1989) 339-347, DOI: 10.2307/1937538 [15] K.K.M. Kulju, M. Pekkinen, S. Varvio, Twenty‐three microsatellite primer pairs for Betula pendula (Betulaceae), Mol. Ecol. Notes, 4(3) (2004) 471-473, DOI: 10.1111/j.1471-8286.2004.00704.x [16] C. Truong, A.E. Palmé, F. Felber, Recent invasion of the mountain birch Betula pubescens ssp. tortuosa above the treeline due to climate change: genetic and ecological study in northern Sweden. J. Evol. Biol. 20(1) (2007) 369-380, DOI: 10.1111/j.1471-8286.2004.00848.x [17] P. Taberlet, S. Griffin, B. Goossens, S. Questiau, V. Manceau, N. Escaravage, et. al., Reliable genotyping of samples with very low DNA quantities using PCR, Nucleic acids research, 24(16) (1996) 3189-3194, DOI: 10.1093/nar/24.16.3189 [18] C.C. Benschop, C.P. van der Beek, H.C. Meiland, A.G. van Gorp, A.A Westen, T. Sijen, Low template STR typing: effect of replicate number and consensus method on genotyping reliability and DNA database search results. Forensic Sci. Int. Genet. 5(4) (2011) 316-328, DOI: 10.1016/j.fsigen.2010.06.006 [19] ECLI:NL:RBZUT:2011:BP8108, Available from: http://www.rechtspraak.nl [in Dutch] [20] [Last update 2012 March 02] Available from http://www.gelrenieuws.nl/2012/03/eerbekenaar- bekent-moord-op-autohandelaar.html [in Dutch]

114

Chapter 7

General discussion

M. Wesselink, A.D. Kloosterman and I. Kuiper

1. Introduction The DNA markers that are described in this thesis, enable classification of specific non-human biological traces at the species level (chapter 1), within species level (chapters 2, 3 and 4) and at the individual level (chapters 5 and 6). In this chapter, the forensic relevance of these classifications is described in a general perspective. This includes general insights from molecular biology, taxonomy, population genetics and ecology on one hand and forensic science, law, policy making and criminalistics on the other hand.

2. Identification at the species level 2.1. Biological perspective In biology, naming organisms, describing their characteristics, and determining the relatedness of organisms is considered a classical, well founded science. Since Carl Linnaeus contributed to taxonomy by introducing the binominal nomenclature system in the eighteenth century, organisms have been named with a genus name (e.g. Psilocybe), followed by a species name (e.g. cubensis) with type specimens being stored for future reference. Additionally, he introduced groupings based on both similarities and differences, implying that species within a genus are more closely related than species in different genera. This is also true for different genera within a family and for different subgroups within a species; subspecies in animal species, varieties in plant species and strains in species of microorganisms. One of the main advantages of this system for the identification of unknown samples, is that when inadequate features are available for identification at a certain taxonomic level (e.g. species), identification at a higher taxonomic level (e.g. genus) may still be performed. The same holds when scientific consensus has not yet been reached about a certain taxonomic level, but agreement exists about the higher taxonomic level. Some biologists for example consider all marijuana plants to belong to

115 Chapter 7

one highly variable species (Cannabis sativa), whilst others recognize multiple species of Cannabis (reviewed in [1]). Identification of a sample to the species level could therefore give rise to debate, whereas both parties would agree to identification at the genus level.

From a biological point or view it is accepted and highly valued that scientific advances will lead to new insights which, when applied to taxonomy, may lead to the renaming or reordering of previously described and accepted species, genera or even families or orders. Examples range from the partitioning of the former genus Psilocybe into genera Psilocybe and Deconica [2,3] (Chapter 1), to recognition of Psilocybe subacutipilea as a synonym of Psilocybe mexicana [4], and reconsidering whether species should be placed in a genus indicating the capability to produce specific metabolites or not (e.g. Psilocybe goniospora – a species mentioned on the 2008 Opium Act [5] – no longer considered a Psilocybe species and denoted Deconica goniospora [4]). In addition, new species are continuously being discovered and described [6,7], that have characteristics comparable to regulated species.

Non-biologists, merely using the classification system for other purposes than to describe biology, may be oblivious, annoyed or puzzled by such alterations, especially in the transition period when not even all biologists have fully embraced and implemented such changes. Ideally laws, decrees and descriptions using biological nomenclature would regularly be reviewed and revised when appropriate, with input from biologists. In practice this does not always occur, potentially leading to discussions and confusion in court (e.g. [8]).

In the last decades, taxonomy and classification no longer solely rely on morphological features, but differences and similarities in DNA composition are also considered. Where DNA composition was initially used as an additional feature to discriminate between species and determine the relatedness of species, molecular phylogeny and “DNA barcoding” have at present become mature fields of research. Large barcoding initiatives such as ECBOL (European Consortium for the Barcode of Life), FISH-BOL (Fish Barcode of Life Campaign), and HealthBOL, and many other initiatives are increasing the number of publicly available DNA sequences that may be used to identify unknown specimens. The data produced in these barcoding initiatives have given rise to new discussions of species composition, species concepts and the value of sequencing for the discovery and description of new species [9-13]. Since the sequencing capacity has increased exponentially and the capacity of taxonomists has remained unchanged or has even decreased in recent years, many samples are now sequenced that are not associated with a described type specimen. One of the proposed solutions to enable use of such sequence data, but acknowledging that a description is not available, is the Barcode Index Number (BIN) [14]. Although this system is considered useful in biological studies [12,15-17], its application in forensics is limited for the time being. This is in part due to the relatively large DNA markers that the BIN system relies on, which will often not be available for analysis in forensic samples. Additionally, until specific curated BIN databases have been composed and accredited that enable incrimination by BIN-identification (see 2.2

116 General discussion

II), the usage of the BIN system in forensics is expected to be limited to investigations where questioned materials are to be compared to reference materials (see 2.2, I).

2.2. Forensic application Within forensics, DNA based taxon identification is mainly performed in two instances: I As a first step in the investigation of a biological sample, when the focus is to determine if a sample originates from a taxon that may be suitable for further investigation; II As a final step in the investigation of a biological sample, when identification of the sample determines whether possessing or selling a certain sample should be considered a legal offence or not.

When taxon identification is performed as a first (screening) step (I), the described drawbacks surrounding nomenclature or misidentification will not intrinsically influence a legal case, as further investigation into a sample will be performed. Misidentification of a sample could result in the collection of irrelevant reference materials, but this would be observed when comparisons to these materials are performed leading such a mistake to be recognized and rectified. Although time and resources would be wasted, the course of justice would not be influenced.

When taxon identification is a crucial step in determining whether a crime has been committed (II), misidentification of a sample, or the abovementioned taxonomic or nomenclature discussions could lead to misunderstandings in court and potentially to wrongful verdicts. The majority of cases in which identification of a taxon is of such importance to a case, involve the possession or selling of biological drugs of abuse (e.g. marijuana, magic mushrooms and khat) and plant or animal taxa protected by CITES (Convention on International Trade in Endangered Species) or local flora and fauna regulations. Additionally cases dealing with food safety, mislabelling of consumer products and animal cruelty are gaining attention (reviewed in [18]). In all these cases, the possession or trade of certain taxa of organisms is controlled or prohibited. To define which taxa are controlled, lists of taxa are generally included in the relevant laws. Whole families of organisms may be placed on such lists (e.g. Primates on CITES appendix II [19]), but also genera (e.g. Cannabis on the Dutch Opiumact list II [5]), species (e.g. Psilocybe cubensis on the Dutch Opiumact list II [5]), subspecies (e.g. Hippotragus niger variani on CITES appendix I [19]), or only specific populations (e.g. Panax ginseng from the Russian Federation on CITES appendix II [19]) may be specified. As the policy makers commissioned with defining which biological entities are placed on such lists do not necessarily have a background in biology, the way in which the controlled taxa are described may not always be logical from a biological point of view. However as these descriptions dictate which level of taxon identification is needed in a certain legal framework to provide a relevant answer in a criminal investigation, these descriptions also prescribe between which taxa discrimination should be possible. Identifying all Psilocybe species enumerated on the Dutch Opiumact list II

117 Chapter 7

[5] requires a different approach than identification of the genus Psilocybe encompassing all Psilocybe species. As described in Chapter 1, multiple DNA markers would enable straightforward identification of the genus Psilocybe, whilst identification of separate species is more challenging.

3. Within species differentiation – mitochondrial DNA evidence 3.1. Biological perspective In classification and taxonomy, organisms sharing the majority of their characteristic features and only displaying little differences, are likely to be placed in closely related groups. Within the species Homo sapiens (human), much attention has gone into the identification of maternal and paternal lineages, with the goal to retrace human evolution [19]. Mitochondrial DNA, more specifically the control region (also known as displacement loop, D-loop, hypervariable regions), has been the marker of choice to investigate maternal lineages. The DNA sequences (known as haplotypes or mitotypes) initially obtained through DNA sequencing, are at present often obtained by SNP analysis or MPS (mass parallel sequencing). With these advances in technology, the number of sequences available for comparative analysis has increased tremendously over recent years. These technological advances have also enabled determining the sequence of an increasing number of informative nucleotides per sample. Additionally the existence of a curated database for the storage of these haplotypes for forensic application [20] enables the use of this data not only for evolutionary and population studies, but also for evaluation of results in forensic investigations. Although the use of comparable techniques is being explored for other forensically relevant species such as dogs (Canis lupus familiaris) (e.g. [21-24]) and cats (Felis catus) (Chapters 2, 3 and 4 and references therein), less resources have been allocated, techniques have not yet been standardized, the number of samples studied is relatively small and no curated database for the collection of the data exists to date that incorporates quality control measures as discussed in Chapter 4. Additionally, notable difference exist between the evolution, population structure, haplotype distribution and thereby forensic potential of these species.

In humans, several haplotypes have been identified that are shared by larger groups of people, however the majority of haplotypes occurs at low frequencies. Furthermore, these haplotypes can be divided into a large number of main groups with mutual cohesion, all consisting of multiple closely related haplotypes, differing from one another by only one or a few nucleotides [25]. These patterns are explained by the migration patterns of humans in the far past, spreading from Africa to the rest of the world starting as early as 200,000 years ago [19,26]. The correlation between these evolutionary routes and haplotypes, have enabled the use of such DNA markers to predict the ancestry of an individual, and thereby of the donor of a crime scene sample as different haplotypes occur in different frequencies in for example different ethnic groups [27].

118 General discussion

The structure of the documented haplotypes in humans is in contrast with the haplotypes documented for both dogs and cats. In dogs only six main groups have been identified until present that differ so strongly from each other that evolutionary routes between these groups have not been suggested. Instead, these groups have been linked to domestication of at least 51 female wolfs (Canis lupus) around 16,300 years ago in southwest Asia [28,29]. Additionally hybridization between dogs and wolfs has been suggested [30,31]. Within these main groups, multiple haplotypes differing from one another by only one or a few nucleotides have been described (e.g. [22-24]). Furthermore, in many parts of the world, reproduction of dogs has long been influenced by humans. In the past, individuals were mainly selected for their behavioural properties and bred selectively, only in the last centuries selective breeding to enhance or maintain phenotypic properties has been described [32]. As these breeds are young on an evolutionary scale, insufficient differentiation has occurred to accurately predict dog breeds based on haplotypes although notably different haplotype distributions have been documented in several breeds [33].

Cat haplotype structure has not yet been fully elucidated, however to this date less main haplogroups have been identified in cats than in humans and dogs. Additionally the majority of the documented haplotypes within a group differ from each other by only a single nucleotide (Chapter 4, Figure 1) and in European samples one dominant haplotype was identified reaching frequencies of more than 50% (Chapters 2 and 4 and references therein). Furthermore studies focussing on both the nuclear and mitochondrial DNA of cats, hypothesize that cats have first been domesticated in the Mediterranean around 10,000 years ago, and have spread over the world along with the spreading of humans ([34,35], Chapter 4 and references therein) in a relationship that would be beneficial for both. However, other than with dogs, humans have not significantly interfered with the breeding of cats for centuries. Although humans have selected against certain phenotypes in the recent past (e.g. witch- hunts), humans breeding cats for their phenotypic characteristics has only been described since the nineteenth century (illustrated in [36]). Recent evolution of breeds has been detected with nuclear DNA markers [35,36] that may enable attribution of a sample to a certain breed. However, between breed differences are at present too small to enable breed attribution using mitochondrial haplotypes (Chapter 2).

3.2. Forensic application Due to the described differences, when matching haplotypes are encountered in a forensic investigation, the value of such a match differs considerably between species. When a reference individual and crime scene sample have completely different haplotypes, exclusion of this individual as the donor of a crime scene sample is straightforward for all species. However, the chances of this occurring are different in different species due (great) differences in haplotype distribution (Chapters 2 and 4 vs [20]). When a reference individual and a crime scene sample have identical haplotypes, the relevant population to derive haplotype frequencies from should

119 Chapter 7

be determined. This may range from all individuals in a certain area (e.g. random match with any other cat in the Netherlands), to only those individuals with specific phenotypic characteristic in a small defined area (e.g. random match with any other golden retriever in The Hague) or to either one or another individual (e.g. the suspects cat or the victims cat). If such information is available, this should influence which population is chosen to determine the value of matching haplotypes. When samples from species such as dogs and cats are used in forensics, the collection and typing of additional reference samples or population samples may be needed to accurately estimate the random match probability of matching haplotypes in a specified relevant population, as the databases for these species are far less complete than for humans. Additionally, due to advances in technology, sequencing of the entire mitochondrial DNA of not only human crime scene samples but also of other species may become feasible, identifying more (or different) positions of the mitochondrial DNA as informative in the near future. This may render present day databases less useful and increase the need for case specific population studies.

Another important aspect when evaluating the value of mitochondrial DNA evidence as obtained from samples such as animal hairs, is knowledge of the prevalence of such evidence. Animal hairs are easily transferred during a crime [37], and as many households have multiple pets (almost half of Dutch cat owners own at least two cats [38]), encountering hairs from more than one pet is not unlikely. Therefore when the value of matching mitochondrial haplotypes has been determined, combination of incriminating or acquitting evidence becomes possible as different individual dogs and or cats may be considered independent thereby enabling multiplication of the individual evidential values. In such cases the evidential value of mitochondrial DNA evidence obtained from multiple matching animal hairs may reach the same order of magnitude as would be obtained when matching nuclear DNA evidence from one individual was available. It is conceivable that in the future, the topic of discussions in court will not be fact that a specific cat hair has been secured as trace evidence, but that innocent explanations will be put forward describing how the hair emerged at the crime scene, as has become relevant with human DNA evidence [39].

4. Individual identification – nuclear DNA evidence 4.1. Biological perspective The use of a standardized set of nuclear STR DNA markers for the identification of humans has become a universal forensic standard enabling the exchange of both population data and crime scene information or reference profiles. However, such markers are in part selected for their species specificity. Generally a profile will only be obtained in the presence of DNA of the selected species. In non-human biological forensics, countless species may be of interest, implicating that countless methods should be developed, with at least as many population studies being performed to interpret the data. As all these species would require their own

120 General discussion technical validation of the selected markers, the resources needed for such an endeavor would be vast.

At present only several other species have gained the amount of forensic interest that techniques have been developed, forensically validated, their repeat structures and population data described in literature including dog [40,41], cat [42], badger [43], rhinoceros [44], marijuana [45] and birch (Chapter 5). Many of these described markers have not been selected for their species specificity and are known to amplify not only in the species they were developed for, but also in closely related species, enabling their application in a larger number of forensic cases. However, whether laboratory procedures and databases developed for one species can be transferred to another species, needs to be determined for every STR marker separately. Reaction conditions may need to be adopted to compensate for differences in the DNA sequences between the species. Furthermore markers variable in one species may be monomorphic in another species (e.g. variable in dog, monomorphic in fox [46]), or alleles may occur in completely different frequencies (Chapter 5), severely influencing the interpretation of results. Even the inheritance patterns of markers may not be identical in different species.

Fortunately many STR markers for a great variety of species have been developed for other purposes such as ecological studies, conservation studies and parentage analysis. Although these markers have been tested to a certain extent, a certain level of validation is still needed prior to their application to forensics. As described in the outline of this thesis, determining whether the markers provide the desired level of discrimination between the relevant groups will be the first point to be addressed. Whether these markers can be reliably measured in forensic samples additionally needs to be determined, especially when markers have been designed to investigate high quality material. Furthermore at least some samples should be analyzed to demonstrate the applicability of the technique. At this point it may be feasible to analyze crime scene samples and reference samples, as this may be sufficient validation when only distinguishable DNA profiles are obtained. However, when a match between a crime scene samples and reference material is found, analysis of population samples will be essential to enable interpretation of this match. This generally involves the creation of databases, and eventually sharing of the data with the forensic community for review and comparison. As described in fungal and feline DNA sequences in Chapters 1 and 4, present day databases do not always allow adequate curation and revision to enable both incorporation of new findings and removal of erroneous data.

4.2. Forensic application As described for animal hairs in 3.2, determination of the relevant population(s) for comparison is a major challenge for the application of non-human biological traces. Forensic intelligence is of great importance to identify potentially relevant populations, but additionally

121 Chapter 7

biological knowledge of populations, for example of the propagation of a species, is important. Whilst reproduction of mammals is generally well understood by all involved in the forensic community, the different modes of reproduction of plants pose different challenges and opportunities. Additionally the mobility of plants and animals differs greatly. Furthermore, different forensic cases, with diverse hypotheses to be tested, may call for different levels of certainty (e.g. DNA profiling of an animal species in an animal cruelty case versus DNA profiling of an abundant species of trees in a murder case).

As described previously, when DNA profiles of crime scene samples and reference individuals differ from one another, interpretation of such comparisons does not require extensive population data. However, when identical DNA profiles are obtained and the evidential value of such a match is to be determined, the variability of the applied markers in the studied population becomes of importance (Chapter 5). Fortunately, for many (plant and animal) species, much is known about natural reproduction strategies, but also about human use of the species and breeding or cultivation strategies. Moreover within the Netherlands, breeders of most species of plants and animals and municipal services in charge of landscaping have proven to be valuable recourses in providing information about breeding/cultivation, planting locations, but also in providing samples (personal observations by the authors). Obviously this is in contrast with cultivators of illegal species (e.g. Cannabis) who will generally not provide trustworthy information about their business affairs. Interpretation of matching nuclear DNA profiles of various species may therefore call for completely different approaches, varying from mostly literature based studies of genotype frequencies, to typing of one or a few reference individuals or populations to estimate local genotype frequencies, to needing to design a complete population study to establish local genotype frequencies to enable interpretation of results.

Clonal propagation such as applied in the cultivation of Cannabis has been posed to enable the direct comparison of “mother plants” and their derivatives [47,48], whilst these plants are often highly mobile as humans are responsible for their cultivation and transportation. Plant species that naturally reproduce through selfing have the potential to lose their genetic variation in small areas, enabling comparison of samples to their populations of origin [49]. Plant species mainly reproducing in a sexual manner can be considered as mammal species would be considered (Chapters 5 and 6), with the difference that plants are naturally less mobile than mammals. However the human movement of complete plants should not be excluded, and the potential transfer of parts of plants should seriously be evaluated. In mammals, human interference with reproduction is readily illustrated by one single male (or female) stimulated to produce great numbers of offspring, whilst other individuals are not allowed to reproduce. The production of both fertilized and unfertilized seeds by birch trees is another illustration of a striking reproductive strategy (Chapter 6).

122 General discussion

5. Non-human biological traces – concluding remarks To successfully apply non-human biological DNA typing in forensics, the added value of such traces to investigations should be known to those involved in all aspects of the forensic and legal system. This starts with crime scene officers, who if unaware of their possibilities may not sufficiently recognize and secure traces such as animal hairs or birch seeds. Fortunately, if these traces are not damaged or lost, many non-human traces can still be investigated after a prolonged period of time. Additionally, the police officers, prosecutors and investigative magistrates in charge of formulating questions for forensic scientists, may define questions that cannot readily be answered. Moreover unawareness of the possibilities and pitfalls associated with non-human biological traces may lead to failure to obtain relevant reference materials, or to failure to pursue investigation of these traces altogether. If investigation of non-human biological traces is initiated, knowledge of the evidential value of these traces and how this may be influenced by the context of the case in which they are evaluated, facilitates the use of such traces by magistrates. Selecting the most appropriate hypotheses or requesting additional evaluation of the findings in the light of new hypotheses may help avoid (potentially confusing) discussions in court. Finally, policy makers with an increased knowledge of biology may be able to formulate the biological component of legal documents in such a way that enforcing these laws by all parties involved is eased.

Although dissemination of the applications and pitfalls of the use of non-human biological traces is relatively straightforward within the forensic scientific community through scientific literature, reaching other members of the forensic community (both within the Netherlands but also on a European scale (personal communication ENFSI APST working group)) is generally performed on a case by case basis. Moreover police officers, prosecutors, magistrates, policy makers and the general public, who have not personally experienced the value of these traces are at present still surprised by the possibilities when informed about the subject (personal observations by the authors, 2004-2017). As more cases involving non-human biological traces receive attention within the scientific community, forensic community and general public, the potential of these traces and the need to investigate their vast opportunities will further increase the interest in and application of these traces.

123 Chapter 7

References [1] K. Hillig, Genetic evidence for speciation in Cannabis (Cannabaceae), Genet Resourc Crop Evol. 52 (2005) 161-180. DOI: 10.1007/s10722-003-4452-y Cannabis 1 sp [2] S.A. Redhead, J.M. Moncalvo, R. Vilgalys, P.B. Matheny, L. Guzmán-Dávalos, G. Guzmán, (1757) Proposal to conserve the name Psilocybe (Basidiomycota) with a conserved type, Taxon 56 (1) (2007) 255–257. [3] L.L Norvell, Report of the nomenclature committee for fungi: 15, Taxon 59 (1) (2010) 291-293. [4] V. Ramirez-Cruz, G. Guzman, L. Guzman-Davalos, Type studies of Psilocybe sensu lato (, Agaricales), Sydowia 65 (2) (2013) 277-319. [5] Opium Act Schedule II, Staatsblad van het Koninkrijk der Nederlanden [Bulletin of Acts and Decrees] 2008, 486. [6] J. Borovička, A. Rockefeller, P.G. Werner, Psilocybe allenii – a new bluing species from the Pacific Coast, USA, Czech Mycology 64 (2012) 181-195. [7] T. Ma, Y. Feng, X.F. Lin, S.C. Karunarathna, W.F. Ding, K.D. Hyde, Psilocybe chuxiongensis, a new bluing species from subtropical China. Phytotaxa, 156(4) (2014) 211-220, DOI: 10.11646/phytotaxa.156.4.3 [8] ECLI:NL:HR:2016:522, ECLI:NL:PHR:2015:2720, ECLI:NL:GHAMS:2014:1691, Available from: http://www.rechtspraak.nl [in Dutch] [9] D.E. Schindel, S.E. Miller, DNA barcoding a useful tool for taxonomists. Nature, 435(7038) (2005) 17-17. [10] M. Hajibabaei, G.A. Singer, P.D. Hebert, D.A. Hickey, DNA barcoding: how it complements taxonomy, molecular phylogenetics and population genetics. TRENDS in Genetics, 23(4) (2007) 167-172, DOI: 10.1016/j.tig.2007.02.001 [11] J. Waugh, DNA barcoding in animal species: progress, potential and pitfalls. BioEssays, 29(2) (2007) 188-197. DOI: 10.1002/bies.20529 [12] M. Kekkonen, P.D. Hebert, DNA barcode‐based delineation of putative species: efficient start for taxonomic workflows. Mol. Ecol. resources, 14(4) (2014) 706-715, DOI: 10.1111/1755-0998.12233 [13] P.D. Hebert, P.M. Hollingsworth, M. Hajibabaei, From writing to reading the encyclopedia of life, (2016) DOI: 10.1098/rstb.2015.0321 [14] S. Ratnasingham, P.D. Hebert, A DNA-based registry for all animal species: the Barcode Index Number (BIN) system, PloS one, 8(7) (2013) e66213, DOI: 10.1371/journal.pone.0066213 [15] A. Hausmann, H.C.J. Godfray, P. Huemer, M. Mutanen, R. Rougerie, E.J. van Nieukerken, et.al., Genetic patterns in European geometrid moths revealed by the Barcode Index Number (BIN) system. PloS one, 8(12) (2013) e84518, DOI: 10.1371/journal.pone.0084518 [16] R.A. Collins, R.H. Cruickshank, Known knowns, known unknowns, unknown unknowns and unknown knowns in DNA barcoding: a comment on Dowton et al. Systematic biology, 63(6) (2014) 1005-1009, DOI: 10.1093/sysbio/syu060 [17] T. Knebelsberger, A.R. Dunz, D. Neumann, M.F. Geiger, Molecular diversity of Germany's freshwater fishes and lampreys assessed by DNA barcoding. Mol. Ecol. resources, 15(3) (2015) 562- 572, DOI: 10.1111/1755-0998.12322 [18] M. Arenas, F. Pereira, M. Oliveira, N. Pinto, A.M. Lopes, V. Gomes, et.al., Forensic genetics and genomics: Much more than just a human affair. PLoS Genetics, 13(9) (2017) e1006960, DOI: 10.1371/journal.pgen.1006960 [18] Convention on International Trade in Endangered Species of Wild Flora and Fauna, https://www.cites.org/eng/disc/species.php [accessed on January 15th 2018]

124 General discussion

[19] R.L. Cann, M. Stoneking, A.C. Wilson, Mitochondrial DNA and human evolution. Nature, 325(6099) (1987) 31-36. [20] W. Parson, A. Dür, EMPOP—a forensic mtDNA database. Forensic Science International: Genetics, 1(2) (2007) 88-92, DOI: 10.1016/j.fsigen.2007.01.018 [21] P. Savolainen, J. Lundeberg, Forensic evidence based on mtDNA from dog and wolf hairs, J. Forensic Sci., 44 (1) (1999), 77-81, DOI: 10.1520/JFS14414J [22] J.H. Wetton, J.E. Higgs, A.C. Spriggs, C.A. Roney, C.S. Tsang, A.P. Foster, Mitochondrial profiling of dog hairs, Forensic Sci. International, 133(3) (2003) 235-241, DOI: 10.1016/S0379- 0738(03)00076-8 [23] H. Angleby, P. Savolainen, Forensic informativity of domestic dog mtDNA control region sequences. Forensic Sci Int 154 (2-3) (2005) 99–110, DOI: 10.1016/j.forsciint.2004.09.132 [24] K.M. Webb, M.W. Allard, Mitochondrial genome DNA analysis of the domestic dog: identifying informative SNPs outside of the control region. Journal of forensic sciences, 54(2) (2009) 275-288, DOI: 10.1111/j.1556-4029.2008.00953.x [25] M. van Oven, M. Kayser, Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Human mutation, 30(2) (2009) E386–E394, DOI: 10.1002/humu.20921 [26] M. Nei, Genetic support for the out-of-Africa theory of human evolution. Proc Natl Acad Sci USA 92 (1995) 6720–6722. [27] L. Chaitanya, M. van Oven, N. Weiler, J. Harteveld, L. Wirken, T. Sijen et.al., Developmental validation of mitochondrial DNA genotyping assays for adept matrilineal inference of biogeographic ancestry at a continental level. Forensic Science International: Genetics, 11 (2014) 39-51, DOI: 10.1016/j.fsigen.2014.02.010 [28] P. Savolainen, Y.P. Zhang, J. Luo, J. Lundeberg, T. Leitner, Genetic evidence for an East Asian origin of domestic dogs. Science, 298(5598) (2002) 1610-1613, DOI: 10.1126/science.1073906 [29] J.F. Pang, C. Kluetsch, X.J. Zou, A.B. Zhang, L.Y. Luo, H. Angleby, et.al., mtDNA data indicate a single origin for dogs south of Yangtze River, less than 16,300 years ago, from numerous wolves. Mol. Biol. Evol., 26(12) (2009) 2849-2864, 10.1093/molbev/msp195 [30] A. Ardalan, C.F. Kluetsch, A.B. Zhang, M. Erdogan, M. Uhlén, M. Houshmand, et.al., Comprehensive study of mtDNA among Southwest Asian dogs contradicts independent domestication of wolf, but implies dog–wolf hybridization. Ecology and evolution, 1(3) (2011) 373- 385. DOI: 10.1002/ece3.35 [31] C.F. Klütsch, E.H. Seppälä, T. Fall, M. Uhlén, Å. Hedhammar, H. Lohi, P. Savolainen, Regional occurrence, high frequency but low diversity of mitochondrial DNA haplogroup d1 suggests a recent dog‐wolf hybridization in Scandinavia. Animal genetics, 42(1) (2011) 100-103, DOI: 10.1111/j.1365- 2052.2010.02069.x [32] H.G. Parker, L.V. Kim, N.B. Sutter, S. Carlson, T.D. Lorentzen, T.B. Malek, et.al., Genetic structure of the purebred domestic dog. science, 304(5674) (2004) 1160-1164, DOI: 10.1126/science.1097406 [33] S. Desmyter, L. Gijsbers,. Belgian canine population and purebred study for forensics by improved mitochondrial DNA sequencing. Forensic Science International: Genetics, 6(1) (2012) 113- 120.10.1016/j.fsigen.2011.03.011 [34] J.D. Vigne, J. Guilaine, K. Debue, L. Haye, P. Gerard Early taming of the cat in Cyprus, Science, 304 (2004), 259, DOI: 10.1126/science.1095335 [35] M.J. Lipinski, L. Froenicke, K.C. Baysac, N.C. Billings, C.M. Leutenegger, A.M. Levy, et.al., The ascent of cat breeds: genetic evaluations of breeds and worldwide random-bred populations. Genomics, 91(1) (2008) 12-21., DOI: 10.1016/j.ygeno.2007.10.009

125 Chapter 7

[36] J.D. Kurushima, M.J. Lipinski, B. Gandolfi, L. Froenicke, J.C. Grahn, R.A. Grahn, L. Lyons, Variation of cats under domestication: genetic assignment of domestic cats to breeds and worldwide random‐bred populations. Animal genetics, 44(3) (2013) 311-324, DOI: 10.1111/age.12008 [37] F. D’Andrea, F. Fridez, R. Coquoz, Preliminary experiments on the transfer of animal hair during simulated criminal behaviour, J. Forensic Sci., 43 (6) (1998), pp. 1257–1258, DOI: 10.1520/JFS14399J [38] HAS Kennistransfer. Feiten & cijfers gezelschapsdierensector. Hogeschool HAS Den Bosch; 2011. [in Dutch] [39] F. Taroni, A. Biedermann, J. Vuille, N. Morling, Whose DNA is this? How relevant a question?(a note for forensic scientists), Forensic Science International: Genetics, 7(4) (2013) 467-470. DOI: 10.1016/j.fsigen.2013.03.012 [40] C. Eichmann, B. Berger, W. Parson, A proposed nomenclature for 15 canine-specific polymorphic STR loci for forensic purposes. International journal of legal medicine, 118(5) (2004) 249-266, DOI 10.1007/s00414-004-0452-5 [41] M. Dayton, M.T. Koskinen, B.K.Tom, A.M. Mattila, E. Johnston, J. Halverson, et.al., Developmental validation of short tandem repeat reagent kit for forensic DNA profiling of canine biological material. Croatian medical journal, 50(3) (2009) 268-285, DOI: 10.3325/cmj.2009.50.268 [42] N. Schury, U. Schleenbecker, A.P. Hellmann,Forensic animal DNA typing: Allele nomenclature and standardization of 14 feline STR markers, Forensic Sci. Int. Genet., 12 (2014) 42-59, DOI: 10.1016/j.fsigen.2014.05.002 [43] N. Dawnay, R. Ogden, R.S. Thorpe, L. Pope, D.A. Dawson, R. McEwing, A forensic STR profiling system for the Eurasian badger: a framework for developing profiling systems for wildlife species. Forensic Science International: Genetics, 2(1) (2008) 47-53, DOI: 10.1016/j.fsigen.2007.08.006 [44] K.L. Dicks, L.M.I. Webster, I McDowall, S.M. Muya, J. Hopper, P. O’Donoghue, P. Validation studies on dinucleotide STRs for forensic identification of black rhinoceros Diceros bicornis. Forensic Science International: Genetics, 26 (2017) e25-e27, DOI: 10.1016/j.fsigen.2016.10.016 [45] C. Howard, S. Gilmore, J. Robertson, R. Peakall, Developmental validation of a Cannabis sativa STR multiplex system for forensic analysis. Journal of forensic sciences, 53(5) (2008) 1061-1067, DOI: 10.1111/j.1556-4029.2008.00792.x [46] M. Wesselink, I. Kuiper, Individual identification of fox (Vulpes vulpes) in forensic wildlife investigations. Forensic Science International: Genetics Supplement Series, 3(1) (2011) e214-e215, DOI: 10.1016/j.fsigss.2011.08.107 [47] H.M. Coyle, T. Palmbach, N. Juliano, C. Ladd, H.C. Lee, An overview of DNA methods for the identification and individualization of marijuana. Croatian medical journal, 44(3) (2003) 315-321. [48] C. Howard, S. Gilmore, J. Robertson, R. Peakall, A Cannabis sativa STR genotype database for Australian seizures: forensic applications and limitations. Journal of forensic sciences, 54(3) (2009) 556-563. DOI: 10.1111/j.1556-4029.2009.01014.x [49] W.J. Koopman, I. Kuiper, D.J. Klein-Geltink, G.J. Sabatino, M.J. Smulders, Botanical DNA evidence in criminal cases: knotgrass (Polygonum aviculare L.) as a model species, Forensic Sci. Int. Genet. 6(3) (2012) 366-374. DOI:10.1016/j.fsigen.2011.07.013

126

Epilogue

Summary Samenvatting Overview of author contributions Acknowledgements

127

Summary

Summary

DNA markers for forensic identification of non-human biological traces

Although “the application of DNA in forensics” is often considered “the application of human DNA in forensics”, many other species can be of importance in forensic investigations. In this thesis, DNA markers are described that enable forensically relevant classification of three groups of non-human biological traces: fungi (Chapter 1), domestic cats (Chapters 2, 3 and 4) and birch trees (Chapters 5 and 6). The biological and forensic context applying to one or more of these groups are described in Chapter 7.

Obvious types of crimes in which non-human DNA is relevant, are the possession of biological drugs such as marijuana, magic mushrooms or khat, and possession of endangered species of plants or animals such as ivory, rhinoceros horn or bush meat. In such cases, classification of biological samples is generally needed to establish whether an illegal action has taken place. However, as samples are often manipulated in ways that prevent identification by traditional morphological features, techniques utilizing DNA to identify their origin can answer questions otherwise unresolved. Although even minute samples may contain sufficient DNA to identify the origin of a sample, reliable methods need to be available that enable discrimination between one class and the others.

To discriminate between hallucinogenic species of mushrooms that are controlled by law and their non-potent neighbour species, several DNA makers were compared (Chapter 1). Following a theoretical examination of these markers, the DNA sequences of the DNA markers ITS1, ITS1-5.8S-ITS2 and LSU were generated from multiple authenticated samples, market samples and police seizures. For the studied species no or little within species variation was detected for all these markers. Different patterns of interspecies variation were detected for the different markers; the lowest interspecies variation was found for marker LSU. Therefore this marker was rendered less suitable than the other two markers to distinguish between species of mushrooms as closely related as some hallucinogenic species and their non- potent neighbours. Consequently the use of ITS1-5.8S-ITS2 is advised. In an effort to incorporate previously published relevant fungal DNA sequences in the estimations of intra and interspecies variation, DNA sequences deposited in the international database GenBank were added to the comparative analyses. Although this database is a valuable source of information, multiple misidentified or mislabelled sequences were recognized that in the past have led to invalid conclusions concerning marker suitability, and could potentially lead to misidentification of unknown samples.

A different type of classification is necessary when biological traces are not the main subject of an investigation. Non-human biological traces can be secured in the investigation of many

129 Epilogue

different types of crime when traces for example have been transferred from a victim or crime scene to a perpetrator, vehicle or tool. After such a trace is secured, the name of the species of origin does no longer answer a forensically relevant question, but is merely the starting point of an investigation. Determining how large the class is from which the biological trace originated then becomes of importance. Based on the prevalence in case investigations, techniques were developed to obtain such information from cats (Chapters 2, 3 and 4) and birch (Chapters 5 and 6) for the two sample types that are most often encountered: cat hairs and birch seeds. Although parallels exist between developing DNA typing methods for these two types of trace evidence, the biological nature of the traces implicate completely different criteria must be met.

Shed animal hairs are a source of mitochondrial DNA; DNA that is inherited maternally and therefore indistinguishable between maternally related individuals. Several portions of the mitochondrial control region were selected and tested for their potential to differentiate between Dutch cats. Analysis of multiple portions of this control region proved to be the most informative, although after applying this method to a large group of reference cats, one mitochondrial DNA type was found to be overrepresented in the Dutch population that had not previously been described for other cat populations (Chapter 2). To enable not only the efficient typing of reference cats, but also of samples containing DNA of lesser quantity and quality the technique was additionally improved (Chapter 3).

To further investigate the differences between the published distributions of feline mitochondrial DNA types, additional samples were sourced from the Netherlands, Belgium and Germany. After a detailed comparison of these North West continental European cats with data from the United States of America, Canada, the United Kingdom and Poland, the value of the previously proposed DNA marker was demonstrated, as were several peculiarities. Difference between the distributions of the North West continental European cat population on one hand, and cats from the United Kingdom, Canada and parts of the United States of America on the other hand were recognized, as were differences between micropopulations in Europe. Analysis of this combined dataset also demonstrated the need to improve the nature of the data recorded and published, for which certain technical recommendations were proposed. Additionally, due to the observed variation between populations and datasets, the need to define the relevant reference population to evaluate data in a forensic investigation was pointed out (Chapter 4).

Birch trees are ubiquitous in the Netherlands and traces from birch trees are often encountered as trace evidence; seeds constituting the majority of traces. As opposed to cat hairs, seeds contain nuclear DNA, which can potentially be used to distinguish individual organisms. Therefore a method was developed and validated to type the nuclear DNA of birch trees. This method was tested on the two species of birch naturally occurring in the Netherlands to determine the power of the technique to discriminate between individuals and investigate whether differences could be observed between different sampling locations. The between

130 Summary location but within species variation was found to be smaller than the between species but within location variation. Therefore different databases were established to record the population data needed to evaluate evidence. Furthermore, to determine the evidential value of matching birch DNA profiles, different models were considered for these two species of trees, as one species is a diploid species and the other is a tetraploid species. For both species, the resulting random match probabilities were found to be highly relevant for forensic investigations. Additionally, the effect of humans interfering with birch reproduction through cloning or breeding of birch varieties was estimated, as clones have DNA profiles identical to their “mother’s” profile and therefore have a lower evidential value (Chapter 5).

After the value of the developed technique was established, the type of birch trace most often encountered, was further investigated. Both fertilized and unfertilized seeds are produced in large number by birch trees. Whilst fertilized seeds contain embryonic DNA and maternal DNA, unfertilized seeds contain only maternal DNA, making these traces the most suitable to link seeds to their trees of origin. However, as only minute amounts of DNA are present in these seeds, the developed technique required adaption prior to obtaining robust DNA profiles from seeds. Additionally the framework to estimate the value of matching DNA profiles is influenced when partial profiles are obtained, and determining whether a profile originated from a diploid or a tetraploid individual is not feasible. The value of this method in a forensic investigation is illustrated by a case report, demonstrating both exclusions and matches with random match probabilities in the order of one in one million (Chapter 6).

Several general observations concerning the identification of species, the value of mitochondrial DNA profiles, and the development and validation of nuclear DNA typing methods are not only true for the species described in Chapters 1 to 6, but are also relevant for other (related) species. These include the value of curated databases and relevant reference collections, but also laws that have a biological scientific basis and that are revised periodically. Extrapolation of these findings to other species, sample types and crime scenes, will increase the applicability of non-human biological traces in forensic investigations (Chapter 7).

131

Samenvatting

Samenvatting

DNA-merkers voor forensische identificatie van niet-humane biologische sporen

Binnen het forensisch werkveld wordt “DNA-onderzoek” veelal geïnterpreteerd als “humaan DNA-onderzoek”, ondanks dat er veel andere soorten organismen zijn die een rol van betekenis kunnen spelen in forensisch onderzoek. In dit proefschrift worden DNA merkers beschreven die de forensisch relevante classificatie van drie groepen niet-humane biologische sporen beschrijven: schimmels (Hoofdstuk 1), huiskatten (Hoofdstukken 2, 3 en 4) en berkenbomen (Hoofdstukken 5 en 6). De algemene biologische en forensische aspecten die van toepassing zijn op één of meerdere van deze groepen worden beschreven in Hoofdstuk 7.

Onderzoek aan niet-humaan DNA ligt voor de hand in zaken waarin bezit van specifiek biologisch materiaal forensische relevantie heeft. Zowel het bezit van biologische drugs waaronder marihuana, paddo’s of qat, als het bezit van beschermde planten en diersoorten zoals ivoor, neushoorn hoorn of bushmeat zijn hier voorbeelden van. In dergelijke onderzoeken is classificatie van biologische samples vaak noodzakelijk om na te gaan of al dan niet van een strafbaar feit sprake is. De te onderzoeken samples zijn vaak dusdanig bewerkt dat onvoldoende uiterlijke kenmerken beschikbaar zijn voor morfologische identificatie. DNA- onderzoek kan dan uitkomst bieden. Hoewel zeer kleine samples al voldoende DNA kunnen bevatten om de herkomst te bepalen, is het noodzakelijk over betrouwbare methoden te beschikken die onderscheid kunnen maken tussen de ene klasse en de anderen.

Om onderscheid te maken tussen hallucinogene paddenstoelen die bij wet verboden zijn, en de legale nauwst verwante soorten zonder hallucinogene eigenschappen, zijn enkele DNA- merkers vergeleken (Hoofdstuk 1). Na een theoretische beschouwing zijn de DNA-sequenties van merkers ITS1, ITS1-5.8S-ITS2 en LSU gegenereerd uit schimmels met bekende herkomst en in beslag genomen monsters. Binnen de schimmelsoorten waarnaar onderzoek is gedaan, is geen of zeer weinig binnen soort variatie waargenomen. Diverse patronen van tussen soort variatie zijn waargenomen voor de verschillende merkers. Merker LSU bleek de geringste tussen soort variatie te vertonen, waardoor deze merker als minder bruikbaar is aangemerkt om onderscheid te maken tussen soorten schimmels die zo nauw aan elkaar verwant zijn als hallucinogene soorten en hun niet-hallucinogene verwanten. Uiteindelijk is het gebruik van merker ITS1-5.8S-ITS2 aangeraden. In een poging om eerder gepubliceerde DNA-sequenties van relevante schimmelsoorten te betrekken in de schattingen van binnen en tussen soort variatie zijn DNA sequenties die in GenBank zijn gedeponeerd meegenomen in de vergelijkingen. Hoewel deze database een waardevolle informatiebron is, zijn meerdere sequenties gevonden die onjuist geïdentificeerd of benoemd zijn. In het verleden hebben deze sequenties tot incorrecte conclusies over de geschiktheid van DNA-merkers geleid, mogelijk zouden zij kunnen leiden tot onjuiste identificatie van onbekende samples.

133 Epilogue

Een andere manier van classificatie is noodzakelijk wanneer een biologisch spoor niet het hoofdonderwerp van onderzoek is. Niet-humane biologische sporen kunnen ook worden veiliggesteld bij andere type delicten wanneer zij bijvoorbeeld zijn overgedragen van een slachtoffer of plaats delict naar een dader, voertuig of gereedschap. Nadat een dergelijk spoor is veiliggesteld, beantwoordt de naam van de soort van herkomst geen forensisch relevante vraag meer, maar vormt de naam het startpunt voor een vervolgonderzoek. Het bepalen van de grote van de klasse waar een biologische spoor vandaan is gekomen, krijgt dan waarde. Gebaseerd op de frequentie van voorkomen in zaak onderzoek, zijn technieken ontwikkeld om dergelijke informatie te verkrijgen voor katten (Hoofstukken 2, 3, en 4) en berken (Hoofdstukken 5 en 6) voor de twee typen sporen die het vaakst worden aangetroffen: katten haren en berkenzaden. Hoewel er parallellen bestaan tussen het opzetten van DNA- typeringsmethoden voor deze twee soorten sporen, zorgen de verschillen tussen haren enerzijds en zaden anderzijds ervoor dat aan andere eisen moet worden voldaan.

Uitgevallen dierharen zijn een bron van mitochondriaal DNA, DNA dat via de vrouwelijke lijn overerft en daardoor niet te onderscheiden is tussen moeder en kind, en daardoor gelijk is tussen broers en zussen etc. Verschillende delen van de mitochondriale “control region” zijn geselecteerd en getest om na te gaan of met deze delen onderscheid gemaakt kon worden tussen Nederlandse katten. Analyse van meerdere van deze delen bleek de meeste informatie op te leveren. Na het testen van een grote groep referentie katten bleek dat één mitochondriaal DNA type oververtegenwoordigd was in de Nederlandse populatie, iets wat nog niet eerder was beschreven voor andere katten populaties (Hoofdstuk 2). Om niet alleen efficiënt referentiekatten te kunnen typeren maar de techniek ook toe te kunnen passen op monsters die minder of slechtere kwaliteit DNA bevatten, is de techniek verbeterd (Hoofdstuk 3).

Om de verschillen tussen gepubliceerde frequentieverdelingen van mitochondriale DNA types van katten te onderzoeken, zijn aanvullende monsters van katten verzameld in Nederland, België en Duitsland. Vergelijking tussen deze Noord West continentaal Europese katten en gegevens verzameld in de Verenigde Staten, Canada, het Verenigd Koninkrijk en Polen illustreerde de waarde van de eerder beschreven DNA merker. Daarnaast zijn enkele bijzonderheden waargenomen. De mitochondriale DNA type verdeling van de katten uit Nederland, Belgie en Duitsland enerzijds en uit het Verenigd Koninkrijk, Canada en delen van de Verenigde Staten anderzijds verschilden van elkaar. Daarnaast zijn verschillen waargenomen in micropopulaties in Europa. Bovendien liet vergelijking van deze gecombineerde dataset zien dat het wenselijk is om registratie en publicatie van mitochondriale DNA data van katten en metadata te verbeteren. Hiervoor zijn enkele technische aanbevelingen geponeerd. Tenslotte is, door de waargenomen verschillen tussen populaties en datasets, gedemonstreerd dat het noodzakelijk is om de relevante referentiepopulatie te definiëren die gebruikt wordt om waarnemingen op waarde te schatten in een forensisch onderzoek (Hoofdstuk 4).

134 Samenvatting

In Nederland zijn berkenbomen alomtegenwoordig. Sporen van berkenbomen worden regelmatig aangetroffen in forensische onderzoeken, waarbij het merendeel van de sporen berkenzaden zijn. Anders dan kattenharen, bevatten berkenzaden nucleair DNA dat de potentie heeft om individuele organismen te onderscheiden. Er is dan ook een methode ontwikkeld en gevalideerd om nucleair DNA van berkenbomen te typeren. Deze methode is getest op de twee soorten berken die van nature in Nederland voorkomen om na te gaan hoe goed met deze techniek onderscheid gemaakt kan worden tussen individuele bomen en om te onderzoeken of er verschillen zijn waar te nemen tussen monstername locaties. De verschillen tussen locaties maar binnen een soort bleken kleiner dan de verschillen tussen soorten maar binnen een locatie. Om deze reden zijn twee verschillende databases opgezet om de populatie data te registreren die noodzakelijk is om forensische onderzoeksresultaten op waarde te kunnen schatten. Bovendien zijn verschillende modellen beschouwd om de waarde van matchende berken DNA profielen te bepalen, aangezien de ene soort diploid is, en de andere tetraploid. Voor beide soorten bleken de berekende random match kansen dusdanig laag dat zij forensisch relevant zijn. Naast een inschatting van de random match kans van natuurlijk voorkomende berken is ingeschat wat de invloed van stekken en veredelen van berken door mensen tot gevolg heeft, aangezien het DNA profiel van een stek hetzelfde DNA profiel heeft als zijn “moeder” wat leidt tot een lagere bewijskracht (Hoofdstuk 5).

Nadat de waarde van deze techniek is vastgesteld, is het type berkenspoor dat het vaakst wordt aangetroffen, nader onderzocht. Berkenbomen produceren grote aantallen bevruchte en onbevruchte zaden. Waar bevruchte zaden zowel embryonaal als maternaal DNA bevatten, bevatten onbevruchte zaden uitsluitend maternaal DNA. Onbevruchte zaden zijn daarom het meest geschikt om aan een boom van origine te linken. Omdat slechts een geringe hoeveelheid DNA in dergelijke zaden aanwezig is, was het noodzakelijk de ontwikkelde techniek verder te optimaliseren voordat robuuste DNA profielen van zaden verkregen konden worden. Daarnaast is een methode ontwikkeld om de waarde van matchende DNA profielen te bepalen als slechts partiele profielen zijn verkregen, temeer omdat dergelijke profielen zich niet lenen om eenduidig vast te stellen of zij van een diploide of tetraploide boom afkomstig zijn. De toegevoegde waarde van deze methode in een forensisch onderzoek wordt gedemonstreerd met een zaakvoorbeeld, waarin zowel uitsluitingen als matches aan bod komen (Hoofdstuk 6).

Verscheidene observaties over de identificatie van soorten, de waarde van mitochondriale DNA profielen, en de ontwikkeling en validatie van nucleaire DNA typeringsmethoden zijn niet alleen van toepassing op de soorten beschreven in hoofdstukken 1 t/m 6, maar ook op andere (verwante) soorten. Voorbeelden zijn de waarde van gecontroleerde databases, relevante referentie collecties, maar ook wetten met een wetenschappelijke biologische grondslag die met regelmaat worden herzien. Door deze bevinden te extrapoleren naar andere soorten, types samples en types delicten, zal de toepasbaarheid van niet-humane forensische sporen in forensische onderzoeken toenemen (Hoofdstuk 7).

135

Overview of author contributions

Overview of author contributions

Outline of this thesis

M. Wesselink (PhD Candidate) authored text of this chapter. I. Kuiper Reviewed the final manuscript.

Chapter 1 Molecular species identification of “Magic Mushrooms”

The work in this chapter has sparked an international collaboration. At the time of submission of this dissertation to the Doctorate Committee, a manuscript describing this extended study is in preparation. M. Wesselink (PhD Candidate) Development of the concept, performed laboratory procedures in the Netherlands, comparative studies, authored text of this chapter. E.M. van Ark Assistance with laboratory procedures. I. Kuiper Discussion and development of the concept and supervision of the research, review of the manuscript.

Chapter 2 Forensic utility of the feline mitochondrial control region - A Dutch perspective

This chapter was originally published as: M. Wesselink, L. Bergwerff, D. Hoogmoed, A.D. Kloosterman, I. Kuiper, Forensic utility of the feline mitochondrial control region - A Dutch perspective, Forensic Science International: Genetics, 17 (2015) 25-32. M. Wesselink (PhD Candidate) Development of the concept, (supervision of) laboratory procedures, comparative studies, authored text of this chapter. L. Bergwerff Performed the majority of the laboratory procedures. D. Hoogmoed Assistance with laboratory procedures. A.D. Kloosterman Discussion of the most important messages based on an early draft of the manuscript. I. Kuiper Discussion and development of the concept and supervision of the research, review of the manuscript.

137 Epilogue

Chapter 3 Forensic analysis of mitochondrial control region DNA from single cat hairs

This chapter was originally published as: M. Wesselink, L. Bergwerff, I. Kuiper, Forensic analysis of mitochondrial control region DNA from single cat hairs, Forensic Sci. Int.: Genetics Supplement Series 5 (2015) e564-e565. M. Wesselink (PhD Candidate) Development of the concept, laboratory procedures, authored text of this chapter. L. Bergwerff Assistance with laboratory procedures. I. Kuiper Discussion and development of the concept and supervision of the research, review of the manuscript.

Chapter 4 Local populations and inaccuracies: Determining the relevant mitochondrial haplotype distributions for North West European cats

This chapter was originally published as: M. Wesselink, S. Desmyter, I. Kuiper, Local populations and inaccuracies: Determining the relevant mitochondrial haplotype distributions for North West European cats, Forensic Science International: Genetics, 30 (2017) 71-80. M. Wesselink (PhD Candidate) Development of the concept, laboratory procedures, data analysis, authored text of this chapter. Performed revisions to satisfy reviewer concerns. S. Desmyter Discussion and development of the concept, reviewed the final manuscript. I. Kuiper Discussion and development of the concept and supervision of the research, review of the manuscript.

Chapter 5 DNA typing of birch: Development of a forensic STR system for Betula pendula and Betula pubescens

At the time of submission of this dissertation to the Doctorate Committee, the work in this chapter is under review as a revised manuscript submitted to the journal Forensic Science International: Genetics. M. Wesselink (PhD Candidate) Co-authored text of this chapter, development of the concept, (supervision of) sequencing and genotyping, data analysis. A. Dragutinović Co-authored text of this chapter, development of the concept, (supervision of) genotyping, data analysis. J. W. Noordhoek Performed sampling and genotyping of majority of the population samples. L. Bergwerff Performed sequencing of alleles. I. Kuiper Discussion and development of the concept and supervision of the research, review of the manuscript.

138 Overview of author contributions

Chapter 6 The forensic potential of DNA typing of birch (Betula) seeds

At the time of submission of this dissertation to the Doctorate Committee, the work in this chapter is under review as manuscript submitted to the journal Forensic Science International: Genetics.

M. Wesselink (PhD Candidate) Co-authored text of this chapter, development of the concept, laboratory procedures, data analysis. A. Dragutinović Co-authored text of this chapter, development of the concept, laboratory procedures. E.M. van Ark Assistance with laboratory procedures. I. Kuiper Discussion and development of the concept and supervision of the research, review of the manuscript.

Chapter 7 General discussion

M. Wesselink (PhD Candidate) Authored text of this chapter. A.D. Kloosterman Discussion of the most important messages based on an early draft of the manuscript. I. Kuiper Reviewed the final manuscript.

139