<<

Forensic Science International: Genetics 23 (2016) 197–209

Contents lists available at ScienceDirect

Forensic Science International: Genetics

journa l homepage: www.elsevier.com/locate/fsig

Forensic timber identification: a case study of a CITES listed ,

Gonystylus bancanus ()

a, a a a

Kevin Kit Siong Ng *, Soon Leong Lee , Lee Hong Tnah , Zakaria Nurul-Farhanah ,

a a b c d e

Chin Hong Ng , Chai Ting Lee , Naoki Tani , Bibian Diway , Pei Sing Lai , Eyen Khoo

a

Genetics Laboratory, Research Institute , 52109 Kepong, Selangor Darul Ehsan, Malaysia

b

Forestry Division, Japan International Research Center for Agricultural Sciences, Ohwashi, Ibaraki 305-8686, Japan

c

Sarawak Forestry Corporation, Botanical Research Centre Semenggoh, KM20, Jalan Puncak , 93250 Kuching, , Malaysia

d

Malaysia Pepper Board, Lot 1115, Jalan Utama, Tanah Putih, 93916 Kuching, Sarawak, Malaysia

e

Forest Research Centre, KM 23, Labuk Road, Sepilok, 90715 Sandakan, Sabah, Malaysia

A R T I C L E I N F O A B S T R A C T

Article history:

Received 25 April 2015 and smuggling of (Thymelaeaceae) poses a serious threat to this

Received in revised form 6 May 2016 fragile valuable peat swamp timber species. Using G. bancanus as a case study, DNA markers were used to

Accepted 7 May 2016 develop identification databases at the species, population and individual level. The species level

Available online 10 May 2016

database for Gonystylus comprised of an rDNA (ITS2) and two cpDNA (trnH-psbA and trnL) markers based

on a 20 Gonystylus species database. When concatenated, taxonomic species recognition was achieved

Keywords:

with a resolution of 90% (18 out of the 20 species). In addition, based on 17 natural populations of

DNA barcoding

G. bancanus throughout West (Peninsular Malaysia) and East (Sabah and Sarawak) Malaysia, population

DNA profiling

and individual identification databases were developed using cpDNA and STR markers respectively. A

Ramin

haplotype distribution map for Malaysia was generated using six cpDNA markers, resulting in 12 unique

Random match probability

multilocus haplotypes, from 24 informative intraspecific variable sites. These unique haplotypes suggest

Short tandem repeats (STRs)

a clear genetic structuring of West and East regions. A simulation procedure based on the composition of

Timber tracking

the samples was used to test whether a suspected sample conformed to a given regional origin. Overall,

the observed type I and II errors of the databases showed good concordance with the predicted 5%

threshold which indicates that the databases were useful in revealing provenance and establishing

conformity of samples from West and East Malaysia. Sixteen STRs were used to develop the DNA profiling

databases for individual identification. Bayesian clustering analyses divided the 17 populations into two

main genetic clusters, corresponding to the regions of West and East Malaysia. Population substructuring

(K = 2) was observed within each region. After removal of bias resulting from sampling effects and

population subdivision, conservativeness tests showed that the West and East Malaysia databases were

conservative. This suggests that both databases can be used independently for random match probability

estimation within respective regions. The reliability of the databases was further determined by

independent self-assignment tests based on the likelihood of each individual’s multilocus genotype

occurring in each identified population, genetic cluster and region with an average percentage of

correctly assigned individuals of 54.80%, 99.60% and 100% respectively. Thus, after appropriate validation,

the genetic identification databases developed for G. bancanus in this study could support forensic

applications and help safeguard this valuable species into the future.

ã 2016 Elsevier Ireland Ltd. All rights reserved.

1. Introduction

In recent years, large areas of forest have been lost due to

conversion for agricultural use or degradation through poor

* Corresponding author.

E-mail addresses: [email protected] (K.K.S. Ng), [email protected] (S.L. Lee), logging practices neglectful of sustainability and biodiversity [1].

[email protected] (L.H. Tnah), [email protected] (Z. Nurul-Farhanah), As tropical forest degradation and deforestation continue, the loss

[email protected] (C.H. Ng), [email protected] (C.T. Lee), [email protected]

of biodiversity in the tropics also occurs in tandem. Illegal logging

(N. Tani), [email protected] (B. Diway), [email protected] (P.S. Lai),

and the international trade in illegally sourced timber are also

[email protected] (E. Khoo).

http://dx.doi.org/10.1016/j.fsigen.2016.05.002

1872-4973/ã 2016 Elsevier Ireland Ltd. All rights reserved.

198 K.K.S. Ng et al. / Forensic Science International: Genetics 23 (2016) 197–209

major contributors to deforestation and biodiversity losses. It has polymorphic nuclear short tandem repeat (STR) markers could be

been estimated that 50% of the timber traded in the market from used to generate DNA profiling databases, in which a suspected log

the Amazonian Basin, Central Africa, Southeast Asia and the could be matched with its original stump [16,17]. To ensure

Russian Federation are from illegal sources [2]. valuable species are not lost forever due to illegal logging, all three

The Convention on International Trade in Endangered Species DNA-based methods require swift development of large compre-

of Wild Fauna and Flora (CITES) is the most prevalent convention in hensive databases to be incorporated into the traceability systems.

relation to rare and threatened species. Currently, there are 181 In principle, the species of unknown samples can be

parties in CITES seeking to build the capacity of more effective law identified easily with a set of cpDNA and/or rDNA markers based

enforcement to restrict trade of species listed under the Conven- on sequence variation to a known species database [18].

tion. Listings on CITES are currently the legal mechanism by which Subsequently, the geographical origin of wood samples can then

importing countries can take action to halt shipments of such be examined with a number of cpDNA markers that clear

items. Consuming countries must only allow the import of timber geographical patterns within the species distribution range. A

and timber products with official CITES permits from exporting previous study on odorata throughout Mesoamerica using

parties. For example, an exporting country would require a CITES cpDNA markers indicated a strong geographical pattern that can be

export permit and a certificate of origin guaranteeing the product used to determine the geographical origin of C. odorata timbers

did not originate elsewhere. All CITES parties are required to issue [19]. It was also proven possible on European [14], where the

CITES certificates for any re-exportation, and must only issue these strong geographical structure and differentiation of the western

if they are satisfied that the wood has been imported legally with versus eastern populations were used to develop an oak wood

CITES documents. Unfortunately, due to fraud and poor gover- traceability system. In addition, a recent study demonstrated that

nance, the implementation of reliable certification methods Neobalanocarpus heimii, a dipterocarp timber species, could be

presents a challenge in many countries [3]. distinguished from the northern and southern regions of

Numerous other initiatives have been implemented to deal Peninsular Malaysia using cpDNA markers [15]. In that study,

with illegally sourced timber and timber products. In 2008, the partitioning of individuals specifically to the source population was

USA amended its Lacey Act to prohibit all trade in timber and rather limited due to the paucity of variable sites in the cpDNA

timber products that are illegally sourced within or outside its used. In another related study, partitioning of N. heimii individuals

borders. In 2003, the European Union (EU) developed the Forest to specific populations was achieved because of the availability of

Law Enforcement, Governance and Trade (FLEGT) Action Plan an individual identification database generated using STR markers,

which provides a number of measures to exclude illegal timber along with the use of individual assignment test based on

from the markets, improve the supply of legal timber and increase multilocus genotypes [16]. An assignment test can provide a

the demand for wood products from legal sources. The two main likelihood that an unknown sample originated in a given

elements of the action plan are the European Union Timber population [20–23].

Regulations (EUTR) and Voluntary Partnership Agreements (VPAs). Herein, DNA-based technology was employed for species,

On 3 March 2013, the new EUTR came into effect to stop the population and individual identification of a CITES listed species,

circulation of illegally logged wood in the EU. Under the EUTR, only Gonystylus bancanus distributed throughout Malaysia. This study

wood carrying a FLEGT license or CITES permit is considered legally aimed to develop a timber tracking in the context of illegal

sourced. VPAs are trade agreements with timber exporting logging and chain-of-custody certification, specifically to (1)

countries that help prevent illegal timber from being placed on identify markers suitable for species identification databases;

the European market. There are currently six countries (Cameroon, (2) identify specific chloroplast haplotypes that could be used to

Central African Republic, Ghana, , Liberia and Republic of generate a population identification database; (3) generate a DNA

the Congo) that have signed VPAs with the EU. profiling database for individual identification of G. bancanus based

Low technology such as (conventional and on a subpopulation-cum-inbreeding model; and (4) evaluate the

ultraviolet) have been widely used to mark and track the flow of reliability of the individual identification database as a reference

timber. However, these relatively inexpensive tools are not database using Bayesian assignment approaches.

particularly useful in broader-based or national-level efforts to

control illegal logging or trade as they are easily reproduced by 2. Materials and methods

unscrupulous individuals. More stringent national tracking sys-

tems involving a combination of databases have been imple- 2.1. Species description

mented, such as the physical tagging of logs and spot-checking [4].

Unfortunately, most of these timber tracking systems rely mostly Gonystylus is one of the three genera of under the

on tags or certificates of origin issued at source which are Gonystylus subfamily of the Thymelaeaceae family. Gonys-

vulnerable to falsification in many countries [5]. Although, fourier tylus consists of approximately 30 species of tall and shrubs

transform near-infra-red spectroscopy and isotopic analyses have [24]. The most common generic name for Gonystylus and also the

been used to reveal provenance [6–9], these are limited by trade name of its timber is ‘Ramin’ [25] with the most important

inconsistent results [6,8]. source being the species Gonystylus bancanus (Miq.) Kurz native to

DNA markers could be applied for forensic timber identification , Indonesia (Kalimantan and Sumatra) and Malaysia (Malay

in three ways, namely (1) for species identification; ribosomal Peninsula, Sabah and Sarawak). Gonystylus bancanus grows mainly

nuclear DNA (rDNA) markers such as the internal transcribed in peat swamp , where it dominates the vegetation and can

spacer (ITS) is a powerful phylogenetic marker at the species level be locally common. Decline in peat swamp vegetation and logging

showing high levels of interspecific divergence [10,11] and have led to concerns about the conservation status of this species;

chloroplast DNA (cpDNA) markers have been widely used for assigned as Vulnerable (VU) and at high risk of extinction in the

DNA barcoding technology due to their highly conserved region wild in the medium-term future (A1 cd) under the IUCN Red List

and relatively low evolutionary rates compared with nuclear Categories and Criteria (version 2.3) [26]. Gonystylus bancanus

markers [12,13]. (2) for population identification; cpDNA markers populations are becoming increasingly fragmented and scattered

that are variable enough to reveal geographical structure could be due to deforestation. Owing to the high demand for its timber in

used to differentiate the origin of one source of timber from the international market, illegal harvesting and smuggling of this

another [14,15], and (3) for individual identification; a set of highly species have become rampant [27]. It is expected that the species

K.K.S. Ng et al. / Forensic Science International: Genetics 23 (2016) 197–209 199

might become threatened in the near future. Presently, it is listed in Specifically, 8 mL of ExoSAP-IT (50X dilution) was added to 2 mL of

CITES Appendix II, which means that exporting G. bancanus wood each PCR product and incubated at 37 C for 30 min to degrade the

has to be accompanied by a CITES export permit indicating it is remaining primers and nucleotides. The mixture was incubated at

legally acquired and its export not detrimental to the survival of 80 C for 20 min to inactivate the ExoSAP-IT reagent.

the species in the wild [27]. The cleaned products were sequenced in both directions using

the BigDye Terminator Sequencing Kit v.3.1 (Applied Biosystems)

2.2. Sample collection and DNA extraction based on the standard dideoxy-mediated chain termination

method. The sequencing thermal profile was 50 cycles at 96 C

For species identification, a total of 20 Gonystylus species were for 10 s, 50 C for 5 s and 60 C for 4 min on a GeneAmp PCR System

used (Table S1). Each species was represented by two to five 9700. Sequencing reactions were purified by ethanol precipitation

samples. Voucher specimens for each of the samples are kept at the and run on the ABI 3130l Genetic Analyzer (Applied Biosystems).

herbarium of the Botanical Research Centre (BRC), Kuching, Sequencing data were edited and assembled using SEQUENCHER

Sarawak, Malaysia. v.5.0 (Gene Code Corporation). The aligned sequence from each

A total of 747 G. bancanus individuals with more than 1 cm sample was trimmed to the same length.

diameter at breast height (dbh) were sampled from 17 natural

populations throughout Malaysia of which seven populations 2.3.2. Species identification database

comprising 280 individuals from West Malaysia and ten popula- The informative variable sites, both substitutions and insertions

tions comprising 467 individuals from East Malaysia (Table 1 and and deletions (indels), which distinguish Gonystylus species, were

Fig. 1). extracted (Table S3). The sequences of each region (trnH-psbA, trnL

All samples were collected either in the form of inner bark or and ITS2) and a concatenated sequence comprising all three

leaf tissues. Genomic DNA was extracted using the Cetyl regions (trnH-psbA + trnL + ITS2) were analyzed using three differ-

trimethylammonium bromide (CTAB) procedure [28] with modi- ent approaches to ascertain the resolution of the species

fication (2x CTAB), and further purified using High Pure PCR identification database. First, a nearest-neighbour approach;

Template Preparation Kit (Roche Diagnostics GmbH). sequences were aligned using MUSCLE [29] and a Neighbour-

Joining (NJ) was constructed using the Kimura 2-parameter

2.3. Species identification (K2P) model with 1000 bootstrap replicates implemented in

MEGA6 [30]. Secondly, a barcoding gap approach, in which

2.3.1. PCR amplification and sequencing intraspecific variation and inter-specific divergences are quantified

All extracted DNA from the 20 species, comprising 76 samples across Gonystylus species using the generated K2 P distance matrix

underwent PCR amplification using the Type-it Microsatellite-PCR [31]. In this approach, a species is considered to be resolved if the

Kit (Qiagen). One rDNA, internal transcribed spacer 2 (ITS2) and minimum interspecific distance is greater than the maximum

three cpDNA non-coding intergenic regions (trnL and trnH-psbA) intraspecific distance [32]. Finally, the standalone Basic Local

were selected for species identification (Table S2). The PCR Alignment Search Tool (BLAST) approach which uses the tradi-

amplifications were performed in 10 mL reactions consisting of tional BLAST algorithm to compute pairwise alignments between

approximately 10 ng of template DNA, 1X Type-it Mutiplex PCR reference and query sequences, and provides an alignment score

Master Mix (Qiagen), and 0.3 mM of each primer. The reaction for interpreting similarity. Briefly, for this approach, a self database

mixture was subjected to amplification using GeneAmp PCR was prepared consisting of a set of sequences generated from

System 9700 (Applied Biosystems) for an initial denaturing step of different Gonystylus species and then the same set of sequences

94 C for 5 min, 35 cycles of 95 C for 30 s, 50–55 C annealing was assigned as query sequences. The query sequences were

temperature for 90 s, and 72 C for 1 min. This was followed by blasted against the database. Successful species identification

further primer extension at 60 C for 30 min. The PCR products occurred when an alignment score of 100% identity with sequences

were purified using ExoSAP-IT PCR Product Cleanup (Affymetrix). of the same species.

Table 1

Locations of the 17 populations of Gonystylus bancanus and the number of samples used for individual and population identification.

a

Region Locality Population acronyms Latitude Longitude Number of samples used

Individual ID Population ID

0 0

West Malaysia 1. Pekan FR, Pahang PE N03 25 08.1” E103 21 17.8” 189 25

0 0

2. Nenasi FR, Pahang NE N03 03 32.2” E103 22 08.9” 15 15

0 0

3. Resak FR, Pahang RE N03 01 19.2” E103 20 56.3” 10 10

0 0

4. Kedondong FR, Pahang KE N03 16 17.4” E103 12 57.7” 4 4

0 0

5. Air Hitam FR, Johor AH N02 01 55.1” E102 48 33.1” 28 16

0 0

6. Sg. Karang FR, Selangor SK N03 42 20.5” E101 08 13.8” 14 14

0 0

7. Raja Musa FR, Selangor RM N03 26 59.9” E101 16 21.3” 20 16

0 0

East Malaysia 8. Sedilu FR, Sarawak SE N01 22 24.3” E110 49 39.9” 31 16

0 0

9. Lingga FR, Sarawak LI N01 21 13.1” E111 10 29.6” 30 16

0 0

10. Serapau-Lingga FR, Sarawak SL N01 19 08.6” E111 12 00.1” 30 16

0 0

11. Maludam NP, Sarawak MA N01 37 44.5” E111 03 29.2” 36 16

0 0

12. Manggut FR, Sarawak MN N01 28 31.2” E111 15 54.3” 53 16

0 0

13. Betong FR, Sarawak BE N01 25 12.4” E111 23 06.6” 40 16

0 0

14. Naman FR, Sarawak NA N02 11 02.8” E111 50 15.6” 55 16

0 0

15. Loagan Bunut NP, Sarawak LB N03 47 29.9” E114 14 10.6” 101 32

0 0

16. Kayangeran FR, Sarawak KG N04 53 12.7” E115 25 19.7” 29 16

0 0

17. Klias FR, Sabah KL N05 17 12.1” E115 37 44.8” 62 16

a

FR-Forest Reserve and NP-National Park.

200 K.K.S. Ng et al. / Forensic Science International: Genetics 23 (2016) 197–209

Fig. 1. Locations of the 17 Gonystylus bancanus populations and the number of samples collected (n) throughout Malaysia.

2.4. Population identification reference to the populations of origin based on the significant

intraspecific variable sites.

2.4.1. PCR amplification and sequencing A statistical procedure based on either presence or absence of a

A total of 31 cpDNA universal primer pairs for higher plants [33] given haplotype in the sample was used to test whether a sample

were initially screened for 8 individuals from different popula- conformed to its stated geographical origin [14]. The null

tions. Subsequently, a subset of 276 of the 747 G. bancanus samples hypothesis HO: ‘the sample is of the presumed origin’ was defined.

from the 17 populations (Table 1 and Fig. 4) was subjected to For each haplotype i identified in the samples, p(i) indicates its

cpDNA analysis to identify intraspecific variability. Six cpDNA non- frequency in the region of presumed origin, while N denotes the

coding regions were found to be informative in the characteriza- number of samples. The probability P(i) of observing a rare

tion of haplotypes in G. bancanus: the intergenic spacers of haplotype at least once is an indication of non-conformity, but the

trnK-rps16 [34], rpoC1-2 [33], trnE-trnT [35,36], ycf9-trnG [37,33], value would depend on the sample size. By using the classical

psbM-trnD [37,33] and trnG-rps14 [35]. PCR amplifications were threshold of p(i) = 0.05 as the criterion for rarity:

m

performed in 20 L reactions consisting of approximately 10 ng of N

if p(i) < 0.05, then P(i) = 1 [1 p(i)] ,

template DNA, 50 mM of KCl, 20 mM of Tris-HCl (pH 8.0), 1.5 mM of

MgCl2, 0.4 mM of each primer, 0.2 mM of each dNTP and 1 U of

GoTaq Flexi DNA polymerase (Promega). The reaction mixture was

if p(i) 0.05, then P(i) = 1.

subjected to amplification using GeneAmp PCR System 9700

(Applied Biosystems) for an initial denaturing step of 94 C for The probability P to observe a particular configuration of

5 min, 40 cycles of 94 C for 1 min, 50 C annealing temperature for haplotypes in the sample is given by P = PP(i). If P < S, with S being

1 min, and 72 C for 1 min. This was followed by a nal primer the maximum risk accepted and set to 0.05, with a risk a S (a is

extension at 72 C for 8 min. the probability of type I error), the hypothesis HO is rejected and

fi fi

PCR products were puri ed using the MinElute PCR Puri cation the sample is declared as non-conforming.

Kit (Qiagen) and sequenced in both directions using the BigDye Type I (false positive) and type II (false negative) errors of the

Terminator Sequencing Kit v.3.1 (Applied Biosystems) based on the population identification database (West and East Malaysia) were

standard dideoxy-mediated chain termination method. The estimated by simulating 10,000 artificial sample origins from

sequencing thermal pro le was 50 cycles at 96 C for 10 s, 50 C either the western or eastern provenance. The artificial samples

for 5 s and 60 C for 4 min on a GeneAmp PCR System 9700. were designed to have random haplotypes that were proportional

Sequencing reactions were puri ed by ethanol precipitation and to the haplotype frequencies determined in the 17 populations of

run on the ABI 3130xl Genetic Analyzer (Applied Biosystems). G. bancanus in Malaysia. The conformity of each artificial sample

Sequence data were edited and assembled using SEQUENCHER that originated from the western or eastern provenance was tested

v.5.0 (Gene Code Corporation). Haplotypes were determined based against the population identification databases. The proportion of

on nucleotide substitutions and indels. samples that were considered as non-conforming was measured.

2.4.2. Characterization of the population identi cation database and 2.5. Individual identification

conformity of origin

As a result of its uniparental inheritance, cpDNA was expected 2.5.1. PCR amplifications and reproducibility

to show pronounced levels of population differentiation with a All 747 G. bancanus DNA samples obtained from the 17

large proportion of population speci c haplotypes. Using the populations were used to generate a comprehensive individual

cpDNA markers, a haplotype distribution map throughout identification database for forensic DNA profiling (Table 1). Sixteen

Malaysia was generated. The population identi cation database STR loci developed specifically for G. bancanus: WGb06, WGb17,

was constructed based on 276 samples, which was assigned as WGb22, WGb24, WGb29, WGb32, WGb37, WGb38 and WGb39 [38];

K.K.S. Ng et al. / Forensic Science International: Genetics 23 (2016) 197–209 201

and Gba028, Gba092, Gba108, Gba129, Gba147, Gba348 and Gba430 of individuals sampled from a population and 2n is the number of

[39], which showed specific amplification, single-locus mode of chromosomes counted because autosomes are in pairs due to

inheritance, absence of mononucleotide repeat motifs and null inheritance of one allele each from the paternal and maternal

alleles were selected for DNA profiling [38–40]. Multiplex-PCR parents [57].

amplification was carried out using the Type-it Microsatellite-PCR

Kit (Qiagen). The multiplex-PCR amplifications were performed in 2.5.4. Conservativeness of the individual identification databases

8 mL reactions, with 5 ng template DNA, 1X Type-it Multiplex PCR The profile frequency or random match probability was

Master Mix and 0.067 mM of each primer. The reaction mixture calculated based on the subpopulation-cum-inbreeding model

was subjected to amplification using a GeneAmp PCR System 9700 [58] for the individuals from West Malaysia and East Malaysia;

(Applied Biosystems), for an initial denaturing step of 5 min at excluding those with missing alleles. The profile frequency was

95 C, followed by 35 cycles of 95 C (30 s), 50 C to 57 C (90 s), and calculated by first considering the genotype frequency for each

72 C (30 s), with a 30 min extension at 60 C. The PCR products locus and then multiplying the frequencies across all loci, while the

were electrophoresed along with GeneScan ROX 400 (Applied random match probability is simply the reciprocal of the profile

Biosystems) internal size standard on an ABI 3130xl Genetic frequency [54,59]. To examine the conservativeness of the West

Analyzer (Applied Biosystems). Allele sizes were assigned against and East Malaysia databases, the full profile frequency was

the internal size standard and individuals were genotyped using calculated based on the cognate database (Porigin) against the

GENEMAPPER v.4.0 (Applied Biosystems). Reproducibility of all profile frequency calculated using the West Malaysia database

STR markers was tested by performing five independent PCR (Pcombined). Similarly, the full profile frequency was calculated for

amplifications on one individual and comparing genotype data East Malaysia database. The differences (d) between the databases

[41]. were defined as: d = log10(Porigin/Pcombined). If the d value was

negative, then Porigin was less than Pcombined suggesting that the

2.5.2. Population structure and genetic clustering database was conservative [60].

A model-based clustering approach, employing a Bayesian For a non-conservative database, this could be ascribed to the

algorithm using STRUCTURE v.2.3.3 [42,43] was applied to infer the effect of sampling or population subdivision [61]. To negate the

genetic structure of the G. bancanus populations in West Malaysia problem of subpopulation that is either part of, or not represented

(7 populations), East Malaysia (10 populations), and combined in the working database, adjusting the profile frequency by the use

Malaysia (17 populations). Three independent runs were per- of u is recommended [62,63]. A series of u were applied to correct

formed by setting the number of groups (K), each run consisting of data comprising different subpopulations, of which Porigin was not

a burn-in period of 250,000 iterations followed by 850,000 Monte adjusted but Pcombined was recalculated. The effectiveness of u

Carlo Markov Chain iterations, assuming an admixture model and adjustment was then measured by calculating mean d, its standard

correlated allele frequencies. For other settings, the default deviation and proportion of samples where d < 0.

program was applied and no prior information was used to define

the groups. The most likely number of groups (K) was chosen based 2.5.5. Individual assignment test

on the Delta K statistic [44] and the data was analyzed by the online To ensure the reliability of the generated G. bancanus databases

version of STRUCTURE HARVESTER [45]. After the relevant K value (West and East Malaysia) in inferring the origin of any individuals,

was verified, individual admixture proportions were sorted and a three independent self-assignment approaches were established:

graphical representation of the clustering was generated using (i) by population (existing designated populations), (ii) by group

DISTRUCT v.1.1 [46]. (groups defined by population genetic structure and genetic

Genetic relatedness among populations was quantified using relatedness analyses) and (iii) by region (West and East Malaysia).

Chord distance, DC [47], and cluster analysis on DC via the The likelihood of each reference individual’s multilocus STR

neighbour-joining method [48]. Chord distance was selected genotype occurring in each identified population, group or region

because it has been shown by simulation analysis to generate was determined by performing independent assignment tests on

correct tree topologies regardless of the STR mutation model [49]. the reference databases using the leave-one-out procedure (self-

The relative strengths of the nodes were determined by boot- assignment) in GENECLASS2 [20], assuming random mating and

strapping analysis (1000 replicates) using the program POWER- independence between loci [21].

MARKER [50].

2.5.3. Characterization of the individual identification databases 3. Results

The allele frequency of each locus was computed using the

program MICROSATELLITE TOOLKIT [51]. The information content 3.1. Species identi cation

of the STR loci were assessed by calculating the number of alleles

per locus (A), observed heterozygosity (Ho) and expected High quality sequences for all three regions, two cpDNA (trnH-

heterozygosity (He) using the program GDA v.1.0 [52]. The psbA and trnL) and an rDNA (ITS2) were obtained in 93.42% of the

polymorphic information content (PIC), power of discrimination tested samples (71 out of 76 samples). All sequences have been

(PD) and match probability (MP) were estimated using the deposited in GenBank under Accession numbers KU990894

program POWERSTATS v.1.2 [53]. The match probability refers to KU991106. In the NJ tree method, the identity of a species is

the probability that two randomly selected individuals will have an considered to be resolved if the accessions under the species form a

identical genotype [54]. The evaluations of Hardy-Weinberg monophyletic group. In this method, based on the individual

equilibrium (HWE) and linkage equilibrium (LE) were performed regions used, the species discrimination was the highest for ITS2 at

using the program GENEPOP v.3.4 [55]. A sequential Bonferroni 80% (16 of the 20 Gonystylus species), however the remaining two

test was then used to compensate for multiple comparisons regions at 15% (3 of the 20 Gonystylus species) each. Higher species

between loci [56]. The coancestry coefficient (u) and inbreeding discrimination was achieved with 90% (18 of the 20 Gonystylus

coefficient (f) were estimated using the program GDA v.1.0 [52] and species) by concatenating the sequences of the three regions (Fig. 2

tested for significance by bootstrapping of 1000 replicates across and Table S4). The two unresolved species were G. augescens and G.

loci. The minimum allele frequency of 5/2n was calculated for the xylocarpus. Failure to achieve a 100% rate of species discrimination

adjustments of random match probability, where n is the number was due to the absence of sequence variation in these two species.

202 K.K.S. Ng et al. / Forensic Science International: Genetics 23 (2016) 197–209

Fig. 2. Neighbor joining tree of 20 Gonystylus species (Kimura-2-parameter, 1000 bootstraps).

Barcoding gap analysis on trnH-psbA and trnL showed 15% (3 of with BLAST E-value parameter. The result showed that 18 out of the

the 20 Gonystylus species) species resolution compared to that of 20 Gonystylus species were correctly identified to the same species

ITS2 with 80% (16 of the 20 Gonystylus species). The concatenated and the remaining two unresolved species were G. augescens and G.

sequences of the three regions showed high sequence variation xylocarpus.

between species (interspecific divergence) and low sequence

variation within species (intraspecific variation) for 18 out of the 20 3.2. Population identification

Gonystylus species (Fig. S1). This showed that 90% of the Gonystylus

species had higher interspecific distance compared to intraspecific 3.2.1. Haplotype distribution

distance. Similar to the NJ method, the two unresolved species In total, 24 intraspecific variable sites were detected within the

were G. augescens and G. xylocarpus (intraspecific = interspecific). six noncoding regions. Among these variable sites, seven sites were

The results for the standalone BLAST methodology (requiring an found in the trnK-rps16 spacer, six in the ycf9-trnG spacer, four in

identity score of 100% to assume species resolution) followed the trnE-trnT spacer, three in both the psbM-trnD and trnG-rps14

closely those of the previous two methods, with both the spacers and one in the rpoC1-2 spacer. In particular, 12 variable

concatenated sequences and ITS2 performing well (90% species sites were caused by base substitutions and the other 12 were the

resolution). Again, as with the previous two methodologies, trnH- result of indels. The sequences have been deposited in GenBank

psbA and trnL performed less well, establishing species resolution with Accession numbers KT964030–KT964054.

in only 15% of cases. Although both the concatenated sequences A total of 12 haplotypes were identified from the 24 variable

and ITS2 produced similar species resolution, the optimum result sites (Table S5). The most common haplotype, P1, was present in

was produced by the concatenated sequences when combined 47.6% of the samples whereas the second most common, P2, was

K.K.S. Ng et al. / Forensic Science International: Genetics 23 (2016) 197–209 203

found in 34.7%. Both haplotypes displayed disjunct distributions. populations was set at K = 2. This suggests that all the individuals

Haplotype P1 was widespread throughout East Malaysia whereas can be clustered into one of the two larger populations (Fig. 4a),

haplotype P2 was solely found in West Malaysia. Of the 12 corresponding to the two regions that are separated by the South

haplotypes, eight (P5-P12) were endemic to specific populations, China Sea; West (Peninsular Malaysia) and East (Sabah and

as shown in Fig. 3. Based on the cpDNA haplotypes, the results Sarawak) Malaysia. Nevertheless, further population substructur-

showed that the West and East Malaysia regions were clearly ing (K = 2) was observed within each of the regions (Fig. 4b); the

distinguishable. East Malaysia region was subdivided into two groups, East

Malaysia I (Sedilu FR, Lingga FR, Serapau-Lingga FR, Maludam

3.2.2. Test of conformity of origin NP, Manggut FR, Betong FR and Naman FR) and East Malaysia II

Type I and II errors were estimated for the population (Klias FR, Kayangeran FR and Loagan Bunut NP). The West Malaysia

identification database of West Malaysia. Type I error was region was similarly subdivided into West Malaysia I (Pekan FR,

estimated by simulating 10,000 samples of western provenance Nenasi FR, Resak FR, Kedondong FR and Air Hitam FR) and West

against the database of West Malaysia showed all samples Malaysia II (Sg. Karang FR and Raja Musa FR). The neighbour-

conformed (Fig. S2a). For the type II error, 100% of those from joining cluster analysis also divided the 17 populations into two

the eastern provenance resulted in non-conformity. Likewise, the main genetic clusters with substructuring within each main cluster

type I error for the population identification database of East (Fig. 4c). Based on the clustering results, West and East Malaysia

Malaysia was estimated by simulating 10,000 samples that databases were established with 280 and 467 individuals

originated from the eastern provenance of Malaysia against the respectively.

database of East Malaysia (Fig. S2b). The number of rejected

samples varied between 0 and 0.77%, which showed non- 3.3.3. Characterization of the individual identification database

conformity at the 5.0% threshold. In order to estimate the type The distribution of the allele frequencies and genetic param-

II error, a similar simulation was performed with samples that eters for West and East Malaysia are shown in Table 2, Tables S6

originated from the western provenance tested against the and S7. A total of 157 and 340 alleles were observed in West and

database of East Malaysia. Those samples that originated from East Malaysia, respectively. Overall, WGb32 was the most

the western provenance showed 100% non-conformity. Overall, the discriminating locus at all levels, the PD values being 0.969 (West

observed types I and II errors resulting from the simulation in Malaysia) and 0.984 (East Malaysia).

population identification database showed good concordance About 18.8% and 75.0% of the STR loci were found to depart

within the predicted 5.0% threshold. significantly from HWE in West and East Malaysia respectively

after employing Bonferroni corrections. To rule out the possibility

3.3. Individual identification that genotyping anomalies or the presence of null alleles were

responsible for the high proportion of STRs that exhibited

3.3.1. Reproducibility of the STR markers deviation from HWE, we tested each subpopulation independently.

All sixteen STR markers gave consistent genotypes for the five The observed deviations from HWE in these subpopulations was

independent PCR amplifications. Base pair size differences substantially lower than for the entire cluster, just six and two loci

between maximum and minimum fragment lengths were less for East Malaysia I and East Malaysia II, respectively. This suggests

than 0.5. significant population substructuring within East Malaysia and

provides some indication that deviation from HWE was due to the

3.3.2. Population structure and genetic clustering Wahlund effect rather than genotyping inconsistencies or STR

The model-based program STRUCTURE, which applied a issues (i.e. null alleles). The LE tests showed that 10.0% (West

Bayesian clustering algorithm without prior population informa- Malaysia) and 94.2% (East Malaysia) of the 120 pairwise loci were

tion, produced the highest likelihood scores when the number of significant after Bonferroni correction for multiple comparisons.

Fig. 3. Geographic distributions of the 12 chloroplast DNA haplotypes of Gonystylus bancanus shown in different colours. The total pie chart size reflects sample size (n) at each location.

204 K.K.S. Ng et al. / Forensic Science International: Genetics 23 (2016) 197–209

Fig. 4. Bayesian inferences of the number of clusters, K estimated by STRUCTURE among individuals for Gonystylus bancanus in Malaysia. (A) Two clusters (K = 2) of genotypes

were identified. (B) Population substructuring (K = 2) was observed within each of the larger regions (West and East Malaysia). (C) A neighbour-joining tree indicating the

relationship between the 17 populations of G. bancanus in Malaysia. For population abbreviations, see Table 1.

This might indicate that the genotypes were not independent and values, the highest value was observed in East Malaysia (0.091),

some of the loci might be linked. However, deviations from the followed by East Malaysia II (0.067). The estimated u and f values

independence test could also be due to population substructuring were subsequently used to correct the calculations of the random

and inbreeding. match probability, as most of the STR loci in the databases showed

The u and f values calculated for each of the hierarchical levels deviation from HWE and LE.

are listed in Table 3. Bootstrapping analysis showed that all the

calculated u values were significantly greater than zero, as the 95% 3.3.4. Conservativeness of the individual identification databases

confidence interval did not overlap with zero. Similarly, all f values The conservativeness of the West and East Malaysia databases

were also significantly greater than zero except for West Malaysia from the full profile frequency for each of the 245 (out of 246) and

II (0.019), which was mainly due to the smaller sample size 33 (out of 34) individuals from West Malaysia I and West Malaysia

compared to the other groups. The highest value of u was observed II respectively was calculated based on the cognate database

for Malaysia (0.121), followed by West Malaysia (0.067). For f (Porigin) against the profile frequency calculated using the West

K.K.S. Ng et al. / Forensic Science International: Genetics 23 (2016) 197–209 205

Table 2

a

Genetic parameters of Gonystylus bancanus databases in West and East Malaysia for the 16 STR loci.

Gba028 Gba092 Gba108 Gba129 Gba147 Gba348 Gba430 WGb06 WGb17 WGb22 WGb24 WGb29 WGb32 WGb37 WGb38 WGb39

West Malaysia

A 14 10 15 7 22 10 12 2 13 9 9 5 15 4 3 7

Ho 0.779 0.757 0.721 0.625 0.621 0.757 0.667 0.004 0.796 0.646 0.696 0.525 0.792 0.521 0.479 0.404

He 0.789 0.798 0.758 0.651 0.634 0.776 0.696 0.004 0.827 0.732 0.728 0.560 0.852 0.528 0.462 0.450

PIC 0.761 0.772 0.727 0.582 0.604 0.742 0.672 0.004 0.805 0.691 0.684 0.502 0.835 0.452 0.392 0.410

b b b

HWE 0.016 0.799 0.058 0.650 0.708 0.047 0.206 1.000 0.510 0.000 0.644 0.422 0.000 0.157 0.313 0.002

MP 0.076 0.067 0.089 0.121 0.166 0.082 0.117 0.993 0.051 0.112 0.113 0.247 0.031 0.306 0.356 0.347

PD 0.924 0.933 0.911 0.879 0.834 0.918 0.883 0.007 0.949 0.888 0.887 0.753 0.969 0.694 0.644 0.653

East Malaysia

A 35 14 35 26 38 17 20 12 23 15 18 27 28 10 11 11

Ho 0.852 0.680 0.839 0.819 0.857 0.748 0.789 0.503 0.769 0.579 0.763 0.737 0.855 0.738 0.662 0.680

He 0.914 0.734 0.910 0.882 0.900 0.844 0.884 0.545 0.869 0.717 0.844 0.857 0.914 0.770 0.757 0.724

PIC 0.907 0.699 0.903 0.871 0.892 0.824 0.872 0.502 0.855 0.688 0.827 0.846 0.907 0.732 0.719 0.676

b b b b b b b b b b b b

HWE 0.000 0.036 0.000 0.000 0.005 0.000 0.000 0.028 0.000 0.000 0.000 0.000 0.000 0.131 0.000 0.000

MP 0.016 0.103 0.019 0.026 0.020 0.045 0.027 0.249 0.032 0.124 0.041 0.042 0.016 0.089 0.100 0.124

PD 0.984 0.897 0.981 0.974 0.980 0.955 0.973 0.751 0.968 0.876 0.959 0.958 0.984 0.911 0.900 0.876

a

A: Total number of alleles; Ho: Observed heterozygosity; He: Expected heterozygosity; PIC: Polymorphic information content; HWE: Hardy-Weinberg equilibrium; MP:

Match probability; PD: Power of discrimination.

b

Deviation from Hardy-Weinberg equilibrium (probability test) was tested after Bonferroni correction (P < 0.05/16 = 0.003).

Table 3

Coancestry coefficients (u) and inbreeding coefficients (f) of Gonystylus bancanus

calculated according to hierarchical levels. Probabilities of the mean u and f were

determined using bootstrap analysis (1000 replications) with a 95% confidence

interval.

A. Conserva tiveness of West Malaysia Database

Hierarchical level Coancestry coefficient (u) Inbreeding coefficient (f) 6

Mean u lower upper Mean f lower upper

4

Malaysia 0.121 0.106 0.137 0.027 0.020 0.034

(N = 747)

2

West Malaysia 0.067 0.050 0.090 0.044 0.029 0.062

(N = 280)

0

East Malaysia 0.065 0.055 0.077 0.091 0.074 0.110

-20 -15 -10 -5 0

(N = 467)

d -2

West Malaysia II group 0.037 0.003 0.079 0.046 0.019 0.120

(N = 34)

West Malaysia I group 0.036 0.018 0.055 0.020 0.005 0.036 -4

(N = 246)

East Malaysia II group 0.073 0.056 0.091 0.067 0.050 0.086 -6

(N = 192)

East Malaysia I group 0.018 0.015 0.021 0.058 0.044 0.074 -8

(N = 275)

N is the number of samples Log (Profile frequency of West Malaysia I and II) -10

West Malaysia I West Malaysia II

Malaysia database (Pcombined). Likewise, the full profile frequency

for each of the 256 (out of 275) and 183 (out of 192) individuals

from East Malaysia I and East Malaysia II respectively was

B. Conserva tiveness of East Malaysia Database

calculated based on the cognate database (Porigin) against the 4

profile frequency calculated using the East Malaysia database

(Pcombined). The conservativeness of the West and East Malaysia 2

databases was tested by estimating the parameter d for every DNA

pro le in the database using the subpopulation-cum-inbreeding 0

u

model (Fig. 5). When the was set as 0.067, 245 tests performed on -30 -25 -20 -15 -10 -5 0

West Malaysia I showed negative d values (mean d = 1.032

d -2

0.329), indicating that the West Malaysia database was

conservative. Yet, 33 tests performed on West Malaysia II showed

-4

non-conservativeness, with positive d values (mean

d = 2.675 0.779). Accordingly, when the u was set as 0.065 for

the East Malaysia database, all d values were negative (mean -6

d = 3.382 0.876) and this implied that the database was

conservative when tested on East Malaysia I. However, 183 tests Log (Profile frequency of East Malaysia I and II) -8

performed on East Malaysia II showed non-conservativeness, with East Malaysia I East Malaysia II

positive d values (mean d = 2.456 0.517). Notably, the conserva-

Fig. 5. The conservativeness of (A) West and (B) East Malaysia databases tested by

tiveness tests showed that the West and East Malaysia databases

estimating the parameter d for every DNA profile frequency in the database using

were non-conservative when they were used on West Malaysia II

the subpopulation-cum-inbreeding model. X-axis: log (Profile frequency) and Y-

and East Malaysia II.

axis: mean number of differences between origin and combined database (d).

206 K.K.S. Ng et al. / Forensic Science International: Genetics 23 (2016) 197–209

For the series of u adjustments, the estimated mean profile Malaysia (ranging from 99.27 to 99.95%) with an average of 99.60%.

frequency based on the subpopulation-cum-inbreeding model, At the regional scale, both West and East Malaysia regions could

mean differences between origin and combined database (d) and achieve 100% correct individual assignments (Table 5), confirming

proportion of d < 0 are shown in Table 4 and Fig. S3. Generally, in coherence to the STRUCTURE clusters (see Fig. 4a).

cases of non-conservativeness, profile frequencies derived from

the combined databases tend to be lower than those derived from 4. Discussion

cognate population database. For example, for West Malaysia II,

12 9

mean Pcombined was 2.878 10 and Porigin was 1.465 10 . On For forensic timber identification or authenticity testing, the

the other hand, for West Malaysia I, mean Pcombined was establishment of a comprehensive genetic identification database

12 13

1.343 10 and Porigin was 2.362 10 , implying that the is necessary to facilitate the identification of a suspicious sample to

Pcombined was conservative. reveal its species, geographic origin and individual level identifi-

As the value of u used in the estimation increased, the mean cation. Hence, this study strongly demonstrates how DNA-based

profile frequency increased and d decreased. For example, in East techniques can serve as an important authenticity tool to identify

Malaysia II, when u was iterated from 0.000 to 0.200, the mean species, trace the source of origin and verify the legality of

21 12

Pcombined increased from 1.675 10 to 1.159 10 and mean d unknown samples.

decreased gradually from 7.626 0.960 to 2.282 0.723. This An ideal species barcode should hold adequate variation among

adjustment of u was observed to improve the conservativeness of the sequences to discriminate species; yet, it also needs to be

the database. To obtain 100% negative d value in both databases, adequately conserved so that there is less variability within a

the u readjustment needed to be increased to 0.30 and 0.20 in the species than between species [18]. In the present study, a species

West and East Malaysia databases respectively. It is standard identification database was established using 20 out of the 30

practice to adjust profile frequency to take into account both known Gonystylus species. The markers trnH-psbA, trnL and ITS2,

sampling error and population substructuring effects as the can all be sequenced easily from leaf or inner bark (cambium)

adjustments will increase the profile frequency with the intention tissue for the 20 Gonystylus species. Using the 20 Gonystylus species

of under, rather than overstating the weight of the DNA evidence dataset, with concatenated sequences of the three markers, correct

against the defendant [60]. In this case, with closer inspection of matching between barcodes and taxonomic species was achieved

these u adjustments, the mean profile frequencies for West with a resolution of 90% (18 out of the 20 species). However, the

12

Malaysia II and East Malaysia II increased from 2.878 10 to ITS2 region had a higher success rate for discriminating between

8 17 12

6.737 10 and from 2.536 10 to 1.159 10 , respectively, closely related species compared to trnH-psbA and trnL. Recently,

and was therefore demonstrably conservative. several researchers have demonstrated the potential of using ITS2

for taxonomic classification and phylogenetic reconstruction at

3.3.5. Individual assignment test both the genus and species levels for eukaryotes due to its variation

In order to correctly assign a source of origin to each G. bancanus in primary sequence and secondary structures in a way that

individual within the population, group or region reference correlates highly with taxonomic classification [64–66]. The ITS2

databases, independent self-assignment (leave-one-out) tests sequence length for Gonystylus species is about 341 bp, suggesting

were performed on all individuals in the reference database using that it is sufficiently short to allow PCR amplification of even

all available loci. The average percentage of correctly assigned degraded DNA (unpublished data). This thus provides useful

individuals at each population was 54.80% (Table 5), with 0% at biological information for alignment and can be considered as a

Resak and Kedondong due to their limited sample sizes of 10 and 4, molecular character for Gonystylus species identification.

respectively. The percentage of correctly assigned individuals was Ideally, a population identification database should be able to

higher for self-assignment of individuals belonging to the groups specifically partition unknown samples from any population or

within West Malaysia (ranging from 99.19 to 100%) and East forest reserve to the level of specific group. In this study, the

Table 4

Series of coancestry coefficients (u) adjustments applied on the West and East Malaysia databases of Gonystylus bancanus. Mean profile frequency values, mean differences d

and P (d < 0) were calculated for individuals derived from West Malaysia I and II as well as East Malaysia I and II using respective databases. Values in parentheses denote

standard deviations.

West Malaysia Database

West Malaysia I West Malaysia II

u Mean profile frequency Mean d P (d < 0) Mean profile frequency Mean d P (d < 0)

15 15

0.000 6.916 10 2.266 (0.605) 0.00 9.248 10 5.211 (1.179) 0.00

12 12

0.067 1.343 10 1.032 (0.329) 1.00 2.878 10 2.675 (0.779) 0.00

12 11

0.100 9.095 10 1.997 (0.461) 1.00 2.343 10 1.595 (0.764) 0.00

11 10

0.150 9.869 10 3.124 (0.615) 1.00 3.161 10 0.334 (0.780) 0.36

10 9

0.200 6.858 10 3.995 (0.728) 1.00 2.608 10 0.638 (0.810) 0.85

9 8

0.250 3.448 10 4.696 (0.817) 1.00 1.513 10 1.419 (0.843) 0.91

8 8

0.300 1.357 10 5.276 (0.888) 1.00 6.737 10 2.066 (0.875) 1.00

East Malaysia Database

East Malaysia I East Malaysia II

u Mean profile frequency Mean d P (d < 0) Mean profile frequency Mean d P (d < 0)

24 21

0.000 7.058 10 3.825 (1.221) 0.00 1.675 10 7.626 (0.960) 0.00

19 17

0.065 2.877 10 3.382 (0.876) 1.00 2.536 10 2.456 (0.517) 0.00

17 16

0.100 1.161 10 5.346 (0.974) 1.00 8.940 10 0.830 (0.566) 0.09

16 14

0.150 7.249 10 7.400 (1.090) 1.00 4.936 10 0.931 (0.651) 0.93

14 12

0.200 1.863 10 8.936 (1.177) 1.00 1.159 10 2.282 (0.723) 1.00

K.K.S. Ng et al. / Forensic Science International: Genetics 23 (2016) 197–209 207

Table 5

Self-assignment (leave-one-out) test on Gonystylus bancanus individuals based on population, genetic cluster and region implemented in GENECLASS2.

Region Group Population Correctly assigned individuals

By population By group By region

West Malaysia West Malaysia I Pekan 86.24% 99.19% 100.00%

Nenasi 33.34%

Resak 0.00%

Kedondong 0.00%

Air Hitam 78.57%

West Malaysia II Sg. Karang 50.00% 100.00%

Raja Musa 70.00%

East Malaysia East Malaysia I Sedilu 54.83% 99.27% 100.00%

Lingga 23.34%

Serapau-Lingga 36.67%

Maludam 44.45%

Manggut 56.60%

Betong 32.50%

Naman 83.63%

East Malaysia II Loagan Bunut 98.02% 99.95%

Kayangeran 93.10%

Klias 90.32%

Mean 54.80% 99.60% 100.00%

Type I error (alpha) = 0.05.

presence of a single prevalent chloroplast haplotype in all the suggests that the West and East Malaysia databases can be used

seven populations from West Malaysia was sufficient to partition independently for random match probability estimation within

West (Peninsular Malaysia) and East (Sabah and Sarawak) Malaysia their respective regions. For example, if a confiscated G. bancanus

into two regions. This clearly divided the 17 populations into two log is found to have originated from Pekan in West Malaysia, the

distinct clusters, corresponding to the two regions that are random match probability can be estimated using the West

separated by the South China Sea. The finding therefore clearly Malaysia database for evidence interpretation. Thus, with the

demonstrates that the population identification database of G. availability of the West and East Malaysia databases that cover the

bancanus could be used to identify individuals with unknown whole distribution range of G. bancanus, it is feasible to estimate

origin at the regional level. the random match probability of a suspected log to its original

Similarly, partitioning into West and East Malaysia at the stump.

highest hierarchical level of population structuring was observed The self-assignment tests showed a lower percentage of

based on a Bayesian model approach by STRUCTURE and the correctly assigned individuals to their population of origin. This

neighbour-joining cluster analysis. Nonetheless, the larger pop- suggests that the existing population designations by humans do

ulations were split further into two subpopulations each at K = 2. not reflect true population genetic boundaries. In contrast, self-

The STRUCTURE results did therefore hint at the presence of two assignment of individuals by group showed a higher percentage of

subclusters in each region. Limited pollen and seed dispersal correctly assigned individuals, suggesting that the grouping of

mechanisms might provide a reason for the observed within individuals by population genetic structure and genetic related-

population substructuring. Although the coancestry coefficient (u) ness analyses are more realistic, with the assumption of correlated

values calculated for West (0.065) and East (0.067) Malaysia were alleles among populations being due to gene flow.

significantly greater than zero, they were generally lower than the Recently, an ongoing investigation in Malaysia by the Selangor

u according to the 17 populations (0.121). This suggests that State Forest Department in collaboration with the Forest Research

populations within each region may have shared a common gene Institute Malaysia was able to identify the species of seized logs

pool or have experienced extensive gene flow in the past. and link them back to the stumps of illegally felled trees, with

Consequently, individual identification databases were established evidence obtained through DNA barcoding and an individual

separately according to these regions in order to ascertain accurate identification database developed for Neobalanocarpus heimii [16].

estimates of random match probabilities. Regions with inbreeding With this initial breakthrough, the use of forensic timber

coefficient (f) value significantly greater than zero indicate the identification approach to curb illegal logging in Malaysia is

presence of inbreeding. Empirical evidence reveals that indepen- beginning to gain confidence among local enforcement authorities.

dence within and among loci is violated in plants because most Consequently, with appropriate validation, the generated G.

populations tend to have substantial population substruc- bancanus databases from the current study could be used for

turing and inbreeding among individuals [16,67,68]. Similarly, in forensic timber identification of confiscated or unknown logs in at

the present study, most populations of G. bancanus had signifi- least three scenarios: (1) to identify the species of the unknown

cantly departed from the independence test. Hence, to evaluate log, (2) to authenticate the source of origin of an unknown sample,

DNA evidence that accounts for subpopulation and inbreeding and (3) to match an unknown log to its original stump. In brief, the

effects, the ‘subpopulation-cum-inbreeding model’ [58] was Gonystylus species identification database will be useful for

employed to correct for these effects although its use in forensic authenticating the species of the unknown logs. Subsequently,

casework may not always be conservative [41]. the population identification database would be useful to reveal

The conservativeness tests showed that the East and West the source of origin of an unknown G. bancanus sample from West

Malaysia databases were conservative when used on West or East Malaysia. Once the source region is determined, the

Malaysia I and II, as well as East Malaysia I and II. This further individual identification database can be applied with an

208 K.K.S. Ng et al. / Forensic Science International: Genetics 23 (2016) 197–209

assignment test using multilocus STR genotypes to assign the listings. Donors to the collaborative program include the EU, the

unknown sample to its source of origin (population, group or USA, Japan, Norway, New Zealand and Switzerland.

region).

An intact form of log or unprocessed wood is usually seized Appendix A. Supplementary data

from logging sites, while that is supplied in the trading

market is processed in the form of sawn and/or wood products. Supplementary data associated with this article can be found, in

Thus, the fundamental challenge for forensic timber identification the online version, at http://dx.doi.org/10.1016/j.

applications is the possibility of extracting DNA from dry wood. fsigen.2016.05.002.

Previously, several researchers have developed different DNA

extraction protocols for unprocessed and processed wood from G. References

bancanus [13,69] and other tree species [70–72]. These protocols

[1] R.T. Corlett, The Ecology of Tropical East Asia, 2nd edn., Oxford University

have a common goal to obtain sufficient yield and good quality

Press, UK, 2014.

DNA from wood for PCR amplification of short cpDNA regions. In

[2] M.P. Goncalves, M. Panjer, T.S. Greenberg, W.B. Magrath, Justice for Forests—

addition, heating diminishes the presence of DNA in wood tissue, Improving Criminal Justice Efforts, World Bank Study, 2012. http://siteresour-

ces.worldbank.org/EXTFINANCIALSECTOR/Resources/Illegal_Logging.pdf.

however a small amount of quality DNA can still be retrieved from

[3] Environmental Investigation Agency (EIA), Telapak, Profiting from Plunder,

heat-treated wood from 60 to 140 C [73]. Thus, the feasibility of

Environmental Investigation Agency, 2004. http://www.telapak.rog/wp-

retrieving good quality DNA from processed wood signifies the content/uploads/2013/10/profitingfromplunder.pdf.

[4] W. Smith, Undercutting sustainability: the global problem of illegal logging

potential for performing downstream analysis related to forensic

and trade, in: R.M. Ravenel, I.M.E. Granoff, C.A. Magee (Eds.), Illegal Logging in

timber identification. Recognizing this essential gap in G. bancanus,

the Tropics: Strategies for Cutting Crime, The Haworth Press, New York, 2004,

our current ongoing tasks are to determine a suitable DNA pp. 7–30.

extraction method for dry wood from stumps and logs of G. [5] M. Carr, Nothing to declare: DNA ngerprinting cracks down on illegal timber

harvest, Conserv. Mag. 8 (2007) 37.

bancanus and subsequently evaluate the extracted DNA with PCR

[6] A. Sandak, J. Sandak, M. Negri, Relationship between near-infrared (NIR)

amplifications on both cpDNA and nSTR markers based on a similar

spectra and the geographical provenance of timber, Wood Sci. Technol. 45

approach used in our previous study on N. heimii [72]. In addition, (2011) 35 48.

[7] M. Horacek, M. Jakusch, H. Krehan, Control of origin of wood:

for forensic application of the G. bancanus STRs, a full develop-

discrimination between European (Austrian) and Siberian origin by stable

mental validation study following the Scientific Working Group on

isotope analysis, Rapid Commun. Mass Spectrom. 23 (2009) 3688–3692.

DNA Analysis Methods (SWGDAM 2012) [74] is currently being [8] A. Kagawa, S.W. Leavitt, Stable carbon isotopes of tree rings as a tool to

pinpoint the geographic origin of timber, J.Wood Sci. 56 (2010) 175–183.

carried out to address the reliability of the STRs used.

[9] A. Kagawa, K. Kuroda, H. Abe, T. Fuji, Y. Itoh, Stable isotopes and inorganic

Law enforcement in accordance with CITES regulations is being

elements as potential indicators of geographic origin of Southeast Asian timber,

used to control the trade of threatened and endangered species, Proceedings of the International Symposium on Development of Improved

Methods to Identify Species Wood and Its Origin (2007) 39–44.

but illegal harvesting and laundering activities still persist. There

[10] I. Alvarez, J.F. Wendel, Ribosomal ITS sequences and plant phylogenetic

are no swift solutions to the problems faced by law enforcement to

inference, Mol. Phylogenet. Evol. 29 (2003) 417–434.

combat illegal logging as it is always a challenging task when [11] W.J. Kress, K.J. Wurdack, E.A. Zimmer, L.A. Weigt, D.H. Janzen, Use of DNA

barcodes to identify flowering plants, Proc. Nat. Acad. Sci. U. S. A. 102 (2005) dealing with many valuable threatened and endangered timber

8369–8374.

species. Timber is so hard to identify compared to other wildlife

[12] W. Dong, J. Liu, J. Yu, L. Wang, S. Zhou, Highly variable chloroplast markers for

products and there is such a high volume of legal trade that makes evaluating plant phylogeny at low taxonomic levels and for DNA barcoding,

it even more difficult to distinguish between the legal and illegal PLoS One 7 (2012) e35071.

[13] R. Ogden, H.N. McGough, R.S. Cowan, L. Chua, M. Groves, R. McEwing, SNP-

trade. Clearly, concerted effort on prevention and deterrence is

based method for the genetic identification of ramin Gonystylus spp. timber

urgently needed to halt this destructive practice before the species

and products: applied research meeting CITES enforcement needs, Endang.

become extinct. Increasing awareness and acceptance of DNA Species Res. 9 (2009) 255 261.

[14] M.F. Deguilloux, M.H. Pemonge, L. Bertel, A. Kremer, R.J. Petit, Checking the

technology applied to criminal forensics suggest that consumers

geographical origin of oak wood: molecular and statistical tools, Mol. Ecol. 12

and buyers recognize the capabilities of DNA testing, whilst at the

(2003) 1629–1636.

same time deterring illegal timber harvesting through successful [15] L.H. Tnah, S.L. Lee, K.K.S. Ng, N. Tani, S. Bhassu, R.Y. Othman, Geographical

traceability of an important tropical timber (Neobalanocarpus heimii) inferred

prosecutions. Thus, with appropriate validation, we are optimistic

from chloroplast DNA, For. Ecol. Manage. 258 (2009) 1918–1923.

that the genetic identification databases developed for G. bancanus

[16] L.H. Tnah, S.L. Lee, K.K.S. Ng, Q.Z. Faridah, H.I. Faridah, Forensic DNA-profiling

in this study can support forensic applications and help safeguard of tropical timber species in Peninsular Malaysia, For. Ecol. Manage. 259 (2010)

1436–1446.

this valuable timber species into the future.

[17] B. Degen, S.E. Ward, M.R. Lemes, C. Navarro, S. Cavers, A.M. Sebbenn, Verifying

the geographic origin of ( King) with DNA-

fingerprints, Forensic Sci. Int. Genet. 7 (2013) 55–62.

Acknowledgements [18] W.J. Kress, D.L. Erickson, DNA barcodes: methods and protocols, in: W.J. Kress,

D.L. Erickson (Eds.), DNA Barcodes, Humana Press, New York, 2012, pp. 3–8.

[19] S. Cavers, C. Navarro, A.J. Lowe, Chloroplast DNA phylogeography reveals

The authors would like to extend their appreciation to the staff

colonization history of a Neotropical tree, Cedrela odorata L., in Mesoamerica,

of the Forest Research Institute Malaysia (Ghazali Jaafar, Yahya Mol. Ecol. 12 (2003) 1451 1460.

[20] S. Piry, A. Alapetite, J.M. Cornuet, D. Paetkau, L. Baudouin, A. Estoup,

Marhani, Ramli Ponyoh, Rosman Ibrahim, Yasri Baya, Mariam Din

Geneclass2: a software for genetic assignment and first-generation migrant

and Sharifah Talib), the Sarawak Forestry Corporation (Rantai Jawa,

detection, J. Hered. 95 (2004) 536–539.

Tinjan Kuda, Army Kapi, Suliana Charles and Stephanie Fairchild) [21] B. Rannala, J.L. Mountain, Detecting immigration by using multilocus

genotypes, Proc. Nat. Acad. Sci. U. S. A. 94 (1997) 9197–9201.

and the Forest Research Centre, Sepilok (Richard Joseph, Azman

[22] J.M. Cornuet, S. Piry, G. Luikart, A. Estoup, M. Solihnac, New methods

Mahali, Rozaime Abd Rasid, Amir Hamzah and Men Segunting) for

employing multilocus genotypes to select or exclude populations as origins of

their assistance in the eld and laboratory and to the Pahang, individuals, Genetics 153 (1999) 1989–2000.

[23] R. Ogden, A. Linacre, Wildlife forensic science: a review of genetic geographic

Selangor, Johor, Sarawak and Sabah State Forest Departments for

origin assignment, Forensic Sci. Int. Genet. 18 (2015) 152–159.

granting permission to work in the forests under their tenure. This

[24] C.S. Tawan, Thymelaeaceae, in: E. Soepadmo, L.G. , R.C.K. Chung (Eds.), Tree

study was primarily funded by the Ministry of Plantation Flora of Sabah and Sarawak, vol. 5, Forest Research Institute Malaysia, Kuala

Lumpur, 2004, pp. 433–484.

Industries and Commodities Malaysia under the Timber Levy

[25] I. Soerianegara, R.H.M.J. Lemmens, Plant resources of South-East Asia 5(1),

Fund of Peninsular Malaysia and made possible by grants from the

‘Timber Trees: Major Commercial Timbers’, Pudoc Scientific Publishers,

ITTO to the Sarawak Forestry Corporation under its collaborative Wageningen, 1993.

program with CITES to build capacity for implementing timber [26] IUCN Red List Categories and Criteria version 2.3. IUCN, Gland, 1994.

K.K.S. Ng et al. / Forensic Science International: Genetics 23 (2016) 197–209 209

[27] International Tropical Timber Organisation (ITTO), in: H.K. H. Aminah, L.S.L. [51] S.D.E. Park, The Excel microsatellite toolkit. Website http://animalgenomics.

Chen, K.C. Khoo. Chua (Eds.), Report of the ITTO Expert Meeting on the ucd.ie/sdepark/ms-toolkit/, 2001 (accessed August 2011).

Effective Implementation of the Inclusion of Ramin (Gonystylus spp.) in [52] P.O. Lewis, D. Zaykin, Genetic data analysis-gda (version 1.0 d16c): a computer

Appendix II of CITES, Kuala Lumpur, Malaysia, 2007. program for the analysis of allelic data. http://lewis.eeb.uconn.edu/lewis-

[28] M. Murray, W.F. Thompson, Rapid isolation of high molecular weight plant home/software.html, 2001.

DNA, Nuc. Acids Res. 8 (1980) 4321–4325. [53] Powerstats (version 1.2). Promega corporation website. http://www.promega.

[29] R.C. Edger, MUSCLE: multiple sequence alignment with high accuracy and high com/geneticidtools/powerstats/.

throughput, Nucleic Acids Res. 32 (2004) 1792–1797. [54] W. Goodwin, A. Linacre, S. Hadi, An Introduction to Forensic Genetics, John

[30] K. Tamura, G. Stecher, D. Peterson, A. Filipski, S. Kumar, MEGA6: molecular Wiley and Sons Ltd., West Sussex, 2007.

evolutionary genetics analysis version 6.0, Mol. Biol. Evol. 30 (2013) [55] M. Raymond, F. Rousset, Genepop (version 3.4): population genetics software

2725–2729. for exact tests and ecumenicism, J. Hered. 86 (1995) 248–249.

[31] R. Meier, G.Y. Zhang, F. Ali, The use of mean instead of smallest interspecific [56] W.R. Rice, Analysing tables of statistical tests, Evolution 43 (1989) 223–225.

distances exaggerates the size of the barcoding gap and leads to [57] National Research Council Report (NRC II), The Evaluation of Forensic DNA

misidentification, Syst. Biol. 57 (2008) 809–813. Evidence, National Academy Press, Washington, DC, 1996.

[32] CBOL Plant Working Group, A DNA barcode for land plants, Proc. Natl. Acad. Sci. [58] K.L. Ayres, A.D.L. Overall, Allowing for within-subpopulation inbreeding in

U. S. A. 106 (2009) 12794–12797. forensic match probabilities, Forensic Sci. Int. 103 (1999) 207–216.

[33] B. Heinze, A database of PCR primers for the chloroplast genomes of higher [59] J.M. Butler, Forensic DNA Typing: Biology, Technology and Genetics of STR

plants, Plant Methods 3 (2007) 4. Marker, 2nd edn, Elsevier Academic Press, Burlington, Massachusetts, 2005.

[34] S. Dumolin-Lapeque, B. Demesure, S. Fineschi, V. Le Corre, R.J. Petit, [60] P. Gill, L.A. Foreman, J.S. Buckleton, C.M. Triggs, H. Allen, A comparison of

Phylogeographic structure of white throughout the European continent, adjustment methods to test the robustness of a STR database comprised of 24

Genetics 146 (1997) 1475–1487. European populations, Forensic Sci. Int. 131 (2003) 184–196.

[35] J.J. Doyle, J.I. Davis, R.J. Soreng, D. Gavin, M.J. Anderson, Chloroplast DNA [61] D.E. Krane, R.W. Allen, S.A. Sawyer, D.A. Petrov, D.L. Hartl, Genetic differences

inversions and the origin of the grass family (Poaceae), Proc. Natl. Acad. Sci. U. at four DNA typing loci in Finnish Italian, and mixed Caucasian populations,

S. A. 89 (1992) 7722–7726. Proc. Nat. Acad. Sci. U. S. A. 89 (1992) 10583–10587.

[36] B. Demesure, N. Sodzi, R.J. Petit, A set of universal primers for amplification of [62] D.J. Balding, R.A. Nichols, DNA profile match probability calculations: how to

polymorphic non-coding regions of mitochondrial and chloroplast DNA in allow for population stratification, relatedness, database selection and single

plants, Mol. Ecol. 4 (1995) 129–131. bands, Forensic Sci. Int. 64 (1994) 125–140.

[37] B. Heinze, PCR-based chloroplast DNA assays for the identification of native [63] D.J. Balding, R.A. Nichols, A method for quantifying differentiation between

PopulusnigraandintroducedpoplarhybridsinEurope,For.Genet.5(1998)31–38. populations at multi-allelic loci and its implications for investigating identity

[38] M.J.M. Smulder, B. van’T Westende, G.D. Diway, P.J. van Der Meer, W.J.M. and paternity, in: B. Weir (Ed.), Human Identification: the Use of DNA Markers,

Koopman, Development of microsatellite markers in Gonystylus bancanus Kluwer Acad, Dordrecht Netherlands, 1995.

(Ramin) useful for tracing and tracking of wood of this protected species, Mol. [64] J. Schultz, S. Maisel, D. Gerlach, T. Muller, M. Wolf, A common core of secondary

Ecol. Resour. 8 (2008) 168–171. structure of the internal transcribed spacer 2 (ITS2) throughout the eukaryota,

[39] K.K.S. Ng, S.L. Lee, C.H. Ng, L.H. Tnah, C.T. Lee, N. Tani, Microsatellite markers of RNA 11 (2005) 361–364.

Gonystylus bancanus (Thymelaeaceae) for population genetic studies and DNA [65] T. Muller, N. Philippi, T. Dandekar, J. Schultz, M. Wolf, Distinguishing species,

fingerprinting, Conserv. Genet. Resour. 1 (2009) 153–157. RNA 13 (2007) 1469–1472.

[40] N.F. Zakaria, Population Genetics Study of Gonystylus bancanus (Ramin [66] J. Schultz, M. Wolf, ITS2 sequence-structure analysis in phylogenetics: a how-

Melawis) Using Microsatellite Markers, Universiti Kebangsaan Malaysia, to manual for molecular systematics, Mol. Phylogenet. Evol. 52 (2009)

Malaysia, 2014 (MSc. Thesis). 520–523.

[41] N. Dawnay, R. Ogden, R.S. Thorpe, L.C. Pope, D.A. Dawson, R. McEwing, A [67] J.L. Hamrick, M.J.W. Godt, S.L. Sherman-Broyles, Factors influencing levels of

forensic STR profiling system for the Eurasian badger: a framework for genetic diversity in woody plant species, New For. 6 (1992) 95–124.

developing profiling systems for wildlife species, Forensic Sci. Int. Genet. 2 [68] L. Waits, G. Luikart, P. Taberlet, Estimating the probability of identity among

(2008) 47–53. genotypes in natural populations: cautions and guidelines, Mol. Ecol.10 (2001)

[42] J.K. Pritchard, M. Stephens, P. Donnelly, Inference of population structure using 249–256.

multilocus genotype data, Genetics 155 (2000) 945–959. [69] M.J. Asif, C.H. Cannon, DNA extraction from processed wood: a case study for

[43] D. Falush, M. Stephens, J.K. Pritchard, Inference of population structure: the identification of an endangered timber species (Gonystylus bancanus),

extensions to linked loci and correlated allele frequencies, Genetics 164 (2003) Plant Mol. Biol. Rep. 23 (2005) 1–8.

1567–1587. [70] M.F. Deguilloux, M.H. Pemonge, R.J. Petit, Novel perspectives in wood

[44] G. Evanno, S. Reqnaut, J. Goudet, Detecting the number of clusters of certification and forensics: dry wood as a source of DNA, Proc. R. Soc. Lond. Ser.

individuals using the software structure: a simulation study, Mol. Ecol. 14 B-Biol. Sci. 269 (2002) 1039–1046.

(2005) 2611–2620. [71] Y. Rachmayanti, L. Leinemann, O. Gailing, R. Finkeldey, Extraction:

[45] D.A. Earl, B.M. vonHoldt, Structure harvester: a website and program for amplification and characterization of wood DNA from Dipterocarpaceae, Plant

visualizing structure output and implementing the Evanno method, Conserv. Mol. Biol. Rep. 24 (2006) 45–55.

Genet. Resour. 4 (2012) 359–361. [72] L.H. Tnah, S.L. Lee, K.K.S. Ng, S. Bhassu, R.Y. Othman, DNA extraction from dry

[46] N.A. Rosenberg, distruct: a program for the graphical display of population wood of Neobalanocarpus heimii (Dipterocarpaceae) for forensic DNA profiling

structure, Mol. Ecol. Notes 4 (2004) 137–138. and timber tracking, Wood Sci. Technol. 48 (2012) 814–825.

[47] L.L. Cavalli-Sforza, A.W.F. Edwards, Phylogenetic analysis: models and [73] A. Yoshida, Extraction and Detection of DNA from Wood for Species

estimation procedures, Am. J. Hum. Genet. 19 (1967) 233–257. Identification. In: Proceedings of the International Symposium on

[48] N. Saitou, M. Nei, The neighbour-joining method: a new method for Development of Improved Methods to Identify Shorea Species Wood and Its

reconstructing phylogenetic trees, Mol. Biol. Evol. 11 (1987) 553–570. Origin, Forestry and Forest Products Research Institute, Ibaraki, Japan, 2007,

[49] N. Takezaki, M. Nei, Genetic distances and reconstruction of phylogenetic trees pp. 27–34.

from microsatellite DNA, Genetics 144 (1996) 389–399. [74] SWGDAM, Validation guidelines for DNA analysis methods. Available at http://

[50] K. Liu, S.V. Muse, Powermarker: integrated analysis environment for genetic media.wix.com/ugd/4344b0_cbc27d16dcb64fd88cb36ab2a2a25e4c.pdf, 2012

marker data, Bioinformatics 21 (2005) 2128–2129. (accessed 29.03.16).