COPYRIGHT AND CITATION CONSIDERATIONS FOR THIS THESIS/ DISSERTATION

o Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

o NonCommercial — You may not use the material for commercial purposes.

o ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

How to cite this thesis

Surname, Initial(s). (2012) Title of the thesis or dissertation. PhD. (Chemistry)/ M.Sc. (Physics)/ M.A. (Philosophy)/M.Com. (Finance) etc. [Unpublished]: University of Johannesburg. Retrieved from: https://ujdigispace.uj.ac.za (Accessed: Date). `

The assessment of DNA barcoding as an identification tool for traded and protected in : Mozambican commercial timber species as a case study

By

Ronny Mukala Kabongo

Dissertation presented in the fulfilment of the requirements for the degree

MAGISTER SCIENTIAE

in

BOTANY

in the

FACULTY OF SCIENCE

DEPARTMENT OF BOTANY AND BIOTECHNOLOGY

at the

UNIVERSITY OF JOHANNESBURG SUPERVISOR: PROF MICHELLE VAN DER BANK CO-SUPERVISOR: DR. OLIVIER MAURIN January 2014 I hereby declare that this dissertation has been composed by me and work contained within unless stated otherwise, is my own.

Signed: Ronny Mukala Kabongo

Date: 30 January 2013

Table of Contents

Table of Contents ...... iii Abstract ...... vi Acknowledgements ...... vii List of abbreviations ...... viii Chapter 1 ...... 1 1. General introduction and objectives ...... 1 1.1 Illegal logging and the global timber market ...... 1

1.2 Effects of Illegal logging ...... 2

1.3 The fight against illegal logging ...... 5

1.4 Convention on International Trade in Endangered Species ...... 6

1.5 Other technical means of timber identification ...... 8

1.6 Scale and effectiveness of global response ...... 10

1.7 Molecular genetics tool for wood identification ...... 12

1.8 DNA extraction from wood materials ...... 13

1.9 DNA Barcoding ...... 16

1.10 Diagnostic markers for land ...... 18

1.11 -BOL Africa initiative ...... 21

1.12 Study site: Mozambican forest ...... 23

1.13 State of Mozambican Forests ...... 26

1.14 Research objectives ...... 28

Chapter 2 ...... 30 2. Materials and Methods ...... 30

Page | iii

2.1 Specimen collection and reference samples...... 30

2.2 DNA extraction and sequencing...... 31

2.3 Sequence editing, alignment and broad analysis...... 33

2.4 Assessment of core DNA barcodes identification efficiency...... 34

2.5 Phylogenetic analysis ...... 36

2.6 Correspondence of query sequences to the database...... 37

Chapter 3 ...... 58 3. Results ...... 58 3.1 Summary statistics ...... 58

3.2 Genetic divergence and barcode gap analyses ...... 59

3.3 DNA barcode identification success rates using distance-based analysis ...... 60

3.4 Cumulative error and threshold optimization ...... 61

3.5 DNA barcode identification success rates using tree-based analysis ...... 62

3.6 DNA barcode query assignment ...... 63

Chapter 4 ...... 100 4. Discussion ...... 100 4.1 Development of DNA barcode reference library ...... 100

4.2 Genetic divergence and implication for identification ...... 102

4.3 Identification success rates ...... 105

4.4 DNA barcoding: practical considerations ...... 107

Chapter 5 ...... 110 5. Conclusion ...... 110 6. References ...... 115 7. Supplementary information ...... 131

Page | iv

Figure 7-1 Specimen illustrations as submitted on BOLD Systems database, an additional scan of the herbarium voucher specimen is available for every specimen sampled...... 138

...... 138

Page | v

Abstract

Global efforts to protect the world’s forests from unsustainable and inequitable exploitation have been undermined in recent years by rampant illegal logging in many timber-producing countries. A prerequisite for efficient control and seizure of illegally harvested forest product is a rapid, accurate and tamper proof method of species identification. DNA barcoding is one such a tool, relatively simple to apply. It is acknowledged to bring about accuracy and efficiency in species identification. In this study a DNA barcode reference library for traded and protected tree species of southern

Africa was developed comprising of 81 species and 48 genera. Four primary analyses were conducted to assess the suitability of the core barcodes as a species identification tool using the R package Spider 1.2-0. Lastly, to evaluate this identification tool, query specimens independently sampled at a Mozambican logging concession were identified using DNA barcoding techniques. The nearest neighbour (k-NN) and best close match

(BCM) distance based parameter yielded 90% and 85% identification success rate using the core plant barcodes respectively. DNA barcoding identification of query specimens maintained a constant 83% accuracy over the single marker dataset and the combined dataset. This database can serve as a backbone to a control mechanism based on DNA techniques for species identification and also advance the ability of relevant authorities to rapidly identify species of timber at entry and exit points between countries with simple, fast, and accurate DNA techniques.

Page | vi

Acknowledgements

The work presented in this document would not have been possible without the contribution of a small group of people; therefore, I would like to take this opportunity to acknowledge them. First and foremost I would like to express my appreciation to my research supervisor prof. Michelle Van der Bank for her constant guidance and enthusiastic encouragement. Your kind, patient and enthusiastic nature is admirable and they are life skills I wish to take with me. Secondly, I wish to sincerely thank my co- supervisor Dr. Olivier Maurin for his contribution and assistance during the planning and development of this research project. Your constant willingness to give your time so generously has been very much appreciated. I am further indebted to my colleagues both past and present at the African Centre for DNA Barcoding. Your support and motivation kept me going. I would also like to thank my family who have been by my throughout my studies.

Lastly I thank the Government of Canada through Genome Canada and the

Ontario Genomics Institute (2008-OGI-ICI-03), the International Development

Research Centre (IDRC), Canada and the University of Johannesburg for financial support.

Page | vii

List of abbreviations

°C = Degree Celsius ABI = Applied Biosystems, Inc. ACDB = African Centre for DNA Barcoding BL = Bayesian Likelihood BLAST = Basic Sequence Alignment Search Tool BOLD = Barcode of Life Datasystems BP = Bootstrap Percentages CBOL = Consortium for the Barcode of Life CCDB = Canadian Centre for DNA Barcoding CITES = Convention on International Trade in Endangered Species CTAB = Hexadecyltrimethylammonium Bromide DNA = Deoxyribonucleic Acid F = Forward Primer FSC Forest Stewardship Council g = Gram GenBank (NCBI) = National Centre for Biotechnology Information Inc = Incorporated JRAU = Herbarium of the University of Johannesburg (UJ) matK = Maturase K min = Minute(s) MUSCLE = Multiple Accurate and Fast Sequence Comparison by Log-Expectation NJ = Neighbour Joining No = Number NRF = National Research Foundation PAUP = Phylogenetic Analysis Using Parsimony Software Program PCR = Polymerase Chain Reaction PP = Posterior Probabilities PVP = Polyvinyl Pyrolidone R = Reverse Primer rbcLa = Subunit ‘a’ of Ribulose-Bisphosphate Carboxylase sec = Second TBR = Tree-Bisection-Reconnection psbA-trnH = Spacer between trnH and psbA Genes UJ = University of Johannesburg USFDA = U.S. Food and Drug Administration

Page | viii

Chapter 1

1. General introduction and objectives

1.1 Illegal logging and the global timber market

Global efforts to protect the world’s forests from unsustainable and inequitable exploitation have been undermined in recent years by rampant illegal logging in many timber-producing countries. According to Seneca Creek Associates, Wood Resource

International (2004), illegal logging occurs in the instance of theft, harvesting without the required approvals, purchase, sale, acquisition of harvesting or trading authorisations through corruption, the trade or processing of timber in breach of national and international laws. The immediate impacts of illegal logging include loss of biodiversity, soil erosion, water pollution, forest fires, flash flooding and landslides

(Lawson and MacFaul 2010). Furthermore, illegal logging also threatens the livelihoods of millions of forest dependent people and starves cash-strapped governments of billions of dollars in revenue. It undermines the rule of law, promotes forest corruption, plus creates and fuels armed conflict in developing nations (Global Witness 2002).

To date the extent and size of illegal logging remains debatable, however, it is universally accepted that some form of illegal logging takes place in all producing countries. According to most forest associations, illegal logging represents 10% of the total global harvest (American Forest & Paper Association 2009). This estimation is based on documented incidents, despite the fact that most illegal logging activities go

Page | 1

unnoticed. On the other hand, estimations of the state of illegal logging by environmental non-government organizations (NGOs) are that the scale of illegal logging is as high as 80% in some producing nation (Centre for International Economics

2009) with the majority of estimates within these extremities. According to Chatham

House (2007) just under half of tropical logs, sawn timber and plywood traded worldwide in 2004, were believed to have been illegally sourced.

In 2006 the World Bank considered that significant proportions of timber productions in Asia, central Africa, South America, and Russia are illegal. According to the same source, on average over 50% of timber produced in Africa is illegal. Brack

(2007) confirmed that half of the logging operations taking place in countries judged to be at high risks of illegal logging are actually illegal. The same source further estimated that illegal logging accounts for a US$15 billion (R165 billion) revenue loss for global producing nations per annum based on the approximation that illegal logging represented 10% of the total timber traded worldwide.

1.2 Effects of Illegal logging

In recent years, climate change and environmental problems have come into view and gained global recognition. Numerous studies highlighting the contribution of deforestation to greenhouse gas emissions, biodiversity losses, and soil and water degradations have been published (Global Environment Facility 2009, UNCCD 2012).

However, establishing the impact of illegal logging on these require assessing the contribution of illegal logging to land deforestation.

Page | 2

Developing countries rich in raw materials are prone to deforestation. Forest destruction and clearing currently stand as the biggest threat to global biodiversity

(Finkeldey et al. 2007). Forest degradation around the world ranges between 14 and 16 million hectares per annum, and the majority of these crimes are persecuted in biodiversity rich hot-spots tropical forests. The total contribution of deforestation to global warming is between 20 and 25% of CO2 emissions. Although numerous factors contribute to deforestation (agricultural expansion, settlement or infrastructure, mining exploration, un-control burning, and charcoal exploration) (Zahnen 2008, Food and

Agriculture Organization 2007), a third of global deforestation was due to both legal and illegal logging activities where, illegal logging account for more than 50% of timber harvested (Brack 2003).

Other than environmental losses, deforestation and subsequent illegal logging activities could promote displacement of forest dependent communities from their livelihoods, cultural and spiritual values. It is however difficult to assess the social costs of illegal logging. According to the World Bank, forest dependent communities residing near or within tropical forests around the world amount to 350 million people, of which more or less 60 million are utterly dependent on forest products for their daily livelihood. According to Global Witness (2002), once rebel leader Charles Taylor of

Liberia famously exploited lucrative financial revenues of illegal logging to fuel his civil war, and abuse human rights in the process. Therefore, the reach of the impacts

Page | 3 illegal logging has on people is not limited to forest or nearby forest dweller but can spam across nations even continents.

As mentioned in the previous paragraphs illegal logging financially cost producing nations worldwide approximately US$15 billion. The Organization for

Economic Co-operation and Development (OECD) states that per year US$10 billion is lost in developing countries due to illegal logging. The remaining US$5 billion is lost due to the subsequent tax and royalty evasion according to the World Bank. The financial effects of illegal logging are also manifested in consumer countries. The timber market in consumer countries is unbalanced by the influx of illegally sourced timbers. Legally produced timber cannot compete (as certified timber cost up to 10% more than uncertified timber (Newsom et al. 2008) with cheaper yet of equal if not better quality illegally sourced timber. In 2008, the state of Pennsylvania in the United

State of America reported additional revenues due to the pricing of FSC-certified timber. Forest Stewardship Council certified timber refers to timber produced under environmentally appropriate, socially beneficial, and economically viable management

(FSC 2013). However, this accreditation comes at an added cost, FSC-certified buyers paid on average, $198 more per thousand board feet (mbf) for black cherry, further price differences of $138, $49 and $35 per mbf were observed for sugar maple, red oak, and red maple respectively (Newsom et al. 2008). Therefore, markets are easily seduced by illegal timber. Furthermore, illegally sourced logs are thought to depress world timber prices by as much as 16% (Seneca Creek Associates, Wood Resources

International 2004).

Page | 4

If one is to deflect from the direct financial losses of illegal logging and infer the cost of the illegal deforestation caused by such activities, the cost of loss of forest supplies such as food, construction materials, medicine, wood fuel, and shelter will have to be included. This further increases the total financial costs of illegal logging.

According to the OECD, the financial cost of the global damages committed by illegal loggers accumulates to approximately 150 billion Euros (R2 trillion) per year (Degen and Fladung 2007). The Food and Agriculture Organization (2005) has valued non- wood forest products at US$4.7 billion per year.

1.3 The fight against illegal logging

Recognizing the importance to stop illegal logging, NGOs, along with governments of producer and consumer countries, donors, communities, and wood producers have put into place several initiatives to combat this growing problem. For instance, numerous wood producing associations have developed policies binding members to insure authenticity of wood or timber traded (American Forest & Paper Association 2002, The

Confederation of European Paper Industries 2002). Governments have adopted the same approaches. The Forest Law Enforcement Governance and Trade (FLEGT) is such an example. The actions of FLEGT span across both the producer (commonly developing countries) and consumer countries (processing and first world nations) and aims to eliminate illegal timber trade within the European Union (Commission of the European

Communities 2003).

Page | 5

Greenpeace, Global Forest Watch and many more NGOs are instrumental in implementing sustainable forest management practices and raising awareness and due to their credibility as a third party; they have served as watchdogs to the timber trade industry (Smith 2002). Furthermore, numerous certification schemes have been developed adopting a variety of techniques. As mentioned the Forest Stewardship

Council (FSC) is one such a scheme; it has proven popular and is the most widely recognized certification program (Cauley et al. 2001).

Prior to the above mentioned comprehensive controls, enforcement activities against illegally sourced wood in producer and consumer countries were largely limited to species of timber listed under the Convention on International Trade in Endangered

Species (CITES). However, only four significant commercial timber species were listed under CITES, representing less than 0.5% of primary wood products in international trade (Keong 2006). Though, to date over 300 species are protect or considered for protection, only 23 significant commercial timber species with mention of “lookalike” species are listed (CITES 2002). Despite the increase of timber species on CITES appendix, it still represent less than 3% of primary wood traded, therefore the level of enforcement activities are relatively limited and the data set is far too small to allow conclusive effects.

1.4 Convention on International Trade in Endangered Species

In 1973 CITES was established as the international agreement tracking mechanism for endangered species. The aims of CITES is to protect endangered species from over-

Page | 6 exploitation by controlling their international trade. Species listed in CITES Appendix I,

II, III require the Management authority of the exporting countries’ satisfaction that the specimen was not obtained in contravention of the state’s laws for the protection of fauna and flora. Because the identification of plants is particularly complex as they are seldom traded as a whole or fresh specimen, CITES have design a guide to assist with identification, with most consumer countries provide some guidance and training to custom officials to assist them in enforcing CITES listing of timber species.

The CITES identification technique rely on a multi-step documentation cross checking procedure: verification of identification of declared wood on CITES permit, then determination of validity of CITES permit and lastly identification of wood that is not accompanied by a CITES permit (CITES 2002). The technical tools available to conduct such inspections are macroscopic and microscopic identification techniques.

Generally macroscopic analysis involves analysis of gross features (colour, density, odour, fluorescence, and burning splinter test), tools (10x flash magnifier), surface preparation and identification keys. Microscopic identification techniques for suspicious cargo of timber arriving at ports involve: sample collection, sectioning and staining followed by microscopic analysis and identification (CITES 2006). Enforcing officer rely on photographic guides of and bark of the tree species, field guides that may provide identification keys (dichotomous, cards and some computer assisted key) to the trees of a specific area, encyclopaedias of timber cross-sections and even xylarium collections to assist in the identification of trees.

Page | 7

As much as each of these tools provides some assistance to the user, they also have limitations. Border control officials are often left helpless in monitoring illegal trafficking of products derived from CITES-protected tree species. The latter are generally difficult or impossible to identify for several reasons: Firstly, traditional taxonomic procedures are hardly definitive due to the array of close lookalike to a given specimen, and secondly, but even more importantly the additional challenges of morphologically indistinguishable specimen due to removed diagnostic features amplify the difficulties to reach conclusive species identifications.

1.5 Other technical means of timber identification

To enhance the use of legally sourced timber and improve the sustainable use of forests proposed by recent international policies, developing technological tools with the ability to trace the chain of custody of wood or wood products from the producers to the consumers plus rapidly and accurately assert identification is imperative to success.

Numerous technologies (listed below) have been developed to counteract the various illegal activities associated with timber such as; logging nationally-protected species, avoidance of CITES restrictions, under-grading and misreporting harvest, under-valuing exports, and misclassification of species to avoid trade restrictions or higher taxes and many more (Brack et al. 2002).

 Microtaggant tracer: technology applies the use of paint and microscopes. It is

recommended by the US forest service for covert environment monitoring due to

the fast and ease of application plus high security level. However, a lack of

practicality in the field (such as, difficulty to read and impaired marker uptake)

Page | 8

and high installation costs for developing nations has limited the implementation

of the tool by producing nations.

 Chemical tracer paint: a cheaper alternative to microtaggant tracer, but lower

security level, limited accountability control and potential false reading due to

paint degradation.

 Bar-coded tags and scanners: provide good security, very easy to read and a

combination with the previous two is possible. However, tags may break or fall

off or be cut out. Metal staples use can damage milling equipment and vice

versa. The installation costs are moderate so is the security.

 Radio frequency identification (RF/ID) tags: highly accurate and reliable, the

transponder emits a signal for over eight years. The informative date is computer

linked. However, an applicator and reader amount to $3 000, with every

additional reader costing $800. Furthermore, additional costs such as $3-8 per

transponders and equipment maintenance are requires.

 Brand hammers: a very low cost and fairly practical identification tool.

However, it can be easily copied and illegally distributed. Though, the tool

requires minimal training to operate, the reliability is poor, it is difficult to read

and the level of information within is unsatisfactory.

 CIRAD-Foret: An improved version of the brand hammer, the CIRAD-Foret

has a relatively good security. There is no possibility to substitute logs and it is

easy to apply. The level of information relies on the completion of the

accompanying forms. However audits are required to enforce security especially

as forms as well as hammer marks may be counterfeited.

Page | 9

 Unique reflector identifiers: still an experimental procedure at present, URI at

optimum functioning will have a security and reliability level above average.

Nonetheless, as it stands currently the technology is impractical for field use

(laser device not yet robust enough), costly for producing countries ($500 for a

laser measuring device) and low in information (but improving, there is the

potential incorporation of memory cards with unique identifiers).

 Satellite-borne sensors: mostly applicable for large scale monitoring for

detection of potential illegal activities. Very reliable and efficient, however, the

large-scale nature of it equally demands large-scale expenses.

 Ground video surveillance cameras and automatic activation devices: prefer

choice for monitoring popular transportation routes, video surveillance is highly

informative and reliable. Cameras are equipped with light, sound, and motion

detector able to transmit signals to enforcement personnel. The draw back with

video surveillance is that it is not practical for monitoring individual logs and

they are difficult to hide for covert surveillance. Furthermore, a high cost of

approximately $5 000 per unit and conditions in most developing (lack of

electricity, theft crime, etc.) countries do not allow for the implementation of

such surveillance.

1.6 Scale and effectiveness of global response

In an attempt to measure the scale and effectiveness of the global response to illegal logging, Lawson and Macfaul (2010), assessed enforcement data recorded all along the chain of custody. At first glance, the overall conclusion extracted from illegal logging data of several producing nation (such as: Ghana, Cameroon, Brazil, and Indonesia)

Page | 10 shows that, recently, there has been significant improvement in the enforcement response, which has led to the global reductions in illegal logging activities.

However, they also show that while detection has improved in these countries, follow-up such as prosecution, convictions, and issuance and collection of fines remains poor. For instance at the time of this study only 39% of fines issued by the Cameroonian government had been collected and the IBAMA office of Indonesia had 1 025 outstanding illegal logging cases, of which 93% were at least two years old, and some more than twenty years old (Barreto et al. 2008). Therefore, there is a general misconception that, due to the dramatic increase in random roadside checkpoints, cargo inspections, arrests and court cases initiated that illegal logging is on the decrease.

Prosecutions have failed to keep up with the recent assault on illegal logging activities.

Not only are cases backing up but several are rejected or being dropped due to lack of clarity in the systems of forest laws management or a lack of credible evidence and capacity within the forestry commission to contest legal arguments (Lawson and

MacFaul 2010).

As far as the African continent is concerned, the timber tracking regulations are generally weak. For instance, French authorities have persistently recorded significantly larger volumes (by 30-40%) of log imports from Central African countries than the volumes recorded as having been legally exported. Data on trade in round logs between

Mozambique and China have shown increasingly large discrepancies in recent years. By

2008, 80% of the logs being reported by Chinese customs as import from Mozambique

Page | 11

(120 000 cubic metres of timber worth around US$60 million) were not recorded as legally exported (Lawson and MacFaul 2010).

In 2008 the independent observer concluded, “in its current configuration, the timber inspection system along transport routes is not uniform and lacks effectiveness”

(Resource Extraction Monitoring 2008). The system does not track back to the stump; logs are often first marked many kilometres from the location of harvesting. The system also does not allow for easy reconciliation. Export content is rarely checked against licensed production records. Although the transport documents include some form of tamper-proofing measures, they are not sufficient to prevent counterfeiting. More timber transportation permits are often issued to logging companies than are required, and unused permits are not recovered; this has allowed companies to launder illegal logs into the system (Resource Extraction Monitoring 2008). It is thus clear that the timber- tracking system is lacking a comprehensive verification measure and suffers from leakage and inefficiency.

1.7 Molecular genetics tool for wood identification

Molecular genetic markers have numerous potential applications in environmental forensics. Fields such as, forest certification schemes, environmental management, forest enterprises, and state agencies could effectively profit from molecular genetics tools. The latter will amply benefit from a reliable method of identification able to trace timber, identify false declarations, and pin point geographic regions of origin. The potential of molecular markers as a tool to identify timber species has been suggested

Page | 12 due to the stability of DNA and the impossibility to manipulate DNA contained in tissue samples (Finkeldey et al. 2007).

However, before molecular genetics can be applied basic requirements need to be met. Firstly the isolation of DNA from unconventional materials (timber logs and wood materials) needs to be developed to isolate DNA in sufficient quantity and quality. Secondly, informative diagnostic markers must be identified. Finally, it will be required to establish a global genetic inventory of economically important timber species. Though a molecular genetics tool is highly reliable (for identification of unknown specimens) and infer a tamper proof security level, molecular genetics tools are viewed as not practical for monitoring individual logs, it requires specialist support and high implementation cost especially in developing a genetic database (Brack et al.

2002).

1.8 DNA extraction from wood materials

DNA extraction from the standard soft and living tissues of terrestrial plants has been standardized (Doyle and Doyle 1987). However, in recent times a sturdy demand for analysis of unconventional plant tissue such as hard seed coats and dead plant tissue has attracted the interest of paleogeneticists, forensic scientists, and geneticists. Genetic material from fossilised or dead plant material may be of great phylogenetic values

(Savolainen et al. 1995). However, the genomic DNA from these samples is extremely fragmented and in some cases highly contaminated (Rachmayanti 2009, Cooper and

Wayne 1998).

Page | 13

In the review on “Molecular genetics tools to infer the origin of forest plants and wood” by Finkeldey et al. (2010), they conclude that four aspects of wood and wood products material mainly influenced DNA extraction: (1) Physical; although optimum isolation require optimum cell disruption, due to the hard nature of the material unconventional methods usually resulting in overheating such as sawing and drilling must be applied with meticulous care (Finkeldey et al. 2010); (2) Chemical; the presence of strong inhibitors such as phenolic compounds may potentially inhibit DNA extraction or render it unsuitable for amplification (Lee and Cooper 1995, Rachmayanti et al. 2009); (3) Biological; due to the biological conversion to woody tissues, cells die and become empty. Therefore, no organelles or nuclei are detectable in heartwood.

Wood is also subjected to decomposition by microorganisms and fungi, which may lead to contamination by foreign DNA (Finkeldey et al. 2007, Lindahl 1993); (4) Aging; as a tree is cut or as soon as cells dies the process of degradation instantly begins. It is expected that the quantity and quality of DNA will diminish, as the DNA is likely to be fragmented (Deguilloux et al. 2002).

Various DNA extraction protocols have been successfully modified to extract

DNA from wood materials. Despite the obvious setbacks, the successful extraction of genomic DNA from wood material has been described for a few species. For example: oaks or Quercus species by Dumolin-Lapègue et al. (1999) and Cyclobalanopsis species of the Fagaceae family by Ohyama et al. (2001). Asif and Cannon (2005) describe the method for extraction of genomic DNA from the tropical endangered species

Page | 14

Gonystylus bancanus (Miq.) Kruz and Rachmayanti et al. (2006) for Dipterocarpaceae family. Deguilloux and colleagues (2002, 2003, and 2004) have demonstrated the usefulness of oak tree wood as a source of material for DNA isolation. Five hundred base pair long fragments of genomic DNA has been isolated and amplified from a 3

600-year-old Japanese cedar japonica (Thunb. Ex L.f.) D. Don (Gugerli et al. 2005).

Based on the factors listed above, adaptation such as careful mechanical disruption of wood, extended incubation periods, and exclusion of potential contaminants and inhibitors are thus needed for successful extraction of wood DNA.

DNA extraction followed by successful amplification of numerous markers has been described using the CTAB protocols (Ohyama et al. 2001, Reynolds and Williams 2004,

Rogers and Kaya 2006). The commercially available Qiagen Dneasy Plant Mini Kit has also been modified to extract DNA from ancient wood dating up back to a 1 000 years

(Liepelt et al. 2006). Asif and Cannon (2005) described the successful extraction of

DNA from Gonystylus bancanus using the N-Phenacylthiazolium bromide, an extraction protocol primarily used for DNA extraction from ancient bone in paleontological studies.

Based on the species investigated the factors listed above may be highly variable.

Both genetic and environmental factors may influence the physical, chemical, and biological nature of the species. Furthermore, treatment such as heating, pressing,

Page | 15 pesticide application, and processing of wood may interfere with DNA extraction procedures (Finkeldey et al. 2010).

1.9 DNA Barcoding

DNA barcoding is a genetic approach that relies on the nucleotide diversity in short

DNA strands to distinguish between different species (Hebert et al. 2003, Savolainen et al. 2005, Chase et al. 2005, Cameron et al. 2007, Lahaye et al. 2008). With the selection of the universal gene cytochrome c oxidase subunit 1 (COI) as standard barcode for animals, DNA barcoding has demonstrated outstanding species identification capabilities. However finding a plant equivalent of COI has proven much more challenging (Hebert et al. 2003). Despite the complications in establishing a universally acceptable barcode for plants, the Consortium for the Barcode of Life (CBOL) Plant

Working Group selected a combination of the two chloroplast genes matK and rbcLa as the standard core barcodes for plants, with ITS and psbA-trnH as supplementary barcodes (CBOL - Plant Working Group 2009; Hollingsworth et al. 2011). This has made it possible for plant scientists to undertake barcoding initiative across the planet

(Vijayan and Tsou 2010).

The beginning of DNA barcoding dates no further than 2003 when Hebert et al. used the COI gene to accurately isolate individual species of lepidopterans from a set of

200 closely related species with 100% accuracy. Additionally, and more importantly they subsequently crafted a novel application for sequencing able to assign identification to unknown specimens with a rapidity and accuracy far more efficient and

Page | 16 modern then morphological taxonomist (Hebert et al. 2003, Hebert and Gregory 2005,

Taberlet et al. 2007). Soon after these findings (by May 2004) the iBOL initiative was jump-started with the aim to barcode all eukaryotic life on the planet. The barcoding sequence from a short DNA segment stored in an online reference database will serve as species identification tool, therefore, by comparing the sequence from unknown organism against the database an identification will be instantly revealed (Shneyer

2009).

Despite its recent beginnings DNA barcoding has proven to be a triumph in animals. The reason for its outstanding establishment is solely due to the mitochondrial

DNA and the gene COI. This is because COI gene covers all the basic features required for a barcoding sequence. Primarily universality, a DNA barcode should cover a wide range of taxa. Next is sequence quality, a DNA barcode should be easy to obtain with a single set of primers and bidirectional reads should be obtainable. Lastly DNA barcodes should have a high discriminatory power; the variation between species should be high, however, conserved within species (Hebert et al. 2003). The latter will determine how effective it will be at distinguishing species apart. Therefore, at 648 base pair long COI is a short DNA fragment located between conserved regions of the mitochondrial DNA.

These characteristics provide the means for a universal primer to easily amplify, sequence, and analyse a short DNA segment. Furthermore COI is able to confidently distinguish between species due to a high species variation between closely related animal species but a low equivalent within species (Hebert et al. 2003).

Page | 17

According to CBOL - Plant Working Group (2009), DNA barcoding has the potential to reduce the complications accompanied by traditional taxonomic identification as performed by CITES, which rely on morphological characters for species identification. For instance misidentification due to phenotypic plasticity or unreliability of characters due to extended maturity period. Hammond (1992) stated that

“since few taxonomists can critically identify more than 0.01% of the estimated 10-15 million species, a community of 15 000 taxonomist, in perpetuity, will be require to identify life on earth.” But DNA barcoding can serve taxonomist, forensics, conservationist, and even commercial food industries as an efficient tool for identification just as barcodes are used to distinguish supermarket products (Blaxter

2003, Schindel and Miller 2005). More importantly it will be exceptionally useful for identification of morphologically indistinguishable products for instance extremely damaged materials, delayed expression of characters or processed goods (Lahaye et al.

2008). Furthermore, when considering the rate of economic development and population growth and its effect on the flora and fauna, there is an essential need for an effective method to assess and record biodiversity on the planet in order to safeguard the rare and endangered species.

1.10 Diagnostic markers for land plants

The same expectation from COI in plant mtDNA could not be adequately fulfilled.

Unfortunately mitochondrial genes evolve slower in plants than they do in animals. The rate of substitution in plant mtDNA comes second best compared to mammalians, nuclear and chloroplast DNA 50 to 100 times lower than mammalians, two to three

Page | 18 times lower than chloroplast genes and lastly 10 to 20 times lower than nuclear genes

(Wolfe et al. 1987, Drouin et al. 2008). Furthermore the vast differences in the rate of substitution and major structural rearrangement between plants have forced scientists to avoid using plant mtDNA to evaluate evolution in plants at species level (Sanjur et al.

2002).

CBOL - Plant Working Group charged with searching for a suitable DNA barcode for plants assessed an array of loci of both nuclear and chloroplast origin.

Based on previous phylogenetic studies 12 loci were proposed as potential plant barcodes. Of the 12 loci four were short-handed, three plastid loci (rbcLa, matK, and psbA-trnH) and one nuclear segment (internal transcribed spacer; ITS). In 2009 the

CBOL - Plant Working Group recommended a core-barcode of the two coding chloroplast genes rbcLa and matK.

The recommendation of rbcLa and matK as the standard plant barcode was based on the fact that the rbcLa gene had been established in phylogenetic studies due to its modest discriminatory power, but most importantly its high retrievable characteristics. On the other hand matK was selected to complement the shortcomings of rbcLa, because it is one of the fastest evolving chloroplast genes, therefore, has a high discriminatory power. Together rbcLa and matK achieve approximately 70% species discrimination and represent the paramount equilibrium between the combination of universality, sequence quality, and discrimination characteristics of a

DNA barcode.

Page | 19

The rbcL gene is a chloroplast gene of about 1 430 basepairs; it codes for the important photosynthetic enzyme RUBISCO and was the first gene to be sequenced in plants. Therefore, to date it remains the most characterized plant gene (Zurawski et al.

1981), and provided new informative data which helped established the current phylogeny of flowering plants (The Angiosperm phylogeny group 2003). For the purpose of DNA barcoding, shorter segments within the rbcL holding sufficient variability were isolated. Subsequently, the short rbcLa segment has been readily amplified across a vast span of plant taxa. These characteristic of rbcLa along with the view that previous studies in plant phylogenetics have proposed the use of rbcL to reconstruct a phylogenetic tree of all land plant have lead the CBOL - Plant Working

Group to recognized rbcLa as a plant barcode (Soltis and Soltis 1998). However, rbcLa can only provide confident grouping up to the generic level, therefore a combination of rbcLa with a different locus useful at species level was proposed (CBOL - Plant

Working Group 2009; Hollingsworth et al. 2011).

Of the wide range of loci proposed as potential DNA barcodes, matK was considered the most complementary to rbcLa. Like rbcLa, matK is a chloroplast gene, it stretches 1 550 basepair long and codes for the RNA splicing enzyme maturase

(Neuhaus and Link 1987). As a DNA barcoding gene, matK is located within the intron of the trnK gene therefore conserved regions of the trnK gene can be used to design universal primers. Furthermore, due to the matK gene fast rate of evolution, it has proven to be useful for resolving interspecies relationships in a number of angiosperms

Page | 20

(Johnson and Soltis 1995). Hence, having taken into perspective the matK gene’s universality and high discriminatory powers, CBOL - Plant Working Group recommended matK and rbcLa as the core plant barcodes.

Though, matK and rbcLa represent the best possible combination for species discrimination and universality so far, other loci are under revision with the aim to improve the species discrimination power of plant barcoding while maintaining universality and ease of amplification (CBOL - Plant Working Group 2009).

1.11 Tree-BOL Africa initiative

A wide spread let-down to enforce forest laws has meant that progressive forest management regulations implemented in many producer countries have failed to live up to their promise to provide for sustainable and socially equitable forest use. The pressure on sub-Saharan Africa’s biodiversity continues to increase with the increasing human population (largely impoverished) relying on forest trees for fuel, shelter, fruits, nuts, and for various medical compounds (Scholes et al. 2008). The Food and

Agricultural Organization (FAO) estimates 13 000 square kilometres of African forest disappear every year through forest clearing, with the west African rain forest especially being depleted. The invasion of the forests by logging companies and subsequently farmers could drive species sensitive to change and disturbance in forest structure to become locally extinct.

Page | 21

Given the current situation, although species are constantly being discovered, many more may become extinct before they are known to science. In 2002 the

Convention on Biological Diversity (which promotes the fair and equitable sharing of the benefits arising out of utilization of genetic resources) adopted a global strategy for plant conservation. The CBD’s ultimate and long-term objective is to halt the current and continuing loss of plant biodiversity.

The Tree-BOL Africa initiative is an ambitious undertaking by the University of

Johannesburg to answer the targets of the CBD and spring board the African continent in the International DNA barcode of Life (iBOL) initiative. Tree-BOL is a derivative of the larger principal organization, the Consortium of the Barcode of Life (CBOL). The main objective of the Tree-BOL initiative is to create a reference database of trees by establishing DNA barcodes of all tree species in the world. Situated at the University of

Johannesburg, the regional working group for Africa aims to facilitate the transfer of accurate and reliable information between the continent’s tree collections and the rest of the world.

Through assessments of conservation status, distribution and DNA barcodes of the tree species in Africa, Tree-BOL Africa will: (1) reconstruct a phylogenetic tree for the estimated 1 700 tree species native to southern Africa and complete a DNA barcode database of all southern African trees; (2) determine the causes of variation in regional biodiversity (for science-based conservation proposals); (3) generate phylogenies from

DNA barcodes (as DNA barcodes libraries and phylogenetic data are both sequence

Page | 22 information from a collection of species). Furthermore, the Tree-BOL initiative aims to build capacity in African countries and raise awareness and promote the use of DNA barcoding data for the conservation and sustainable use of African tree species.

1.12 Study site: Mozambican forest

According to the Direcção Nacional de Terras e Florestas (2007), the Mozambican forest vegetation covers approximately 40.6 million hectares. These forests have served as stores of subsistence goods for the rural communities. The uses for these forests includes firewood source, charcoal, medicinal plants, bamboo, reeds, and veldt foods such as wild vegetables, fruit and tubers, also as a source of nutrients and soil fertilisers, through fire and recycling of material. Local communities also use timber and precious wood for the construction of houses and for arts and crafts, more specifically carvings and sculptures. The local people of Mozambique have also developed a relationship with the forests for both cultural and spiritual reasons (Ribeiro 2008).

There are two main forest types comprising the majority of the Mozambican forest vegetation, the Miombo and Mopane forest types. The most extensive forest type in the country is the Miombo Forest. Home to approximately 334 tree species, it stretches across the vast central and northern regions of Mozambique. It is characterized by the prevalence of fires and dense vegetation cover, with deciduous and semi- deciduous trees, often reaching 10 to 20 meters in height. This forest type occupies close to two third of the country and includes some of Mozambique’s most important river systems, notably the Zambezi river. The Miombo forest type is dominated by

Page | 23 species such as Brachystegia spiciformis Benth. and Jubernadia globiflora (Benth.)

Troupin (Ribeiro 2008).

The second largest forest type is the Mopane forest. It is mostly found in the

Limpopo-area and towards the upper Zambezi valley. This vegetation type is characterized by dry savannahs with low and medium altitudes and the presence of deciduous trees and . It is low in species richness with no more than 283 species per 625 km2. A few medicinal plants can be found. The forest type is dominated by species such as Colophospermum mopane (Benth.) Leonard, Adansonia digitata L., and

Afzelia quanzensis Welw. The Mopane forests are situated on low quality soil and are largely inefficient at rejuvenation after degradation. This has resulted in numerous areas of the forest to be under conservation along with it large number of fauna (Hatton and

Munguambe 1998).

Mozambique remains one of the poorest countries in the world. Approximately

54% of the country’s population live below the poverty (Instituto Nacional de

Estatistica 2013) receiving an average annual income of US$ 250 per person (World

Bank, 2005). The wealth of the nation is unevenly distributed mostly favouring urban dwellers and specifically southern provinces (Instituto Nacional de Estatistica 2013).

However, 70% of Mozambican remains in rural areas, relying on natural resources for daily livelihoods. Rural dwellers mainly rely on subsistence farming. Approximately

7% of the population has access to electricity, leaving the rest of the population with a heavy reliance on firewood and charcoal energy, in fact, firewood and charcoal

Page | 24 represents 85% of energy consumption in the country as 17 million m3 of biomass is consumed every year (Hanlon 2007).

The Mozambique government has recently introduced a number of laws and reforms to protect and develop rational and sustainable use of forest. The land law of

1997 and the forestry and wildlife law of 1999 were only passed in 2002. The land law of 1997 assert the right of local communities to land and render consultation of local communities by second party before exploitation. Therefore, the communities have first preference of habitation and subsistence usage of the land and can negotiate agreement with commercial entities that could bring development and prosperity to the community

(Johnstone et al. 2004).

The forest and wildlife law is centred on the concept of community based natural resource management. It is intended to establish a sustainable usage of forest resources and institute an efficient revenue generation and distribution system of the accumulated taxations (Nhantumbo and Macqueen 2003). According to Johnstone et al. (2004), the laws governing forest utilizations contradict each other. Thought the land law allows the transfer of rights to the land to the community. The forest and wild life law only approves it for non-commercial purpose rendering it compulsory to obtain a license for commercial uses, thus, placing the local community on an equal playing field as the private sector and multinational companies.

Page | 25

1.13 State of Mozambican Forests

The rate of deforestation in Mozambique is 219 000 hectares per year (Direcção

Nacional de Terras e Florestas 2007). The 2007 national inventory attributed the blame for this fast rate of deforestation to the growing population. Forest fire, open cultivation areas, firewood collection, and charcoal production where highlighted as the main cause for deforestation. However, according to Ribeiro (2008), numerous studies indicate that illegal logging and unsustainable logging are the main contributors to the fast rate of deforestation.

The prevalence of illegal logging has been well documented in Mozambique.

Current estimation suggests that 50 to 70% of timber produced in Mozambique is illegal. This corresponds to 90 000 and 140 000 m3 of illegally produced round-wood per year and result in a gross economic of US$ 15 to 24 million (Del Gatto 2003). The present extraction rate of valuable commercial timber species such as, Afzelia quanzensis, Dalbergia melanoxylon Guill., & Perr. and stuhlmannii Taub. commonly known as (chanfuta, pau preto and jambirre, respectively) may be two to four times above the sustainable potential. The efficiency of the clandestine illegal logging is also made easy by the low-density nature of the Mozambican forest. The openness of these forest allow for easy access to precious timber (Reyes 2003).

The mechanism of channelling forest revenues to local community has been poorly implemented. Local communities have expressed complaints of extensive logging, a high number of abandoned logs and indiscriminate logging practice around

Page | 26 villages. Furthermore the practice of illegal logging promotes a system that abuses the rights of local communities. Illegal logging denies them opportunities for vitally needed employment and skills development that could come from sustainable forest management, processing industries and community based enterprises (Ribeiro 2008).

The fortunate few to obtain employment perform gruelling work rewarded by salaries below minimal wage, backlogs in salaries plus inhumane working conditions (Ribeiro

2008).

Despite the new legislations illegal trade continues to flourish. The new control measures involve a wide range of stakeholders (including customs) and a complicated process of forms, inspections and payments (Mosse 2007). Clearance with customs occurs through a shipping agent and involves a number of documents cross-checking.

The most important is the Boleitim de Mercadoria; this is the exporter’s declaration of the cargo content, destination and importer. Serious irregularities take place at this point. Firstly cross-checking of the documents against each other is hampered by practices such as theft and sale of blank documents. Because for most precious commercial timber only processed or semi-processed timber can be exported from

Mozambique, identification is, almost impossible. The products are identified by type, not by species. Hence enforcing officers seldom cross-check the document with the shipment, leaving room for practices such as undervaluing of cargo. The only form of species identification takes place at the logging site where a pre-harvest inventory is made in which all exploitable trees are marked. However, falsified documents, illegal payments, over harvest and illegal harvest of protected precious timber species are very common (Mackenzie 2006). Therefore the inability to correctly identify the species in

Page | 27 question is a huge loophole in the system and is an opportunity criminals will not overlook.

1.14 Research objectives

The development of a genetic- based tool for tree species identification should provide an unparalleled level of opportunities for scientists, consumers and enforcing officers.

DNA barcoding is one such a tool that is acknowledged to bring about accuracy and efficiency in species identification. The DNA barcoding identification technique works by comparing the sequences of DNA nucleotides isolated from a short segment of the genome standardized for plants (Hebert et al. 2003, Savolainen et al. 2005, Chase et al.

2005, Cameron et al. 2007, Lahaye et al. 2008) to the reference barcode database.

As a contribution to the Tree-BOL Africa initiative this project (DNA barcoding of traded and protected trees of southern Africa) will make use of DNA barcoding data as a foundation to address the following specific objectives:

1. Develop a DNA barcode reference library for traded and protected tree

species of southern Africa, with special emphasis on the trees of the Sofala

province in Mozambique and protected trees of , especially those for

their economic values as sources of fuel, shelter, fibre, food, , and

medicine; as well as for their ecological values as carbon sinks.

2. Channel the develop DNA barcodes into the DNA barcoding pipeline, which

will imply uploading relevant data such as, genetic DNA barcodes, ,

properties, geographic distributions and many other biodiversity data relevant to

Page | 28

the sustainable use of trees in southern Africa on the globally assessable and user

friendly BOLD Systems website (www.boldsystems.org). Furthermore submit

the obtained specimen into Tree-BOL to expand the current reference library.

3. Assess DNA barcode species identification success using different

identification parameters. The parameters used are as follows; best close match”

analysis of Meier et al. (2006) and Meier et al. (2008), the “nearNeighbour”

analysis, the BOLD “thresh id” and lastly a tree-based test of species monophyly.

4. Compare identification success of matK and rbcLa. Does the gain in

information obtained in combining the two genes justify the extra cost and

labour?

5. Assess the applicable performance of DNA barcoding as a species

identification tool by successfully extracting and sequencing the standardized

DNA barcodes from timber logs. This will be followed by an attempt to identify

the generated sequences against the developed database of traded and protected

trees.

Page | 29

Chapter 2

2. Materials and Methods

2.1 Specimen collection and reference samples.

A list of southern African traded and protected tree species was generated from several sources, namely: Annex 1 of Mozambican forestry and wildlife regulations (Tudo Legal

2009), the South African list of protected tree species (Republic of South Africa 2012),

Cites list of timber species from Appendix I, Appendix II, and Appendix III (CITES,

2006), and The Wood Explore species list (The Wood Explorer 2012) see table 2-1.

Specimens used in this study were identified and collected under two different scenarios. The first involved field expeditions where morphological species identification was made using scientific literature (see figure 2-1) with the assistance of local and regional tree specialists (see table 2-2) and trained field assistants. The specimens were collected from localities spanning across several provinces in South

Africa and several countries within southern Africa (see figure 2-2). For every sample collected, herbarium vouchers were prepared and deposited at the University of

Johannesburg herbarium (JRAU). A project entitled “Traded and Protected Trees of

Africa (TPTA)” was created on the Barcode of Life Database (BOLD) systems

(www.boldsystems.org) under the project tag IDRC-SA WG 1.2 Land Plants.

Collection details, taxonomy, voucher numbers, GPS coordinates, specimen

Page | 30

photographs, and sequence data (matK and rbcLa) were generated and were deposited online on BOLD systems see figure 2-3.

The second sample collection scenario consisted of sampling timber log material. These morphologically indistinguishable samples were included in the study to test the ability of standard barcodes to be amplified and to assign a correct identification from degraded material. The specimens were randomly collected at the TCT Dalmann logging concession located in Catapu Mozambique. Voucher specimens for the taxa used in this study and GenBank accession numbers are listed in Table 2-3.

2.2 DNA extraction and sequencing.

DNA extraction, polymerase chain reactions (PCR) and sequencing of the core barcode

(matK and rbcLa) regions were done at the Canadian Centre for DNA Barcoding

(CCDB Canada) and the African Centre for DNA Barcoding (ACDB, South Africa).

Leaf samples were sent to CCDB for DNA extraction and sequencing. Molecular procedures conducted at CCDB (DNA extraction, amplification and sequencing) followed several steps. DNA extraction was performed using a semi-automated protocol

(Ivanova et al. 2006); Polymerase Chain Reactions (PCR) used a PCR cocktail including 5-trehalose as a PCR enhancer. PCR was run in two rounds, the first being essentially the “proper” PCR and the second round was mainly failure tracking.

Different primers were utilized in each round. For rbcLa, the primers rbcLa-F/rbcLa-R was used during the first round of PCR reactions and rbcLa-F/rbcLa-jf634R during the second round whereas 1R Kim-f/3F-Kim-r were used in the first round and matK-

Page | 31

390f/matK-1326r in the second round for amplification of matK. These two sequential

PCR rounds with different primer sets allow improvement of sequencing success for both rbcLa and matK. Sequencing of the cleaned PCR products were conducted using the standard CCDB sequencing protocols described by Hajibabaei et al. (2005).

Protocols used at ACDB are presented as follows. Total genomic DNA was isolated either from 0.1 – 0.3g of silica dried leaf materials or 0.5 – 1.0g of fresh leaves using the 2× CTAB (hexadecyltrimethylammonium bromide) extraction method of

Doyle and Doyle (1987). Polyvinyl pyrolidone (2% PVP) was added to reduce the effect of high polysaccharide concentration in the samples. After precipitating the DNA with 100% ethanol, it was stored at -20°C for a minimum of two weeks (Fay et al.

1998). DNA extracts were purified using QIAquik silica columns (Qiagen Inc., Hilden,

Germany) according to the manufacturers’ protocol. The total genomic DNA (tDNA) extracted was stored at -80C in the DNA bank of ACDB. Slight modifications were made for the timber log materials. 1.0g of the inner most tissue were used for DNA extraction. For samples displaying relatively fresh phloem tissue, this was used for

DNA extraction. For cell disruption a clean ethanol treated electric rock mill was used, to prevent over heating liquid nitrogen was added, see figure 2-5. Furthermore, incubation in the CTAB lysis buffer was extended from the recommended 60 min to an overnight incubation. Lastly precipitation was done with isopropanol.

Primers used for PCR of the cpDNA rbcLa and matK regions are listed in table

2-4. All PCRs were performed using ReadyMix Master (Advanced Biotechnologies,

Page | 32

Epson, Surrey, UK). An additional component Bovine serum albumin (3.2% BSA) was added to both plastid reactions. This additive serves as a stabiliser for enzymes, reduces problems with secondary structure, and improves annealing (Palumbi 1996). The PCR amplifications were performed using either the 9800 Fast Thermal Cycler or the

GeneAmp PCR System 9700 machine. Programs used for PCR amplification are as follows: (a) for rbcLa pre-melt at 94°C for 60 sec, denaturation at 94°C for 60 sec, annealing at 48°C for 60 sec, extension at 72°C for 60 sec (for 28 cycles), followed by a final extension at 72°C for 7 min and (b) for matK the protocol consisted of pre-melt at

94°C for 3 min, denaturation at 94°C for 60 sec, annealing at 52°C for 60 sec, extension at 72°C for 2 min (for 30 cycles), final extension at 72°C for 7 min. Cycle sequencing reactions were carried out in a GeneAmp PCR System 9700 thermal cycler using the

ABI PRISM® BigDye® Terminator v3.1 Cycle Sequencing Kits (Applied Biosystems,

Inc., California, USA). Cycle sequencing products were precipitated in ethanol and sodium acetate to remove excess dye terminators. They were then re-suspended into

10µL HiDi formamide (ABI) before being sequenced on an ABI 3130 xl Genetic

Analyser (ABI).

2.3 Sequence editing, alignment and broad analysis.

Complementary strands were assembled and manually edited using Sequencher 3.1

(Gene Codes, Ann Arbor, Michigan, USA). The rbcLa and matK sequences were initially automatically aligned (rbcLa with muscle and matK with Mega 5.2.1), followed by a manual correction of the alignment in PAUP* (version 4.0b.10) (Swofford 2002).

Summary statistics for the dataset compiled were generated in R version 3.0.0.

Page | 33

Summary statistics were calculated using “dataStat”, “seqStat” and “is.ambig” functions implemented in R package SPIDER 1.2-0 (Brown et al. 2012). These statistics include: number of species and genera in the dataset, in addition to the number of individuals collected for each species, sequence number and length, and proportion of ambiguous bases. Ambiguous bases are bases coded by International Union of Pure and Applied

Chemistry (IUPAC) DNA codes but different from A, C, G, or T.

2.4 Assessment of core DNA barcodes identification efficiency.

Four primary analyses were conducted to assess the suitability of the core barcodes as a species identification tool using the R package Spider 1.2-0 (Brown et al. 2012).

Suitability was firstly assessed for the matK data frame, followed by the rbcLa data frame and finally on the combined data frame. The barcode gap is an imperative in assessing the suitability of the core barcodes for identification purpose. Using the generated DNA data the presence of a barcode gap was tested to determine whether genetic variation within species is smaller than the amount of variation between species

(Meyer and Paulay 2005). Comparing the smallest, rather than the mean interspecific distance versus intraspecific distances evaluated the barcoding gap. The significance of the differences between both distances was tested using the Wilcoxon sum ranked test.

The region that exhibits significant barcode gap is henceforth referred to as ‘best barcode”.

The identification accuracy of specimens in the dataset was assessed using four primary metrics, thereby quantifying different properties of the data. In these tests,

Page | 34 treating each individual as an identification query simulated a real identification problem. In essence, each specimen sequenced in the data set is considered as unknown while the remaining sequences in the data set represent the DNA barcoding database used for identification. The primary metrics assessed were the “best close match” analysis of (Meier, Shiyang, Vaidya, & PKL, 2006) Meier et al. (2006) and Meier et al.

(2008), the “nearNeighbour” and “thresh id” methods implemented in R package Spider

1.2-0 (Brown et al. 2012). The best close match and nearest neighbour analyses measure the identity accuracy by searching for the closest individuals under a specific threshold, while the former focuses on a single nearest neighbour match, rather than all matches within a specific threshold. The “thresh id” method performs a threshold-based analysis similar to the “Specimen identification” tool provided by BOLD.

Lastly tree-based test of species monophyly was employed. The latter reports the exclusivity of the genetic clusters in a NJ phylogram. A bootstrap test was further incorporated and node supports greater than 70% were considered correct identifications. Identification rates for queries were distinguished into four categories:

“correct” or “incorrect”, and “no identification” or “ambiguous” if applicable to the method. For singletons (species represented by a single specimen), which have been a problem for DNA barcoding identification a “no identification” return was considered correct identification.

The threshold of 1% is generally used by BOLD for species identification, but this may not always be appropriate for every dataset, and identification success can be

Page | 35 increased if a better threshold for the given data can be found (Meyer and Paulay 2005).

Because not one threshold is likely to suit all species, to minimise the prominence of errors a range of thresholds were assessed for all three data frames. The optimum threshold was extracted at the point where the sums of both false positive and false negative error rates are at their minimum. The function “treshOpt” implemented in the

R package Spider 1.2-0 (Brown et al. 2012) was used. The resulting thresholds were implemented in the different primary metrics mentioned above to assess the effect on the success rates.

2.5 Phylogenetic analysis

Phylogenetic analyses were conducted on the combined matK and rbcLa data frames.

The combined matrix was analysed in PAUP using the maximum parsimony (MP) in a heuristic search. The parameters of the analysis included 1 000 replicates of random taxa addition, 10 trees retained per step with tree-bisection-reconnection (TBR) branch swapping. Subsequently, sequence relationship was inferred using Bayesian analysis on the program MrBayes v.3.1.2. The rate was set at nts=6 and the Markov Chain Mote

Carlo (MCMC) analysis was run for 2 000 000 generations. Lastly Posterior probability

(PP) was recorded on an 80% majority-rule consensus tree using PAUP with scores classed as high above 95 PP or low below 95 PP. The tree was rooted using representatives of Acrogymnospermae namely; Cycas thouarsii R.Br, Stangeria eriopus

(Kunze) Baill., Encephalartos aemulans Vorster, Pinus pinaster Aiton, Podocarpus elongatus (Aiton) L'Hér. ex Pers., Podocarpus henkelii Stapf ex Dallim. & B.D.Jacks,

Podocarpus latifolius (Thunb.) R.Br. ex Mirb, Afrocarpus falcatus (Thunb.) C.N.Page.,

Page | 36

Widdringtonia nodiflora (L.) E.Powrie, and schwarzii (Marloth) Mast

(Cantino et al. 2007, Soltis et al. 2011).

2.6 Correspondence of query sequences to the database

Different algorithms were used to assess the query sequences performance, when matched against the generated database. The following query specimens were sampled from old timber logs exposed to the elements at the TCT-Dalmann, Catapú timber concession, in central Mozambique. Queries (1 & 2) Panga-panga, Millettia stuhlmannii

Taub.; Query (3) Chanfuta, Afzelia quanzensis; Query (4) Brown-ivory, Berchemia discolor (Klotzsch) Hemsl.; Query (5) Cordyla, Cordyla africana Lour.; and Query (6)

Messassa, Brachystegia spiciformis Benth. The selection of an applicable method of identification was of concern; therefore, widely accessible algorithms such as sequence

BLAST on GenBank and BOLD that use local pairwise alignments were applied with default setting to assess the efficiency of the core barcodes at identification on a global platform. NJ trees based on distance matrix incorporated in the R package spider were generated for all three datasets (rbcLa, matK and combined). NJ uses an algorithm of minimum evolution to assess the best branching pattern between taxa based on genetic distances (Saitou and Nei 1987). Furthermore, the same trees were subjected to

Bootstrap analysis and assessed for reciprocal monophyly. Lastly Bayesian analysis was used, for support of the query identification. Using the above mentioned phylogeny of the traded and protected trees, the query DNA barcode sequence generated for the

Mozambican timber logs were included and aligned in the phylogeny of the collected traded and protected trees. Both NJ and pairwise parsimony trees were generated for

Page | 37 tree based identification of the query specimen on the local database. Additional

Bayesian and bootstrap analysis were conducted to support the query identification. A match will only be measured accurate if hits or neighbours are the same species see, figure 2-4. The Bayesian likelihood method also seeks a phylogenetic topology that minimizes evolutionary assumptions, under various models of nucleotide evolution

(Huelsenbeck and Ronquist 2001)

Page | 38

Table 2-1 Compiled list of priority traded and protected timber species in southern Africa

Scientific Names Commercial name Adansonia digitata L. Ala Afrocarpus falcatus (Thunb.) C.N.Page. Yellowwood Afzelia quanzensis Welw. Chanfuta Androstachys johnsonii Prain Mecrusse Baikiaea plurijuga Harms Rhodesian Teak Balanites maughamii Sprague - Barringtonia racemosa (L.) Spreng. - Berchemia zeyheri (Sond.) Grubov Pau-Rosa/Pink Ivory Bobgunnia madagascariensis (Desv.) J.H.Kirkbr. & Wiersema.* Kampanga/ Pau Ferro Boscia albitrunca (Burch.) Gilg & Benedict - Brachystegia spiciformis Benth. Messassa Breonadia salicina (Vahl) Hepper & J.R.I.Wood Matumi Bruguiera gymnorhiza (L.) Lam. Black Mangrove Cassia abbreviata Oliv. Rabai Cassia afrofistula Brenan Kenyan Shower Cassine transvaalensis (Burtt Davy) Codd - Cassipourea swaziensis Compton* Catha edulis (Vahl) Endl. Khat Catha transvaalensis Codd* - Ceriops tagal (Perr.) C.B.Rob. Balobalarao Cleistanthus schlechteri (Pax) Hutch. - Colophospermum mopane (Benth.) Leonard Balsam Tree Colubrina nicholsonii A.E.van Wyk & Schrire* Combretum imberbe Wawra Monzo Cordyla africana Lour. Cordyla Curtisia dentata (Burm.f.) C.A.Sm. Asgaai Dalbergia melanoxylon Guill. & Perr. African Blackwood Diospyros mespiliformis Hochst. ex A.DC. Ebano Ekebergia capensis Sparrm. Inhamarre/Cape Ash Entandrophragma caudatum (Sprague) Sprague Mbuti/Bottle Tree Entandrophragma cylindricum (Sprague) Sprague* Sapele Erythrophleum suaveolens (Guill. & Perr.) Brenan Missanda Erythrophysa transvaalensis Verd.* - Euclea pseudebenus E.Mey. ex A.DC. Cape Ebony Ficus trichopoda Baker - coleosperma (Benth.) Leonard African Rosewood Guibourtia conjugata (Bolle) J.Leonard Chacate Guibourtia pellegriniana Leonard Bubinga Guibourtia tessmannii (Harms) J.Leonard Bubinga Julbernardia globiflora (Benth.) Troupin Messassa Encarnada Khaya anthotheca (Welw.) C.DC.* African Mahogany

Page | 39

Khaya nyasica Stapf ex Baker f.* African Mahogany Leucadendron argenteum (L.) R. Br. - Lumnitzera racemosa Willd. - Lydenburgia abbottii (A.E.van Wyk & M.Prins) Steenkamp* A.E.van Wyk & M.Prins* - Microberlinia brazzavillensis A.Chev. Zebrano Milicia excelsa (Welw.) C.C.Berg Iroko Millettia stuhlmannii Taub. Jambire Mimusops caffra E.Mey. ex A.DC. Kaffir Bulletwood Newtonia hildebrandtii (Vatke) Torre - Ocotea bullata (Burch.) E. Meyer in Drege Black Stinkwood Ozoroa namaquensis (Sprague) I. von Teichman & A.E. van Wyk* - Philenoptera violacea (Klotzsch) Schrire - Pittosporum viridiflorum Sims Cape Pittosporum Podocarpus elongatus (Aiton) L'Hér. ex Pers. - Podocarpus henkelii Stapf ex Dallim. & B.D.Jacks. - Podocarpus latifolius (Thunb.) R.Br. ex Mirb. East African Yellowwood Pouteria altissima (A.Chev.) Baehni* Anegre Protea comptonii Beard* - Protea curvata N.E.Br.* - Prunus africana (Hook. f.) Kalkman - Pterocarpus angolensis DC. Umbila Pterocarpus soyauxii Taub.* African Padauk Rhizophora mucronata Lam*. Bakau Sclerocarya birrea subsp. caffra (Sond.) Kokwaro Aniya Securidaca longipedunculata Fresen. - Sideroxylon inerme L. - Spirostachys africana Sond. Sandalo Africano Staudtia kamerunensis var. gabonensis (Warb.) Fouilloy* Niove Tephrosia pondoensis (Codd) Schrire* - erioloba (E. Mey.) P.J.H. Hurter Camelthorn (E. Mey.) P.J.H. Hurter - Warburgia salutaris (G.Bertol.) Chiov. - Widdringtonia schwarzii (Marloth) Mast. - Widdringtonia wallichii Endl. ex Carrière* - Xylia torreana Brenan* - *Refers to species not collected for this study

Page | 40

Table 2-2 List of local and regional specialists on trees of southern Africa

Specialist Associations Cape Nature - Kogelberg Johns Nature Reserve Braam van Wyk University of Pretoria Ernst van Jaarsfeld SANBI Ernst Schmidt - Johan Hurter - Buffelskloof Nature John Burrows Reserve Herbarium Marie Jordaan SANBI Meg Coates-Palgrave - Pieter Winter SANBI Robert Archer SANBI Tony Abbott -

Page | 41

Table 2-3 List of empirical data used in this study, for each sequence an accession number (Accession No.), voucher number (Voucher No.) and herbarium information are provided. Furthermore, APG III family names, scientific names and traded names (if available) are present.

APG III Family Scientific Name Common/Trade Name Voucher No. Herbarium Accession No. Accession No. matK rbcLa Podocarpaceae Podocarpus elongatus (Aiton) L'Hér. ex Pers. Breede River Yellowwood − − HM593746.1 HM593643.1 Podocarpaceae Podocarpus elongatus (Aiton) L'Hér. ex Pers. Breede River Yellowwood OM2273 JRAU − TSA060-10 Podocarpaceae Podocarpus henkelii Stapf ex Dallim. & B.D.Jacks. Henkel's Yellowwood − − HM593751 HM593648 Podocarpaceae Podocarpus henkelii Stapf ex Dallim. & B.D.Jacks. Henkel's Yellowwood − − − AF249610 Podocarpaceae Afrocarpus falcatus (Thunb.) C.N.Page. Yellow-wood OM1681 JRAU − KNPA994-09 Podocarpaceae Afrocarpus falcatus (Thunb.) C.N.Page. Yellow-wood − − − − Podocarpaceae Podocarpus latifolius (Thunb.) R.Br. ex Mirb. East African Yellowwood − − HM593754 AF249612 Podocarpaceae Podocarpus latifolius (Thunb.) R.Br. ex Mirb. East African Yellowwood − − − JF969703.1 Widdringtonia nodiflora (L.) E.Powrie Cape Cypress OM2271 JRAU − TSA058-10 Cupressaceae Widdringtonia nodiflora (L.) E.Powrie Cape Cypress − − JF725830.1 AY988266 Cupressaceae Widdringtonia nodiflora (L.) E.Powrie Cape Cypress − − HQ245917.1 JF725930.1 Cupressaceae Widdringtonia nodiflora (L.) E.Powrie Cape Cypress − − AY988364.1 − Cupressaceae Widdringtonia schwarzii (Marloth) Mast. Willowmore Cedar OM2272 JRAU − TSA059-10 Cupressaceae Widdringtonia schwarzii (Marloth) Mast. Willowmore Cedar − − AF152218 JF725943.1 Cupressaceae Widdringtonia schwarzii (Marloth) Mast. Willowmore Cedar − − JF725843.1 Cannelaceae Warburgia salutaris (G.Bertol.) Chiov. IsiBhaha/Pepperbark-tree OM1853 JRAU KNPA088-08 KNPA088 -08 Lauraceae Ocotea bullata (Burch.) E. Meyer in Drege Black Stinkwood Abbott9194 JRAU SAFH417-10 SAFH417-10 Lauraceae Ocotea bullata (Burch.) E. Meyer in Drege Black Stinkwood − − − AM235002.1 Proteaceae Leucadendron argenteum (L.) R. Br. Silver-tree OM2263 JRAU TSA050-10 TSA050-10 Zygophyllaceae Balanites maughamii Sprague Green thorn OM2096 JRAU SAFH252-10 SAFH252-10 Zygophyllaceae Balanites aegyptiaca (L.) Delile Desert Date OM3548 JRAU SAFH3536-11 SAFH3536-11

Page | 42

APG III Family Scientific Name Common/Trade Name Voucher No. Herbarium Accession No. Accession No. matK rbcLa Zygophyllaceae Balanites maughamii Sprague Green thorn OM994 JRAU KNPA1340-09 KNPA1340-09 Zygophyllaceae Balanites maughamii Sprague Green thorn OM223 JRAU KNPA1170-09 KNPA1170-09 Zygophyllaceae Balanites maughamii Sprague Green thorn OM3412 JRAU SAFH3400-11 SAFH3400-11 Zygophyllaceae Balanites pedicillaris Small Green Thorn OM901 JRAU KNPA353-09 KNPA353-09 Celastraceae Catha edulis (Vahl) Endl. Khat OM1866 JRAU KNPA1019-09 KNPA1019-09 Celastraceae Catha edulis (Vahl) Endl. Khat OM482 JRAU KNPA1262-09 KNPA1262-09 Celastraceae Catha abbottii A.E.van Wyk & M.Prins. − Abbott9242 JRAU SAFH463-10 SAFH463-10 Celastraceae Catha abbottii A.E.van Wyk & M.Prins. − − − DQ217556 − Celastraceae Catha transvaalensis Codd − − − DQ217548 − Celastraceae Cassine crocea (Thunb.) C.Presl Saffronwood Abbott9197 JRAU SAFH420-10 SAFH420-10 Celastraceae Cassine crocea (Thunb.) C.Presl Saffronwood OM3179 JRAU SAFH2322-11 SAFH2322-11 Celastraceae Cassine crocea (Thunb.) C.Presl Saffronwood OM3778 JRAU SAFH4390-12 SAFH4390-12 Celastraceae Cassine transvaalensis (Burtt Davy) Codd − OM403 JRAU KNPA1252-09 KNPA1252-09 Celastraceae Cassine transvaalensis (Burtt Davy) Codd − OM1229 JRAU SAFH534-10 SAFH534-10 Celastraceae Cassine transvaalensis (Burtt Davy) Codd − OM241 JRAU KNPA533-09 KNPA533-09 Celastraceae Cassine matabelica (Loes.) Steedman − − − DQ217537 − Euphorbiaceae Cleistanthus polystachyus Hook.f. ex Planch. Nom tonso − − FJ439971.1 − Euphorbiaceae Cleistanthus schlechteri (Pax) Hutch. Bastard Tamboti/Muchite OM2539 JRAU SAFH1460-11 SAFH1460-11 Euphorbiaceae Cleistanthus schlechteri (Pax) Hutch. Bastard Tamboti/Muchite OM2603 JRAU − SAFH1524-11 Euphorbiaceae Spirostachys africana Sond. Sandalo Africano OM2396 JRAU TSA178-10 TSA178-10 Euphorbiaceae Spirostachys africana Sond. Sandalo Africano OM254 JRAU SAFH1464-11 SAFH1464-11 Euphorbiaceae Spirostachys africana Sond. Sandalo Africano OM990 JRAU KNPA1338-09 KNPA1338-09 Picrodendraceae Androstachys johnsonii Prain Mecrusse − − AY552461.1 AF206734.1 Picrodendraceae Androstachys johnsonii Prain Mecrusse − − EF135502.1 −

Page | 43

APG III Family Scientific Name Common/Trade Name Voucher No. Herbarium Accession No. Accession No. matK rbcLa Picrodendraceae Androstachys johnsonii Prain Mecrusse OM3385 JRAU − SAFH3373-11 Picrodendraceae Androstachys johnsonii Prain Mecrusse OM1912 JRAU − KNPA1059-09 Picrodendraceae Androstachys johnsonii Prain Mecrusse RBN185 JRAU − − Rhizophoraceae Bruguiera gymnorhiza (L.) Lam. Black Mangrove OM2487 JRAU − SAFH680-10 Rhizophoraceae Bruguiera gymnorhiza (L.) Lam. Black Mangrove − − AF105088 AF006754 Rhizophoraceae Bruguiera gymnorhiza (L.) Lam. Black Mangrove − − AB233823 AF127693 Rhizophoraceae Ceriops tagal (Perr.) C.B.Rob. Balobalarao − − AF105089 AF006756 Rhizophoraceae Ceriops tagal (Perr.) C.B.Rob. Balobalarao − − − AF127684 Afzelia quanzensis Welw. Chanfuta/Afzelia/Doussie CS04 JRAU KNPA670-09 KNPA670-09 Fabaceae Afzelia quanzensis Welw. Chanfuta/Afzelia/Doussie OM2085 JRAU SAFH241-10 SAFH241-10 Fabaceae Afzelia quanzensis Welw. Chanfuta/Afzelia/Doussie OM2113 JRAU SAFH269-10 SAFH269-10 Fabaceae Afzelia quanzensis Welw. Chanfuta/Afzelia/Doussie OM291 JRAU KNPA539-09 KNPA539-09 Fabaceae Julbernardia globiflora (Benth.) Troupin Mtondoro/Muwa − − JX850047.1 − Fabaceae Julbernardia globiflora (Benth.) Troupin Mtondoro/Muwa OM2517 JRAU SAFH1438-11 SAFH1438-11 Fabaceae Julbernardia globiflora (Benth.) Troupin Mtondoro/Muwa OM2705 JRAU SAFH1626-11 SAFH1626-11 Fabaceae Brachystegia boehmii Taub. Mfuti OM3534 JRAU SAFH3522-11 SAFH3522-11 Fabaceae Brachystegia boehmii Taub. Mfuti OM2516 JRAU − SAFH1437-11 Fabaceae Brachystegia boehmii Taub. Mfuti OM2121 JRAU − SAFH277-10 Fabaceae Brachystegia boehmii Taub. Mfuti OM2061 JRAU − SAFH217-10 Fabaceae Brachystegia boehmii Taub. Mfuti − − EU361886 − Fabaceae Brachystegia spiciformis Benth. Messassa − − EU361888.1 − Fabaceae Brachystegia spiciformis Benth. Messassa OM2040 JRAU SAFH196-10 SAFH196-10 Fabaceae Colophospermum mopane (Benth.) Leonard Balsam Tree OM778 JRAU KNPA1304-09 KNPA1304-09 Fabaceae Colophospermum mopane (Benth.) Leonard Balsam Tree RL1558 JRAU SAFH1035-10 SAFH1035-10

Page | 44

APG III Family Scientific Name Common/Trade Name Voucher No. Herbarium Accession No. Accession No. matK rbcLa Fabaceae Colophospermum mopane (Benth.) Leonard Balsam Tree RL1611 JRAU KNPA1501-09 KNPA1501-09 Fabaceae Guibourtia pellegriniana Leonard Bubinga − − EU361964 − Fabaceae Guibourtia tessmannii (Harms) J.Leonard Bubinga − − EU361965 − Fabaceae Guibourtia coleosperma (Benth.) Leonard African Rosewood − − EU361962 − Fabaceae Guibourtia coleosperma (Benth.) Leonard African Rosewood OM2116 JRAU SAFH272-10 SAFH272-10 Fabaceae Guibourtia conjugata (Bolle) J.Leonard Chacate M662 JRAU KNPA705-09 KNPA705-09 Fabaceae Guibourtia conjugata (Bolle) J.Leonard Chacate OM1287 JRAU KNPA875-09 KNPA875-09 Fabaceae Baikiaea plurijuga Harms Zambesi Redwood M660 JRAU KNPA703-09 KNPA703-09 Fabaceae Cassia abbreviata Oliv. Rabai OM1177 JRAU KNPA839-09 KNPA839-09 Fabaceae Cassia abbreviata Oliv. Rabai OM2047 JRAU SAFH203-10 SAFH203-10 Fabaceae Cassia abbreviata Oliv. Rabai OM235 JRAU KNPA1177-09 KNPA1177-09 Fabaceae Cassia afrofistula Brenan Kenyan Shower OM2629 JRAU SAFH1550-11 SAFH1550-11 Fabaceae Erythrophleum suaveolens (Guill. & Perr.) Brenan Missanda − − EU361949 − Fabaceae Erythrophleum suaveolens (Guill. & Perr.) Brenan Missanda OM2674 JRAU SAFH1595-11 SAFH1595-11 Fabaceae Erythrophleum africanum (Benth.) Harms African blackwood OM2537 JRAU SAFH1458-11 SAFH1458-11 Fabaceae Microberlinia brazzavillensis A.Chev. Zebrano − − EU362003 − Fabaceae Cordyla africana Lour. Cordyla OM1188 JRAU KNPA844-09 KNPA844-09 Fabaceae Cordyla africana Lour. Cordyla OM1210 JRAU KNPA852-09 KNPA852-09 Fabaceae Cordyla africana Lour. Cordyla OM2745 JRAU SAFH1666-11 SAFH1666-11 Fabaceae Dalbergia melanoxylon Guill. & Perr. Pau-Preto/Ironwood OM268 JRAU KNPA1197-09 KNPA1197-09 Fabaceae Dalbergia melanoxylon Guill. & Perr. Pau-Preto/Ironwood OM2394 JRAU TSA176-10 TSA176-10 Fabaceae Dalbergia melanoxylon Guill. & Perr. Pau-Preto/Ironwood OM984 JRAU KNPA1336-09 KNPA1336-09 Fabaceae Dalbergia boehmii Taub. − OM2420 JRAU TSA245-10 TSA245-10 Fabaceae Dalbergia boehmii Taub. − OM2452 JRAU TSA277-10 TSA277-10

Page | 45

APG III Family Scientific Name Common/Trade Name Voucher No. Herbarium Accession No. Accession No. matK rbcLa Fabaceae Dalbergia boehmii Taub. − OM2532 JRAU SAFH1453-11 SAFH1453-11 Fabaceae Pterocarpus angolensis DC. Umbila, Kiaat, Transvaal teak OM1139 JRAU − SAFH513-10 Fabaceae Pterocarpus angolensis DC. Umbila, Kiaat, Transvaal teak OM2717 JRAU SAFH1638-11 SAFH1638-11 Fabaceae Pterocarpus angolensis DC. Umbila, Kiaat, Transvaal teak OM3312 JRAU SAFH3575-11 SAFH3575-11 Fabaceae Pterocarpus angolensis DC. Umbila, Kiaat, Transvaal teak OM3587 JRAU SAFH3575-11 SAFH3575-11 Fabaceae Pterocarpus angolensis DC. Umbila, Kiaat, Transvaal teak OM490 JRAU KNPA555-09 KNPA555-09 Fabaceae Pterocarpus brenanii Barbosa & Torre Padauk − − JN083540.1 JN083718.1 Fabaceae Pterocarpus brenanii Barbosa & Torre Padauk OM2510 JRAU SAFH1431-11 SAFH1431-11 Fabaceae Pterocarpus rotundifolius (Sond.) Druce Round-leaved bloodwood OM418 JRAU KNPA1253-09 KNPA1253-09 Fabaceae Pterocarpus rotundifolius (Sond.) Druce Round-leaved bloodwood RBN174 JRAU KNPA374-09 KNPA374-09 Fabaceae Pterocarpus rotundifolius (Sond.) Druce Round-leaved bloodwood RL1181 JRAU KNPA1439-09 KNPA1439-09 Fabaceae Pterocarpus rotundifolius (Sond.) Druce Round-leaved bloodwood OM3359 JRAU SAFH3347-11 SAFH3347-11 Fabaceae Pterocarpus rotundifolius (Sond.) Druce Round-leaved bloodwood RL1105 JRAU SAFH964-10 SAFH964-10 Fabaceae Millettia grandis (E.Mey.) Skeels Ironwood Kaffir − − AF142724 − Fabaceae Millettia grandis (E.Mey.) Skeels Ironwood Kaffir OM1757 JRAU KNPA016-08 − Fabaceae Millettia stuhlmannii Taub. Jambire/Panga Panga OM2322 JRAU TSA104-10 TSA104-10 Fabaceae Millettia stuhlmannii Taub. Jambire/Panga Panga OM2522 JRAU SAFH1443-11 SAFH1443-11 Fabaceae Millettia stuhlmannii Taub. Jambire/Panga Panga CS27 JRAU KNPA690-09 KNPA690-09 Fabaceae Millettia stuhlmannii Taub. Jambire/Panga Panga OM3517 JRAU SAFH3505-11 SAFH3505-11 Fabaceae Millettia stuhlmannii Taub. Jambire/Panga Panga OM2433 JRAU TSA258-10 TSA258-10 Fabaceae Millettia usaramensis Taub. − OM2222 JRAU SAFH373-10 SAFH373-10 Fabaceae Millettia usaramensis Taub. − OM1803 JRAU KNPA050-08 KNPA050-08 Fabaceae Millettia mossambicensis J.B.Gillett Jambirre/Wenge OM2335 JRAU TSA117-10 TSA117-10 Fabaceae Philenoptera violacea (Klotzsch) Schrire Rain Tree/Chimpakasa OM242 JRAU KNPA1181-09 KNPA1181-09

Page | 46

APG III Family Scientific Name Common/Trade Name Voucher No. Herbarium Accession No. Accession No. matK rbcLa Fabaceae Philenoptera violacea (Klotzsch) Schrire Rain Tree/Chimpakasa RL1123 JRAU KNPA581-09 KNPA581-09 Fabaceae Philenoptera violacea (Klotzsch) Schrire Rain Tree/Chimpakasa OM3542 JRAU SAFH3530-11 SAFH3530-11 Fabaceae Philenoptera bussei (Harms) Schrire Narrow Lance-pod/Chimpakasa OM2376 JRAU TSA158-10 TSA158-10 Fabaceae Philenoptera bussei (Harms) Schrire Narrow Lance-pod/Chimpakasa OM2527 JRAU SAFH1448-11 SAFH1448-11 Fabaceae Bobgunnia madagascariensis (Desv.) J.H.Kirkbr. & Wiersema Kampanga/ Pau-Ferro OM3566 JRAU SAFH3554-11 SAFH3554-11 Fabaceae Vachellia erioloba (E. Mey.) P.J.H. Hurter Camelthorn − − AF523193.1 − Fabaceae Vachellia erioloba (E. Mey.) P.J.H. Hurter Camelthorn RL1298 JRAU KNPA1462-09 KNPA1462-09 Fabaceae Vachellia haematoxylon (Willd.) Seigler & Ebinger Grey Camelthorn − − AF523189.1 − Fabaceae Vachellia haematoxylon (Willd.) Seigler & Ebinger Grey Camelthorn OM1069 JRAU KNPA813-09 KNPA813-09 Fabaceae Newtonia hildebrandtii (Vatke) Torre Lebombo wattle − − AF521848 − Fabaceae Newtonia buchananii (Baker) G.C.C.Gilbert & Boutiqu Lokundu/Nipovera − − − − Fabaceae Cylicodiscus gabunensis Harms Okan − − AF521819 − Polygalaceae Securidaca longipedunculata Fresen. Mufufu/ Violet tree OM3358 JRAU SAFH3346-11 SAFH3346-11 Polygalaceae Securidaca longipedunculata Fresen. Mufufu/ Violet tree OM2580 JRAU SAFH1501-11 SAFH1501-11 Polygalaceae Securidaca longipedunculata Fresen. Mufufu/ Violet tree OM1965 JRAU − KNPA1092-09 Polygalaceae Securidaca longipedunculata Fresen. Mufufu/ Violet tree CS33 JRAU − KNPA695-09 Moraceae Ficus rokko Warb. & Schweinf Giant forest Fig OM2249 JRAU TSA036-10 − Moraceae Ficus trichopoda Baker Swamp Fig OM1817 JRAU KNPA059-08 KNPA059-08 Moraceae Ficus trichopoda Baker Swamp Fig OM3274 JRAU − SAFH2401-11 Moraceae Ficus trichopoda Baker Swamp Fig OM3674 JRAU − SAFH4323-12 Moraceae Ficus thonningii Blume. Wild Fig OM1576 JRAU KNPA963-09 KNPA963-09 Moraceae Ficus thonningii Blume. Wild Fig OM972 JRAU KNPA1332-09 KNPA1332-09 Moraceae Ficus thonningii Blume. Wild Fig OM2763 JRAU − SAFH1684-11 Moraceae Ficus thonningii Blume. Wild Fig OM2754 JRAU − SAFH1675-11 Moraceae Ficus thonningii Blume. Wild Fig OM542 JRAU KNPA1274-09 KNPA1274-09

Page | 47

APG III Family Scientific Name Common/Trade Name Voucher No. Herbarium Accession No. Accession No. matK rbcLa Moraceae Ficus thonningii Blume. Wild Fig RL1487 JRAU KNPA1488-09 KNPA1488-09 Moraceae Ficus thonningii Blume. Wild Fig MWC20247 JRAU SAFH1225-10 − Moraceae Ficus thonningii Blume. Wild Fig OM1850 JRAU KNPA086-08 KNPA086-08 Moraceae Milicia excelsa (Welw.) C.C.Berg Iroko OM2696 JRAU SAFH1617-11 SAFH1617-11 Moraceae Milicia excelsa (Welw.) C.C.Berg Iroko OM2749 JRAU SAFH1670-11 SAFH1670-11 Rhamnaceae Berchemia zeyheri (Sond.) Grubov Pau-Rosa/Pink Ivory OM1165 JRAU KNPA835-09 KNPA835-09 Rhamnaceae Berchemia zeyheri (Sond.) Grubov Pau-Rosa/Pink Ivory OM600 JRAU KNPA136-08 KNPA136-08 Rhamnaceae Berchemia zeyheri (Sond.) Grubov Pau-Rosa/Pink Ivory OM3345 JRAU SAFH3333-11 SAFH3333-11 Rhamnaceae Berchemia discolor (Klotzsch) Hemsl. Brown Ivory OM1175 JRAU KNPA838-09 KNPA838-09 Rhamnaceae Berchemia discolor (Klotzsch) Hemsl. Brown Ivory OM2437 JRAU TSA262-10 TSA262-10 Rhamnaceae Berchemia discolor (Klotzsch) Hemsl. Brown Ivory OM267 JRAU SAFH1595-11 SAFH1595-11 Rhamnaceae Berchemia discolor (Klotzsch) Hemsl. Brown Ivory OM3536 JRAU SAFH3524-11 SAFH3524-11 Rosaceae Prunus africana (Hook. f.) Kalkman Red stinkwood OM1568 JRAU KNPA139-08 KNPA139-08 Rosaceae Prunus africana (Hook. f.) Kalkman Red stinkwood YM02 JRAU KNPA1517-09 KNPA1517-09 Rosaceae Prunus africana (Hook. f.) Kalkman Red stinkwood − − HQ235064.1 HQ235346.1 Rosaceae Prunus africana (Hook. f.) Kalkman Red stinkwood − − HQ235065.1 HQ235347.1 Rosaceae Prunus africana (Hook. f.) Kalkman Red stinkwood − − HQ235066.1 HQ235348.1 Rosaceae Prunus africana (Hook. f.) Kalkman Red stinkwood − − HQ235067.1 HQ235349.1 Rosaceae Prunus persica (L.) Stokes Peach OM1899 JRAU KNPA1051-09 KNPA1051-09 Rosaceae Prunus persica (L.) Stokes Peach − − GQ434205 GQ436597 Rosaceae Prunus persica (L.) Stokes Peach − − JF955829.1 JF943755.1 Rosaceae Prunus persica (L.) Stokes Peach − − JF955827.1 JF943753.1 Rosaceae Prunus serotina Ehrh. Wild Cherry − − HQ235258.1 DQ006123 Rosaceae Prunus serotina Ehrh. Wild Cherry − − HQ235259.1 HQ590225.1

Page | 48

APG III Family Scientific Name Common/Trade Name Voucher No. Herbarium Accession No. Accession No. matK rbcLa Rosaceae Prunus serotina Ehrh. Wild Cherry − − HQ235260.1 HQ235404 Combretaceae Combretum imberbe Wawra Monzo/Leadwood OM1012 JRAU KNPA791-09 KNPA791-09 Combretaceae Combretum imberbe Wawra Monzo/Leadwood OM2393 JRAU TSA175-10 TSA175-10 Combretaceae Combretum apiculatum Sond. Red Bushwillow OM2066 JRAU SAFH222-10 SAFH222-10 Combretaceae Combretum apiculatum Sond. Red Bushwillow OM2406 JRAU TSA231-10 TSA231-10 Combretaceae Combretum apiculatum Sond. Red Bushwillow OM3522 JRAU SAFH3510-11 SAFH3510-11 Combretaceae Combretum apiculatum Sond. Red Bushwillow OM1068 JRAU KNPA651-09 KNPA651-09 Combretaceae Combretum apiculatum Sond. Red Bushwillow RL1100 JRAU KNPA647-09 KNPA647-09 Combretaceae Combretum molle R.Br. ex G.Don − OM3553 JRAU SAFH3541-11 SAFH3541-11 Combretaceae Combretum molle R.Br. ex G.Don − OM1526 JRAU KNPA623-09 KNPA623-09 Combretaceae Lumnitzera racemosa Willd. Black mangrove − − AB114570.1 AF425717.1 Combretaceae Lumnitzera racemosa Willd. Black mangrove OM1675 JRAU KNPA643-09 KNPA643-09 Combretaceae Lumnitzera racemosa Willd. Black mangrove OM2478 JRAU SAFH671-10 SAFH671-10 Capparaceae Boscia albitrunca (Burch.) Gilg & Benedict Shepherd's tree OM312 JRAU KNPA1222-09 KNPA1222-09 Capparaceae Boscia albitrunca (Burch.) Gilg & Benedict Shepherd's tree OM1256 JRAU KNPA495-09 KNPA495-09 Capparaceae Boscia albitrunca (Burch.) Gilg & Benedict Shepherd's tree OM1274 JRAU SAFH547-10 SAFH547-10 Capparaceae Boscia angustifolia A.Rich. Rough-leaved shepherd's tree − − EU371747 − Capparaceae Boscia angustifolia A.Rich. Rough-leaved shepherd's tree OM2069 JRAU SAFH225-10 − Capparaceae Boscia angustifolia A.Rich. Rough-leaved shepherd's tree RBN268 JRAU − − Capparaceae Boscia mossambicensis Klotzsch Chimapamapane OM250 JRAU KNPA1189-09 KNPA1189-09 Capparaceae Boscia mossambicensis Klotzsch Chimapamapane RL1212 JRAU KNPA1444-09 KNPA1444-09 Capparaceae Boscia mossambicensis Klotzsch Chimapamapane YBK177 JRAU SAFH1044-10 SAFH1044-10 Capparaceae Boscia salicifolia Oliv. Mudemarara/willowey OM2404 JRAU TSA186-10 TSA186-10 Capparaceae Boscia salicifolia Oliv. Mudemarara/willowey OM2543 JRAU SAFH1464-11 SAFH1464-11

Page | 49

APG III Family Scientific Name Common/Trade Name Voucher No. Herbarium Accession No. Accession No. matK rbcLa Malvaceae Adansonia digitata L. Ala OM1306 JRAU KNPA885-09 KNPA885-09 Malvaceae Adansonia digitata L. Ala OM2740 JRAU SAFH1661-11 SAFH1661-11 Malvaceae Adansonia digitata L. Ala OM3387 JRAU SAFH3375-11 SAFH3375-11 Malvaceae Adansonia digitata L. Ala OM747 JRAU KNPA1297-09 KNPA1297-09 Meliaceae Ekebergia capensis Sparrm. Inhamarre/Cape Ash OM1540 JRAU KNPA949-09 − Meliaceae Ekebergia capensis Sparrm. Inhamarre/Cape Ash OM742 JRAU − KNPA566-09 Meliaceae Ekebergia capensis Sparrm. Inhamarre/Cape Ash OM2684 JRAU − SAFH1605-11 Meliaceae Entandrophragma caudatum (Sprague) Sprague Mbuti/Bottle tree OM3352 JRAU SAFH3340-11 SAFH3340-11 Meliaceae Entandrophragma caudatum (Sprague) Sprague Mbuti/Bottle tree OM794 JRAU KNPA1307-09 KNPA1307-09 Anacardiaceae Sclerocarya birrea subsp. caffra (Sond.) Kokwaro Aniya OM498 JRAU KNPA1266-09 KNPA1266-09 Anacardiaceae Sclerocarya birrea subsp. caffra (Sond.) Kokwaro Aniya RL1117 JRAU SAFH966-10 SAFH966-10 Anacardiaceae Sclerocarya birrea subsp. caffra (Sond.) Kokwaro Aniya OM278 JRAU − KNPA1204-09 Cornaceae Curtisia dentata (Burm.f.) C.A.Sm. Asgaai − − U96901 L11222 Cornaceae Curtisia dentata (Burm.f.) C.A.Sm. Asgaai OM1737 JRAU − − Cornaceae Curtisia dentata (Burm.f.) C.A.Sm. Asgaai OM3167 JRAU SAFH2310-11 SAFH2310-11 Ebenaceae Diospyros mespiliformis Hochst. ex A.DC. Ebano OM218 JRAU KNPA1167-09 KNPA1167-09 Ebenaceae Diospyros mespiliformis Hochst. ex A.DC. Ebano OM2764 JRAU SAFH1685-11 SAFH1685-11 Ebenaceae Diospyros mespiliformis Hochst. ex A.DC. Ebano OM3493 JRAU SAFH3481-11 SAFH3481-11 Ebenaceae Diospyros mespiliformis Hochst. ex A.DC. Ebano RL1273 JRAU SAFH1008-10 SAFH1008-10 Ebenaceae Diospyros squarrosa Klotzsch − OM3485 JRAU SAFH3473-11 SAFH3473-11 Ebenaceae Diospyros squarrosa Klotzsch − − − − − Ebenaceae Diospyros inhacaensis F.White − OM2225 JRAU SAFH376-10 SAFH376-10 Ebenaceae Diospyros abyssinica (Hiern) F.White − − − DQ923990 EU980646 Ebenaceae Diospyros abyssinica (Hiern) F.White − − − FJ238143 −

Page | 50

APG III Family Scientific Name Common/Trade Name Voucher No. Herbarium Accession No. Accession No. matK rbcLa Ebenaceae Euclea pseudebenus E.Mey. ex A.DC. Cape Ebony MWC21190 JRAU SAFH1226-10 SAFH1226-10 Ebenaceae Euclea natalensis A.DC. Natal Ebony RL1166 JRAU KNPA1436-09 KNPA1436-09 Ebenaceae Euclea natalensis A.DC. Natal Ebony OM3239 JRAU SAFH2370-11 SAFH2370-11 Ebenaceae Euclea natalensis A.DC. Natal Ebony OM211 JRAU KNPA1161-09 KNPA1161-09 Ebenaceae Euclea divinorum Hiern Magic Gwarra − − DQ924074 EU980790 Ebenaceae Euclea divinorum Hiern Magic Gwarra − − FJ238117 − Ebenaceae Euclea divinorum Hiern Magic Gwarra OM1102 JRAU SAFH503-10 SAFH503-10 Ebenaceae Euclea divinorum Hiern Magic Gwarra OM227A JRAU − KNPA1172-09 Rubiaceae Breonadia salicina (Vahl) Hepper & J.R.I.Wood Matumi OM2571 JRAU SAFH1492-11 SAFH1492-11 Rubiaceae Breonadia salicina (Vahl) Hepper & J.R.I.Wood Matumi OM3538 JRAU SAFH3526-11 SAFH3526-11 Rubiaceae Breonadia salicina (Vahl) Hepper & J.R.I.Wood Matumi RL1194 JRAU KNPA143-08 KNPA143-08 Sapotaceae Mimusops caffra E.Mey. ex A.DC. Kaffir Bulletwood OM1754 JRAU KNPA013-08 KNPA013-08 Sapotaceae Mimusops caffra E.Mey. ex A.DC. Kaffir Bulletwood OM2472 JRAU SAFH665-10 SAFH665-10 Sapotaceae Mimusops caffra E.Mey. ex A.DC. Kaffir Bulletwood OM1554 JRAU KNPA956-09 KNPA956-09 Sapotaceae Mimusops obovata Sond. Bush red-milkwood OM3233 JRAU SAFH2364-11 SAFH2364-11 Sapotaceae Mimusops zeyheri Sond. Milkwood OM1943 JRAU KNPA1080-09 KNPA1080-09 Sapotaceae Mimusops zeyheri Sond. Milkwood OM_MvdB50 JRAU KNPA779-09 − Sapotaceae Mimusops obtusifolia Lam. Milkwood OM2627 JRAU SAFH1548-11 SAFH1548-11 Sapotaceae Sideroxylon inerme L. White Milkwood AM0232 JRAU SAFH1902-11 SAFH1902-11 Sapotaceae Sideroxylon inerme L. White Milkwood BS0117 JRAU SAFH2001-11 SAFH2001-11 Sapotaceae Sideroxylon inerme L. White Milkwood OM1760 JRAU KNPA019-08 KNPA019-08 Sapotaceae Sideroxylon inerme L. White Milkwood OM266 JRAU KNPA117-08 KNPA117-08 Sapotaceae Sideroxylon inerme L. White Milkwood RL1144 JRAU SAFH972-10 SAFH972-10 Lecythidaceae Barringtonia racemosa (L.) Spreng. Putat OM1830 JRAU KNPA071-08 KNPA071-08

Page | 51

APG III Family Scientific Name Common/Trade Name Voucher No. Herbarium Accession No. Accession No. matK rbcLa Lecythidaceae Barringtonia racemosa (L.) Spreng. Putat OM2170 JRAU − SAFH322-10 Lecythidaceae Barringtonia racemosa (L.) Spreng. Putat OM3733 JRAU − SAFH4363-12 Pittosporaceae Pittosporum undulatum Vent. Cheesewood − − HM850708 HM850262 Pittosporaceae Pittosporum undulatum Vent. Cheesewood − − DQ133794 − Pittosporaceae Pittosporum undulatum Vent. Cheesewood − − AJ429374 − Pittosporaceae Pittosporum viridiflorum Sims Cape Pittosporum Abbott9133 JRAU TSA358-10 TSA358-10 Pittosporaceae Pittosporum viridiflorum Sims Cape Pittosporum OM1738 JRAU KNPA152-08 KNPA152-08 Pittosporaceae Pittosporum viridiflorum Sims Cape Pittosporum OM1784 JRAU KNPA034-08 KNPA034-08 Pittosporaceae Pittosporum undulatum Vent. Cheesewood OM2815 JRAU SAFH1697-11 SAFH1697-11 Cycadaceae Cycas thouarsii R.Br. Madagascar Cycad − − − − Stangeriaceae Stangeria eriopus (Kunze) Baill. Natal Grass Cycad PR706 JRAU − CYAF057-10 Zamiaceae Encephalartos aemulans Vorster Ngotshe Cycad PR861 JRAU − CYPAF309-11 Pinaceae Pinus pinaster Aiton Maritime − − − −

Page | 52

Table 2-4 DNA barcoding primers used to amplify the core DNA barcoding sequences

Locus (CCDB) Primer Sequence (5´ – 3´) Reference Primer matK matK 390 CGATCTATTCATTCAATATTC Cuénoud et al. 2002

matK 1326 TCTAGCACACGAAAGTCGAAGT Cuénoud et al. 2002

1R Kim-f ACCCAGTCCATCTGGAAATCTTGGTTC Ki-Joong Kim, pers. comm.

3F-Kim-r CGTACAGTACTTTTGTGTTTACGAG Ki-Joong Kim, pers. comm.

rbcLa rbcLa-F ATGTCACCACAAACAGAGACTAAAGC Levin et al. 2003

rbcLa-R GTAAAATCAAGTCCACCYCG Kress and Erickson 2009

rbcLajf634R GAAACGGTCTCTCCAACGCAT Fazekas et al. 2008

Page | 53

a) b) c)

d) e)

Figure 2-1 Covers of scientific literature used for morphological identification. a) Field guide to trees of southern Africa (Van Wyk and Van Wyk 1997), b) Flora Zambesiaca (Launert 1978), c) Palgrave’s trees of southern Africa (Coates Palgrave et al. 2002), d) Pooley’s trees of eastern South Africa a complete guide ( 2010), and Trees and shrubs of the Mpumalanga and Kruger national park (Schimdt et al. 2002).

Page | 54

Figure 2-2 Specimens collection localities spanning across several provinces in South Africa and several countries within southern Africa.

Page | 55

Figure 2-3 Illustration of the DNA barcoding workflow undertaken from the specimen collection to the uploaded barcodes on BOLD Systems as recommended by iBOL. A critical component not illustrated here are the voucher specimens for which electronic scans are available on the database.

Page | 56

a) a b)

c)

Figure 2-4 Illustration of the liberal (a-b) and strict (c-d) criteria of assignment based on location of query sequence on the tree. A) “Query” is sister to “A. species” a clade comprising of a single taxon, therefore, query is positive for the clade taxon. B) “Query” is sister to both “A. species” and “B. species” the assignment is ambiguous. C) “Query” is nested in a clade comprising of one taxon, hence, is positive for the taxon.

a) b) c)

Figure 2-5 Illustration of Query 6 Brachystegia spiciformis gradual cell disruption stages before exposure to lysis buffer CTAB. a) Collected sample preserved dry in silica. b) Slow drilled timber flakes to prevent overheating. c) Mortar and pestles pulverised timber flakes ready for exposure to lysis buffer.

Page | 57

Chapter 3

3. Results

3.1 Summary statistics

A total of 187 traded and protected trees specimens were collected (including lookalikes) for this study, representing 81 species within 48 genera. A further 62 sequences for matK and rbcLa datasets were downloaded from GenBank (refer to table

2-3). The species coverage in this study represents 20.2% of the traded trees species on the African continent. The proportion increases if restricted to trees of southern Africa, for instance 93% of the South African protected trees and 100% of the main commercial species of the Zambézia listed in the final report of forest governance of the Zambézia,

Mozambique (Mackenzie 2006) are included in this study.

DNA barcodes were effectively amplified from all the samples in the study using the primers reported in table 2-4. In the produced dataset, each species was represented on average by two specimens. However, 25 species were represented by a single specimen (refer to table 3.1). The matK dataset is comprised of 223 specimens with an alignment length of 858 bp; the mean sequence length generated is 745 bp long ranging between a shortest sequence of 406 bp and a longest sequence of 795 bp in length. Composed of 217 specimens and an alignment length of 552 bp, the rbcLa dataset mean sequence length is 548 bp ranging between 478 to 552 bp. Lastly, the combined matK and rbcLa dataset resulted in aligned sequence length of 1 410 bp and a mean of 1 144 bp. A default threshold of 500 bp was assigned as the minimum length

Page | 58 for a DNA barcode. In the matK, rbcLa and combined datasets, four, one and two specimens have sequences of a shorter length than the threshold, respectively. Table 3-2 provides a further summary of barcode statistics and table 2-3 provides GenBank accession numbers for all sequences used in this study.

3.2 Genetic divergence and barcode gap analyses

Across all three data partitions genetic divergence was generally lower within species than between species. For example, 94%, 97% and 94% of total intraspecific variation is less than 1% K2P distance in the matK, rbcLa and combined dataset respectively.

While 49%, 27% and 39% of the total smallest interspecific distances to the closest non- conspecific neighbour are above 1% K2P distance. The proportion of intraspecific values above 1% and 2% was recorded. From the three datasets, 5.8% in the matK and combined dataset were above 1% (intraspecific value). When further examined at 2%

(intraspecific value) matK dataset proportion is reduced to 4.5% while the combined dataset is reduced to 0%. The proportion of intraspecific variation in the rbcLa dataset is reduced from 2.8% to 1.8% for the two cut off at 1% and 2% (intraspecific value) respectively. A graphical representation of the distance data is shown in the majority rule phylogram presented as figure 3-1. This shows interrelated clusters formations for the majority of taxa.

The mean K2P distance to the closest non-conspecific across all dataset is on average 10 folds the mean intraspecific distance and figure 3-2 highlight this difference.

Figures 3-3 to 3-5 are graphical representation of the barcode gap. These figures

Page | 59 illustrate the difference between the maximum intraspecific distance and the minimum interspecific distance. Of the 223 specimens in the matK dataset, 24% displayed no barcoding gap, while, 34% of the 217 specimens in the rbcLa dataset also displayed no barcoding gap. Lastly 16% of the 191 specimens in the combined dataset also displayed no barcode gap (refer to table 3-3). Hence, most of the specimens in all three datasets display a significant barcoding gap.

3.3 DNA barcode identification success rates using distance-based analysis

In this study the nearest neighbour criterion was the best performing distance-based identification parameter. When applied on the different datasets, for 88% (on the matK and rbcLa datasets) and 90% (on the combined datasets) of the query specimens the nearest neighbour criterion (k-NN) returned a closest individual to the target specimens that shared the same species index with the target. The introduction of singletons

(species represented by a single specimen) expectedly reduced the successful identification rate across all three dataset. The most affected dataset is the combined dataset, which suffered a 17% decreased in identification success.

The BOLD 1% threshold criterion of identification is overall the worst performing identification parameter. For the bulk of the specimens in the different datasets, the matches returned that were within to the default threshold comprised of both correct and incorrect species. Therefore, the ambiguous logical vector is returned.

For matK, rbcLa and combined dataset only 53%, 32% and 38% respectively of the

Page | 60 queries have all the specimens within the default threshold of 1% yielding a correct identification. The latter values decrease for the matK, and rbcLa dataset to 45% and

27% respectively. However a 1% increase in successful identification rate is recorded for the combined dataset.

Meier’s best close match (BCM) criterion on average yielded the second highest identification success rates. For 78%, 73% and 85% of the specimens in the matK, rbcLa and combined datasets respectively, the closest individual to the query within the default threshold of 1% shared the same species index as the query specimen. However, once again the introduction of singletons in the datasets consistently reduced the successful identification rate by 5% for matK, 7% for rbcLa and finally 6% for the combined dataset. A breakdown of the identification success rate for each method and for each dataset is presented in table 3-4.

3.4 Cumulative error and threshold optimization

At the default 1% threshold used by BOLD the sum of numbers of false positive (no conspecific matches within the threshold of query) and the numbers of false negative

(non-conspecific species within threshold distance) resulted in an overall 88 cumulative errors when singletons are excluded and 129 when included for the matK dataset (see table 3-5). By applying the function “threshOpt” systematically the optimum threshold values of 0.1% with singletons excluded and 0.4% with singletons included are identified as the best performing thresholds for matK dataset (see figure 3-6 “a” and

“b”).

Page | 61

For the rbcLa dataset the default threshold value of 1% used by BOLD brought about 161 and 129 cumulative errors when singletons are included and excluded respectively. These values represent 74% and 68% of the relative rbcLa datasets. In both instances of having singletons included or excluded the optimum threshold of 0.1% is identified as the best threshold. The latter threshold value reduces the number of cumulative errors by 50% for the dataset including singletons and by 48% for the dataset when singletons are excluded (see figure 3-6 “c” and “d”).

At the default BOLD threshold of 1%, cumulative errors in the combined dataset amount to 122 with singletons included and 98 when excluded. Figure 3-6 “e” and “f” show the isolated optimum thresholds for the combined datasets subjected to the two different scenarios. When singletons are included, the threshold with the least cumulative errors is recorded at 0.1%, reducing the cumulative error percentage by

48%. Meanwhile when singletons are excluded the least amount of cumulative error is noted at a 0.2% threshold value, also reducing cumulative errors by 46%.

3.5 DNA barcode identification success rates using tree-based analysis

Using the species monophyly criterion on neighbour joining trees assembled from the different datasets, the “NJ mono” tree-based identification criterion identified 74% of specimen within clades with the same number of tips as there are species members to be matches in the matK dataset when singletons are excluded. In contrast to the distance

Page | 62 based analysis the inclusion of singletons improved the performance of the tree based identification NJ mono by 4%. The “NJ mono boot” tree-based identification reports that, at a minimum bootstrap value of 70% and 1 000 reps 72% of the specimens within a clade with the same number of tips as they are species member matched, with the latter occurring in the absence of singletons. The inclusion of singletons once more resulted in the increase of the identification success from 72% to 74% in the matK dataset.

The pattern observed in the previous dataset is repeated throughout the rest

(rbcLa and combined dataset) see table 3-6. For both “NJ mono” and “NJ mono boot” criterions a 70% successful identification in the rbcLa dataset is observed when singletons are excluded followed by a 1% increase in identification success with the inclusion of singletons. On the other hand, the combined dataset shows a significant upsurge of the identification success rate. 82% and 77% successful identification are recorded for the “NJ mono” and the “NJ mono boot” respectively, and the latter figure escalate to 85% and 80% with the inclusion of singletons.

3.6 DNA barcode query assignment

Of the six timber log query samples collected adequate quality and quantity DNA was extracted using the CTAB method. On average the samples contained 115.3 µg/ml of

DNA and all displayed a purity reading above the recommended 1.8260/280. A 100% successful amplification of both core barcoding regions matK and rbcLa was reached.

Figure 3-7 illustrates the generated DNA barcodes from the timber log material and

Page | 63 their alignments to the respective local reference datasets. Tables 3-7 and 3-8 displays summary statistic of the generated barcodes for both markers. With the exception of query 1 and query 2 all the matK sequence length exceeded the average respective sequence length of the dataset. As for rbcLa, only query 1 failed to exceed the dataset’s average sequence length.

The rbcLa marker accurately identified five of the six query sequences on

BOLD database. However, the same marker failed to convincingly assign a positive identification to four of the six queries tested. The matK marker provided satisfactory identifications for four of the six queries in the BOLD database but also poorly performed in the GenBank dataset; this is summarized in table 3-9. Table 3-10 displays the BLAST result on universal databases. For every submitted query sequence all sequence match returned carrying equal highest bit-scores are reported. The tree based identification analysis scored an 83% rate after successfully identifying five of the six- query specimen (refer to table 3-9). Figures (3-9, 3-10, 3-11, and 3-12) show query 1 to

5 forming distinct monophyletic clusters, while query 6 is either nesting in an unexpected clade (rbcLa dataset) or is sister to two different species of Brachystegia.

According to the Rosenberg’s probability of reciprocal monophyly most of the identification could be due to random assortment. However, in the combined NJ analysis queries 1 and 2 forms a significant monophyletic cluster with Millettia stuhlmannii.

Page | 64

Table 3-1 List of species represented by a single individual in the respective datasets

Singletons matK dataset rbcLa dataset Combined dataset Baikiaea plurijuga M660 Baikiaea plurijuga M660 Cycas thouarsii Genbank Balanites aegyptica OM3548 Balanites aegyptica OM3548 Stangeria eriopus PR706 Balanites pedicillaris OM901 Balanites pedicillaris OM901 Encephalartos aemulans PR861 Barringtonia racemosa OM1830 Bobgunnia madagascariensis OM3566 Pinus pinaster Genbank Bobgunnia madagascariensis OM3566 Boscia angustifolia RBN2681 Podocarpus elongatus Genbank Cassia afrofistula OM2629 Brachystegia boehmii OM3534 Podocarpus henkelii Genbank Cassine matabelicum DQ217537 Brachystegia spiciformis OM2040 Podocarpus latifolius Genbank Catha transvaalensis DQ217548 Cassia afrofistula OM2629 Widdringtonia schwarzii Genbank Ceriops tagal AF105089 Catha abbottii Abbott9242 Warburgia salutaris OM1853 Cleistanthus polystachyus FJ439971.1 Cycas thouarsii Genbank Ocotea bullata Abbott9194 Cleistanthus schlechteri OM2539 Diospyros abyssinica EU980646 Leucadendron argenteum OM2263 Cycas thouarsii Genbank Diospyros inhacaensis OM2225 Balanites aegyptica OM3548 Cylicodiscus gabunensis AF521819 Encephalartos aemulans PR861 Balanites pedicillaris OM901 Diospyros inhacaensis OM2225 Erythrophleum africanum OM2537 Androstachys johnsonii Ekebergia capensis OM1540 Erythrophleum suaveolens OM2674 Baikiaea plurijuga M660 Encephalartos aemulans PR861 Euclea pseudebenus MWC21190 Balanites aegyptica OM3548 Erythrophleum africanum OM2537 Guibourtia coleosperma OM2116 Balanites pedicillaris OM901 Euclea pseudebenus MWC21190 Leucadendron argenteum OM2263 Barringtonia racemosa OM1830 Bobgunnia madagascariensis Ficus rokko OM2249 Millettia mossambicense OM2335 OM3566 Ficus trichopoda OM1817 Mimusops obovata OM3233 Boscia angustifolia RBN268.1

Page | 65

Singletons matK dataset rbcLa dataset Combined dataset Guibourtia pellegriniana EU361964 Mimusops obtusifolia OM2627 Brachystegia boehmii OM3534 Guibourtia tessmannii EU361965 Pinus pinaster Genbank Brachystegia spiciformis OM2040 Leucadendron argenteum OM2263 Stangeria eriopus PR706 Cassia afrofistula OM2629 Microberlinia brazzavillensis EU362003 Vachellia erioloba RL1298 Catha abbottii Abbott9242 Millettia mossambicense OM2335 Vachellia haematoxylon OM1069 Ceriops tagal Mimusops obovata OM3233 Warburgia salutaris OM1853 Cleistanthus schlechteri OM2539 Mimusops obtusifolia OM2627 Cycas thouarsii Newtonia buchananii Genbank Diospyros abyssinica Newtonia hildebrandtii AF521848 Diospyros inhacaensis OM2225 Ocotea bullata Abbott9194 Encephalartos aemulans PR861 Pinus pinaster Genbank Erythrophleum africanum OM2537 Podocarpus elongatus HM593746.1 Erythrophleum suaveolens OM2674 Podocarpus henkelii HM593751 Euclea pseudebenus MWC21190 Podocarpus latifolius HM593754 Ficus trichopoda OM1817 Securidaca longipedunculata OM3358 Guibourtia coleosperma OM2116 Stangeria eriopus PR706 Leucadendron argenteum OM2263 Warburgia salutaris OM1853 Millettia mossambicense OM2335

Page | 66

Table 3-2 Summary of descriptive barcode statistics for the three data partitions analysed in the study.

Statistic matK rbcLa Combined Individuals 223 217 249 Sequences alignment length 858 552 1410 Genera 55 53 56 Species (no. unique sp.) 99 90 101 Mean individuals per sp. (range) 2 (1-6) 2(1-7) 2 (1-8) Median 2 2 2 Mean seq. length bp (range) 745 (406-795) 548(478-552) 1144 (406-1347) Median 765 552 1306 No. barcodes <500 bp 4 1 2 Prop. intraspecific dist. >1% 5,8% 2,8 5,8 Prop. intraspecific dist. >2% 4,5% 1,8 0 Mean (%) intraspecific dist. (range) 0.25 ( 0.00- 2.65) 0.13 ( 0.00-2.59) 0.16 ( 0.00-1.74) Mean smallest (%) interspecific dist. (range) 2.45 ( 0.00-31.07) 1.12 ( 0.00-7.52) 2.04 ( 0.00-20.58) Abbreviations: no. = number; seq. = sequence; sp. = species prop. = proportion. Combined refers to a data partition comprising of a combination of the generated matK and rbcLa data. Ranges are presented in parentheses.

Page | 67

Table 3-3 List of species per dataset displaying zero or no barcoding gap

inter-intra <= 0 matK dataset rbcLa dataset Combined dataset Boscia albitrunca OM1256 Boscia albitrunca OM1256 Boscia albitrunca OM1256 Boscia albitrunca OM1274 Boscia albitrunca OM1274 Boscia albitrunca OM1274 Boscia albitrunca OM312 Boscia albitrunca OM312 Boscia albitrunca OM312 Boscia salicifolia OM2404 Boscia angustifolia RBN268.1 Boscia salicifolia OM2404 Boscia salicifolia OM2543 Boscia mossambicensis OM250 Boscia salicifolia OM2543 Brachystegia boehmii EU361886 Boscia mossambicensis RL1212 Brachystegia boehmii OM3534 Brachystegia boehmii OM3534 Boscia mossambicensis YBK177 Brachystegia spiciformis OM2040 Brachystegia spiciformis EU361888.1 Boscia salicifolia OM2404 Cassine transvaalense OM1229 Brachystegia spiciformis OM2040 Boscia salicifolia OM2543 Cassine transvaalense OM403 Cassine croceum Abbott9197 Brachystegia boehmii OM3534 Catha edulis OM1866 Cassine croceum OM3179 Brachystegia spiciformis OM2040 Catha edulis OM482 Cassine croceum OM3778 Catha abbottii Abbott9242 Combretum apiculatum OM1068 Cassine transvaalense OM1229 Catha edulis OM1866 Combretum apiculatum OM2066 Cassine transvaalense OM241 Catha edulis OM482 Combretum apiculatum OM2406 Cassine transvaalense OM403 Combretum apiculatum OM1068 Combretum apiculatum OM3522 Catha edulis OM1866 Combretum apiculatum OM2066 Combretum apiculatum RL1100 Catha edulis OM482 Combretum apiculatum OM2406 Combretum molle OM1526 Combretum apiculatum OM1068 Combretum apiculatum OM3522 Combretum molle OM3553 Combretum apiculatum OM2066 Combretum apiculatum RL1100 Euclea natalensis OM211 Combretum apiculatum OM2406 Combretum molle OM1526 Euclea natalensis OM3239 Combretum apiculatum OM3522 Combretum molle OM3553 Euclea natalensis RL1166

Page | 68

inter-intra <= 0 matK dataset rbcLa dataset Combined dataset Combretum apiculatum RL1100 Euclea divinorum DQ924074 Ficus thonningii OM1576 Combretum molle OM1526 Euclea divinorum OM1102 Pterocarpus brenanii Combretum molle OM3553 Euclea natalensis OM211 Pterocarpus brenanii OM2510 Diospyros abyssinica DQ923990 Euclea natalensis OM3239 Pterocarpus rotundifolius OM3359 Euclea natalensis OM3239 Euclea natalensis RL1166 Pterocarpus rotundifolius OM418 Ficus rokko OM2249 Ficus thonningii OM1576 Pterocarpus rotundifolius RBN174 Ficus thonningii MWC20247 Ficus thonningii OM1850 Pterocarpus rotundifolius RL1105 Ficus thonningii OM1576 Ficus thonningii OM2754 Pterocarpus rotundifolius RL1181 Ficus thonningii OM1850 Ficus thonningii OM2763 Vachellia erioloba RL1298 Ficus thonningii OM542 Ficus thonningii OM542 Vachellia haematoxylon OM1069 Ficus thonningii OM972 Ficus thonningii OM972 Widdringtonia nodiflora Ficus thonningii RL1487 Ficus thonningii RL1487 Guibourtia coleosperma EU361962 Ficus trichopoda OM1817 Guibourtia coleosperma OM2116 Ficus trichopoda OM3274 Guibourtia tessmannii EU361965 Ficus trichopoda OM3674 Mimusops obovata OM3233 Julbernardia globiflora OM2517 Mimusops obtusifolia OM2627 Julbernardia globiflora OM2705 Podocarpus elongatus HM593746.1 Mimusops caffra OM1554 Podocarpus latifolius HM593754 Mimusops caffra OM1754 Pterocarpus brenanii JN083540.1 Mimusops caffra OM2472 Pterocarpus brenanii OM2510 Mimusops obovata OM3233 Pterocarpus rotundifolius OM3359 Philenoptera bussei OM2376 Pterocarpus rotundifolius OM418 Philenoptera bussei OM2527

Page | 69

inter-intra <= 0 matK dataset rbcLa dataset Combined dataset Pterocarpus rotundifolius RBN174 Philenoptera violacea OM242 Pterocarpus rotundifolius RL1105 Philenoptera violacea OM3542 Pterocarpus rotundifolius RL1181 Philenoptera violacea RL1123 Vachellia erioloba Genbank Philenoptera violacea RL1123.1 Vachellia erioloba RL1298 Pittosporum undulatum HM850708 Vachellia haematoxylon OM1069 Pittosporum undulatum OM2815 Widdringtonia nodiflora JF725830.1 Pittosporum viridiflorum Abbott9133 Widdringtonia nodiflora AF152218 Pittosporum viridiflorum OM1738 Widdringtonia nodiflora AY988364.1 Pittosporum viridiflorum OM1784 Podocarpus latifolius JF969703.1 Prunus africana HQ235347.1 Prunus africana HQ235348.1 Prunus africana HQ235349.1 Prunus africana OM1568 Prunus africana YM02 Pterocarpus brenanii JN083718.1 Pterocarpus brenanii OM2510 Pterocarpus rotundifolius OM3359 Pterocarpus rotundifolius OM418 Pterocarpus rotundifolius RBN174 Pterocarpus rotundifolius RL1105 Pterocarpus rotundifolius RL1181 Vachellia erioloba RL1298

Page | 70

inter-intra <= 0 matK dataset rbcLa dataset Combined dataset Vachellia haematoxylon OM1069 Widdringtonia nodiflora AY988266 Widdringtonia nodiflora JF725930.1 Widdringtonia nodiflora OM2271 Widdringtonia schwarzii JF725943.1 Widdringtonia schwarzii OM2272

Page | 71

Table 3-4 Identification success rates for each primary analytical method across three data partitions (with singletons both included and excluded from results).

Measure Singletons matK (%) rbcLa (%) Combined (%) NJ mono. excl. 74 70 82 incl. 78 71 85 NJ mono. boot. excl. 72 70 77 incl. 74 71 80 k -NN (k=1) excl. 88 88 90 incl. 73 75 73 BOLD : 1% thresh. excl. 53 32 38 incl. 45 27 39 BOLD : opt. thresh. excl. 65 65 71 incl. 68 66 71 BCM: 1% thresh. excl. 78 73 85 incl. 73 66 79 BCM: opt. thresh. excl. 78 74 81 incl. 73 66 78 Abbreviations: BCM = “best close match”; boot = bootstrap (>70%); excl. = excluded; incl. = included; mono. = monophyly; opt. = optimum; thresh. = threshold; NJ = neighbour joining; k-NN = nearest neighbour

Table 3-5 A comparison of the default 1% threshold performance to the pre- determined optimum threshold across all three local databases.

Datasets singletons Threshold True - True + False - False + Cumulative error matK incl. 0.01 16 78 123 6 129 0.001* 31 114 36 42 78 excl. 0.01 0 98 82 6 88 0.004* 0 121 54 11 65 rbcLa incl. 0.01 7 49 158 3 161 0.001* 19 117 60 21 81 excl. 0.01 0 62 126 3 129 0.001* 0 124 45 22 67 Combined incl. 0.01 18 51 117 5 122 0.001* 29 103 39 20 59 excl. 0.01 0 60 92 6 98 0.002* 1 112 31 14 45 *= Optimum threshold, excl. = excluded; incl. = included, + = positive, - = negative

Page | 72

Table 3-6 Detailed identification success rates for all four parameters on the three different dataset (singletons included or excluded). matK Nearest neighbour (K-NN) TRUE FALSE Total # specimens % Success incl. 163 60 223 73 excl. 164 22 186 88 BOLD 1% thresh Ambiguous Correct Incorrect No ID Total # specimens incl. 101 78 22 22 0.01 223 45 excl. 81 98 1 6 0.01 186 53 BOLD 1% threshOpt Ambiguous Correct Incorrect No ID Treshold Total # specimens incl. 26 114 10 73 0.001 223 68 excl. 53 121 1 11 0.004 186 65 Best Close Match Ambiguous Correct Incorrect No ID Total # specimens incl. 30 140 31 22 0.01 223 73 excl. 25 146 9 6 0.01 186 78 Best Close Match Opt. Ambiguous Correct Incorrect No ID Treshold Total # specimens incl. 30 140 31 22 0.01097358 223 73 excl. 25 146 9 6 0.01180586 186 78 NJ Mono TRUE FALSE Total # specimens incl. 175 48 223 78 excl. 137 49 186 74

Page | 73

NJ Mono Boot TRUE FALSE Treshold Total # specimens incl. 123 100 223 55 excl. 100 86 186 54 rbcLa Nearest neighbour (K-NN) TRUE FALSE Total # specimens incl. 163 54 217 75 excl. 168 23 191 88 BOLD 1% thresh Ambiguous Correct Incorrect No ID Treshold Total # specimens incl. 139 49 19 10 0.01 217 27 excl. 126 62 0 3 0.01 191 32 BOLD 1% threshOpt Ambiguous Correct Incorrect No ID Treshold Total # specimens incl. 49 117 11 40 0.001 217 66 excl. 42 124 3 22 0.001 191 65 Best Close Match Ambiguous Correct Incorrect No ID Treshold Total # specimens incl. 52 130 21 14 0.01 217 66 excl. 44 139 5 3 0.01 191 73 Best Close Match Opt. Ambiguous Correct Incorrect No ID Treshold Total # specimens incl. 52 130 21 14 0.008680157 217 66 excl. 44 141 5 1 0.02257339 191 74 NJ Mono TRUE FALSE Total # specimens incl. 153 64 217 71

Page | 74 excl. 133 58 191 70 NJ Mono Boot TRUE FALSE Total # specimens incl. 134 83 217 62 excl. 112 79 191 59 Combined Nearest neighbour (K- NN) TRUE FALSE Total # specimens incl. 140 51 191 73 excl. 142 16 158 90 BOLD 1% thresh Ambiguous Correct Incorrect No ID Treshold Total # specimens incl. 97 51 20 23 0.01 191 39 excl. 88 60 4 6 0.01 158 38 BOLD 1% threshOpt Ambiguous Correct Incorrect No ID Treshold Total # specimens incl. 31 103 8 49 0.001 191 71 excl. 28 112 3 15 0.002 158 71 Best Close Match Ambiguous Correct Incorrect No ID Treshold Total # specimens incl. 14 127 27 23 0.01 191 79 excl. 14 128 10 6 0.01 158 85 Best Close Match Opt. Ambiguous Correct Incorrect No ID Treshold Total # specimens incl. 13 127 26 25 0.00822 191 80 excl. 13 128 9 8 0.00749 158 81 NJ Mono TRUE FALSE Total # specimens

Page | 75 incl. 162 29 191 85 excl. 129 29 158 82 NJ Mono Boot TRUE FALSE Total # specimens incl. 153 38 191 80 excl. 121 37 158 77

Page | 76

Table 3-7 matK base composition percentage and successfully generated query sequence length.

matK-Query Base composition Sequence length a c g t Query 1 0.308 0.149 0.136 0.407 685 Query 2 0.31 0.147 0.135 0.408 768 Query 3 0.307 0.176 0.149 0.367 743 Query 4 0.284 0.187 0.167 0.362 760 Query 5 0.295 0.163 0.152 0.39 765 Query 6 0.302 0.177 0.144 0.377 764

Table 3-8 rbcLa base composition percentage and successfully generated query sequence length.

rbcLa-Query Base composition Sequence length a c g t Query 1 0.274 0.208 0.222 0.296 504 Query 2 0.279 0.194 0.223 0.304 552 Query 3 0.264 0.21 0.23 0.295 552 Query 4 0.261 0.217 0.232 0.29 552 Query 5 0.268 0.203 0.225 0.304 552 Query 6 0.261 0.216 0.232 0.292 552

Page | 77

Table 3-9 Query re-identification performances on universal database and local database.

Query Scientific name Genbank BLAST BOLD BLAST NJ+BP BL matK rbcLa matK rbcLa matK rbcLa Combined Combined 1 Millettia stuhlmannii No No Yes Yes Yes Yes Yes Yes

2 Millettia stuhlmannii No No Yes Yes Yes Yes Yes Yes

3 Afzelia quanzensis No No No No Yes Yes Yes Yes

4 Berchemia discolour Yes Yes Yes Yes Yes Yes Yes Yes

5 Cordyla Africana Yes Yes Yes Yes Yes Yes Yes Yes

6 Brachystegia spiciformis Yes No No Yes No No No No

ID% 50 33 67 83 83 83 83 83 Abbreviations: NJ = Neighbour joining, BP = Bootstrap percentages, BL = Bayesian Likelihood.

Page | 78

Table 3-10 Top hits (highest Bit-scores) returned for query sequences identification using BLAST method on Genbank and BOLD.

Query Sample Dataset Match(es) Bit-Score % Identical Sites E-value % Query Coverage Query 1: GenBank matK Leptoderris hypargyrea JX06610.1 1201 98 0 88 Millettia stuhlmannii rbcLa Philenoptera violaceae JF265547.1 902 99 0 91 BOLD matK Millettia stuhlmannii "Unpublished" 681 99,71 0 - Millettia stuhlmannii "Early release" 681 99,71 0 - Millettia stuhlmannii "Unpublished" 681 99,71 0 - rbcLa Millettia stuhlmannii "Unpublished" 503 99,8 0 - Millettia stuhlmannii "Unpublished" 503 99,8 0 - Millettia stuhlmannii "Unpublished" 503 99,8 0 - Query 2: GenBank matK Leptoderris hypargyrea JX506610.1 1349 98 0 100 Millettia stuhlmannii rbcLa Philenoptera violaceae JF265547.1 992 99 0 100 BOLD matK Millettia stuhlmannii "Unpublished" 768 100 0 - Millettia stuhlmannii "Unpublished" 768 100 0 - rbcLa Millettia stuhlmannii "Unpublished" 552 100 0 - Millettia stuhlmannii "Unpublished" 552 100 0 - Millettia stuhlmannii "Unpublished" 552 100 0 - Query 3: GenBank matK Afzelia bella KC627408.1 1373 100 0 97 Afzelia Afzelia quanzensis JF270629.1 1373 100 0 97 quanzensis Intsia bijuga EU361981.1 1373 100 0 97

Page | 79

Afzelia bipindensis EU361847.1 1373 100 0 97 Afzelia bella EU361846.1 1373 100 0 97 rbcLa Afzelia bella KC628648.1 1009 99 0 100 Afzelia bella KC628574.1 1009 99 0 100 Afzelia bella KC627950.1 1009 99 0 100 Afzelia quanzensis JF265273.1 1009 99 0 100 BOLD matK Afzelia quanzensis "Unpublished" 743 100 0 - Intsia bijuga "Unpublished" 743 100 0 - Intsia bijuga "Unpublished" 743 100 0 - Afzelia quanzensis "Unpublished" 743 100 0 - Afzelia quanzensis "published" 743 100 0 - Afzelia bella "published" 743 100 0 - Instia bijuga "published" 743 100 0 - Afzelia bipindensis "published" 743 100 0 - Afzelia pachyloba "Unpublished" 743 100 0 - Afzelia bella "published" 743 100 0 - Afzelia bella "early release " 743 100 0 - Afzelia africana "Unpublished" 743 100 0 - Afzelia bella "early release " 743 100 0 - rbcLa Afzelia quanzensis "Unpublished" 550 99,82 0 - Afzelia bella "early release " 550 99,82 0 - Afzelia bella "early release " 550 99,82 0 -

Page | 80

Afzelia quanzensis "Unpublished" 550 99,82 0 - Afzelia africana "Unpublished" 550 99,82 0 - Query 4: GenBank matK Berchemia discolor JF270655.1 1397 99 0 99 Berchemia discolor rbcLa Berchemia discolor JF265302.1 1020 100 0 100 BOLD matK Berchemia discolor "Unpublished" 755 100 0 - Berchemia discolor "published" 755 100 0 - rbcLa Berchemia discolor "Unpublished" 552 100 0 - Berchemia discolor "early-release" 552 100 0 - Berchemia discolor "published" 552 100 0 - Query 5: GenBank matK Cordyla africana JF270724.1 1413 100 0 100 Cordyla africana rbcLa Cordyla africana JF265371.1 1020 100 0 100 BOLD matK Cordyla africana "published" 765 100 0 - Cordyla africana "Unpublished" 765 100 0 - rbcLa Cordyla africana "Unpublished" 552 100 0 - Cordyla africana "Unpublished" 552 100 0 - Cordyla africana "early-release" 552 100 0 - Cordyla africana "published" 552 100 0 - Cordyla africana "Unpublished" 552 100 0 - Query 6: GenBank matK Brachystegia spiciformis EU361888.1 1404 99 0 100 Brachystegia rbcLa Berlinia sp. KC628668.1 1014 99 0 100 spiciformis

Page | 81

Anthonotha fragans KC628635.1 1014 99 0 100 Anthonotha sp. KC628585.1 1014 99 0 100 Berlinia hollandii KC628547.1 1014 99 0 100 Berlinia hollandii KC628537.1 1014 99 0 100 Anthonotha fragans KC628532.1 1014 99 0 100 Gilbertiodendron demonstrans KC628526.1 1014 99 0 100 Anthonotha fragans KC628515.1 1014 99 0 100 Anthonotha macrophylla KC628430.1 1014 99 0 100 Berlinia auriculata KC628395.1 1014 99 0 100 Berlinia auriculata KC628285.1 1014 99 0 100 Gilbertiodendron demonstrans KC628266.1 1014 99 0 100 Gilbertiodendron sp. KC628147.1 1014 99 0 100 Anthonotha macrophylla KC628145.1 1014 99 0 100 Anthonotha macrophylla KC628044.1 1014 99 0 100 Anthonotha fragans KC628021.1 1014 99 0 100 Amherstia nobilis KC470025.1 1014 99 0 100 Humboldtia brunonis JX163310.1 1014 99 0 100 Aphanocalyx cynometroides AM234241.1 1014 99 0 100 Amherstia nobilis AM23234.1 1014 99 0 100 BOLD matK Brachystegia boehmii "early-release" 761 99,74 0 - Brachystegia spiciformis "published" 761 99,74 0 -

Page | 82 rbcLa Brachystegia spiciformis "Unpublished" 552 100 0 - Brachystegia spiciformis "Unpublished" 552 100 0 -

Page | 83

Pterocarpus brenanii Genbank Pterocarpus brenanii OM2510 1.0 Pterocarpus rotundifolius OM418 Pterocarpus rotundifolius RL1181 1.0 Pterocarpus rotundifolius OM3359 Pterocarpus rotundifolius RL1105 Pterocarpus rotundifolius RBN174 0.92 Pterocarpus angolensis OM2717 0.68 Pterocarpus angolensis OM3312 0.62 Pterocarpus angolensis OM3587 1.0 Pterocarpus angolensis OM490 Pterocarpus angolensis OM1139 0.51 Dalbergia melanoxylon OM268 1.0 Dalbergia melanoxylon OM984 1.0 Dalbergia melanoxylon OM2394 1.0 Dalbergia boehmii OM2420 Dalbergia boehmii OM2452 Dalbergia boehmii OM2532 Philenoptera violacea OM242 0.99 1.0 Philenoptera violacea RL1123 Papilionoideae 1.0 Philenoptera violacea RL1123 1 Philenoptera violacea OM3542 1.0 Philenoptera bussei OM2376 Philenoptera bussei OM2527 Millettia stuhlmannii OM2322 1.0 1.0 Millettia stuhlmannii OM2522 0.97 Millettia stuhlmannii CS27 Millettia stuhlmannii OM3517 1.0 1.0 Millettia grandis Genbank 1.0 Millettia grandis OM1757 Millettia mossambicense OM2335 1.0 1.0 Millettia usaramensis OM2433 1.0 Millettia usaramensis OM2222 Millettia usaramensis OM1803 Bobgunnia madagascariensis OM3566 Cordyla africana OM1188 1.0 Cordyla africana OM1210 Cordyla africana OM2745 Vachellia erioloba Genbank 1.0 1.0 Vachellia erioloba RL1298 Vachellia haematoxylon Genbank 1.0 Vachellia haematoxylon OM1069 Fabaceae 1.0 Newtonia hildebrandtii Genbank 1.0 Newtonia buchananii Genbank Cylicodiscus gabunensis Genbank 1.0 Erythrophleum suaveolens Genbank 1.0 1.0 Erythrophleum suaveolens OM2674 Erythrophleum africanum OM2537 1.0 Cassia abbreviata OM1177 1.0 Cassia abbreviata OM2047 Cassia abbreviata OM235 Cassia afrofistula OM2629 0.93 Brachystegia boehmii OM3534 0.99 1.0 Brachystegia spiciformis OM2040 0.97 Brachystegia spiciformis Genbank Brachystegia boehmii Genbank 1.0 Microberlinia brazzavillensis Genbank 1.0 Julbernardia globiflora Genbank Julbernardia globiflora OM2517 Caesalpinioideae 1.0 Julbernardia globiflora OM2705 Afzelia quanzensis CS04 1.0 Afzelia quanzensis OM2085 Afzelia quanzensis OM2113 0.81 Afzelia quanzensis OM291 1.0 0.99 Guibourtia tessmannii Genbank Guibourtia coleosperma Genbank 1.0 Guibourtia coleosperma OM2116 1.0 1.0 Guibourtia pellegriniana Genbank 1.0 1.0 Guibourtia conjugata M662 Guibourtia conjugata OM1287 Baikiaea plurijuga M660 Colophospermum mopane OM778 1.0 Colophospermum mopane RL1558 Colophospermum mopane RL1611 1.0 Securidaca longipedunculata OM2580 1.0 Securidaca longipedunculata OM1965 Securidaca longipedunculata CS33 Polygalaceae Securidaca longipedunculata OM3358

Page | 84

Ficus rokko OM2249 Ficus trichopoda OM1817 Ficus trichopoda OM3274 1.0 Ficus trichopoda OM3674 Ficus thonningii OM1576 1.0 Ficus thonningii OM972 Ficus thonningii OM2763 Ficus thonningii OM2754 Moraceae Ficus thonningii OM542 1.0 Ficus thonningii RL1487 Ficus thonningii MWC20247 Ficus thonningii OM1850 1.0 Milicia excelsa OM2696 1.0 Milicia excelsa OM2749 Berchemia discolor OM1175 0.97 Berchemia discolor OM2437 Berchemia discolor OM267 1.0 Berchemia discolor OM3536 Rhamnaceae Berchemia zeyheri OM1165 1.0 Berchemia zeyheri OM600 1.0 Berchemia zeyheri OM3345 Prunus africana OM1568 Prunus africana YM02 1.0 Prunus africana GenbankA Prunus africana GenbankB 1.0 Prunus africana GenbankC Prunus africana GenbankD 1.0 Prunus serotina GenbankA Rosaceae 1.0 Prunus serotina GenbankB Prunus serotina GenbankC 1.0 1.0 Prunus persica GenbankB 0.99 Prunus persica GenbankC Prunus persica OM1899 Prunus persica GenbankA Androstachys johnsonii GenbankA 1.0 Androstachys johnsonii GenbankB Androstachys johnsonii OM3385 Picrodendraceae Androstachys johnsonii OM1912 0.99 Androstachys johnsonii RBN185 1 0.86 Cleistanthus schlechteri OM2539 0.59 1.0 Cleistanthus schlechteri OM2603 Cleistanthus polystachyus Genbank 0.95 Spirostachys africana OM254 Euphorbiaceae 1.0 Spirostachys africana OM990 1.0 Spirostachys africana OM2396 1.0 Bruguiera gymnorhiza OM2487 1.0 Bruguiera gymnorhiza GenbankA 1.0 Bruguiera gymnorhiza GenbankB Rhizophoraceae 1.0 Ceriops tagal GenbankA Ceriops tagal GenbankB 1.0 Cassine transvaalense OM403 0.89 Cassine transvaalense OM1229 Cassine transvaalense OM241 0.78 Cassine croceum Abbott9197 1.0 Cassine croceum OM3179 Cassine croceum OM3778 Celastraceae Cassine matabelicum Genbank 0.63 1.0 0.97 Catha abbottii Abbott9242 0.99 Catha abbottii Genbank 1.0 0.79 Catha edulis OM1866 1.0 Catha transvaalensis Genbank Catha edulis OM482 Balanites maughamii OM2096 Balanites aegyptica OM3548 1.0 Balanites maughamii OM994 Balanites maughamii OM223 Zygophyllaceae Balanites maughamii OM3412 Balanites pedicillaris OM901

Page | 85

Combretum apiculatum OM2066 Combretum apiculatum OM3522 0.75 Combretum apiculatum OM1068 Combretum apiculatum RL1100 1.0 Combretum molle OM3553 1.0 Combretum molle OM1526 Combretum apiculatum OM2406 Combretaceae 1.0 Combretum imberbe OM1012 1.0 Combretum imberbe OM2393 1.0 Lumnitzera racemosa OM1675 1.0 0.65 Lumnitzera racemosa OM2478 Lumnitzera racemosa Genbank Adansonia digitata OM1306 1.0 Adansonia digitata OM2740 Adansonia digitata OM3387 Malvaceae 0.65 Adansonia digitata OM747 0.89 Ekebergia capensis OM742 Ekebergia capensis OM2684 1.0 1.0 Entandrophragma caudatum OM3352 Meliaceae Entandrophragma caudatum OM794 1.0 Ekebergia capensis OM1540 0.58 Sclerocarya birrea caffra OM498 1.0 0.91 Sclerocarya birrea caffra RL1117 Anacardiaceae Sclerocarya birrea caffra OM278 0.99 Boscia mossambicensis OM250 Boscia mossambicensis RL1212 Boscia mossambicensis YBK177 0.85 Boscia angustifolia Genbank Boscia angustifolia OM2069 1.0 Boscia angustifolia RBN268 1 Capparaceae 1.0 0.86 Boscia albitrunca OM1274 Boscia salicifolia OM2543 Boscia albitrunca OM312 Boscia albitrunca OM1256 Boscia salicifolia OM2404 1.0 Diospyros inhacaensis OM2225 Diospyros abyssinica GenbankA 0.94 Diospyros abyssinica GenbankB 1.0 Diospyros squarrosa OM3485 1.0 Diospyros squarrosa Genbank 1.0 Diospyros mespiliformis OM218 1.0 Diospyros mespiliformis RL1273 Diospyros mespiliformis OM2764 Diospyros mespiliformis OM3493 Ebenaceae 1.0 Euclea divinorum GenbankA 1.0 Euclea divinorum GenbankB Euclea divinorum OM1102 1.0 Euclea divinorum OM227A 1.0 Euclea natalensis RL1166 1.0 Euclea natalensis OM211 Euclea natalensis OM3239 0.78 Euclea pseudebenus MWC21190 1.0 Mimusops caffra OM1754 0.97 Mimusops caffra OM2472 1.0 Mimusops caffra OM1554 1.0 Mimusops obovata OM3233 0.98 Mimusops zeyheri OM1943 Mimusops zeyheri OM MvdB50 1.0 1.0 Mimusops obtusifolia OM2627 Sapotaceae Sideroxylon inerme AM0232 1.0 Sideroxylon inerme BS0117 Sideroxylon inerme OM1760 Sideroxylon inerme OM266 Sideroxylon inerme RL1144 Barringtonia racemosa OM1830 1.0 Barringtonia racemosa OM2170 Lecythidaceae 1.0 Barringtonia racemosa OM3733 1.0 Pittosporum viridiflorum Abbott9133 1.0 1.0 Pittosporum viridiflorum OM1738 Pittosporum viridiflorum OM1784 1.0 Pittosporum undulatum GenbankA Pittosporum undulatum GenbankB Pittosporaceae 1.0 Pittosporum undulatum GenbankC Pittosporum undulatum OM2815 Breonadia salicina OM2571 0.53 1.0 Breonadia salicina OM3538 Breonadia salicina RL1194 Rubiaceae Curtisia dentata Genbank 1.0 Curtisia dentata OM1737 Curtisia dentata OM3167 Cornaceae

Page | 86

Leucadendron argenteum OM2263 Proteaceae 1.0 Ocotea bullata Abbott9194 1.0 Ocotea bullata Genbank Lauraceae Warburgia salutaris OM1853 Cannelaceae 0.87 Podocarpus latifolius GenbankA 0.96 Podocarpus latifolius GenbankB Podocarpus elongatus Genbank 0.92 Podocarpus elongatus OM2273 1.0 1.0 Podocarpus henkelii GenbankA Podocarpaceae Podocarpus henkelii GenbankB 1.0 Afrocarpus falcatus OM1681 Afrocarpus falcatus Genbank 1.0 0.66 Widdringtonia nodiflora GenbankA Widdringtonia nodiflora GenbankB Widdringtonia nodiflora OM2271 1.0 1.0 Widdringtonia nodiflora GenbankC Cupressaceae Widdringtonia schwarzii OM2272 Widdringtonia schwarzii GenbankA Widdringtonia schwarzii GenbankB Pinus pinaster Genbank Pinaceae 1.0 Stangeria eriopus PR706 Stangeriaceae Encephalartos aemulans PR861 Zamiaceae Cycas thouarsii Genbank Cycadaceae

Figure 3-1 Majority-rule consensus tree from Bayesian analysis of the combined dataset of core barcode genes rbcLa and matK rooted using representatives of the Acrogymnospermae. The numbers above the branches are BI posterior probabilities.

Page | 87

Figure 3-2: Box-plot indicating the genetic variation between smallest interspecific (inter) distance between species and the furthest intraspecific distance among its own species (intra). The distance calculated here as K2P.

Page | 88

Figure 3-3 Line-plot of the barcode gap for the 223 tree specimen in the matK dataset. For each individual in the dataset, the blue line represents the furthest intraspecific distance (bottom of line value), and the closest interspecific distance (top of line value). The orange lines show where this relationship is reversed, and the closest non-conspecific is actually closer to the query than its near conspecific, hence no barcode gape is present.

Page | 89

Figure 3-4 Line-plot of the barcode gap for the 217 tree specimen in the rbcLa dataset. For each individual in the dataset, the blue line represents the furthest intraspecific distance (bottom of line value), and the closest interspecific distance (top of line value). The orange lines show where this relationship is reversed, and the closest non-conspecific is actually closer to the query than its near conspecific, hence no barcode gape is present.

Page | 90

Figure 3-5 Line-plot of the barcode gap for 191 of 249 tree specimen in the combined (matK and rbcLa) dataset. For each individual in the dataset, the blue line represents the furthest intraspecific distance (bottom of line value), and the closest interspecific distance (top of line value). The orange lines show where this relationship is reversed, and the closest non-conspecific is actually closer to the query than its near conspecific, hence no barcode gape is present.

Page | 91

Page | 92

Figure 3-6 Cumulative error and threshold optimisation. False positive (blue) and false negative (orange) identification error rates summed across a range of distance thresholds from 0-2% in 0.1% increments (a = matK data; b = matK singletons excluded data; c = rbcLa data; d = rbcLa singletons excluded data; e = combined data; f = combined singletons excluded data. Characterization of errors tails - Meyer and Paulay 2005. Optimum thresholds are indicated by the arrow.

Page | 93

Figure 3-7 The function “seeBarcode” in R Spider produces an image that represents each base as a coloured vertical line corresponding to its nucleotide. These can be used to assess the alignment of the sequences. Query 1 = a), query 2 = b), query 3 = c), query 4 = d), query 5 = e), query 6 = f).

Page | 94

Cycas thouarsii Genbank Widdringtonia schwarzii OM2272 Widdringtonia schwarzii GenbankA Encephalartos aemulans PR861 Widdringtonia nodiflora OM2271 Widdringtonia nodiflora GenbankB Stangeria eriopus PR706 Widdringtonia schwarzii GenbankA Widdringtonia nodiflora GenbankC Widdringtonia nodiflora GenbankC Widdringtonia nodiflora GenbankB Widdringtonia schwarzii GenbankB Widdringtonia schwarzii GenbankA Widdringtonia nodiflora GenbankA Widdringtonia nodiflora GenbankA Widdringtonia schwarzii GenbankB Widdringtonia nodiflora OM2271 Pinus pinaster Genbank Widdringtonia schwarzii OM2272 Widdringtonia nodiflora GenbankB Cycas thouarsii Genbank Boscia angustifolia RBN268 1 Widdringtonia nodiflora GenbankA Encephalartos aemulans PR861 Boscia mossambicensis OM250 Pinus pinaster Genbank Stangeria eriopus PR706 Boscia mossambicensis YBK177 Podocarpus henkelii GenbankA Boscia mossambicensis RL1212 Podocarpus latifolius GenbankA Boscia albitrunca OM1256 Boscia angustifolia OM2069 Podocarpus elongatus Genbank Boscia albitrunca OM1274 Boscia angustifolia Genbank Balanites aegyptica OM3548 Boscia salicifolia OM2404 Boscia salicifolia OM2404 Balanites pedicillar is OM901 Boscia albitrunca OM312 Boscia salicifolia OM2543 Balanites maughamii OM3412 Boscia angustifolia RBN268 1 Boscia albitrunca OM1274 Balanites maughamii OM223 Boscia mossambicensis RL1212 Boscia albitrunca OM1256 Boscia salicifolia OM2543 Boscia albitrunca OM312 Balanites maughamii OM994 Ficus thonningii OM2763 Balanites maughamii OM2096 Boscia mossambicensis YBK177 Ficus trichopoda OM3674 Securidaca longipedunculata OM3358 Boscia mossambicensis OM250 Ficus trichopoda OM3274 Colophospermum mopane OM778 Combretum molle OM3553 Ficus thonningii OM2754 Colophospermum mopane RL1611 Combretum apiculatum OM3522 Ficus trichopoda OM1817 Colophospermum mopane RL1558 Combretum molle OM1526 Ficus thonningii OM542 Baikiaea plurijuga M660 Combretum apiculatum RL1100 Ficus thonningii OM972 Guibourtia conjugata OM1287 Combretum apiculatum OM1068 Ficus thonningii OM1576 Combretum apiculatum OM2406 Ficus thonningii RL1487 Guibourtia conjugata M662 Ficus thonningii OM1850 Guibourtia pellegriniana Genbank Combretum apiculatum OM2066 Ficus thonningii MWC20247 Guibourtia coleosperma OM2116 Lumnitzera racemosa OM2478 Ficus rokko OM2249 Guibourtia coleosperma Genbank Lumnitzera racemosa OM1675 Milicia excelsa OM2749 Guibourtia tessmannii Genbank Lumnitzera racemosa Genbank Milicia excelsa OM2696 Afzelia quanzensis OM291 Combretum imberbe OM2393 Berchemia zeyheri OM600 Afzelia quanzensis OM2113 Combretum imberbe OM1012 Berchemia zeyheri OM3345 Afzelia quanzensis OM2085 Adansonia digitata OM747 Berchemia zeyheri OM1165 Query 3 Adansonia digitata OM3387 Query 4 Afzelia quanzensis CS04 Berchemia discolor OM267 Adansonia digitata OM2740 Berchemia discolor OM2437 Microberlinia brazzavillensis Genbank Adansonia digitata OM1306 Berchemia discolor OM3536 Query 6 Sclerocarya birrea caffra OM278 Berchemia discolor OM1175 Brachystegia boehmii Genbank Sclerocarya birrea caffra RL1117 Prunus persica OM1899 Brachystegia spiciformis Genbank Sclerocarya birrea caffra OM498 Prunus persica GenbankA Brachystegia spiciformis OM2040 Ekebergia capensis OM2684 Prunus persica GenbankC Brachystegia boehmii OM3534 Ekebergia capensis OM742 Prunus persica GenbankB Julbernardia globiflora OM2705 Entandrophragma caudatum OM794 Prunus serotina GenbankC Julbernardia globiflora OM2517 Entandrophragma caudatum OM3352 Prunus serotina GenbankB Julbernardia globiflora Genbank Prunus serotina GenbankA Cassine transvaalense OM241 Prunus africana YM02 Millettia mossambicense OM2335 Cassine transvaalense OM1229 Prunus africana OM1568 Millettia grandis OM1757 Cassine transvaalense OM403 Prunus africana GenbankD Millettia grandis Genbank Cassine croceum OM3778 Prunus africana GenbankC Query 1 Prunus africana GenbankB Query 2 Cassine croceum OM3179 Prunus africana GenbankA Millettia stuhlmannii OM2522 Cassine croceum Ab bott9197 Sclerocarya birrea caffra OM278 Millettia stuhlmannii OM2322 Catha edulis OM482 Sclerocarya birrea caffra RL1117 Millettia stuhlmannii OM3517 Catha abbottii Abbott9242 Sclerocarya birrea caffra OM498 Millettia stuhlmannii CS27 Catha edulis OM1866 Ekebergia capensis OM2684 Philenoptera bussei OM2527 Spirostachys africana OM990 Ekebergia capensis OM742 Spirostachys africana OM254 Ekebergia capensis OM1540 Philenoptera bussei OM2376 Entandrophragma caudatum OM794 Philenoptera violacea RL1123 Spirostachys africana OM2396 Entandrophragma caudatum OM3352 Philenoptera violacea RL1123 1 Androstachys johnsonii GenbankA Adansonia digitata OM2740 Philenoptera violacea OM3542 Androstachys johnsonii RBN185 1 Adansonia digitata OM3387 Philenoptera violacea OM242 Androstachys johnsonii OM1912 Adansonia digitata OM747 Millettia usaramensis OM1803 Androstachys johnsonii OM3385 Adansonia digitata OM1306 Millettia usaramensis OM2222 Cleistanthus schlechter i OM2603 Lumnitzera racemosa Genbank Millettia usaramensis OM2433 Cleistanthus schlechter i OM2539 Lumnitzera racemosa OM2478 Ceriops tagal GenbankB Lumnitzera racemosa OM1675 Dalbergia boehmii OM2452 Combretum imberbe OM2393 Dalbergia boehmii OM2532 Ceriops tagal GenbankA Combretum imberbe OM1012 Dalbergia boehmii OM2420 Bruguiera gymnorhiza GenbankA Combretum apiculatum OM2406 Dalbergia melanoxylon OM268 Bruguiera gymnorhiza GenbankB Combretum apiculatum RL1100 Dalbergia melanoxylon OM984 Bruguiera gymnorhiza OM2487 Combretum apiculatum OM2066 Dalbergia melanoxylon OM2394 Securidaca longipedunculata OM3358 Combretum apiculatum OM3522 Pterocarpus angolensis OM2717 Securidaca longipedunculata CS33 Combretum molle OM3553 Pterocarpus angolensis OM3587 Combretum molle OM1526 Securidaca longipedunculata OM1965 Combretum apiculatum OM1068 Pterocarpus angolensis OM490 Securidaca longipedunculata OM2580 Leucadendron argenteum OM2263 Pterocarpus angolensis OM3312 Bobgunnia madagascar iensis OM3566 Warburgia salutaris OM1853 Pterocarpus rotundifolius OM3359 Colophospermum mopane RL1558 Ocotea bullata Genbank Pterocarpus brenanii Genbank Colophospermum mopane RL1611 Ocotea bullata Abbott9194 Pterocarpus rotundifolius RL1105 Colophospermum mopane OM778 Curtisia dentata Genbank Pterocarpus rotundifolius RBN174 Baikiaea plurijuga M660 Curtisia dentata OM3167 Pterocarpus rotundifolius RL1181 Guibourtia conjugata M662 Curtisia dentata OM1737 Pterocarpus rotundifolius OM418 Guibourtia conjugata OM1287 Barringtonia racemosa OM2170 Pterocarpus brenanii OM2510 Barringtonia racemosa OM3733 Guibourtia coleosperma OM2116 Barringtonia racemosa OM1830 Bobgunnia madagascar iensis OM3566 Julbernardia globiflora OM2517 Mimusops obtusifolia OM2627 Query 5 Julbernardia globiflora OM2705 Mimusops obovata OM3233 Cordyla africana OM1210 Brachystegia spiciformis OM2040 Mimusops caffra OM1754 Cordyla africana OM2745 Query 6 Mimusops caffra OM2472 Cordyla africana OM1188 Brachystegia boehmii OM3534 Mimusops caffra OM1554 Cassia afrofistula OM2629 Mimusops zeyheri OM MvdB50 Cassia abbreviata OM235 Afzelia quanzensis OM2113 Mimusops zeyheri OM1943 Cassia abbreviata OM2047 Afzelia quanzensis OM291 Sideroxylon inerme RL1144 Afzelia quanzensis OM2085 Sideroxylon inerme AM0232 Cassia abbreviata OM1177 Query 3 Sideroxylon inerme OM266 Newtonia buchananii Genbank Afzelia quanzensis CS04 Sideroxylon inerme OM1760 Newtonia hildebrandtii Genbank Dalbergia boehmii OM2532 Sideroxylon inerme BS0117 Cylicodiscus gabunensis Genbank Diospyros mespiliformis OM2764 Vachellia erioloba RL1298 Dalbergia boehmii OM2452 Diospyros mespiliformis OM3493 Vachellia haematoxylon Genbank Dalbergia boehmii OM2420 Diospyros mespiliformis RL1273 Vachellia haematoxylon OM1069 Dalbergia melanoxylon OM984 Diospyros mespiliformis OM218 Vachellia erioloba Genbank Dalbergia melanoxylon OM2394 Diospyros squarrosa Genbank Erythrophleum africanum OM2537 Dalbergia melanoxylon OM268 Diospyros squarrosa OM3485 Erythrophleum suaveolens OM2674 Pterocarpus rotundifolius RBN174 Diospyros abyssinica GenbankA Erythrophleum suaveolens Genbank Pterocarpus rotundifolius RL1105 Diospyros inhacaensis OM2225 Pterocarpus rotundifolius OM3359 Euclea pseudebenus MWC21190 Prunus persica OM1899 Euclea natalensis OM211 Prunus persica GenbankA Pterocarpus rotundifolius RL1181 Euclea natalensis RL1166 Prunus persica GenbankC Pterocarpus rotundifolius OM418 Euclea natalensis OM3239 Prunus persica GenbankB Pterocarpus brenanii OM2510 Euclea divinorum OM1102 Prunus serotina GenbankA Pterocarpus brenanii Genbank Euclea divinorum GenbankA Prunus serotina GenbankC Pterocarpus angolensis OM490 Diospyros abyssinica GenbankB Prunus serotina GenbankB Pterocarpus angolensis OM3312 Euclea divinorum OM227A Prunus africana YM02 Pterocarpus angolensis OM2717 Euclea divinorum GenbankB Breonadia salicina RL1194 Prunus africana GenbankD Pterocarpus angolensis OM3587 Breonadia salicina OM3538 Prunus africana GenbankC Pterocarpus angolensis OM1139 Breonadia salicina OM2571 Prunus africana GenbankB Millettia usaramensis OM1803 Pittosporum viridiflorum OM1784 Prunus africana GenbankA Millettia usaramensis OM2222 Pittosporum viridiflorum OM1738 Prunus africana OM1568 Millettia usaramensis OM2433 Pittosporum viridiflorum Abbott9133 Berchemia zeyheri OM3345 Philenoptera violacea OM3542 Pittosporum undulatum OM2815 Berchemia zeyheri OM600 Philenoptera bussei OM2527 Pittosporum undulatum GenbankA Berchemia zeyheri OM1165 Philenoptera bussei OM2376 Pittosporum undulatum GenbankC Query 4 Pittosporum undulatum GenbankB Philenoptera violacea RL1123 1 Securidaca longipedunculata OM1965 Berchemia discolor OM2437 Philenoptera violacea RL1123 Securidaca longipedunculata OM2580 Berchemia discolor OM1175 Philenoptera violacea OM242 Securidaca longipedunculata CS33 Berchemia discolor OM3536 Millettia mossambicense OM2335 Securidaca longipedunculata OM3358 Berchemia discolor OM267 Colophospermum mopane OM778 Ficus rokko OM2249 Millettia stuhlmannii OM3517 Colophospermum mopane RL1611 Ficus trichopoda OM1817 Query 2 Colophospermum mopane RL1558 Ficus thonningii MWC20247 Millettia stuhlmannii CS27 Julbernardia globiflora OM2705 Ficus thonningii OM1850 Millettia stuhlmannii OM2522 Query 6 Ficus thonningii RL1487 Query 1 Brachystegia spiciformis OM2040 Ficus thonningii OM972 Millettia stuhlmannii OM2322 Brachystegia boehmii OM3534 Vachellia haematoxylon OM1069 Baikiaea plurijuga M660 Ficus thonningii OM542 Guibourtia conjugata M662 Ficus thonningii OM1576 Vachellia erioloba RL1298 Guibourtia conjugata OM1287 Milicia excelsa OM2749 Erythrophleum africanum OM2537 Guibourtia coleosperma OM2116 Milicia excelsa OM2696 Erythrophleum suaveolens OM2674 Guibourtia pellegriniana Genbank Sclerocarya birrea caffra RL1117 Query 5 Guibourtia coleosperma Genbank Sclerocarya birrea caffra OM498 Cordyla africana OM2745 Guibourtia tessmannii Genbank Ekebergia capensis OM1540 Cordyla africana OM1210 Julbernardia globiflora OM2517 Entandrophragma caudatum OM794 Cordyla africana OM1188 Julbernardia globiflora Genbank Microberlinia brazzavillensis Genbank Entandrophragma caudatum OM3352 Cassia afrofistula OM2629 Brachystegia spiciformis Genbank Adansonia digitata OM3387 Cassia abbreviata OM235 Brachystegia boehmii Genbank Adansonia digitata OM747 Cassia abbreviata OM2047 Query 3 Adansonia digitata OM2740 Cassia abbreviata OM1177 Afzelia quanzensis CS04 Adansonia digitata OM1306 Balanites aegyptica OM3548 Afzelia quanzensis OM291 Combretum apiculatum OM2406 Balanites pedicillar is OM901 Afzelia quanzensis OM2113 Combretum apiculatum OM2066 Balanites maughamii OM3412 Afzelia quanzensis OM2085 Combretum molle OM3553 Balanites maughamii OM223 Cassia afrofistula OM2629 Combretum apiculatum OM3522 Balanites maughamii OM994 Cassia abbreviata OM235 Combretum apiculatum OM1068 Cassia abbreviata OM2047 Balanites maughamii OM2096 Cassia abbreviata OM1177 Combretum molle OM1526 Ficus thonningii OM1850 Erythrophleum africanum OM2537 Combretum apiculatum RL1100 Ficus thonningii RL1487 Erythrophleum suaveolens OM2674 Combretum imberbe OM2393 Ficus thonningii OM542 Vachellia haematoxylon OM1069 Combretum imberbe OM1012 Ficus thonningii OM2754 Vachellia erioloba RL1298 Lumnitzera racemosa Genbank Ficus thonningii OM2763 Erythrophleum suaveolens Genbank Lumnitzera racemosa OM2478 Ficus thonningii OM972 Newtonia buchananii Genbank Lumnitzera racemosa OM1675 Ficus thonningii OM1576 Newtonia hildebrandtii Genbank Leucadendron argenteum OM2263 Cylicodiscus gabunensis Genbank Ficus trichopoda OM3674 Vachellia haematoxylon Genbank Ocotea bullata Abbott9194 Ficus trichopoda OM3274 Vachellia erioloba Genbank Warburgia salutaris OM1853 Ficus trichopoda OM1817 Pterocarpus angolensis OM1139 Curtisia dentata OM3167 Milicia excelsa OM2749 Dalbergia boehmii OM2420 Curtisia dentata OM1737 Dalbergia boehmii OM2532 Curtisia dentata Genbank Milicia excelsa OM2696 Dalbergia boehmii OM2452 Pittosporum undulatum OM2815 Berchemia discolor OM3536 Dalbergia melanoxylon OM2394 Pittosporum undulatum GenbankB Query 4 Dalbergia melanoxylon OM984 Pittosporum undulatum GenbankA Berchemia discolor OM267 Dalbergia melanoxylon OM268 Pittosporum undulatum GenbankC Berchemia discolor OM2437 Pterocarpus angolensis OM2717 Pittosporum viridiflorum OM1784 Berchemia discolor OM1175 Pterocarpus angolensis OM3312 Pittosporum viridiflorum OM1738 Berchemia zeyheri OM3345 Pterocarpus angolensis OM490 Berchemia zeyheri OM600 Pterocarpus angolensis OM3587 Pittosporum viridiflorum Abbott9133 Pterocarpus rotundifolius RBN174 Breonadia salicina RL1194 Berchemia zeyheri OM1165 Pterocarpus rotundifolius OM418 Breonadia salicina OM3538 Prunus africana GenbankA Pterocarpus brenanii OM2510 Breonadia salicina OM2571 Prunus africana GenbankD Pterocarpus rotundifolius OM3359 Barringtonia racemosa OM1830 Prunus africana GenbankC Pterocarpus rotundifolius RL1181 Mimusops obtusifolia OM2627 Prunus africana GenbankB Pterocarpus rotundifolius RL1105 Mimusops obovata OM3233 Prunus africana OM1568 Pterocarpus brenanii Genbank Millettia stuhlmannii CS27 Mimusops caffra OM1554 Prunus africana YM02 Query 1 Mimusops caffra OM2472 Prunus serotina GenbankC Millettia stuhlmannii OM2322 Mimusops caffra OM1754 Prunus serotina GenbankB Millettia stuhlmannii OM2522 Mimusops zeyheri OM MvdB50 Prunus serotina GenbankA Query 2 Mimusops zeyheri OM1943 Prunus persica GenbankA Millettia stuhlmannii OM3517 Sideroxylon inerme OM266 Prunus persica OM1899 Millettia mossambicense OM2335 Sideroxylon inerme RL1144 Prunus persica GenbankC Millettia grandis OM1757 Sideroxylon inerme OM1760 Prunus persica GenbankB Millettia grandis Genbank Sideroxylon inerme BS0117 Philenoptera violacea RL1123 Barringtonia racemosa OM2170 Philenoptera violacea OM242 Sideroxylon inerme AM0232 Barringtonia racemosa OM3733 Philenoptera violacea OM3542 Euclea pseudebenus MWC21190 Barringtonia racemosa OM1830 Philenoptera violacea RL1123 1 Euclea divinorum OM227A Mimusops obovata OM3233 Philenoptera bussei OM2527 Euclea divinorum GenbankB Mimusops caffra OM1554 Philenoptera bussei OM2376 Euclea divinorum GenbankA Mimusops caffra OM2472 Millettia usaramensis OM1803 Euclea divinorum OM1102 Mimusops caffra OM1754 Millettia usaramensis OM2222 Euclea natalensis OM3239 Mimusops zeyheri OM MvdB50 Millettia usaramensis OM2433 Euclea natalensis OM211 Bobgunnia madagascar iensis OM3566 Mimusops obtusifolia OM2627 Query 5 Euclea natalensis RL1166 Mimusops zeyheri OM1943 Cordyla africana OM1210 Diospyros abyssinica GenbankB Sideroxylon inerme RL1144 Cordyla africana OM2745 Diospyros abyssinica GenbankA Sideroxylon inerme OM266 Cordyla africana OM1188 Diospyros inhacaensis OM2225 Sideroxylon inerme BS0117 Cleistanthus schlechter i OM2603 Diospyros mespiliformis OM3493 Sideroxylon inerme OM1760 Catha edulis OM482 Diospyros mespiliformis OM2764 Sideroxylon inerme AM0232 Cassine croceum OM3179 Diospyros mespiliformis RL1273 Cassine croceum Ab bott9197 Diospyros mespiliformis OM218 Euclea natalensis OM3239 Cassine croceum OM3778 Diospyros squarrosa Genbank Euclea pseudebenus MWC21190 Cassine transvaalense OM241 Diospyros squarrosa OM3485 Euclea divinorum OM1102 Cassine transvaalense OM1229 Cassine transvaalense OM1229 Euclea divinorum GenbankA Cassine transvaalense OM403 Cassine croceum OM3778 Euclea natalensis OM211 Cassine matabelicum Genbank Cassine croceum OM3179 Euclea natalensis RL1166 Catha transvaalensis Genbank Diospyros abyssinica GenbankA Catha abbottii Genbank Cassine transvaalense OM403 Catha abbottii Abbott9242 Cassine croceum Abbott9197 Diospyros inhacaensis OM2225 Catha edulis OM1866 Cassine matabelicum Genbank Diospyros squarrosa Genbank Balanites pedicillar is OM901 Cassine transvaalense OM241 Diospyros squarrosa OM3485 Balanites aegyptica OM3548 Catha edulis OM1866 Diospyros mespiliformis RL1273 Balanites maughamii OM3412 Catha transvaalensis Genbank Diospyros mespiliformis OM3493 Balanites maughamii OM223 Diospyros mespiliformis OM2764 Balanites maughamii OM994 Catha edulis OM482 Balanites maughamii OM2096 Catha abbottii Genbank Diospyros mespiliformis OM218 Androstachys johnsonii OM1912 Catha abbottii Abbott9242 Breonadia salicina OM3538 Androstachys johnsonii RBN185 1 Spirostachys africana OM2396 Breonadia salicina RL1194 Androstachys johnsonii OM3385 Spirostachys africana OM990 Breonadia salicina OM2571 Androstachys johnsonii GenbankB Spirostachys africana OM254 Pittosporum viridiflorum OM1738 Androstachys johnsonii GenbankA Ceriops tagal GenbankA Pittosporum viridiflorum Abbott9133 Cleistanthus schlechter i OM2539 Bruguiera gymnorhiza GenbankB Pittosporum undulatum OM2815 Cleistanthus polystach yus Genbank Bruguiera gymnorhiza GenbankA Spirostachys africana OM2396 Pittosporum viridiflorum OM1784 Spirostachys africana OM990 Bruguiera gymnorhiza OM2487 Pittosporum undulatum GenbankA Spirostachys africana OM254 Androstachys johnsonii GenbankB Curtisia dentata Genbank Ceriops tagal GenbankB Androstachys johnsonii GenbankA Curtisia dentata OM3167 Ceriops tagal GenbankA Cleistanthus schlechter i OM2539 Curtisia dentata OM1737 Bruguiera gymnorhiza GenbankB Cleistanthus polystach yus Genbank Leucadendron argenteum OM2263 Bruguiera gymnorhiza GenbankA Boscia salicifolia OM2543 Warburgia salutaris OM1853 Bruguiera gymnorhiza OM2487 Boscia salicifolia OM2404 Afrocarpus falcatus OM1681 Boscia albitrunca OM1274 Ocotea bullata Genbank Afrocarpus falcatus Genbank Ocotea bullata Abbott9194 Podocarpus latifolius GenbankB Boscia albitrunca OM1256 Afrocarpus falcatus Genbank Podocarpus elongatus OM2273 Boscia albitrunca OM312 Afrocarpus falcatus OM1681 Cycas thouarsii Genbank Boscia angustifolia RBN268 1 Podocarpus henkelii GenbankB Encephalartos aemulans PR861 Boscia angustifolia Genbank Podocarpus henkelii GenbankA Stangeria eriopus PR706 Boscia angustifolia OM2069 Podocarpus latifolius GenbankA Pinus pinaster Genbank a) Boscia mossambicensis YBK177 b) c) Podocarpus latifolius GenbankA Boscia mossambicensis RL1212 Podocarpus latifolius GenbankB Podocarpus elongatus Genbank Boscia mossambicensis OM250 Podocarpus elongatus OM2273 Podocarpus henkelii GenbankB Podocarpus elongatus Genbank Podocarpus henkelii GenbankA Figure 3-8 NJ Phylogeny of all three datasets a = matK; b = rbcLa; c = Combined datasets, Rosenberg’s probability of reciprocal monophyly. The orange nodes a significant to α = 0.05 (orange nodes also represent bootstrap > 70% and blue nodes bootstraps < 70%).

Page | 95

Figure 3-9 Illustrates Query identification by NJ algorithm on the local matK dataset. The sub-clade presented is derived from the matK NJ tree analysis provided in figure 3-8 a). Incorporated in this analysis is Rosenberg’s probability of reciprocal monophyly. Orange nodes report significantly monophyletic clusters, while blue means clustering may be due to chance. Correct identification are marked by a tick while incorrect identification by a cross.

Page | 96

Figure 3-10 Illustrates Query identification by NJ algorithm on the local rbcLa dataset. The sub-clades presented are derived from the rbcLa NJ tree analysis provided in figure 3-8 b). Incorporated in this analysis is Rosenberg’s probability of reciprocal monophyly. Orange nodes report significantly monophyletic clusters, while blue means clustering may be due to chance. Correct identification are marked by a tick while incorrect identification by a cross.

Page | 97

Figure 3-11 Illustrates Query identification by NJ algorithm on the local combined dataset. The sub-clades presented are derived from the Combined NJ tree analysis provided in figure 3-8 c) Incorporated in this analysis is Rosenberg’s probability of reciprocal monophyly. Orange nodes report significantly monophyletic clusters, while blue means clustering may be due to chance. Correct identification are marked by a tick while incorrect identification by a cross.

Page | 98

Figure 3-12 Illustrates query identification by Bayesian Likelihood algorithm on the local combined dataset. Present are subclades derived from the Combined Bayesian likelihood tree provided in figure 3.6. Incorporated on the tree are posterior probabilities ranked as week from 0-95 and strong support from 95- 100. Correct identification are marked by a tick while incorrect identification by a cross

Page | 99

Chapter 4

4. Discussion

4.1 Development of DNA barcode reference library

Developing DNA barcoding as a tool to help combat illegal logging in southern Africa primarily depends on generating an accurate barcode sequence reference library. As such, this study has contributed a preliminary DNA barcode database for priority protected and traded timber species for the regulation of the timber trade industry in southern Africa. The majority of priority species (76%) listed in table 2-1 are present in the database incorporating 93% of South African protected trees and all Mozambican commercial timber species.

I have encountered several obstacles while generating the database. Firstly, vital to completing the reference library was allocating species names accurately to voucher specimens, especially as it aims to be a useful reference library for use by non- taxonomists. To address this issue, relevant guides and experts to local vegetation with special emphasis of local trees were consulted. Despite the latter, field misidentification was still factoring in building the database, which made apparent how defective the current regulation method (primarily based on morphological identification) may just be. To significantly reduce the presence of misidentified specimen in the database all specimens were re-identified using DNA barcodes together with morphological identifications. Referring to figure 3-1 (a majority rule consensus tree of the combined

Page | 100

matK and rbcLa dataset), despite a lack of resolution in specific groups, the dataset displayed cohesive clusters in most of the samples and is in agreement with the classification of the updated angiosperm phylogeny (APG III 2009). For any specimen on our phylogeny displaying obvious disagreement with field identification, careful reassessment of the voucher specimen and collection data was conducted and rectified if necessary. The latter highlights a beneficial component of DNA barcoding data and specimen processing, which further credits its implementation.

Another challenge facing the dataset is that, in many instances insufficient sampling (singletons) of individual species would affect the performance of the dataset.

It is a common trend in DNA barcoding studies that involves developing a reference library that researchers fail to exclude queries from their analysis. The best practice will be to generate the database and subsequently run independently sorted sequences against the database. Parmentier et al. (2013) demonstrated that “bit score” will always be more elevated if the sequence is matched against itself then with separately generated sequence that might display variation in length and ambiguous base composition. In this study, not only is the dataset tested for practicality with a few mock identification scenarios, but also the dataset is systematically questioned by reporting each of its constituent specimens as a query. Hence, to reduce the dataset’s susceptibility to such

(previously mentioned) bias we mined global databases such as BOLD Systems and

GenBank for duplicates where needed. A minimum of two individual per species is a prerequisite, however, it should be noted that incorporating GenBank sequences brought a definite uncertainty. Unlike BOLD Systems data, diagnostic photographs of the

Page | 101 specimen are not available on GenBank and validation of species identification could not be performed.

Therefore, for future endeavour on the topic I will sternly recommend following the classical taxonomic studies sampling design. Where one would sample and study several individuals from multiple localities across the range of given species to differentiate intraspecific variation from interspecific variation, in order to categorize characteristics uniquely shared among all members of that species. Similarly, a reference database for the barcoding of southern African trees should include sufficient sequences for each species distribution across its distribution range and close genetic and morphological “look alike” species in order to better represent diversity of the specimen.

4.2 Genetic divergence and implication for identification

It is a given that genetic diversity are the basis of species identification. Distinguishing between different species may only be achieved in the presence of significant interspecific genetic divergence and kerbed intraspecific variation, in other words the

“barcoding gap” developed by Meyer and Paulay (2005). Alternatively, cohesive monophyletic clustering tree based analysis may isolate specimens of the same species.

In this study, both parameters were implemented to evaluate the generated reference library discrimination power.

Page | 102

At first glance it would be obvious to assume we walked a fine line in terms of barcoding gap especially when assessing figures 3-4, 3-5 and 3-6 displaying the barcoding gap across the three datasets. In the summary statistic table 3-2, a mean intraspecific distance no greater then 0, 25% is reported across all datasets. The smallest mean interspecific distance was recorded in the rbcLa dataset at 1,12% followed by the combined dataset at 2,04% and the highest was record in the matK dataset at 2,45%.

The difference between species is 10 times greater than within species and even greater in the combined dataset. Until recently a mean interspecific distance would have been used to determine divergence, however, it was shown that mean interspecific distance would yield exaggerated approximations (Meier et al. 2008).

In some instances the intraspecific variation exceeded the recorded mean smallest interspecific values. Regardless of the 10 fold difference between the interspecific and the intraspecific variation, the mean smallest interspecific variations in all three dataset ranged from 0 to a given value (see table 3-2) while mean intraspecific variations ranged up to 2,65 % in the matK dataset exceeding the mean of the smallest interspecific distance. It certainly is implying that for a percentage of the sequences, identification will not be possible. For instance, in the matK dataset interspecific divergence was non-existent for two or more species of the genera Pterocarpus,

Vachellia, Brachystegia, Ficus, Catha, Balanites, Combretum, Boscia, Diospyros, and

Widdringtonia. However Little et al. (2013) effectively demonstrated that the presence or absence of a barcoding gap might not necessarily be predictive of discrimination success. This observation is consistent with the findings in the current study. For instance, predicted identification success based on presence or absence of the barcode

Page | 103 gap suggested a 74%, 66% and 83% success rate for matK, rbcLa and the combined dataset respectively. However, depending on parameters used identification success rate varied between 53% to 88%, 27% to 88% and 38% to 90% for matK, rbcLa, and combined dataset. Several of the genera listed above were found to have large intraspecific variation but notable low interspecific variation. The latter will appear to have no barcoding gap. However, according to Little et al. (2013) all it needs is for one nucleotide difference to consistently identify the species.

According to the majority rule trees (figure 3-1), most taxa formed a monophyletic unit with their conspecific taxa with exception of some species from the same genera mentioned above. Most of the authenticate species clades were clearly monophyletic, appearing distinctly distant from other clades with very high support.

However, the NJ trees illustrated in figures 3-8 did not identify all the authenticate species. Rosenberg’s analysis of reciprocal monophyly suggested that, though, the majority of species form consistent and well supported clades with individuals of the same species it does not indicate that the lineage are from the same group (Rosenberg

2007). The latter may merely be the outcome of random branching. Colour plotted on the nodes of each clades represent significance and I noticed that the majority of the recent branches do not hold any significance (blue nodes), suggesting that additional sampling of specimen per species is needed for the dataset figure 3-8 (Rosenberg 2007).

Page | 104

4.3 Identification success rates

The first parameters of identification tested are k-NN and BCM. These are distance based measures, rather than the reciprocal monophyly. They work by finding the closest individual to the target and return the species index for that individual (Brown et al. 2012). The advantage of such a method is that even in the presence of conflicting records, for instance two specimens are returned as nearest neighbour match, the BCM parameter will identify such as ambiguous while the k-NN will bypass the indecision and allocate a single conspecific match based on majority. Accordingly, the best success rate was recorded at an average of 89% for k-NN across all three databases; 10% more effective than BCM. Nevertheless, the k-NN method is plagued with complications. For example in the instances where an under-sampled database is faced by a query not present in the DNA reference library, the k-NN parameter will assign “no identification”.

Despite being fundamentally similar to k-NN, Meier’s best close match parameter and the BOLD identification parameter include a default threshold, which enables a “no identification” measure. As with most parameter in this study, threshold faces its own brand of fundamental reproaches (Galtier et al. 2009), but it facilitates a principle for separating divergence within species from between species (Meier 2008).

Incorporated in the Spider R package is a useful tool “threshOpt” for assessing if the global default 1% threshold, adequately serves the generated database. The application of the “threshOpt” script on the three databases returned a cumulative error analysis emphasising the notion that not all databases are the same. Therefore, following a

Page | 105 general threshold may underestimate the performance of the given database. But more importantly optimising the threshold can protect against highly possible scenario of encountering un-sampled species. Another, noticeable fact is that the adjusted thresholds were relatively close to the 1% default further justifying the universal default threshold of 1%. For both the BCM and BOLD identification parameters the change in threshold resulted in a modest improvement of identification success especially for the

BOLD parameter. An improved threshold somewhat negates the drawback of the

BOLD parameters. The latter requires that all hits within the threshold belong to the conspecific. Therefore, in line with the NJ monophyly, one misidentification within the threshold will yield an ambiguous result. This explains the rather disappointing performance of the BOLD parameter. Perhaps, adopting an approach similar to the

BCM is more suitable and will significantly increase identification rates.

The NJ clustering is a method not shy of controversy and is scrutinised for debatable manner of tree building (Lowenstein and Kolokoyronis 2010, Meier et al.

2006, Will and Rubinoff 2004) and topological ambiguity. The application of the method on our database generated reasonable rate of success, on average 75% for NJ monophyly and 71% when bootstrap was included across all three datasets. However, despite the good rates perfectly well distinguishable species may not be forming monophyletic cluster especially when introgression is concerned and if it is a recently diverged species that still retains ancestral polymorphisms (van Velzen et al. 2012).

These results are in agreement with previous studies suggesting that in some instance barcoding fail to accentuate the agreement between morphology and DNA barcoding sequence. This is highlighted by the drop in success rate with the inclusion of bootstrap

Page | 106 values in some instance as high as 5%. It is apparent that despite the reasonable performance NJ monophyly trees are prone to erroneously ignoring a diagnosable monophyletic species. However; it will be suited for generating quick patterns within

DNA barcoding data.

Identification success results are described for two scenarios primarily with the exclusion of singletons followed by results including singletons. For DNA barcoding identification the exclusion of singletons (taxa represented by a single individual) is ideal because the re-identification scenario will be conducted on a complete reference library (any given taxon is represented by two or more individual). Meaning if a given sequence is the query the duplicate will serve as a reference.. However, the latter is unrealistic because in the trade of timber species it is highly likely that species may be substituted with “a look alike” species not currently available in our database. Hence, it is expected that the number of singletons encountered will more likely increase in the future. Because of this highly likely problem the use of parameter unable to discriminate when a specimen could not be assigned to a species must be precluded, and parameters able to return a “no ID” logical answer must be used instead, until the database is completed.

4.4 DNA barcoding: practical considerations

Despite the widely reported low amplification success of matK between the two barcodes markers (Gonzalez et al. 2009, Kress et al. 2009, Hollingsworth et al. 2011) both matK and rbcLa markers were easily amplified from all six timber log materials

Page | 107 using a single primer pair. For identification on the local database, an alignment of all sequences is needed. However, the presence of indels in matK makes this task somewhat challenging, especially when dealing with a broad range of taxonomic groups across the Angiosperm phylogeny. On the other hand, rbcLa is certainly the easiest marker to align because no gaps are present and only a single alignment is possible as illustrated by figure 3-7.

One of the aims of the study was to propose an alternative mean of identification for timber species of southern Africa to assist officials at border posts. The proposed method of identification was evaluated using the timber log material and techniques detailed in the methods against the generate database and universal databases to compare efficiency. Both markers performed equally well on the local database using

NJ monophyly (table 3-6). For five of the six query sequences the query was sister or nested in a monospecific cluster. In addition matK marker accurately identified query 6 to genus level but failed to form a monophyletic clade with Brachystegia spiciformis, while, the rbcLa marker lacked enough variability to separate the genus Julbernadia from Brachestygia, (see figure 3-10) resulting in ambiguous identification. Combining the two markers did not improve identification success at species level, if anything it reduced the resolution previously observed in matK (see figure 3-11). Comparing the latter to the BL combined analysis, distinction between the genus Julbernadia and

Brachestygia is re-established.

Page | 108

As far as the BLAST identification of universal databases is concerned, the query identification performed poorly on GenBank and moderately on BOLD Systems database. The better performance on BOLD is expected considering the specificity of the region tested (matK and rbcLa). Furthermore sequences on BOLD are not necessarily published on GenBank. In the instance of Millettia stuhlmannii query 1 and

2 no matK or rbcLa sequences are present on Genbank.

Page | 109

Chapter 5

5. Conclusion

This study explored the application of molecular techniques more specifically DNA barcoding, for the identification of timber species and rely on its proven reported resolution especially in instances where morphological diagnosis are impractical or impossible. The result presented in this study show the potential for DNA barcoding in identification of commercial timber. Figure 6-1 is a visual representation of the procedure for “DNA barcoding identification of traded and protected trees”, it represents how the technique would be used for the identification of timber species in a real authentication scenario.

The success of DNA barcoding have been estimated at 98% for animals and

70% for plants, with the relative low success rate of the latter having been attributed to various causes such as high incident of hybrid species in angiosperms long generation time, slow mutation rates of woody species, and limited dispersal of seed (Meyer and

Paulay 2005), hence the identification power of the core barcoding marker have been consistently challenged (CBOL Plant Working Group 2009, Hollingsworth et al. 2009,

Clement and Donoghue 2012, Pettengill 2010) resulting in pursue of alternative barcoding markers.

Page | 110

Studies on tropical trees by Gonzalo et al. 2009, which were the first to include a large number of tropical trees, recorded the best identification rate using psbA-trnH marker. After assessing five markers Tripathi et al. (2013) reported their highest identification rate when using ITS and psbA-trnH at 60- 70% for the land plant of India.

Recently Parmentier et al. 2013, reported identification rate ranging from 71% to 88% with the best performing marker being psbA-trnH. However, in this study we report a positive identification rate of 88% for the core barcoding genes matK and rbcLa individually and 90% success rate when combined for the regional traded and protected trees.

A noticeable difference between the above-mentioned studies and the current study is in the sampling procedure. Parmentier et al. 2013 generated a vast regional genetic inventory with over 700 specimen collected. Similarly, the study of Gonzalez et al. (2009) involved a large number of tropical trees. Though large scale sampling maybe the ultimate goal of barcoding it has been reported that identification success decline with an increase in number of species per genus (Kress et al. 2009, Kress et al. 2010).

Therefore, large scale sampling may include several individuals per genus, which may have resulted in their recorded poor core barcode performance. The sampling in our study remained specific to the traded and protected species of southern Africa, which promoted the uniqueness of a specimen rather than blur the boundaries between species; therefore, serving as an ideal platform for southern African traded and protected trees identification.

Page | 111

In this study I also aimed to assess the species identification rate of the core barcodes using several identification parameters. From the result obtained near- neighbour approaches k-NN and BCM performed better. This is because unlike NJ approach they do not require reciprocal monophyly of each species and the universal and criticised threshold of the BOLD IDS is not applicable. Therefore even when conflicting data is included identification success still remains high. However, in the case of a tie or closest match k-NN technique will ignore this uncertainty and offer an

ID based on majority. However, BCM will report it as ambiguous. This characteristic of the BCM parameter may be appropriate especially with the constant shifting trends in the timber industry. A good identification tool should distinguish new species from existing species in the database and not wrongly assign it to sequence in the database.

Furthermore, species already present in the database should always be subject to re- identification. Recently BRONX identification parameter was described by Little

(2011). This novel algorithms account for observed within taxon variability and hierarchic relationships among taxa. BRONX is reported to perform better than algorithms depending on global multiple sequence alignment and or local pairwise alignment. Due to the ability to score invariant flanking region the BRONX method performs better when imperfect overlaps are queried (i.e. incomplete sequence fragments and mini-barcode; Little 2011).

Though the rbcLa marker is easy to amplify and sequence, this study in conjunction with recent previous studies on land trees (Elansary 2013, Parmentier et al.

Page | 112

2013, Tripathi et al. 2013) reported a lower barcoding gap compared to matK. In this study the successful return of high quality matK sequences, (with all but one of six generated sequence exceeding the average sequence length of 745 bp) combined with the greater variability (interspecific distance of 2, 45) and an all-round higher identification success rate make matK a better individual. In accordance with

Parmentier et al. (2013) combining the two markers resulted in an average identification rate increase of 7% across different parameters. However, it should be noted that this identification rate increase is despite a reduced interspecific variation and barcoding gap. This further reiterates that though informative the barcoding gap does not necessarily indicate delimitation.

Based on the findings of the current study I would recommend that matK be initially used as an individual marker for traded and protected trees of southern Africa.

This will not only conserve the resolution of the identification tool (Kress et al. 2009,

Kress et al. 2010), but also significantly decrease the high implementation costs associated with molecular tools (Brack et al. 2002).

Page | 113

Figure 5-1 illustration of DNA barcoding identification using BOLD Systems database: the first stage depicts selection of adequate material for DNA extraction. Step two is DNA extraction (CTAB). Stage three is amplification and followed by sequencing, lastly, identification of the timber sequence on BOLD Systems database. Despite being a mock identification this workflow illustrates the potential processing of a real DNA barcoding identification.

Page | 114

6. References

American Forest & Paper Association. (2009). American Forest & Paper Association

Comments on the Draft Western Climate Initiative Reporting Requirements.

American Forest & Paper Association. Washington, D.C.: American Forest &

Paper Association.

American Forest and Paper Association. (2002, February 4). AF&PA passes illegal

logging resolution. Press Release .

APG III. (2009). An update of the Angiosperm Phylogeny Group classification for the

orders and families of flowering plants: APG III. Botanical Journal of the

Linnean Society , 161, 105-121.

Asif, M. J., & Cannon, C. H. (2005). DNA extraction from processed wood: A case

study for the identification of an endangered timber species (Gonystylus

bancanus). Plant Molecular Biology Reporter , 23, 185-192.

Barreto, P., Mesquita, M., & Mercês, H. (2008). A Destinação dos Bens Apreendidos

em Crimes Ambientais na Amazônia. Imazon.

Blaxter, M. (2003). Counting angels with DNA. Nature , 421, 122-124.

Boon, R. (2010). Pooley's Trees of the Eastern South Africa (2nd ed.). Flora and Fauna

Publication Trust.

Brack, D. (2007). Illegal Logging. London: The Royal Institute of International Affairs.

Brack, D. (2003). Illegal logging and the illegal trade in forest and timber products.

International Forestry Review , 5, 195-198.

Page | 115

Brack, D., Gray, K., & Hayman, G. (2002). Controlling the international trade in

illegally logged timber and wood products. the UK Department for International

Development. London: Royal Institute of International Affairs.

Brown, S. D., Collins, R. A., Stephane, B., leFort, M.-C., Malumbres-Olarte, J., Vink,

C. J., Cruickshank, R. H. (2012). SPIDER: An R package for the analysis of

species identity and evolution, with particular reference toDNAbarcoding.

Molecular Ecology Resources , 12, 562–565.

Cameron, S. L., Lambkin, C. L., Barker, S. C., & Whiting, M. F. (2007). A

mitochondrial genome phylogeny of Diptera: whole genome sequence data

accurately resolve relationships over broad timescales with high precision.

Systematic Entomology , 32, 40–59.

Cantino, P. D., Doyle, J. A., Graham, S. W., Judd, W. S., Olmstead, R. G., Soltis, D. E.,

Soltis, P. S., Donoghue, M. J (2007). Towards a phylogenetic nomenclature of

Tracheophyta. Taxon , 56, 822-846.

Cauley, H. A., Peters, C. M., Donovan, R. Z., & O'Connor, J. M. (2001). Forest

stewardship council forest certification. Conservation Biology , 15, 311-312.

CBOL - Plant Working Group. (2009). A DNA barcode for land plants. Proc. Natl.

Acad. Sci. USA , 106, 12794–12797.

CBOL Plant Working Group. (2009). A DNA barcode for land plants. Proc. Natl. Acad.

Sci. USA , 106, 12794–12797.

Centre for International Economics. (2009). Proposed new policy on illegally logged

timber Issues paper. Canberra & Sydney: Centre for International Economics.

Page | 116

Chase, M. W., Salamin, N., Wilkinson, M., Dunwell, J. M., Kesanakurthi, R. P., Haidar,

N., Savolainen, V. (2005). Land plants and DNA barcodes: short-term and long-

term goals. Phil. Trans. R. Soc. B , 1720, 1-7.

Chatham House. (2007). Illegal Logging and Related Trade: Measuring the Global

Response. Chatham House. London: Energy, Environment and Development

Programme Chatham House.

CITES. (2002). CITES identification guide-tropical wood. Environment Canada.

Canada: Authorithy of the Minister of Environment Minister of Supply and

Services Canada.

CITES. (2006). CITES I-II-III timber species manual. Washington D.C.: United State

Department of Agriculture.

Clement, W., & Donoghue, M. (2012). Barcoding success as a function of phylogenetic

relatedness in viburnum, a clade of woody angiosperms. BMC Evolutionary

Biology , 12, doi10.1186/1471-2148-12-73.

Coates Palgrave, K. C., Drummond, R. B., Moll, E. J., & Palgrave, M. C. (2002).

Palgrave's Trees of Southern Africa (3 ed.). Cape Town : Struik Publishers.

Commission of the European Communities. (2003). Forest Law Enforcement,

Governance and Trade (FLEGT) – proposal for an EU action plan,

communication from the commission to. Brussels.

Cooper, A., & Wayne, R. (1998). New uses for old DNA. Current Opinion in

Biotechnology , 9, 49-53.

Cuénoud, P., Savolainen, V., Chatrou, L., Powel, M., Grayer, R., & Chase, M. (2002).

Molecular phylogenetics of Caryophyllales based on nuclear 18S rDNA and

Page | 117

plastid rbcL, atpB, and matK DNA sequences. American Journal of Botany ,

132-144.

Degen, B., & Fladung, M. (2007). Use of DNA-markers for tracing illegal logging. In

B. Degen (Ed.), Proceeding of the International Workshop “Fingerprinting

methods for the identification of timber (pp. 6-14). Bonn: Landbauforschung vTI

Agriculture and Forestry Research.

Deguilloux, M.-F., Pemonge, M.-H., & Petit, R. J. (2004). DNA-based control of oak

wood geographic origin in the context of the cooperage industry. Annals of

Forest Science , 61, 97-104.

Deguilloux, M.-F., Pemonge, M.-H., & Petit, R. J. (2002). Novel perspectives in wood

certification and forensics: dry wood as a source of DNA. J Petit , 269, 1039–

1046.

Deguilloux, M.-F., Pemonge, M.-H., Bertel, L., Kremer, A., & Petit, R. J. (2003).

Checking the geographical origin of oak wood: molecular and statistical tools.

Molecular Ecology , 12, 1629–1636.

Del Gatto, F. (2003). Forest law enforcement in Mozambique: An Overview. Maputo:

DNFFB/FAO.

Direcção Nacional de Terras e Florestas . (2007). Integrated Assessment of Mozambican

Forests. República de Moçambique.

Doyle, J. J., & Doyle, J. L. (1987). A rapid DNA isolation procedure for small

quantities of fresh leaf tissue. Phytochemical Bulletin , 19, 11-15.

Drouin, G., Daoud, H., & Xia, J. (2008). Relative rates of synonymous substitutions in

the mitochondrial, chloroplast and nuclear genomes of seed plants. Molecular

Phylogenetics and Evolution , 49, 827–831.

Page | 118

Dumolin-Lapègue, S., Pemonge, M.-H., Gielly, L., Taberlet, P., & Petit, R. J. (1999).

Amplification of oak DNA from ancient and modern wood. Molecular Ecology ,

8, 2137–2140.

Elansary, H. (2013). Towards a DNA barcode library for Egyptian flora, with a

preliminary focus on ornamental trees and shrubs of two major gardens. DNA

Barcodes , 46-55.

Fay, M. F., Bayer, C., Alverson, W. S., De Bruijn, A., & Chase, M. W. (1998). Plastid

rbcL sequence data indicate a close affinity between Diegodendron and Bixa.

Taxon , 47, 43-50.

Fazekas, A. J., Burgess, K. S., Kesanakurti, P. R., Graham, S. W., Newmaster, S. G.,

Husband, B. C., Percy, D. M., Hajibabaei, M., Barrett, S. P. (2008). Multiple

multilocus DNA barcodes from the plastid genome discriminate plant species

equally well. PLoS One , e2802.

Finkeldey, R., Leinemann, L., & Gailing, O. (2010). Molecular genetic tools to infer the

origin of forest plants and wood. Applied Microbiology and Biotechnology , 85,

1251–1258.

Finkeldey, R., Rachmayanti, Y., & Gailing, O. (2007). Molecular genetic tools for the

identification of the origin of wood. In U. Krües, Wood Production, Wood

Technology, and Biotechnological Impacts (pp. 143-158). Goettingen, Germany:

Goettinger Universitaetsverlag.

Food and Agriculture Organization. (2005). Best practices for improving law

compliance in the forest sector. Rome: FAO.

Page | 119

Food and Agriculture Organization. (2007). State of the World's Forests. Forestry

Department. Rome: Electronic Publishing Policy and Support Branch

Communication Division FAO.

FSC. (2013). Statutes. Oxaca: Forest Stewardship Council.

Galtier, N., Nabholz, B., Glemin, S., & Hurst, G. (2009). Mitochondrial DNA as a

marker of molecular diversity: a reappraisal. . Molecular Ecology 18 , 4541–

4550.

Global Environment Facility. (2009). Investing in land stewardship GEF's efforts to

combat land degradation and desertification iglobally. Global Environment

Facility.

Global Witness. (2002). Logging Off How the Liberian Timber Industry Fuels Liberia's

Humanitarian Disaster and Threatens Sierra Leone. London: Global Witness.

Gonzalez, M., Baraloto, C., Engel, J., Mori, S., & Petronelli, P. (2009). Identification of

amazonian trees with DNA barcodes. PLoS ONE , 4.

Gugerli, F., Parducci, L., & Petit, R. J. (2005). Ancient plant DNA: review and

prospects. New Phytologist , 166, 409-418.

Hajibabaei, M., deWaard, J. R., Ivanova, N. V., Ratnasingham, S., Dooh, R. T., Kirk, S.

L., Mackie, P. M., Herbert, P. D. N. (2005). Critical factors for assembling a

high volume of DNA barcodes. Phil. Trans. R. Soc. B , 360, 1959-1967.

Hammond, P. (1992). Species inventory. In B. Groombridge (Ed.), In Global

biodiversity: status of the earth’s living resources (pp. 17–39). London:

Chapman & Hall.

Hanlon, J. (2007). Is poverty decreasing in Mozambique? Maputo: Institudo de Estudos

Sociais e Económicos.

Page | 120

Hatton, J., & Munguambe, F. (1998). The biological Diversity of Mozambique.

Ambiental, Ministério da Coordenação de Acção. Maputo: Impacto Lda.

Hebert, P. D., Cywinska, A., Ball, S. L., & deWaard, J. R. (2003). Biological

identifications through DNA barcodes. Procceding of the Royal Society London ,

270, 313-321.

Herbet, P. D., & Gregory, R. T. (2005). The Promise of DNA Barcoding for Taxonomy.

Systematic Biology , 54, 852-859.

Hollingsworth, M., Clark, A., Forrest, L., Richardson, J., Pennington, R., Long, D.,

Cowan, R., Chase, M. W., Gaudeul, M., Hollingsworth, P. M. (2009). Selecting

barcoding loci for plants: evaluation of seven candidate loci with species-level

sampling in three divergent groups of land plants. Molecular Ecology Resources

, 9, 439-457.

Hollingsworth, P., Graham, S., & Little, D. (2011). Choosing and using a plant DNA

barcode. PLoS ONE , 6.1.

Huelsenbeck, J. P., & Ronquist, F. (2001). MRBAYES: Bayesian inference of

phylogenetic trees. Bioinformatics , 17, 745-755.

Instituto Nacional de Estatistica. (2013). Anuario Estatistico 2012. Maputo: Instituto

Nacional de Estatistica.

Ivanova, N. V., deWaard, J. R., & Herbert, P. D. (2006). An inexpensive, automation-

friendly protocol for recovering high-quality DNA. Molecular Ecology Notes , 6,

998–1002.

Johnson, L. A., & Soltis, D. E. (1995). Phylogenetic Inference in Saxifragaceae Sensu

Stricto and Gilia (Polemoniaceae) Using matK Sequences. Annals of the

Missouri Botanical Garden , 82, 149-175.

Page | 121

Johnstone, R., Cau, B., & Norfolk, S. (2004). Forestry legislation in Mozambique:

compliance and the impact on forest communities. Maputo: Terra Firma Lda.

Keong, C. H. (2006). Role of CITES in Combating Illegal Logging Current and

Potential. Cambridge: TRAFFIC International, Cambridge, UK.

Kress, J., Erickson, D., Swenson, N., Thompson, J., & Uriarte, M. (2010). Advances in

the use of DNA barcodes to build a community phylogeny for tropical trees in

Puerto Rica forest dynamic plot. PLoS ONE , 5, e00134.

Kress, W., & Erickson, D. (2007). A two locus global DNA barcode for land plants: the

coding rbcL gene complements the non-coding trnH-psbA spacer region. PLoS

One , e508.

Kress, W., Erickson, D., Jones, F., Swenson, N., & Perez, R. (2009). DNA barcodes and

a community phylogeny of a tropical forest dynamics plot in Panama . Proc Natl

Acad Sci , 106, 18621-18626.

Lahaye, R., van der Bank, M., Bogarin, D., Warner, J., Pupulin, F., Gigot, G., Maurin,

O., Duthoit, S., Barraclough, T. G., Savolainen, V. (2008). DNA barcoding the

floras of biodiversity hotspots. Proc. Natl. Acad. Sci. USA , 105, 2923–2928.

Launert, E. (1978). Flora Zambesiaca (Vol. 4). London: Flora Zambesiaca Managing

Commitee.

Lawson, S., & MacFaul, L. (2010). Illegal Logging and Related Trade Indicators of the

Global Response. Chatham House. Great Britain: The Royal Institute of

International Affairs.

Lee, A. B., & Cooper, T. A. (1995). Improved direct PCR screen for bacterial colonies:

wooden toothpicks inhibit PCR amplification. Biotechniques , 18, 225-226.

Page | 122

Levin, R., Wagner, W., Hoch, P., Nepokroeff, M., Pires, J., Zimmer, E., Sytsma, K.

(2003). Family-level relationships of Onagraceae based on chloroplast rbcL and

ndhF data. American Journal of Botany , 107-115.

Liepelt, S., Sperisen, C., Deguilloux, M. F., Petit, R. J., Kissling, R., Spencer, M.,

Beaulieu, J. L., Taberlet, P., Gielly, L., Ziegenhagen, B. (2006). Authenticated

DNA from Ancient Wood Remains. Annals of Botany , 98, 1107-1111.

Lindahl, T. (1993). Instability and decay of the primary structure of DNA. Nature , 362,

709-715.

Little, D. (2011). DNA barcode sequence identification incorporating taxonomic

hierarchy and within taxon variability . PLoS ONE , 6, e20552.

Little, D., Knopf, P., & Schulz, C. (2013). DNA barcode identification of

Podocarpaceae the second largest family. PLoS ONE , 8, e81008.

Lowenstein, J., & Kolokoyronis, S. (2010). The real maccoyii: identifying tuna sushi

with DNA barcodes = contrating characteristic attributes and genetic distances.

PLoS ONE , 4, e7866.

Mackenzie, C. (2006). Forest governance in Zambezia, Mozambique: Chinese

takeaway! Zambezia: Fongza.

Meier, R. (2008). DNA Sequences in Taxonomy: Opportunities and Challenges. In Q.

D. Wheeler (Ed.), The New Taxonomy (pp. 95–127). New York: CRC Press.

Meier, R., Shiyang, K., Vaidya, G., & PKL, N. (2006). DNA barcoding and taxonomy

in Diptera: a tale of high intraspecific variability and low identification success.

Systematic Biology , 55, 715-728.

Page | 123

Meier, R., Zhang, G., & Ali, F. (2008). The use of mean instead of smallest interspecific

distances exaggerates the size of the ‘‘barcoding gap’’ and leads to

misidentification. Systematic Biology 57 , 809–813.

Meier, R., Zhang, G., & Ali, F. (2008). The use of mean instead of smallest interspecific

distances exaggerates the Size of the “barcoding Gap” and leads to

misidentification. Systematic Biology , 57 (5), 809–813.

Meyer, C. P., & Paulay, G. (2005). DNA barcoding: error rates based on comprehensive

sampling. PLoS Biology , 3, 2229–2238.

Meyer, C., & Paulay, G. (2005). DNA barcoding: error rates based on comprehensive

sampling. PLoS Biology 3 , 2229–2238.

Mosse, M. (2007). Corruption and Reform in the Customs in Mozambique. Maputo:

Centro de Integridade Publica (CIP).

Neuhaus, H., & Link, G. (1987). The chloroplast tRNA Lys (UUU) gene from mustard

(Sinapis alba) contains a class II intron potentially coding for a maturase-related

polypeptide. Current Genetics , 11, 251-257.

Newsom, D., Bensel, T., & Bahn, V. (2008). Are There Economic Benefits from Forest

Stewardship Council (FSC) Certification? An Analysis of Pennsylvania State

Forest Timber Sales. Richmond: Rainforest Alliance .

Nhantumbo, I., & Macqueen, D. (2003). Direitos das Comunidades: Realidade ou

retórica. Maputo: Direcção Nacional de Floresta e Fauna Bravia (DNFFB).

Ohyama, M., Baba, K., & Itoh, T. (2001). Wood identification of Japanese

Cyclobalanopsis species (Fagaceae) based on DNA polymorphism of the

intergenic spacer between trnT and trnL 5'exon. Journal of Wood Science , 47

(2), 81-86.

Page | 124

Palumbi, S. R. (1996). Nucleic acids II: the polymerase chain reaction. In D. M. Hillis,

C. Moritz, & B. K. Mable (Eds.), Molecular Systematics (2nd ed., pp. 241-246).

Sunderland: Sinauer & Associates Inc. Publishers.

Parducci, L., & Petit, R. J. (2004). Ancient DNA: Unlocking Plants' Fossil Secrets. New

Phytologist , 161, 335-339.

Parmentier, I., Duminil, J. K., Philippe, M., Thomas, D. W., Kenfack, D., Chuyong, G.

B., Cruaud, C., Hardy, O. J. (2013). How Effective Are DNA Barcodes in the

Identification of African Rainforest Trees? PLOS ONE , 8, e54921.

Pettengill, J. (2010). An evaluation of candidate plant DNA barcodes and assignment

methods in diagnosing 29 species in the genus Agalinis (Orobanchaceae).

American Journal of Botany , 97, 1381-1406.

Ploeg, A., Bassleer, G., & Hensen, R. (2009). Biosecurity in the Ornamental Aquatic

Industry. Ornamental Fish International , 148.

Rachmayanti, Y. (2009). Isolation of DNA from unprocessed and processed wood of

Dipterocarpaceae. Doctoral dissertation, Georg-August University of Göttingen,

Department of Forest Genetics and Forest Tree Breeding.

Rachmayanti, Y., Leinemann, L., Gailing, O., & Finkeldey, R. (2009). DNA from

processed and unprocessed wood: Factors influencing the isolation success.

Forensic Science International: Genetics , 3, 185-192.

Rachmayanti, Y., Leinemann, L., Gailing, O., & Finkeldey, R. (2006). Extraction,

amplification and characterization of wood DNA from dipterocarpaceae. Plant

Molecular Biology Reporter , 24, 45–55.

Page | 125

Republic of South Africa. (2012). Notice of the list of protected trees under the national

forests act, 1998 (ACT NO 84 OF 1998). Goverment Gazette , 35648 (716), 8-

10.

Resource Extraction Monitoring. (2008). Independent Monitoring Cameroon: Progress

in tackling illegal logging in Cameroon. Cambridge, Yaoundé: Resource

Extraction Monitoring (REM).

Reyes, D. (2003). An Evaluation of Commercial Logging in Mozambique . Cambridge:

Collaborative for Development Action.

Reynolds, M. M., & Williams, C. G. (2004). Extracting DNA from submerged pine

wood. Genome , 47, 994-997.

Ribeiro, V. (2008, August). An overview of the problems faced by Mozambique's

forests, forest-dependent peoples and forest workers. WRM Bulletin .

Rogers, S. O., & Kaya, Z. (2006). DNA From Ancient Cedar Wood From King Midas’

Tomb, Turkey, and Al-Aksa Mosque, Israel. Silvae Genetica , 55, 54-62.

Rosenberg, N. A. (2007). Statistical tests for taxonomic distinctiveness from

observations of monophyly. Evolution 61 , 317–323.

Saitou, N., & Nei, M. (1987). The Neighbor-joining Method: A New Method for

Reconstructing Phylogenetic Trees. Mol. Biol. Evol. , 4, 406-425.

Sanjur, O. I., Piperno, D. R., Andres, T. C., & Wessel-Beaver, L. (2002). Phylogenetic

relationships among domesticated and wild species of Cucurbita (Cucurbitaceae)

inferred from a mitochondrial gene: Implications for crop plant evolution and

areas of origin. Proc. Natl. Acad. Sci. USA , 99, 535–540.

Page | 126

Savolainen, V., Cowan, R. S., Vogler, A. P., Roderick, G. K., & Lane, R. (2005).

Towards writing the encyclopaedia of life: an introduction to DNA barcoding.

Phil. Trans. R. Soc. B , 360, 1805-1811.

Savolainen, V., Cuénoud, P., Spichiger, R., Martinez, M. D., Crèvecoeur, M., & Manen,

J.-F. (1995). The use of herbarium specimens in DNA phylogenetics: evaluation

and improvement. Plant Systematics and Evolution , 197, 87-98.

Schindel, D. E., & Miller, S. E. (2005). DNA barcoding a useful tool for taxonomists.

Nature , 435, 17.

Schmidt, E., Lötter, M., & McCleland, E. S. (2002). Trees and Shrubs of Mpumalanga

and Kruger National Park (illustrated ed.). (J. E. Burrows, Ed.) Johannesburg:

Jacana Media.

Scholes, B., Ajavon, A.-L., Nyong, T., Tabo, R., Vogel, C., & Ansorge, I. (2008).

Global Environmental Change (including Climate Change and Adaptation) in

sub-Saharan Africa. ICSU Regional Office for Africa.

Seneca Creek Associates, Wood Resources International. (2004). “Illegal” Logging and

Global Wood Markets: The Competitive Impacts on the U.S. Wood Products

Industry. American Forest & Paper Association (p. 5). Seneca Creek Associates,

LLC & Wood Resources International, LLC.

Shneyer, V. S. (2009). DNA barcoding Is a new approach in comparative genomics of

plants. Russian Journal of Genetics , 45, 1436–1448.

Smith, W. (2002). The global problem of illegal logging. Tropical Forest Update , 12,

3-5.

Soltis, D. E., & Soltis, P. S. (1998). Choosing an approach and an appropriate gene for

phylogenetic analysis. In D. E. Soltis, P. S. Soltis, & J. J. Doyle (Eds.),

Page | 127

Molecular Systematics of Plants II: DNA Sequencing (pp. 21–24). Dordrecht:

Kluwer.

Soltis, D. E., Smith, S. A., Cellinese, N., Wurdack, K. J., Tank, D. C., Brockington, S.

F., Refulio-Rodriguez, N. F., Walker, J. B., Moore, M. J., Carlsward, B., Bell, C.

D. (2011). Angiosperm phylogeny: 17 Genes, 640 Taxa. American Journal of

Botany , 98, 704–730.

Steinke, D., Zemlak, T. S., & Herbert, P. (2009). Barcoding Nemo:DNA-based

identifications for the ornamental fish trade. PLoS ONE 4 , e3600.

Swofford, D. L. (2002). Patent No. Sinauer Associates. Massachusetts.

Taberlet, P., Coissac, E., Pompanon, F., Gielly, L., & Miquel, C. (2007). Power and

limitations of the chloroplast trnL (UAA) intron for plant DNA barcoding.

Nucleic Acids Research , 35, 1-8.

The Confederation of European Paper Industries. (2002, August 26). The European pulp

and paper industry’s position against illegal logging and the trade of illegally

harvested wood. Press Release .

The Wood Explorer. (2012). The Wood Explorer Species Listl. Retrieved February 2,

2012, from http://www.thewoodexplorer.com/specieslist1.html

Tripathi, A., Tyagi, A., Kumar, A., Singh, A., Singh, S., Chaudhary, L.B., Roy, S.

(2013). The internal transcribed spacer (ITS) region and trnH-psbA are suitable

candidates loci for DNA barcoding of tropical tree species of india . PLoS ONE ,

8, e57934.

Tudo Legal. (2009). Annex I: Classification lists of the timber-producing species

forseen in article 11(1) of the regulations of law on forestry and wildlife. Club of

Mozambique, Lda.

Page | 128

UNCCD. (2012). Zero net land degradation. Bonn: Ediouro Grafica e Editora, Brazil. van Velzen, R., Weitschek, E., Felici, G., & Bakker, F. T. (2012). DNA Barcoding of

recently diverged Species: relative performance of matching methods. PLoS

ONE , 7, e30490.

Van Wyk, B. A., & Van Wyk, P. (1997). Field guide to trees of southern Africa. Cape

Town: Struik Publishers.

Vijayan, K., & Tsou, C. H. (2010). DNA barcoding in plants: taxonomy in a new

perspective. Current Science , 99, 1530-1541.

Will, K., & Rubinoff, D. (2004). Myth of the molecule: DNA barcodes for species

cannot replace morphology for identification and classification. Cladistics , 20,

47-55.

Wolfe, K. H., Li, W.-H., & Sharp, P. M. (1987). Rates of nucleotide substitution vary

greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proc. Natl.

Acad. Sci. USA , 84, 9054-9058.

World Bank, The International Bank for Reconstruction and Development. (2006).

Strengthening Forest Law Enforcement and Governance: Addressing a Systemic

Constraint to Sustainable Development. Environment and Agriculture and Rural

Development Departments, Washington.

World Bank, 2005 world development indicators. (2005). Intrnational Bank for

Recontruction and Development. Washington.

Zahnen, J. (2008). Foreword from WWF - Germany: Deforestation = CO2 + Distinction

of species. Proceeding of the International Workshop “Fingerprinting methods

for the identification of timber origins”. In B. Degen (Ed.). (p. 5). Bonn:

Landbauforschung vTI Agriculture and Forestry Research.

Page | 129

Zurawski, G., Perrot, B., Bottomley, W., & Whitfeld, P. R. (1981). The structure of the

gene for the large subunit of ribulose 1,5-bisphosphate carboxylase from spinach

chloroplast DNA. Nucleic Acids Research , 9, 3251-3270.

Page | 130

7. Supplementary information

Page | 131

Page | 132

Page | 133

Page | 134

Page | 135

Page | 136

Page | 137

Figure 7-1 Specimen illustrations as submitted on BOLD systems database, an additional scan of the herbarium voucher specimen is available for every specimen sampled.

Page | 138