<<

Genet Resour Crop Evol DOI 10.1007/s10722-012-9847-1

RESEARCH ARTICLE

Genetic diversity and parentage in farmer varieties of cacao ( cacao L.) from Honduras and Nicaragua as revealed by single nucleotide polymorphism (SNP) markers

Kun Ji • Dapeng Zhang • Lambert A. Motilal • Michel Boccara • Philippe Lachenaud • Lyndel W. Meinhardt

Received: 30 September 2011 / Accepted: 23 April 2012 Ó Springer Science+Business Media B.V. (outside the USA) 2012

Abstract Cacao ( L.) is the main multilocus matching identified six synonymous groups, source for with an annual production of four including 14 Criollo and two Amelonado varieties. A million tons worldwide. This Neotropical crop was moderately high level of genetic diversity was observed domesticated in as far back as 3,000 years in these farmer varieties, indicating the possibility to ago. Knowledge of genetic diversity and population further explore intra-population variation and breed for structure in farmer varieties of cacao in the center of fine-flavored cocoa. Multivariate analysis showed clus- domestication is essential for sustainable production of tering of the 84 farmer accessions in five genetic groups: fine-flavored cacao beans and contributes to in situ/on- ancient Criollo, Amelonado, Trinitario (including Nic- farm conservation of farmer varieties. Based on 70 aragua Trinitario and Honduras Trinitario) and Upper single nucleotide polymorphism markers, we analyzed Amazon Forastero (only one accession). The Honduras 84 fine-flavored farmer varieties collected from tradi- Trinitario differed from the Nicaragua Trinitario group. tional cacao farms in Honduras and Nicaragua. The The clustering results largely supported the perceived study also included 31 clones from the international classification of cacao by local farmers and researchers, cacao collections to serve as references. The SNP based which was mainly based on morphological traits. However, the well known traditional variety ‘‘Indio’’ in this region was identified as synonymous with & K. Ji Á D. Zhang ( ) Á L. W. Meinhardt Amelonado. Parentage analysis showed that the variety Beltsville Agricultural Research Center, PSI, SPCL, USDA/ARS, BARC-W, 10300 Baltimore Avenue, ‘‘Indio’’ (or Amelonado) contributed more to the Bldg. 001, Rm. 223, Beltsville, MD 20705, USA Trinitario type farmer varieties, whereas ancient Criollo e-mail: [email protected] had less influence. The present study demonstrates the efficacy of using a small set of SNP makers for cacao K. Ji College of and Landscape, Southwest germplasm characterization, and further depicts the University, Chongqing 400715, China diverse origins and parentage in farmer varieties from Mesoamerica. This information thus will be highly L. A. Motilal Á M. Boccara useful for conservation and utilization of cacao germ- Cocoa Research Unit, The University of the West Indies, St. Augustine, Trinidad, Rep., Trinidad and Tobago, plasm from this region. West Indies Keywords Cacao landraces Á Chocolate Á M. Boccara Á P. Lachenaud Conservation Á Germplasm Á Genetic diversity Á Centre de coope´ration internationale en recherche agronomique pour le de´veloppement, Montpellier Molecular markers Á Mesoamerica Á Cedex 5, France Theobroma cacao Á Tropical tree 123 Genet Resour Crop Evol

Introduction Motamayor et al. 2002). Today, few of the cultigens used in Mesoamerica are ancient Criollo, although this Theobroma cacao (2n = 2x = 20) also referred to as landrace can still be found in Mesoamerica, such as in cacao, is an important tropical crop native to South rainforest associated with Mayan ruins (Mooledhar America (Cuatrecasas 1964; Dias 2001; Smith 1999). et al. 1995; Motamayor et al. 2002). In addition to the Cacao is cultivated extensively as the source of cocoa Criollo, many hybrids of Criollo (known as Trinitario), butter and powder for the confectionery industry, are also renowned for their distinct aroma making with an annual production of four million tons (ICCO them preferred raw materials for fine flavored choc- 2010). This species comprises a large number of olate. These fine flavoured farmer varieties, with the morphologically variable populations, which can all genetic background of Criollo and Forastero, are be crossed with each other (Cheesman 1944; Ken- highly valuable germplasm for production or for nedy 1995; Pound 1945). The Upper Amazon is future breeding of high quality cocoa. However, until generally believed to be the center of origin of cacao, recently, research tools have not been available for judged by the great morphological diversity observed accurate identification of traditional varieties and in this region (Bartley 2005; Pound 1945) and allelic reports on the multi-cluster structure of on-farm diversity based on molecular markers (Zhang et al. diversity of cacao are still scarce. Using microsatellite 2012). markers, Motilal et al. (2010) characterized genetic Cacao was domesticated in Mesoamerica as far diversity in 77 Criollo accessions collected back as 3,000 years ago. The earliest known evidence from the Maya Mountains in Belize and identified 11 for cacao use dates between about 1900–1500 BC. distinctive genotypes that belonged to Criollo among Researchers investigated residues on the interior of the Belizean germplasm. Trognitz et al. (2011) bowls dated to the earliest Maya and Olmec cultures analyzed the allelic composition and genetic structure (Go´mez-Pompa et al. 1990; Henderson et al. 2007; of cacao sampled from 44 farms in Waslala, Nicara- Powis et al. 2011). This crop was widely spread in gua. They found only three putative founder-genotype central and even before the Spanish spectra (lineages) in the local accessions and one of arrived, but the commercial production began in the them is likely ancient Criollo. 18th century (De la Cruz et al. 1995; Young 2007; Coe Single nucleotide polymorphisms (SNPs) are the and Coe 1996; Ogata et al. 2006). most abundant class of polymorphisms in The cultivated cacao has been traditionally subdi- (Buckler and Thornsberry 2002; Rafalski vided into three main groups, including Criollo, 2002). Compared to SSR markers, SNP assays can be Forastero and Trinitario (Cheesman 1944; Toxopeus done without separating DNA by size, and therefore 1985). It was believed that Criollo was the only cacao can be automated in high throughput assay format. variety cultivated in Mesoamerica before the arrival of The di-allelic of SNPs offers a much lower error the Europeans (Bartley 2005; Motamayor et al. 2002). rate in allele calling. Moreover, there is a higher The Forastero cacao encompasses a diverse range of consistency in allele calling across laboratories. While populations from South America (Bartley 2005), but SNP markers have been widely used in plant varietal they were not used in production until the middle of identification in many other crops (Ganal et al. 2009), the 18th century. Forastero cacao was brought to the the efficacy of using SNP markers for cacao genotype traditional cocoa producing regions, including Central identification and diversity assessment remain to be America and the , when the cacao planta- investigated. Recent effort in sequencing a large tions were devastated by the unknown diseases. collection of cacao cDNA was carried out through A Trinitario population is believed to have naturally an international project (Argout et al. 2008). Forty-five occurred in Trinidad by hybridization between rem- cDNA libraries were constructed from a wide panel of nant Criollo and the Forastero germplasm that was organs, including flowers, cherelles, pod cortex, shoot, introduced from in the middle of the 18th root, germinated , and embryos from in vitro century (Cheesman 1944; Bartley 2005). culture (Lanaud et al. 2006). The developed EST The exact origin of the Criollo cacao remains collection allowed the identification of large number unknown, although Venezuela has been suggested of SNP markers (Lanaud et al. 2006) and some of as its putative home of origin (Van Hall 1914; which are being used in association mapping using 123 Genet Resour Crop Evol whole scanning, as well as in diversity Table 1 List of 84 farmer varieties from Honduras and Nic- analysis (Argout, Boccara and Lanaud, personal aragua and 31 reference clones from International Cacao communication). Genebanks used in SNP genotyping Large scale genotyping using a small set of SNP Country Perceived variety name No. of a markers is still in great demand by the cacao or germplasm groups accession community for a broad range of research and field Honduras Criollo 14 applications. To name a few, these applications Honduras Honduras Trinitario 13 include identification of mislabeled accessions, par- Honduras Indio Amarillo 2 entage and sibship analysis for quality control in Honduras Indio Alargado Amarillo 2 breeding and programs, and characterization of Honduras Indio Rojo 1 farmer selections to backstop the production of fine- Nicaragua Nicaragua Trinitario 31 flavored cocoa. In the present study, we used 100 EST- Nicaragua Criollo 16 derived SNP markers to genotype a group of farmer Nicaragua Indio Rojo 2 varieties collected from traditional farms in Honduras Nicaragua Forastero 1 and Nicaragua. The objectives of the study were (1) to Nicaragua Indio Amarillo 1 assess genetic background and population structure of Nicaragua Trinitario 1 the fine-flavored farmer varieties, and (2) to analyze CATIE, Costa Rica Amelonado 3 parentage of these farmer varieties and further eluci- CATIE, Costa Rica Trinitario 2 date the origin of Trinitario type varieties in Honduras and Nicaragua. The resultant information will help the CATIE, Costa Rica Criollo 1 establishment of a baseline for conservation and ICG,T Upper Amazon Forastero 15 utilization of cacao germplasm from this region. ICG,T Trinitario 7 ICG,T Nacional hybrids 2 ICG,T Criollo 1 Materials and methods The specific accession codes are given in Fig. 2 a Germplasm group of reference clones were determined by Plant materials and DNA sample preparation SSR molecular markers

A total of 84 cacao accessions were used in this experiment (Table 1). These accessions were col- initiative of DNA fingerprinting of cacao germplasm lected by agricultural research institutes, non-govern- (Zhang et al. 2006). The majority of the accessions ment organizations (NGOs), private companies of fine were maintained in the two international genebanks in chocolate industry, and farmer communities in Hon- Trinidad and Costa Rica (Boccara and Zhang 2006; duras and Nicaragua. These accessions were consid- Zhang et al. 2009; Motilal et al. 2010). ered by the providers as fine-flavored varieties and most of them are in propagation. The samples used for SNP markers and genotyping DNA fingerprinting profiles were of various ages collected from individual cacao on local One hundred SNP markers were selected from 1,560 farms. Two healthy young leaves were collected from candidate SNPs developed based on the cDNA each tree, and the samples were air dried and sent to sequences from a wide range of cacao organs (Boc- the USDA Beltsville Agricultural Research Center, cara, personal communication; Argout et al. 2008). Maryland, USA for genotyping. Cacao DNA was The selection was based on the level of polymorphism extracted from tissue using the DNeasy Plant and their distribution across the ten in System (Qiagen Inc., Valencia, CA, USA) according cacao. SNP genotyping was performed at the Human to Saunders et al. (2004). In addition, DNA samples of Genetics Division Genotyping Core facility, Wash- 31 international clones were included in this experi- ington University, St. Louis, using MALDI-TOF mass ment as reference clones. The genetic identities of spectrometry (product of Sequenom Inc.). (http://hg. these clones have been known through an international wustl.edu/info/Sequenom_description.html).

123 Genet Resour Crop Evol

Data analysis TRD 86. Criollo and Lower Amazon Amelonado accessions, both from the 84 farmer selections and the Informativeness of SNP markers and genetic diversity 31 reference clones, were used as candidate parents. In in farmer varieties addition, reference clones of Upper Amazon Forastero type were included as candidate parents. A likelihood- Key descriptive statistics for measuring the informa- based method implemented in the program CERVUS tiveness of these 100 SNP markers were calculated, 3.0 (Marshall et al. 1998; Kalinowski et al. 2007) was including minor allele frequency, observed heterozy- used for computation. For each parent–offspring pair, gosity, expected heterozygosity, Shannon’s informa- the natural logarithm of the likelihood ratio (LOD tion index, and probability of identity (Evett and Weir score) was calculated. Critical LOD scores were 1998; Waits et al. 2001). The program GenAlEx 6.0 determined for the assignment of parentage to a group (Peakall and Smouse 2006) was used for computation. of individuals without knowing the maternity or For duplicates identification, pair-wise multi-locus paternity. Simulations were run for 10,000 cycles; matching was applied among individual varieties, as assuming that 80 % of candidate parents were sam- well as with the reference clones, using the same pled, a total of 80 % of loci was typed and a 1 % program. Accessions with different names but which typing error rate. The most probable single mother (or were fully matched at the genotyped SNP loci were father) for each offspring was identified on the basis of declared duplicates or synonymous accessions. the critical difference in LOD scores (D) between the Distance-based multi-variant analysis was used to most likely and next most likely candidate parent at assess the relationship among the individual farmer greater than 95 or 80 % confidence (Marshall et al. varieties, as well as their relationship with reference 1998; Kalinowski et al. 2007). clones from international geneabanks. Pair-wise Euclidean distance was computed for every pair of accessions using the genetic distance procedure in Results GenAlEx 6.0 (Peakall and Smouse 2006). The same program was then used to perform principal coordi- Frequency of SNP markers and descriptive nates analysis (PCoA), based on the pair-wise distance statistics matrix. Both distance and covariance were standard- ized. A total of thirty-one international accessions with Out of the 100 genotyped SNP markers, four were established population identity were included in the monomorphic across the 115 cacao accessions. analysis. Another 26 markers generated SNP profiles with more Cluster analysis was used to further examine the than 15 % missing data partially due to the low quality genetic relationship among accessions. Kinship coef- DNA extracted from some air- dried cacao leaves. The ficient among individual accessions (n = 115) was remaining 70 SNP markers were used in the data calculated using MSA (Microsatellite Analyser; Die- analysis and their descriptive statistics was presented ringer and Schlo¨tterer 2003). A dendrogram was in Table 2. The average of observed heterozygosity, generated from the resulting distance matrix using the expected heterozygosity in the 84 farmer varieties is NEIGHBOR Joining algorithm (Saitou and Nei 1987) slightly lower and the inbreeding coefficient is higher available in PHYLIP (Felsenstein 1989), and visual- than those in the 31 reference clones, but overall their ized using the program TreeView Version 1.6.6 (Page, diversity levels are comparable (Table 3). 1996). Parentage analysis was applied to verify the origin Duplicate identification of the hybrid type farmer selections. Farmer selections that were not classified as Criollo or Lower Amazon Individual genotype matching (pair-wise compari- Forastero by the PCoA and clustering analysis were sons) based on the 70 SNP identified six synonymous considered as ‘‘offspring’’ for which parentage anal- groups including 16 accessions (Table 4). The mem- yses were carried out. The ‘‘offspring’’ accessions also bers within each group shared identical multi-locus included Trinitario reference clones, such as ICS 6, SNP profiles thus met our definition of duplicates or ICS39, ICS 95, ICS97, OC 71, RIM 113, TRD 37 and synonymous accessions. The probability of identity 123 Genet Resour Crop Evol

Table 2 Information index, heterozygosity and minor allele frequency of the 70 SNP loci scored on 84 farmer selections from Honduras and Nicaragua and 31 reference clones from International Genebanks SNP locus Shannon’s Observed Expected Minor allele information index heterozygosity heterozygosity frequency

TcSNP25 0.229 0.070 0.114 0.06 TcSNP75 0.633 0.202 0.441 0.33 TcSNP90 0.655 0.236 0.463 0.36 TcSNP139 0.692 0.345 0.499 0.47 TcSNP144 0.692 0.322 0.499 0.47 TcSNP150 0.685 0.319 0.492 0.44 TcSNP151 0.692 0.303 0.498 0.47 TcSNP174 0.693 0.268 0.500 0.49 TcSNP189 0.693 0.378 0.500 0.50 TcSNP193 0.692 0.266 0.499 0.48 TcSNP226 0.633 0.202 0.441 0.33 TcSNP230 0.693 0.277 0.500 0.49 TcSNP242 0.685 0.274 0.492 0.44 TcSNP290 0.306 0.043 0.166 0.09 TcSNP309 0.693 0.250 0.499 0.48 TcSNP329 0.688 0.304 0.495 0.45 TcSNP364 0.679 0.278 0.486 0.42 TcSNP372 0.669 0.261 0.476 0.39 TcSNP429 0.264 0.096 0.137 0.07 TcSNP448 0.556 0.216 0.369 0.24 TcSNP469 0.687 0.254 0.493 0.44 TcSNP480 0.333 0.117 0.186 0.10 TcSNP529 0.379 0.165 0.220 0.13 TcSNP534 0.690 0.288 0.497 0.46 TcSNP560 0.688 0.336 0.495 0.45 TcSNP573 0.205 0.052 0.099 0.05 TcSNP577 0.578 0.184 0.389 0.26 TcSNP591 0.637 0.118 0.444 0.33 TcSNP602 0.448 0.226 0.276 0.17 TcSNP619 0.692 0.296 0.499 0.48 TcSNP633 0.309 0.093 0.169 0.09 TcSNP702 0.665 0.243 0.472 0.38 TcSNP723 0.680 0.239 0.487 0.42 TcSNP731 0.681 0.184 0.488 0.42 TcSNP750 0.217 0.061 0.107 0.06 TcSNP786 0.088 0.017 0.034 0.02 TcSNP799 0.665 0.174 0.472 0.38 TcSNP823 0.616 0.330 0.425 0.31 TcSNP836 0.344 0.113 0.194 0.11 TcSNP852 0.335 0.191 0.187 0.10 TcSNP872 0.671 0.233 0.478 0.40 TcSNP878 0.693 0.254 0.500 0.50 TcSNP886 0.681 0.217 0.488 0.42

123 Genet Resour Crop Evol

Table 2 continued SNP locus Shannon’s Observed Expected Minor allele information index heterozygosity heterozygosity frequency

TcSNP891 0.438 0.019 0.267 0.16 TcSNP899 0.693 1.000 0.500 0.50 TcSNP917 0.451 0.140 0.278 0.17 TcSNP928 0.675 0.170 0.482 0.41 TcSNP953 0.289 0.080 0.154 0.08 TcSNP998 0.669 0.261 0.476 0.39 TcSNP999 0.419 0.010 0.252 0.15 TcSNP1038 0.691 0.278 0.498 0.47 TcSNP1060 0.654 0.252 0.461 0.36 TcSNP1062 0.460 0.080 0.286 0.17 TcSNP1063 0.584 0.194 0.395 0.27 TcSNP1075 0.335 0.143 0.187 0.10 TcSNP1111 0.639 0.301 0.446 0.34 TcSNP1126 0.654 0.200 0.461 0.36 TcSNP1159 0.684 0.205 0.491 0.43 TcSNP1253 0.621 0.250 0.430 0.31 TcSNP1270 0.481 0.118 0.303 0.19 TcSNP1280 0.679 0.261 0.486 0.42 TcSNP1309 0.685 0.313 0.491 0.43 TcSNP1331 0.693 0.270 0.500 0.49 TcSNP1378 0.679 0.254 0.486 0.42 TcSNP1414 0.285 0.078 0.152 0.08 TcSNP1439 0.687 0.283 0.493 0.44 TcSNP1442 0.689 0.239 0.496 0.46 TcSNP1453 0.565 0.266 0.377 0.25 TcSNP1458 0.679 0.226 0.486 0.42 TcSNP1484 0.687 0.243 0.494 0.44

Table 3 Comparison of Group Shannon’s Observed Expected Inbreeding genetic diversity information heterozygosity heterozygosity coefficient (Shannon’s information index index, observed heterozygosity, expected Farmer varieties (n = 84) heterozygosity and inbreeding coefficient) Mean 0.531 0.206 0.367 0.360 between farmer varieties SE 0.023 0.017 0.019 0.046 from Honduras and Reference clones (n = 31) Nicaragua and reference Mean 0.567 0.246 0.385 0.370 clones SE 0.015 0.017 0.013 0.033 among siblings (PID-sib), which was defined as the accessions under investigation, predicted that a set of probability that two sibling individuals drawn at 26 most informative loci were necessary, yet suffi- random from a population have the same multilocus cient, to distinguish between accessions with above genotype (Waits et al. 2001), calculated from the 115 99.999 % certainty.

123 Genet Resour Crop Evol

Table 4 Identified synonymous groups in 84 farmer varieties Venezuela and RIM clones from , indicating from Honduras and Nicaragua their Trinitario background. A substantial amount of Synonymous Accessions Country genetic diversity existed in this group. The third groupa cluster was comprised exclusively of all the Lower Amazon Forastero clones, which shared high similar- 1 BE07HON 17 Honduras Criollo ity with the reference Amelonado (Amelonado 15, CR07HON 09 Honduras Criollo Amelonado 22 and SIAL 325). The fourth cluster EPI07HON 16 Honduras Criollo consisted of most of the perceived Trinitario from 2 BE07HON 18 Honduras Criollo Honduras, which differed from the Nicaragua Trini- EC07HON 12 Honduras Criollo tario varieties in cluster #3. Their closer proximity 3 CR07HON 10 Honduras Criollo with the reference clones of Upper Amazon Forastero EPI07HON 15 Honduras Criollo and Ecuadorian National hybrids (CLM 78 and JA Matagalpa 13 Nicaragua Criollo 10/33), indicated that they were not the classically Matagalpa 20 Nicaragua Criollo defined Trinitario and most likely has gene introgres- Criollo 13 Reference Criollo sion from Upper Amazon Forastero. The fifth cluster 4 MCI07HON 13 Honduras Criollo included exclusively all the reference clones of Upper TR07HON 11 Honduras Criollo Amazon Forastero and Ecuadorian Nacional hybrids. 5 Francisco Nicaragua Criollo Mercedes Nicaragua Criollo Parentage analysis Oscar Nicaragua Criollo Criollo 22 Reference Criollo Out of the 53 perceived Trinitario type farmer 6 IEZ07HON 4 Nicaragua Amelonado varieties, twenty-eight were assigned paternal or Matagalpa 99 Nicaragua Amelonado maternal parents at the confidence level of 80 %. In Amelonado 15 Reference Amelonado addition, candidate parents were also assigned to five a Duplicated or shared genotypes were identified using multi- reference Trinitario clones (Table 5). Amelonado was locus genotype matching (Waits et al. 2001) responsible for assigned parentage of 23 farmer varieties and one reference clone (TRD 86). However, Out of the six synonymous groups, three fully no farmer varieties were found to have direct parent- matched with the reference clones of Criollo 13, age contribution from ancient Criollo, except that Criollo 22 and Amelonado 15 respectively. In total, Matagalpa 34 from Nicaragua showed a Criollo duplicated accessions accounted for 11.9 % of the parentage at 80 % confidence level. In contrast, farmer varieties. Criollo parentage was identified with high confidence ([95 %) in reference Trinitario clones OC 77, ICS 95 Genetic relationship among individual varieties and ICS 39, and with 80 % confidence in RIM 113. Four Upper Amazon Forastero clones (IMC 27, IMC The genetic relationships among the 84 farmer selec- 47, IMC 63 and JA 10/33) were identified as likely tions, as well as the 31 reference clones, are presented parents for three farmer varieties and for reference in Figs. 1 and 2. Both Euclidian distance-based clone ICS 97. The result of parent-offspring assign- principal coordinates analysis (Fig. 1) and Kinship ment is largely compatible with the kinship based Coefficient-based cluster analysis (Fig. 2) clearly cluster analysis (Fig. 2). Farmer varieties assigned as separated the 115 tested accessions into five clusters. offspring from the same parent tended to be grouped The 84 farmer selections can also be approximately together in the Neighbor Joining tree (Fig. 2). The classified into five clusters (Fig. 2). The first cluster perceive names of the analyzed 84 famer accessions was comprised exclusively of ancient Criollo, which appeared to be well agreed with the clusters identified had little intra-group diversity. The second cluster by SNP genotyping. Out of 28 trees names as Criollo, included most of the hybrid farmer varieties from twenty-two appeared to be ancient Criollo genotypes Nicaragua. These Nicaraguan farmer varieties were and two out of three trees perceived as Nicaragua scattered amongst the ICS and TRD clones from Trinitario clustered in the same group as Trinitario Trinidad, as well as the OC (Ocumare) clones from reference genotypes. Twelve of the 13 trees perceived 123 Genet Resour Crop Evol

MO 80 MO 121 MO 96 MO 99 5. Upper Amazon Forastero SCA 6

IMC 63 NA 232 IMC 6 IMC 47 NA 127 IMC 30 IMC 27 NA 178 IMC 65 NA 191

Coordinate 2 4. Honduras Trinitario CLM 78 1. Ancient Criollo JA 10 ICS 97

Amelonado 15 CrRIOLLO 13 TRD 37 ICS 39 Amelonado 22 CRIOLLO 22 SIAL 325 TRD 86 ICS 6 RIM 113 ICS 95 OC 77 3. Lower Amazon Forastero ICS 76 2. Nicaragua Trinitario

Coordinate 1

Reference genotypes Honduras farmer variees Nicaragua farmer variees

Fig. 1 PCoA plot of 115 cacao accessions, including 84 farmer main PCO axes accounted for 89.6 % of total variation. First varieties from Honduras and Nicaragua, and 31 reference clones axis = 63.5 % of total information, the second = 18.4 % and from the International Genebanks. The plane of the first three the third = 7.7 % as Honduras Trinitario clustered in the Honduras needed further refining in terms of selection based on Trinitario group, together with CLM78, a Nacional criteria of informativeness and minor allele frequency. hybrid genotype (Figs. 1, 2). PID computation in the present study shows that a minimum set of 26 SNPs would provide 99.999 % confidence to identify an individual cacao tree. A Discussion smaller core set of SNP markers can be selected for a broad range of application. SNP markers for cacao genotype identification Genetic diversity in farmer varieties The 100 SNP markers for cacao genotyping have been from Honduras and Nicaragua evaluated using 84 farmer varieties from Mesoamerica and 31 reference clones. The result shows that a set of A multi-cluster structure was detected in the 84 farmer 70 SNP markers were highly accurate in genotype varieties. The structure includes majority of the main identification and diversity analysis. These 70 SNPs cultivated germplasm groups i.e. Criollo, Lower constitute a cost-effective marker resource suitable for Amazon Forastero, Amelonado and Trinitario. Due cacao germplasm characterization. The genotyping to the interbreeding among the different groups, new result can be compared across different genotyping types of varieties were created and selected by farmers platforms and laboratories, therefore facilitating the (Bartley 2005). In spite of the interbreeding among the integration and interpretation of SNP data across different germplasm groups, ancient Criollo and different genebanks and cacao producing countries. Amelonado are still frequently found in Honduras Our result also showed that this set of SNP markers and Nicaragua. Out of the 84 farmer varieties, twenty- 123 Genet Resour Crop Evol

Fig. 2 Neighbor-joining EPI07HON 16 BE07HON 17 dendrogram depicting the Tiburcio 4 Oscar relationship between 84 Francisco Matagalpa 13 farmer varieties from Matagalpa 20 Criollo22 Nicaragua and Honduras, BE07HON 19 Criollo13 and 31 reference accessions. CR07HON 10 EPI07HON I5 Kinship coefficient was used MC07HON 13 1. Ancient Criollo TR07HON 11 as genetic distances YU07HON 6 BE07HON 18 EC07HON 12 BE07HON 20 CR07HON09 Tiburcio1 Mercedes Tiburcio 5 Tiburcio 2 Tiburcio 3 Matagalpa 34 Matagalpa 166 Matagalpa 212 ICS 95 ICS 39 Matagalpa 100 OC 77 RIM Matagalpa113 302 IC97 IC76 AM 288 RZ350 Matagalpa 228 CH 31 Matagalpa 240 JH 59 Matagalpa 217 Matagalpa 221 Matagalpa 209 GC 303 AM 60 Matagalpa 231 CH 89 2. Trinitario CH 25 Matagalpa 266 CH 21 AM 79 CH 244 Matagalpa 204 GC 2 ICS 6 Matagalpa 16 Matagalpa 62 Matagalpa 203 JH 47 AM 72 CH 68 GC 13 Matagalpa 250 CH 251 CH 263 Matagalpa 206 TRD 37 AM 294 CH 81 Matagalpa 83 TRD 86 Matagalpa 80 Amelonado 22 SIAL3 BN07H ON5CU08HON 17 3. Amelonado Amelonado 15 IEZ07 HON4 Matagalpa 99 CU08HON 3 CU08HON 8 FC07HON 07 CU08HON 6 FC07HON 2 CU08HON 7 CU08HON 9 JOI07HON 8 CLM 78 4. Honduras Trinitario CU08HON 5 CU08HON 10 CU08HON 12 CU08HON 18 CU08HON 13 CU08HON 14 CU08HON 11 CU08HON 19 JA10/33 LA07HON3 GC 26 NA 191 NA 127 NA 178 NA 232 IMC 65 IMC 6 IMC 27 5. Upper Amazon Forastero IMC 30 IMC 47 IMC 63 SCA 6 MO 96 MO 99 MO 121 MO 80

0.07

123 Genet Resour Crop Evol

Table 5 Likelihood Offspring Assigned parenta Parental type LOD scoreb assignment of parentage of 33 farmer varieties and 1 Matagalpa 62 SIAL 325 Amelonado 3.36 reference Trinitario genotypes based on 70 SNP 2 Matagalpa 206 SIAL 325 Amelonado 1.35 markers with LOD scores 3 Matagalpa 16 SIAL 325 Amelonado 12.43 above 80 % probability 4 CU08Hon 8 SIAL 325 Amelonado 10.72 5 CU08Hon 3 SIAL 325 Amelonado 10.15 6 CU08Hon 14 SIAL 325 Amelonado 9.04 7 CH-251 SIAL 325 Amelonado 7.45 8 CU08Hon 9 IEZ07Hon 4 Amelonado 6.76 (Indio Amelonado Amarillo) 9 CU08Hon 10 IEZ07Hon 4 Amelonado 11.53 (Indio Amelonado Amarillo) 10 FC07Hon 2 CU08Hon 17 Amelonado 12.22 (Indio Amelonado Rojo) 11 FC07Hon 1 CU08Hon 17 Amelonado 10.15 (Indio Alargado Amarillo) 12 CU08Hon 5 CU08Hon 17 Amelonado 9.75 13 CU08Hon 13 CU08Hon 17 Amelonado 6.82 14 CU08Hon 12 CU08Hon 17 Amelonado 16.42 15 AM-288 CU08Hon 17 Amelonado 3.57 16 CU08Hon 19 BN07Hon 5 Amelonado 9.61 (Indio Amelonado Amarillo) 17 CU08Hon 11 BN07Hon 5 Amelonado 1.56 (Indio Amelonado Amarillo) 18 TRD86 Amelonado 22 Amelonado 4.64 19 Matagalpa 250 Amelonado 22 Amelonado 12.66 20 Matagalpa 228 Amelonado 22 Amelonado 3.01 21 CU08Hon 18 Amelonado 22 Amelonado 13.36 22 CH-263 Amelonado 22 Amelonado 7.96 23 Matagalpa 100 Amelonado 15 Amelonado 8.72 24 AM-294 Amelonado 15 Amelonado 9.89 25 Matagalpa 34 Criollo 22 Criollo 8.40 a Putative parental accessions used in the 26 OC 77 Oscar Criollo 21.54 present analysis. Only those 27 ICS 95 Oscar Criollo 31.03 with significant LOD score 28 ICS 39 Oscar Criollo 27.31 was listed 29 RIM 113 Criollo 22 Criollo 1.21 b Critical LOD (the natural 30 CH-68 IMC 27 UAF 1.43 logarithm of the likelihood) ratio for assignment of 31 ICS 97 IMC 47 UAF 13.37 parentage are 4.58 at[95 % 32 JOI07Hon 8 JA10/33 UAF 6.81 confidence and 0.85 at 33 GC-26 IMC 63 UAF 13.32 [80 % confidence two were found as pure Criollo and five were pure group of Honduras Trinitario used in the study is an Amelonado and each group was represented by exception. This group of varieties showed their distinct homozygous SNP genotypes across the 70 resemblance with two Ecuadorian Nacional hybrids SNP loci. The SNP analysis also demonstrated that the (JA 10/33), suggesting their association with intro- traditional varieties ‘‘Indio’’ and ‘‘Indio Rojo’’ in duced seeds families of Refractario background. This Mesoamerica are synonymous to Amelonado. The observation agreed with the breeding activity in 123 Genet Resour Crop Evol

Honduras, where hybrid families from CATIE were human intervention (Brush 2000, Bellon 2004). introduced and some of the parents are clones of Farmers’ fields not only provide a natural laboratory Ecuadorian Nacional hybrids and Upper Amazon that allows the crop landraces to continue to generate Forastero. However, these varieties retained the char- new variations; they also allow for the inclusion of acteristic of fine flavor and produced decent yield. farmer-selected agronomic traits (Brush 2000). How- ever, there is a general trend in cacao that traditional Parentage analysis for Trinitario hybrids cacao varieties in farmers’ fields are replaced by introduced bulk varieties, which are more productive Parentage analysis showed that Amelonado made an but are usually inferior in quality. Moreover, multi- important parentage contribution to the Trinitario wave introduction of exotic cacao varieties has led to farmer varieties. A total of 23 Trinitario farmer large scale admixture in Mesoamerica (Bartley 2005). varieties (27 % of 84 accessions) were found to have This study showed that traditional Trinitario, direct parentage from Amelonado. This result is likely ancient Criollo and Amelonado still exist in farmers’ due to the historical popularity of growing Amelonado fields in Mesoamerica. Although these varieties often cacao, such as ‘‘Indio’’ and ‘‘Indio Rojo’’ in this have lower yield and are more susceptible to diseases region. However, the direct parental contribution from and pests, they have acquired a reputation for quality ancient Criollo was surprisingly small. Out of the 53 and are increasingly coveted by external gourmet analyzed Trinitario farmer varieties, only one (Mata- specialty markets. Production and marketing of galpa 34) was found having direct parentage of ancient differentiated high-value cocoa provides an opportu- Criollo. On the other hand, among the five reference nity for conservation through use of threatened cacao Trinitario clones, Criollo parentage was found in ICS diversity. Higher farm-gate revenues from premium 39, ICS 95, OC 77, and RIM 113. market, which has been increasing at an annual rate of Trinitario cacao is generally considered as a hybrid 16 % in the last 5 years, may provide incentives for population of Criollo and Lower Amazon Forastero demand-driven on-farm conservation. Molecular (Cope 1976; Lockwood and Gyamfi 1979; Toxopeus characterization of the traditional cacao varieties at 1985). The present result supported the notion that the the center of cacao domestication will help the so called ‘‘Trinitario hybrids’’ today were derived credibility of the planning process for conservation from different genetic groups. For example, the and provides a sound basis for logical decision- reference clone ICS 97 had parentage contribution making. Specifically, it provides the much needed from Upper Amazon Forastero (Table 5). This present baseline information for monitoring spatial and tem- result is compatible with the microsatellite-based poral changes of in situ/on farm diversity. analysis by Motilal et al. (2010), which demonstrated Promoting high yielding varieties with premium that the ICS clones may have their ancestry from traits will support incentives for local communities and Forastero origin. Ancient Criollo and Lower Amazon private sectors. The farmer varieties assessed in the Forastero can only explain a fraction of the so called present study not only consist of wide scope of Trinitario accessions in terms of their origin. Due to molecular diversity, they also vary greatly in terms of the multi-wave plant introductions and movements, a morphological and agronomic traits, including produc- resulting complex mixture of Trinitario types are tivity, diseases resistance and flavors. It’s possible to expected to arise and may be conveniently artificially explore intra-population diversity and develop high sorted into groups on a geographical basis (Sounigo yielding varieties with gourmet traits and indigenous et al. 2005; Motilal et al. 2010). The direct parentage background. The recent release of high yielding ‘‘Nac- of the majority of the fine-flavored Trinitario varieties ional’’ type of varieties in is a good example. cultivated in Mesoamerica remains to be identified. Acknowledgments We thank Xavier Argout, Claire Lanaud Implication for conservation and utilization and Mathilde Allegre of CIRAD, France for providing the SNP marker sequences. We also thank Shenghui Duan and Cindy of cacao germplasm Helms of the Human Genetics Division Genotyping Core, Washington University School of Medicine for SNP Farmer varieties of crop species often represent a genotyping. We are grateful to TechnoServe for providing preferable combination of natural evolution and cacao leaf samples and passport data. 123 Genet Resour Crop Evol

References genotyping error increases success in paternity assignment. Mol Ecol 16:1006–1099 Argout X, Fouet O, Wincker P et al (2008) Towards the Kennedy AJ (1995) Cacao, Theobroma cacao (). understanding of the cocoa transcriptome: production and In: Smartt J, Simmonds NW (eds) Evolution of crop . analysis of an exhaustive dataset of ESTs of Theobroma Longman, London, pp 472–475 cacao generated from various tissues and under various Lanaud C, Fouet O, Gramacho K, Argout X et al. (2006) A large conditions. BMC 9:512 EST resource for Theobroma cacao including cDNAs iso- Bartley BGD (2005) The genetic diversity of cacao and its uti- lated from various organs and under various biotic and abiotic lization. CAB International, CABI Publishing, Wallingford stresses. In: Proceedings of the 15th international cocoa Bellon MR (2004) Conceptualizing interventions to support research conference, San Jose, Costa Rica, pp 185–191 on-farm genetic resource conservation. World Dev 32: Lockwood G and Gyamfi MMO (1979) The CRIG cocoa 159–172 germplasm collection with notes on codes used in the Boccara M and Zhang D (2006) Progress in resolving identity breeding programme at Tafo and elsewhere. Tech. Bull. 10. issues among the Parinari accessions held in Trinidad: the Cocoa Research Institute, , p 62 contribution of the collaborative USDA/CRU project. Marshall TC, Slate J, Kruuk LEB, Pemberton JM (1998) Sta- CRU Annual Report 2005, Cacao Research Unit, The tistical confidence for likelihood-based paternity inference University of the West Indies, St. Augustine, Trinidad and in natural populations. Mol Ecol 7:639–655 Tobago Mooledhar V, Maharaj WW, O’Brien H (1995) The collection Brush SB (2000) The issues of in situ conservation of crop of Criollo cocoa germplasm in Belize. Cocoa Grower’s genetic resources. In: Brush S (ed) Genes in the field. Bull 49:26–40 IDRC/IPGRI/Lewis Publishers, Ottawa\, p 300 Motamayor JC, Lopez PA, Ortiz CF, Moreno A, Lanaud C Buckler ES, Thornsberry J (2002) Plant molecular diversity and (2002) Cacao domestication. I. The origin of the cacao applications to genomics. Curr Opin Plant Biol 5:107–111 cultivated by the Mayas. Heredity 89:380–386 Cheesman EE (1944) Notes on the nomenclature, classification Motilal L, Zhang D, Umaharan P, Mischke BS, Mooleedhar V, and possible relationships of cocoa populations. Trop Meinhardt LW (2010) The relic Criollo cacao in Belize- Agricult 21:144–159 genetic diversity and relationship with Trinitario and other Coe SD, Coe MD (1996) The True . cacao clones held in the International cocoa Genebank, Thames & Hudson, London Trinidad. Plant Genet Res Charact Util 8:106–110 Cope FW (1976) Cacao. Theobroma cacao L. (Sterculiaceae). Ogata N, Gomez-Pompa A, Taube KA (2006) The Domestica- In: Simmonds NW (ed) Evolution of Crop Plants. Long- tion and Distribution of Theobroma cacao L. in the neo- man, London, pp 207–213 tropics. In: McNeil CL (ed) Chocolate in Mesoamerica: a Cuatrecasas J (1964) Cacao and its allies. A taxonomic revision cultural history of cacao. University Press Florida, of the Theobroma. Contributions from the United Gainesville, pp 69–89 States National Herbarium 35:375–614. Smithsonian Peakall R, Smouse PE (2006) Genalex 6: genetic analysis in Institution Press, Washington, DC Excel. Population genetic software for teaching and De la Cruz M, Whitkus R, Gomez-Pompa A, Mota-Bravo L research. Mol Ecol Notes 6:288–295 (1995) Origins of cacao cultivation. Nature 375:542–543 Pound FJ (1945) A note on the cocoa population of South Dias LAS (2001) Origin and distribution of Theobroma cacao America. In: Report and proceedings of the 1945 cocoa L.: A new scenario. In: Dias LAS (ed) Genetic improve- conference, London, pp 131–133 ment of cacao. Available at: http://ecoport.org/ep?Search Powis TG, Cyphers A, Gaikwad NW, Grivetti L, Cheong K Type=earticleView&earticleId=197&page=-2 (2011) Cacao use and the San Lorenzo Olmec. Proc Natl Dieringer D, Schlo¨tterer C (2003) Microsatellite analyzer Acad Sci USA 108:8595–8600 (MSA)—a platform independent analysis tool for large Rafalski A (2002) Applications of single nucleotide poly- microsatellite data sets. Mol Ecol Notes 3:167–169 morphisms in crop genetics. Curr Opin Plant Biol Evett IW, Weir BS (1998) Interpreting DNA Evidence: Statis- 5:94–100 tical Genetics for Forensic Scientists. Sinauer, Sunderland Saitou N, Nei M (1987) The neighbor-joining method: a new Felsenstein J (1989) PHYLIP: phylogeny inference package method for reconstucting phylogenetic trees. Mol Biol (version 3.2). Cladistics 5:164–166 Evol 4:685–691 Ganal MW, Altmann T, Ro¨der MS (2009) SNP identification in Saunders JA, Mischke S, Leamy EA, Hemeida AA (2004) crop plants. Curr Opin Plant Biol 12:211–217 Selection of international molecular standards for DNA Go´mez-Pompa A, Flores JS, Fernandez MA (1990) The sacred fingerprinting of Theobroma cacao. Theor Appl Genet cacao groves of the Maya. Latin Am Antiq 1:247–257 110:41–47 Henderson JS, Joyce RA, Hall GR, Hurst WJ, McGovern PE Smith N (1999) The Amazon River Forest: a natural history of (2007) Chemical and archaeological evidence for the ear- plants, animals, and people. Oxford University Press, liest cacao beverages. Proc Natl Acad Sci USA 104: USA 18937–18940 Sounigo O, Umaharan R, Christopher Y, Sankar A, Ramdahin S International Cocoa Organization (ICCO) (2010) The World (2005) Assessing the genetic diversity in the International Cocoa Economy: Past and Present. London, UK Cocoa Genebank, Trinidad (ICG,T) using isozyme elec- Kalinowski ST, Taper ML, Marshall TC (2007) Revising trophoresis and RAPD. Genet Resour Crop Evol 52: how the computer program CERVUS accommodates 1111–1120

123 Genet Resour Crop Evol

Toxopeus H (1985) Botany, types and populations. In: Wood genotyping for cacao clone identification. Crop Sci GAR, Lass RA (eds) Chapter 2 in cocoa 4th edition. 46:2084–2092 Blackwell Science, Oxford, pp 17–18 Zhang D, Mischke BS, Johnson ES, Mora A, Phillips-Mora W, Trognitz B, Scheldeman X, Hansel-Hohl K et al (2011) Genetic Meinhardt LW (2009) Molecular characterization of an population structure of cacao plantings within a young International cacao collection using microsatellite markers. production area in Nicaragua. PLoS ONE 6:e16056 Tree Genet Genom 5:1–10 Van Hall CJJ (1914) Cocoa. Macmillan, London Zhang D, Figueira A, Motilal L, Lachenaud P, LW Meinhardt Waits LP, Luikart G, Taberlet P (2001) Estimating the proba- (2012) Theobroma. In: Kole C (ed) Wild crop relatives: bility of identity among genotypes in natural populations: genomic and breeding resources, plantation and ornamental cautions and guidelines. Mol Ecol 10:249–256 crops. doi:10.1007/978-3-642-21201-7_13; Springer, Young AM (2007) The chocolate tree: a natural history of cacao. Berlin, pp 277–296 University Press of Florida, Gainsville, FL Zhang D, Mischke S, Goenaga R, Hemeida AA, S’A (2006) Accuracy and reliability of high-throughput microsatellite

123