Research

Methods Chemocoding as an identification tool where morphological- and DNA-based methods fall short: as a case study

Marıa-Jose Endara1,2, Phyllis D. Coley1,3, Natasha L. Wiggins4, Dale L. Forrister1, Gordon C. Younkin1, James A. Nicholls5, R. Toby Pennington6, Kyle G. Dexter6,7, Catherine A. Kidner5,6, Graham N. Stone5 and Thomas A. Kursar1,3 1Department of Biology, University of Utah, Salt Lake City, UT 84112-0840, USA; 2Centro de Investigacion de la Biodiversidad y Cambio Climatico (BioCamb) e Ingenierıa en Biodiversidad y Recursos Geneticos, Facultad de Ciencias de Medio Ambiente, Universidad Tecnologica Indoamerica, Quito EC170103, Ecuador; 3Smithsonian Tropical Research Institute, Box 0843-03092, Balboa, Ancon, Republic of Panama; 4School of Biological Sciences, University of Tasmania, Sandy Bay, TAS 7001, Australia; 5Ashworth Labs, Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3JY, UK; 6Royal Botanic Garden Edinburgh, Edinburgh, EH3 5LR, UK; 7School of GeoSciences, University of Edinburgh, Edinburgh, EH9 3FF, UK

Summary Author for correspondence:  The need for species identification and taxonomic discovery has led to the development of   Marıa-Jose Endara innovative technologies for large-scale identification. DNA barcoding has been useful, Tel: +1 801 581 7086 but fails to distinguish among many species in species-rich plant genera, particularly in tropical Email: [email protected] regions. Here, we show that chemical fingerprinting, or ‘chemocoding’, has great potential Received: 4 August 2017 for plant identification in challenging tropical biomes. Accepted: 4 January 2018  Using untargeted metabolomics in combination with multivariate analysis, we constructed species-level fingerprints, which we define as chemocoding. We evaluated the utility of New Phytologist (2018) chemocoding with species that were defined morphologically and subject to next-generation doi: 10.1111/nph.15020 DNA sequencing in the diverse and recently radiated neotropical genus Inga (Leguminosae), both at single study sites and across broad geographic scales.  Key words: chemocoding, Inga, Our results show that chemocoding is a robust method for distinguishing morphologically metabolomics, species identification, similar species at a single site and for identifying widespread species across continental-scale tropical forests. ranges.  Given that species are the fundamental unit of analysis for conservation and biodiversity research, the development of accurate identification methods is essential. We suggest that chemocoding will be a valuable additional source of data for a quick identification of , especially for groups where other methods fall short.

Introduction (Gemeinholzer et al., 2006; Hollingsworth et al., 2009, 2011; Chen et al., 2010; Kress et al., 2010; China Plant BOL Group Cataloguing the world’s plant diversity has been a challenge for et al., 2011). Here we present evidence that chemical fingerprint- centuries, and because of accelerated anthropogenic extinctions, ing or ‘chemocoding’ can be another tool for species identifica- the rapid documentation of biodiversity is more critical than ever tion in confusing and/or closely related tree species in (Mace, 2004; Valentini et al., 2009; Vernooy et al., 2010; Cris- challenging, hyperdiverse biomes. tescu, 2014). This is particularly true in tropical rainforests, No single identification method is without drawbacks. Mor- where the high diversity and co-occurrence of morphologically phological methods are often labor-intensive, rely upon taxo- and ecologically similar congeneric species have presented signifi- nomic expertise and are the most prone to subjective errors, cant challenges for identification (Gonzalez et al., 2009; Dexter particularly where phenotypic plasticity and cryptic taxa are et al., 2010; Kress et al., 2015; Liu et al., 2015). Plant species have prevalent (de Carvalho et al., 2005; Costion et al., 2011). typically been identified by botanical experts based on morpho- Although DNA barcoding is rapid and straightforward, it can fail logical characteristics, or more recently by sequencing several to distinguish closely related plant species because of insufficient chloroplast DNA regions or the internal transcribed spacer of sequence divergence in standard barcode markers (Kress et al., nuclear ribosomal DNA (ITS), referred to as ‘DNA barcoding’ 2009; Dexter et al., 2010; Liu et al., 2015) and may lead to faulty

Ó 2018 The Authors New Phytologist (2018) 1 New Phytologist Ó 2018 New Phytologist Trust www.newphytologist.com New 2 Research Phytologist

identifications in genera where species are of recent origin 2012). Inga is one of the most abundant and diverse Neotropical (Razafimandimbison et al., 2004; Naciri & Linder, 2015; Pen- genera in lowland forest communities (Valencia et al., 1994; ter nington & Lavin, 2015). These limitations are often exacerbated Steege et al., 2013), is widely distributed and has undergone in highly diverse tropical systems, as the proportion of species recent, rapid diversification (Richardson et al., 2001). For Inga, belonging to young, species-rich genera is high (Richardson et al., genetic and morphological differentiation of closely related 2001). For example, in a subtropical Chinese forest, for the 44% species is low and the identification of a species can therefore be of species belonging to genera with more than two species, > 50% difficult. shared barcoding sequences and could not be distinguished (Liu We propose the use of small, defense-related chemical markers et al., 2015). This led to an overall species resolution of only 67- characterized via untargeted metabolomics in combination with %. A similar problem is encountered in New World tropical rain- multivariate analysis for the construction of a phytochemical, forests, where DNA barcoding cannot reliably discriminate species-level fingerprint, which we define as chemocoding. We species within ecologically important, species-rich genera such as evaluate the units defined by chemocoding with those defined Inga, Ficus and Piper (Gonzalez et al., 2009; Kress et al., 2009). morphologically in a recent taxonomic monograph (Pennington, While standard DNA barcoding for problematic groups can be 1997) as well as with those defined using next-generation DNA improved by adding data for additional loci, recently diverged sequencing data of many hundreds of nuclear genes (Nicholls species will always be hard to distinguish using sequence data and et al., 2015; our unpublished data). no standardized marker sets for such extended DNA barcoding exist. Materials and Methods Many studies assessing diversity in surveys or plots for conser- vation or basic science rely on identifying all individuals in a plot Study sites to species level, including the large majority of individuals that are without flowers or fruits (Dexter et al., 2010). Plots in the Samples were collected at five sites that include a wide range of tropics represent a daunting task, and are still faced with prob- soils but very similar climates throughout the Amazon and lematic identifications despite extremely well-trained botanists. A Panama (Fig. 1). Barro Colorado Island is a field station adminis- related problem is the difficulty of achieving uniform species tered by the Smithsonian Research Tropical Institute located in identifications across multiple sites. For example, in a recent, extensive analysis of the identifications for eight genera, the three genera with the highest error rates were Andira and Tachigali (c. 50%) and Inga (c. 40%; Baker et al., 2017). And yet another sub- stantial challenge is correlating the identities to species level for saplings, small trees and adult trees, because ontogenetic changes in leaf morphology may be considerable and juveniles do not bear flowers or fruits. For example, a consortium to understand forest dynamics has established 63 plots around the world where all woody plants > 1 cm diameter at breast height (DBH) are mapped and identified, the majority of which are juveniles (www.forestgeo.si.edu). Thus, the issues and errors associated with morphological species identifications of thousands of trees in the tropical forests are serious. Given that species are a fundamental unit of analysis for con- servation, for quantifying biodiversity, and for understanding ecological and evolutionary processes, the development of accu- rate methods for identifying them is essential. In this paper, we suggest that chemical fingerprinting (here termed chemocoding) can provide an additional identification tool for species identifi- cation in a species-rich tropical tree genus, particularly for mor- phologically confusing or cryptic species. We examine its utility both for distinguishing species within a single site, and for char- acterizing within-species variation over wider geographic scales. Moreover, chemocoding may be inexpensive enough to allow for every individual tree to be tested. We test the potential of chemocoding for species identification within Inga Mill. (Leguminosae, ) because species in this genus are difficult to distinguish morphologically and Fig. 1 Study sites: (1) Barro Colorado, Panama, (2) Nouragues, French show insufficient variation in barcoding sequences (Richardson Guiana, (3) Tiputini, Ecuador, (4) Los Amigos, Peru, and (5) KM41 near et al., 2001; Kress et al., 2009; Dexter et al., 2010; Dick & Webb, Manaus, Brazil.

New Phytologist (2018) Ó 2018 The Authors www.newphytologist.com New Phytologist Ó 2018 New Phytologist Trust New Phytologist Research 3 the Panama Canal (9°N, 80°W). It is a lowland moist forest with Sampling 2649 mm of precipitation a year and 4-month dry season with mean monthly temperatures of 27°C (Leigh, 1999). The other We determined the power of chemocoding for discriminating four sites do not have a pronounced dry season. The Nouragues among species and among geographically disjunct populations Ecological Research Station, French Guiana (4°N, 53°W), is within species. We included different taxa that are similar in veg- located inside the Nouragues National Reserve on the Guiana etative morphology and are very difficult to identify in the Shield. Mean annual precipitation is 2990 mm and mean annual absence of reproductive structures. Our examples include cases in temperature is 26.3°C (Grimaldi & Riera, 2001). Tiputini Bio- which these coexist at the same site or in two well-separated sites. diversity Station is located in the eastern lowland Ecuadorian We also included cases of one species that is morphologically Amazon (0°S, 75°W), inside the Yasuni Biosphere Reserve. The uniform in collections from up to four sites (Table 1). climate is humid and aseasonal, with an annual precipitation of In addition, we tested the ability of chemocoding to correctly 3320 mm and an average annual temperature of 26°C (Valencia distinguish simultaneously between populations of several dozens et al., 2004). Kilometer 41 (KM41, 2°S, 59°W) is a field station of species at a regional scale (see Random Forest Analysis, below). of the Biological Dynamics of Forest Fragments project located More examples can be found in Supporting Information Figs near Manaus, Brazil. Mean annual temperature is 26°C and aver- S2–S9. age annual precipitation is 2651 mm (Radtke et al., 2007). Los Amigos Biological Station is located in the southeastern lowland Collections Peruvian Amazon, in the Madre de Dios Department (13°S, 70°W). Mean annual rainfall is 2700–3000 mm. Due to winter For each species, samples of expanding leaves were collected from cold spells, the daily minimum temperature can drop to < 10°C; five saplings, 0.5–4 m in height, in the shaded understory. We the mean monthly temperature range is from 21 to 26°C (Pit- focused on expanding leaves as part of a study of plant–herbivore man, 2007). For simplicity in the text, each site will be referred interactions and also because secondary metabolites are at greater to by the country only. concentration during the expansion stage than in leaves that have matured and toughened (Wiggins et al., 2016). For each sapling, we collected leaves that were between 20% and 80% of the aver- Study species age maximum size. Fresh leaves were dried at room temperature We examined saplings of Inga because this size class is the most with fans and silica gel for 24–48 h, transported to the University frequently censused for ecological research and also is the size of Utah and stored at À20°C. For DNA analysis we typically class that can be very difficult to identify. Morphological identifi- included one sample per species per site. cations were based on the most recent taxonomic Inga mono- graph (Pennington, 1997), and made by four researchers (MJE, Metabolomic analysis TAK, PDC and KGD) who have worked in the field identifying Inga for about five decades collectively. They consulted with the Metabolites were extracted and analyzed following the protocol botanists working in the 50-ha CTFS (Center for Tropical Forest of Wiggins et al. (2016), specifically designed for secondary Science) plots in Panama and Ecuador and the plots in metabolites having intermediate polarity. In Inga, these are Nouragues, French Guiana. mainly phenolics and saponins. Briefly, 100 mg of ground leaves

Table 1 Study species

Figure Case study Species and sites

Fig. 2 Two morphologically confusing species within a site Inga alata Benoist and Inga pezizifera Benth, French Guiana Fig. 3 Morphological variation within one species at a site Inga acreana1 Harms, Ecuador Fig. 4 Two morphologically similar species across sites Inga cf. brachystachys Ducke and Inga obidensis Ducke, French Guiana, Brazil and Peru Fig. 5 Identification of a widespread species across its range Inga auristellae Harms, Ecuador, Brazil and Peru Supporting Two morphologically confusing species within a site Inga coruscans Humb. & Bonpl. ex Willd. and Inga laurina (SW.) Willd., Information Fig. S1 Peru Fig. S2 Two morphologically confusing species within a site Inga microcoma Harms and Inga umbellifera2 (Vahl) Steud., Ecuador Fig. S3 Two morphologically confusing species within a site Inga chartacea Poepp. and Inga sapindoides Willd., Ecuador Fig. S4 Morphological and chemical variation within one Inga leiocalycina3 Benth., Ecuador species at a site Fig. S5 Identification of a widespread species across its range Inga alata Benoist, French Guiana, Ecuador and Peru Fig. S6 Identification of a widespread species across its range Inga pezizifera Benth, Panama, French Guiana and Brazil Fig. S7 Identification of a widespread species across its range Inga alba (SW.) Willd., French Guiana, Brazil and Peru Fig. S8 Identification of a widespread species across its range Inga marginata Willd., Panama, French Guiana, Ecuador and Peru

1I. acreana includes two morphotypes in Ecuador: T28 and T56. 2I. umbellifera includes two morphotypes in Ecuador: T50 and T73. 3I. leiocalycina includes two morphotypes in Ecuador: T65 and T86.

Ó 2018 The Authors New Phytologist (2018) New Phytologist Ó 2018 New Phytologist Trust www.newphytologist.com New 4 Research Phytologist

was extracted in 1.0 ml of extraction buffer (44.4 mM ammo- fitted a hierarchical clustering model to the normalized data nium acetate (pH 4.8) : acetonitrile, 60 : 40, v/v). After extraction with 10 000 permutations using the R package PVCLUST for 5 min and centrifugation (13 793 g) for 5 min, the super- (Suzuki & Shimodaira, 2014). Hierarchical clustering was per- natant was transferred to a glass vial and the extraction repeated. formed using the Pearson’s correlation similarity measure, a The extracts were diluted fivefold by combining 200 ll of crude routine method adopted for ‘omics’ data (Reeb et al., 2015). extract with 790 ll of acetonitrile : water (60 : 40, v/v) plus 10 ll The clustering algorithm selection for each analysis was based À of internal standard (1 mg ml 1 biochanin A in acetonitrile : wa- on the correlation between the original distance matrix and ter, 50 : 50). Soluble metabolites were analyzed by ultra- the patristic distance in the hierarchical cluster diagram. Clus- performance liquid chromatography coupled to mass spectrome- ters with AU (approximately unbiased) P-values of ≥ 95% are try (UPLC-MS) using an Acquity UPLC I-Class system and a considered to be strongly supported by the data. For more Xevo G2 Q-ToF spectrometer equipped with LockSpray and an details see Wiggins et al. (2016). electrospray ionization source (Waters, Milford, MA, USA). In addition, to test the overall accuracy of chemocoding to iden- Data were collected in negative ionization mode. tify samples when presented with a very large number of species Raw data from the UPLC-MS analysis were processed for peak (classes) simultaneously, we used supervised statistical learning detection, peak alignment and peak filtering using MassLynx methods. Specifically, we chose Random Forest Analysis, which is (Waters) and the R package XCMS (Smith et al., 2006; Tauten- a powerful classification method for multivariate datasets with hahn et al., 2008; Benton et al., 2010). The parameters used many weak predictor variables along with a large number of species were: peak detection method ‘centWave’ (ppm = 15, peak- (cases). This method has been widely adopted in remote sensing, width = c(5,12), snthresh = 5); peak grouping method ‘density’ high-dimensional biological data (various ‘omics’) and ecology (bw = 2); retention time correction method ‘obiwarp’; and inte- (Breiman, 2001; Lawrence et al.,2006;Cutleret al., 2007, 2009). grate areas of missing peaks method ‘chrom’. XCMS processing was Based on models that we constructed with the metabolomics performed for each species independently, with five leaf samples data, we used Random Forest to predict how well samples can be included as replicates. The results obtained by XCMS were post- classified to species. For this, a single sample-by-compound processed in the R package CAMERA to assign the various ions matrix was generated using XCMS as described above. A total of derived from one compound (termed ‘features’) to that com- 1000 trees were generated for each Random Forest Model and pound (Kuhl et al., 2012). This uses a defined set of rules for 100 variables were used at each split, which was sufficient to linking the precursor ion with adducts and neutral losses (see arrive at a model with minimal prediction error (Breiman, 2001). Table S1 for a list of these). The parameters used were: peak We performed the analyses for 82 species with samples selected grouping after retention time ‘groupFWHM’ (perfwhm: 0.8); ver- from a single site, as well as for 26 species found at two to four ify grouping ‘groupCorr’; annotate isotopes ‘findIsotopes’; and sites. Analyses were performed using the RANDOMFOREST R pack- annotate adducts ‘findAdducts’ (polarity = ‘negative’). For each age (Liaw & Wiener, 2002). R code for all of the analyses is pro- case study, the resulting peak tables for each species were com- vided in Methods S1 and S2. bined into a single peak table (m/z and retention time for each peak) using the R package METAXCMS (Tautenhahn et al., 2011) Next-generation DNA sequence data with the following parameters: peak filtering: none; m/z and retention time tolerance: 0.05 and 12 s, respectively. Peak tables To determine the accuracy of our approach, we compared delimi- are stored in MetaboLights (study ID: MTBLS574, https:// tations based on chemocoding with the first resolved phylogeny www.ebi.ac.uk/metabolights/), a publicly available database of Inga, accomplished through targeted enrichment and sequenc- (https://www.ebi.ac.uk/metabolights/MTBLS574). ing of 194 loci (259 313 bases; Nicholls et al., 2015; our unpub- lished data). Due to Inga’s recent, rapid radiation (Richardson et al., 2001), a previous phylogeny with over 6 kb of plastid and Statistical analysis nuclear DNA sequence did not resolve species-level relationships The variation in metabolites across samples was quantified using fully (Kursar et al., 2009). unsupervised multivariate methods (no prior classification of samples), which is suitable for metabolomics data. For the first Results four case studies, we chose methods for data reduction and pat- tern recognition that group and visualize samples according to Case study 1: two morphologically confusing or cryptic their similarities without prior assignment of samples to classes species that are present at the same site (Bartel et al., 2013). To visualize grouping patterns across samples, we used hier- Inga alata Benoist and Inga pezizifera Benth are sister species archical clustering with multiple agglomerative algorithms (Nicholls et al., 2015; our unpublished data; Fig. 2c). The because this method works well for a limited number of saplings can only be successfully differentiated by expert field species (classes) and provides statistical power (Embrechts workers using subtle differences in the shape of the extrafloral et al., 2013). Peak intensities, or the total ion current (TIC), nectaries, the number of leaflets, the number of primary lateral were normalized by dividing by the sum of the TIC for all leaf veins and the color of the expanding leaves (Fig. S1). In addi- features in the chromatogram of a sample. Subsequently, we tion, they differ in the morphology of their inflorescences

New Phytologist (2018) Ó 2018 The Authors www.newphytologist.com New Phytologist Ó 2018 New Phytologist Trust New Phytologist Research 5

(a) 100 I. alata (N36)

50

0 Fig. 2 Case study 1. Two morphologically 100 confusing or cryptic species that are present I. pezizifera (N4) at the same site: Inga alata and I. pezizifera in French Guiana. (a) Total ion 50 chromatograms showing relative intensities Total ion current (%) Total of peaks from LC-QToF-MS in negative 0 mode. (b) Hierarchical cluster dendrograms 5 1015202530 based on relative abundances of UPLC-MS Time (min) metabolites. The numbers in red above each (b) (c) branch point are the Approximately pezizifera_Brazil_M40A Unbiased confidence levels; these indicate AU 100 100 pezizifera_FG_FG21_N4 the probability that the samples below that 92 pezizifera_FG_FG10825_N4 point are a cluster. Clusters with values of 95 98 pezizifera_Brazil_M51A signify P = 0.05, indicating that these clusters 100pezizifera_Peru_KD716 96 pezizifera_Peru_KD323 are strongly supported by the data. (c) The pezizifera_Panama_BCI8592 clade containing I. alata and I. pezizifera was 100 I. alata I. pezizifera pezizifera_Panama_BCI8577 adapted from a resolved phylogeny based on 100 100 100 alata_Peru_KD402 next-generation DNA sequence data 97 100 98 99 100 alata_Peru_KD1331 100 (Nicholls et al., 2015; our unpublished data). 100alata_FG_FG82_N36 Numbers in black represent bootstrap 100 alata_FG_FG63_N36 support values. Values > 95 indicate that the N4_1455 100alata_EC_TI982 N36_1501 N36_1507 N36_1502 N36_1503 alata_EC_TI1961 N4_1472 N4_1473 N4_1576 N4_1575 clade is strongly supported by the data. N4_1504 N4_1505

(Pennington, 1997), but this feature is not available in the Case study 2: a single species that shows morphological saplings studied by many ecologists. We investigated how variation within a site chemocoding might be useful to separate these two confusing species in Nouragues, French Guiana, where one species is often Inga acreana Harms is a widely distributed species across South found meters away from the other. America, from the Guyanas to the Amazon Basin in Colombia, Consistent with DNA sequence differences (Fig. 2c), Ecuador, Peru and Bolivia (Pennington, 1997; Pennington & chemocoding accurately determined species limits for five saplings Revelo, 1997). In the Tiputini Biological Station in Ecuador, it each of the two species. The profiles of secondary metabolites comprises two distinct types that have the same morphology, but showed visually evident differences between species (Fig. 2a). that differ subtly in the color of the expanding leaves (T28 and Hierarchical clustering of UPLC-MS metabolomics data (98% T56; T numbers are codes for morphotypes; Fig. S1), and co- AUP (Approximately Unbiased) P-value, Fig. 2b) clustered the occur in floodplain habitats. We used chemocoding of five samples into two distinct groups, one for each species. saplings of each of these two leaf variants to determine if they We evaluated four further groups of species that are hard to were the same or different chemotypes. separate morphologically, and coexist at a single site (Figs S2– The two morphotypes showed no consistent metabolomic dif- S5). For three of these, chemocoding-based separation agreed ferences (Fig. 3a). Hierarchical clustering models fitted to the with DNA sequence differences (Fig. S2: I. coruscans and UPLC-MS data reveal no separate clusters (Fig. 3b). These results I. laurina in Peru, Fig. S3: I. umbellifera T50, T72 and T73 in agree with DNA data; the phylogeny from targeted enrichment Ecuador (where T numbers correspond to codes for morphotypes data placed these accessions representing these two morphotypes in Tiputini, Ecuador), and Fig. S4: I. chartaceae and into the same, otherwise unstructured, monophyletic group I. sapindoides in Ecuador). For the fourth supplemental case, (Fig. 3c). The color difference between the two morphotypes is chemocoding found substantial differences in secondary metabo- probably due to a difference in anthocyanin production, a form lites between two morphotypes of I. leiocalycina (T65 and T86, of intraspecific variation sometimes seen in expanding leaves. Fig. S5). T86 occurs in terra firme forests and T65 in floodplains in the Ecuadorian Amazon, whereas DNA data placed these two Case study 3: distinguishing among two morphologically morphotypes into a monophyletic group (Fig. S5d). This may be similar species across three sites the result of plasticity in response to differences in habitat (although we found no intermediate morpho- or chemotypes). Cross-checking identifications amongst morphologically similar Alternatively, these may reflect strong selection in the face of gene tree species that occur at different sites is particularly challenging flow across this environmental gradient or these may be distinct in tropical forests (Baker et al., 2017). Inga brachystachys Ducke species. and Inga obidensis Ducke are closely related (Fig. 4d), with

Ó 2018 The Authors New Phytologist (2018) New Phytologist Ó 2018 New Phytologist Trust www.newphytologist.com New 6 Research Phytologist

(a)

Fig. 3 Case study 2. A single species that shows morphological variation within a site: Inga acreana T28 and I. acreana T56 in Ecuador. (a) Total ion chromatograms Time (min) showing relative intensities of peaks from (b) (c) LC-QToF-MS in negative mode. (b) Hierarchical clustering based on relative abundances of UPLC-MS metabolites. The numbers in red above each branch point are the Approximately Unbiased confidence levels; these indicate the probability that the samples below that point are a cluster. Clusters with values of 95 signify P = 0.05, indicating that these clusters are strongly supported by the data. (c) Clade containing I. acreana T28 and I. acreana T56 adapted from a resolved phylogeny based on next- generation DNA sequence data (Nicholls et al., 2015; our unpublished data).

overlapping distributions across the Amazon (Pennington, 1997; Amazonia, and then within each of these groups the samples clus- Pennington & Revelo, 1997). They are morphologically very ter by population (Fig. 5c). Thus, chemocoding shows consis- similar in the vegetative state, making it problematic to assign tency in species characterization, despite some evidence of genetic accurate species names (Fig. S1). population structure without corresponding structuring of chem- Chemocoding and hierarchical clustering of five saplings istry. A similar pattern was observed for another widespread from each of the three sites delimited the samples into two species: I. pezizifera (Fig. S6). A second widespread species, groups, separating the French Guiana and Brazil samples from I. alba, showed no geographic structure either in chemistry or in those collected in Peru (100% AUP value, Fig. 4c). Together, DNA (Fig. S7). DNA sequence data (Fig. 4c), chemocoding and morphology suggest that samples collected in Brazil and French Guiana are Case 5: identification of a large number of species, a single species, I. obidensis, and that the chemically distinct including comparisons among populations of widespread Peruvian samples may represent a different, as yet unidentified species species that, in its vegetative morphology, is similar to I. brachystachys. The detailed case studies presented above allow a small number of samples to be grouped by chemical similarity and statistically validated without prior classification of the samples (unbiased). Case study 4: identification of a widespread species across Additionally, we sought to evaluate how often chemocoding its range could correctly identify samples to species when we include a To assess the variation of chemocoding profiles across geographic large number of classes (species). To this end, we use statistical space in a widespread species, we collected data on five saplings learning techniques in a supervised strategy (prior classification of per population of Inga auristellae Harms, a species that occurs the samples) to build a model based on metabolomics data for across northern South America. These came from four geographi- identification of samples to species. Specifically, we used Random cally separated populations: French Guiana, Ecuador, Brazil and Forest to assess the overall accuracy of chemocoding to distin- Peru (Fig. 1). guish between the 82 species of Inga that we sampled at the five The different populations showed consistency in their chem- study sites (five saplings per species per site). Random Forest istry across their wide geographic range (Fig. 5a). The hierarchical works by creating many classification trees each trained using ran- clustering model shows no significant differences between dom bootstrapped samples from the original metabolomics data. I. auristellae populations from different geographic areas (94% A consensus classification is then chosen based on the majority AUP value, Fig. 5b). The DNA sequences analyses show that vote from all trees (Breiman et al., 1984). Of the five samples per although the four populations belong to the same species, there is species we iteratively dropped one sample to train the model with a strong geographic structure, with an initial split into east vs west four samples and test the predictive accuracy with the fifth

New Phytologist (2018) Ó 2018 The Authors www.newphytologist.com New Phytologist Ó 2018 New Phytologist Trust New Phytologist Research 7

(a) (b) AU 100 I. obidensis Brazil (M4)

50 hystachys ru I. obidens 0 French 100 I. obidensis French Guiana 100 (N8)

50 LA32_661 M4_734 LA32_1535 M4_732 LA32_659 LA32_660 N8_1249 N8_1541 N8_1670 N8_1686 N8_1689 M4_1689 N8_1542 M4_1686 LA32_1548

0 (c)

Total ion current (%) Total 100 I. brachystachys cf. Peru umbratica_Ecuador_TI2039 (LA32) 100 umbratica_Peru_KD617 50 52 100 brachystachys_cf_LA32_Peru obidensis_FG_N8 100 0 5 1015202530 obidensis_Brazil_M4C Time (min)

Fig. 4 Case study 3. Distinguishing among two morphologically similar species across three sites: Inga obidensis from Brazil and French Guiana and I. brachystachys from Peru. (a) Total ion chromatograms showing relative intensities of peaks from LC-QToF-MS in negative mode. (b) Hierarchical clustering based on relative abundances of UPLC-MS metabolites. The numbers in red above each branch point are the Approximately Unbiased confidence levels; these indicate the probability that the samples below that point are a cluster. Clusters with values of 95 signify P = 0.05, indicating that these clusters are strongly supported by the data. (c) The clade containing I. obidensis and I. cf brachystachys was adapted from a resolved phylogeny based on next-generation DNA sequence data (Nicholls et al., 2015; our unpublished data). sample, such that each sample was used once for testing and four problematic for species-rich regions and highly species-rich times for training. genera (Seberg & Petersen, 2009; Dick & Webb, 2012). And The resulting Random Forest model accurately classifies 94% it is precisely these diverse situations where accurate species of the 410 individuals representing the 82 Inga species (each with identifications are most crucial. Alternative new technologies representatives from a single site, Table S2). Twenty-six of these for plant identification in tropical trees include the use of species occurred at two to four of the study sites, so we also exam- near-infrared (NIR) leaf spectroscopy. However, its potential as ined the model’s classification accuracy when regional variation a taxonomic tool has not been assessed across broad geographic across sites was included. In this case, our analyses correctly classi- scales for widespread species (Dugarte et al., 2013; Lang et al., fied samples to species 96% of the time (Table S3). In addition, 2015; Baker et al., 2017). 90% of the time the classification model identified the correct Here, we add chemocoding to the toolbox, and show that species and site, indicating that there were regional differences in small, defense-related chemical markers characterized via untar- secondary metabolites within a species (Table S3). Overall, these geted metabolomics have great potential for species identification results demonstrate that even in the face of cross-site intraspecific and in providing additional evidence for species delimitation and variation, there is sufficiently high interspecific variation to allow taxonomic discovery. Chemocoding was a robust method for unknown samples to be efficiently classified into units that corre- species identification at a single site and across broad geographic spond to species as defined by morphology and DNA (Table S2). scales, even for Inga, where levels of interspecific morphological variation are low and DNA barcoding is ineffective (Gonzalez et al., 2009; Kress et al., 2009; Dexter et al., 2010). Discussion

The need for new tools Accuracy of identifications The urgent need to catalog, manage and understand the ecol- An effective species identification method must be diagnostic of ogy and evolution of plant diversity has led to the develop- species (recognizing all populations rather than only sub-specific ment of innovative tools to improve the discrimination and units or populations), and involve traits that are always present identification of plant species. Traditional morphological- and (rather than inducible or otherwise phenotypically plastic). Our molecular-based taxonomic identification methods have proved analysis of Inga shows that entities identified as discrete

Ó 2018 The Authors New Phytologist (2018) New Phytologist Ó 2018 New Phytologist Trust www.newphytologist.com New 8 Research Phytologist

(a) 100 (b) AU French Guiana

50 (N6)

0 100 99 Brazil (M36) 100 95 96 50 98 100 LA08_1443 M36_1572

0 N6_1300 100 M36_155 M36_763 T59_984 N6_1299 T59_991 T59_986 N6_1301 M36_761 M36_762 T59_992 T59_985 LA08_646 LA08_645 N6_1298 N6_1302 LA08_644

Ecuador (T59) LA08_1739 (c) 100 auristellae_Peru_KD10 50 auristellae_Peru_TAKPDC1501 100 100 auristellae_EC_TI404

Total ion current (%) Total auristellae_EC_TI1229 100 auristellae_FG_FGW 0 100 100 100 auristellae_FG_113 Peru (LA08) 100auristellae_Br_M36AA 94 auristellae_Br_M36A 100 cylindrica_Peru_KD29 50 cylindrica_FG_FG35 100 100 coruscans_Peru_TAKPDC1635 100 marginata_cf_EC_TI347 0 T29_EC_TI1728 5 1015202530 62 100 T29_EC_TI1978 Time (min) T29_EC_TI465

Fig. 5 Case study 4. Identification of a widespread species across its range: Inga auristellae. (a) Total ion chromatograms showing relative intensities of peaks from LC-QToF-MS in negative mode. (b) Hierarchical clustering based on relative abundances of UPLC-MS metabolites. The numbers in red above each branch point are the Approximately Unbiased confidence levels; these indicate the probability that the samples below that point are a cluster. Clusters with values of 95 signify P = 0.05, indicating that these clusters are strongly supported by the data. (c) The clade containing I. auristellae was adapted from a resolved phylogeny based on next-generation DNA sequence data (Nicholls et al., 2015; our unpublished data).

morphospecies or phylogenetic taxa can indeed show constitutive Chemocoding is unlikely to identify species reliably where differences in chemical defenses that result in distinct major components of plant chemistry show phenotypic plasticity. chemocodes. For example, our results show that chemocoding Rather than being constitutive, chemistry could be age- correctly identified species at a single site and, most importantly, dependent (ontogeny or tissue age) or inducible by herbivores, across their ranges (Figs 2–5, S2–S9; Table S3). Even with 410 pathogens, light, etc. Since this could generate significant within- individuals from 82 species of Inga, with a given species occur- site variation, we recognize that chemocoding may not work for ring at multiple sites, the Random Forest Analysis based on species where important chemical markers vary. However, studies chemocodes accurately classified 96% of the individuals with several species-rich genera in the tropics have found that (Table S2). It also appears to be a robust method for identifying inter-specific differences in the defensive metabolome are large widespread species where intraspecific geographic variation might relative to intra-specific variation, even considering factors that be problematic. For example, chemocoding of the widely scat- are recognized as generators of plastic variation such as leaf tered populations of I. auristellae resolves them as a single group, ontogeny (expanding vs mature leaves; Sedio et al., 2017; Wig- which is also resolved as monophyletic in our DNA sequence- gins et al., 2016), light environment (sun vs shade; Sinimbu et al., based phylogeny (Fig. 5). Nevertheless, it is important to con- 2012; Bixenmann et al., 2016; Sedio et al., 2017), season (dry vs sider that depending on levels of migration, polymorphism and wet; Sedio et al., 2017) and induction by herbivory (Bixenmann selection, other outcomes are possible. In particular, some species et al., 2016). Even though plasticity may be an issue in some taxa, that are widespread may show significant divergence in chemistry it does not rule out useful application of chemocoding, but high- (e.g. I. alata (Fig. S8) and I. marginata (Fig. S9)). Hence, we cau- lights the need to separate diagnostic from phenotypically plastic tion that the efficacy of chemocode-based identification should characters. be explored in each candidate taxon. However, we conclude that given the abundance and diversity of Inga species in neotropical Practicality forests, and the difficulty of identifying them using morphologi- cal characters (particularly in sterile material), chemocodes An identification method also needs to be practical. In other provide a valuable taxonomic tool. words, it needs to be accurate, rapid and inexpensive, as is the

New Phytologist (2018) Ó 2018 The Authors www.newphytologist.com New Phytologist Ó 2018 New Phytologist Trust New Phytologist Research 9 case with DNA barcoding. However, in groups where barcoding straightforward. In addition, chemical analysis is largely free of using standard markers cannot discriminate among species, contamination issues, which are a serious concern in DNA bar- sequencing of many genes may be necessary, which can be time- coding analyses, where specificity of primers may be low consuming and expensive. In the case of Inga, obtaining the (Hollingsworth et al., 2011). Other components could be simpli- resolved phylogeny took several years and hundreds of thou- fied to streamline the analysis. These include collecting mature sands of dollars (Nicholls et al., 2015). Furthermore, only a few instead of expanding leaves. Most often, rainforest plants do not individuals of each species were sequenced. However, as per- have expanding leaves, restricting chemocoding to a minority of base pair prices for sequencing are dropping in next-generation saplings at any given point in time and reducing the utility of sequencing approaches, discriminating among species based on chemocoding. We found that mature leaves have most, but not DNA sequences from many hundreds of loci may become more all, of the chemical signals found in expanding leaves (Lokvam feasible. et al., 2007; Wiggins et al., 2016), so the use of mature leaves For widespread use of chemocoding, we envision the creation should be feasible. Additionally, we used UPLC with a 150-mm of a public library of reference ‘chemocodes’, analogous to iBOL column, followed by detection with a high-resolution time-of- (ibol.org). In principal, chemocodes could be similar to barcodes flight mass spectrometer, an expensive analysis. To simplify this, in that they employ a limited set of compounds that are both we recommend using a shorter, 50-mm column, saving solvent constitutively produced and diagnostic of species. Instead, our and instrument time, followed by detection with a quadrupole method relies on the entire chemical fingerprint (typically a suite mass spectrometer. Single quadrupole, triple quadrupole or ion of > 100 compounds) to identify species. Our choice is based on trap detectors are much less expensive than a time-of-flight spec- the variation observed in single species within and across sites trometer. While these provide lower mass resolution, both nega- (e.g. I. auristellae, Fig. 5), taking into account variation caused by tive and positive mode data can be obtained in a single run, ontogeny. something that generally is not possible with a time-of-flight We have successfully tested chemocoding on taxa from other spectrometer. Based on our extensive work on Inga, only some of groups, such as species from the families Euphorbiaceae, Mal- which is presented here, most pairs of similar species should be vaceae, Moraceae, Rubiaceae and Violaceae, among others (data distinguishable using the proposed simplifications. There are not shown), suggesting that our approach works with groups some cases where two species have similar chemistry and mor- other than Inga. Our data are publicly available (see the Materials phology and the simpler chemical analyses may lump these into a and Methods section) allowing others to attempt to develop diag- single chemotype. But our experience suggests that these would nostic compounds for species identification. Nevertheless, the show high ‘within-chemotype’ variation, indicating the need for most challenging issue to address before chemocoding can be more sophisticated chemical methods that can effectively answer widely used will be the application of our approach across differ- the question at hand. In our hands, chemocoding gave clear ent laboratories. For this, one must ensure that the same results and was practical in terms of time and cost. Using the sim- metabolic traits are used in all laboratories. In contrast to DNA plifications suggested above would decrease cost and time. We barcoding, which uses standardized markers across taxa, each estimate that manual extraction and automated chromatography species is scored using different traits. At present, we do not know and data analysis could take 20 min and $20.00 per sample at the the extent to which chemical fingerprints will differ, depending time of writing. In summary, because chemocoding may work in for example on the exact column used for liquid chromatography circumstances where barcoding does not, we propose that it pre- or the exact model of mass spectrometer used. To address this sents a novel and practical approach for surveying large numbers issue such that chemocoding can be applied generally may require of individuals, possibly thousands of samples. a more rigorous approach, in particular the application of tandem MS or MS/MS (instead of simplifying the analysis as suggested Conclusions below). Another issue is that, while we used compounds with intermediate polarity for chemocoding of Inga, nonpolar com- Our aspiration is not to claim that chemocoding will replace pounds such as terpenes or highly polar compounds such as non- DNA barcoding or morphology-based identification methods, protein amino acids may work best with other clades. nor that chemocoding can determine species boundaries. Because ecological studies based on long-term monitoring Instead we suggest that it is a tool that provides a valuable addi- plots require the accurate identification of thousands of juvenile tional source of data to facilitate identification of plant species, and adult trees, we consider here whether chemocoding could be especially for groups where traditional methods fall short. Che- applied to large numbers of samples. Currently, we have used mocoding may be especially valuable in identifying species in chemocoding rather than DNA-based analyses for classifying recent radiations where morphological distinctions between over 1000 samples of Inga and have found chemocoding to be species are slight and standard barcode markers do not provide effective and convenient. sufficient resolution. However, it could also be used more In terms of scaling up, one consideration is that these analyses broadly for species identification in cases where hundreds or can be carried out more expediently. The methods used in the thousands of samples need analysis, with the added benefit of present study range from easily accomplished to more complex providing information on defensive metabolites. In general, it methods. For example, sample preparation (e.g. drying) in the can help by standardizing species names across multiple sites, field and sample extraction in the laboratory are both and even in pinpointing entities that may be species new to

Ó 2018 The Authors New Phytologist (2018) New Phytologist Ó 2018 New Phytologist Trust www.newphytologist.com New 10 Research Phytologist

science. Such is the case for our third example (case study 3: References I. obidensis and I. cf. brachystachys in Brazil, French Guiana and Baker TR, Pennington RT, Dexter KG, Fine PVA, Fortune-Hopkins H, Peru), where chemocoding and DNA suggest that the samples Honorio EN, Huamatunpa-Chuquimaco I, Klitgard BB, Lewis GP, de Lima collected in Peru might represent a new species. Our approach HC et al. 2017. Maximizing synergy among tropical plant systematists, is especially amenable for field biologists who work in networks ecologists, and evolutionary biologists. Trends in Ecology & Evolution 32: 258– of forest inventory plots since it can consistently distinguish 267. Bartel J, Krumsiek J, Theis FJ. 2013. Statistical methods for the analysis of high- amongst multiple species across geographic space (Figs 4, 5; throughput metabolomics data. Computational and Structural Biotechnology Tables S2, S3), and hence can help in taxonomic integration Journal 4: e201301009. across plots. Benton P, Want EJ, Ebbels TMD. 2010. Correction of mass calibration gaps in Experience with some taxa may show that chemocodes could liquid chromatography–mass spectrometry metabolomics data. Bioinformatics – help to distinguish groups of individuals that show similar plastic 26: 2488 2489. Bixenmann RJ, Coley PD, Weinhold A, Kursar TA. 2016. Higher herbivore responses to shared abiotic environments or natural enemies (re- pressure favors constitutive over induced defense. Ecology and Evolution 6: gardless of whether these correspond to species, e.g. Fig. S5). If 6037–6049. so, chemocoding could be a very valuable way of dividing indi- Breiman L. 2001. Random forests. Machine Learning 45:5–32. vidual plants into ecologically significant sets and improving our Breiman L, Friedman JH, Olshen RA, Stone CJ. 1984. Classification and understanding of the plant–herbivore adaptive landscape. Given regression trees. Monterey, CA, USA: Wadsworth. de Carvalho MR, Bockmann FA, Amorim DS, de Vivo M, de Toledo-Piza M, the key role of plant chemistry in many aspects of plant–herbi- – Menezes NA, de Figueiredo JL, Castro R, Gill AC, McEachran JD. 2005. vore enemy interactions, it may be tremendously valuable to see Revisiting the taxonomic impediment. Science 307: 353. plant communities in terms of chemotypes of hosts experienced Chen S, Yao H, Han J, Liu C, Song J, Shi L, Zhu Y, Ma X, Gao T, Pang X et al. by herbivores (Endara et al., 2017). 2010. Validation of the ITS2 region as a novel DNA barcode for identifying The use of metabolites as tools in systematics has a long history medicinal plant species. PLoS ONE 5: e8613. China Plant BOL Group, Li DZ, Gao LM, Li HT, Wang H, Ge XJ, Liu JQ, and many antecedents (Gibbs, 1974; Smith, 1976; Harborne & Chen ZD, Zhou SL, Chen SL et al. 2011. Comparative analysis of a large Turner, 1984). While this has traditionally been used to investi- dataset indicates that internal transcribed spacer (ITS) should be incorporated gate evolutionary relationships, based on the presence, absence into the core barcode for seed plants. Proceedings of the National Academy of and distinctive structures of specific classes of secondary metabo- Sciences, USA 108: 19641–19646. lites in different groups at all taxonomic levels (Singh, 2016), Costion C, Ford A, Cross H, Crayn D, Harrington M, Lowe A. 2011. Plant DNA barcodes can accurately estimate species richness in poorly known floras. chemocoding differs in that it is designed to quantitatively dis- PLoS ONE 6: e26841. criminate species across samples. This uses an unsupervised statis- Cristescu ME. 2014. From barcoding single individuals to metabarcoding tical approach that will probably yield consistent results biological communities: towards an integrative approach to the study of global independently of a priori ideas of species classifications. We see biodiversity. Trends in Ecology & Evolution 29: 566–571. great potential in chemocoding to assist in the inventory of Cutler A, Cutler DR, Stevens JR. 2009. Tree-based methods. In: Li X, Xu R, eds. High dimensional data analysis in cancer research. New York, NY, USA: species-rich forests and potentially in the discovery of new Springer, 83–101. species. Cutler DR, Edwards KH, Beard AC, Kyle TH, Jacob G, Joshua JL. 2007. Random forests for classification in ecology. Ecology 88: 2783–2792. Dexter KG, Pennington TR, Cunningham CW. 2010. Using DNA to assess Acknowledgements errors in tropical tree identifications: how often are ecologists wrong and when does it matter? Ecological Monographs 80: 267–286. We thank the Ministry of Environment of Ecuador and the Min- Dick CW, Webb CO. 2012. Plant DNA barcodes, taxonomic management and istry of Agriculture of Peru for granting the research and exporta- species discovery in tropical forests. In: Kress WJ, Erickson DL, eds. DNA tion permits. Valuable field assistance was provided by Zachary barcodes: methods and protocols. Totowa, NJ, USA: Humana Press, 379–383. Benavidez, Allison Thompson, Yamara Serrano and Mayra Dugarte FM, Higuchi N, Almeida A, Vicentina A. 2013. Species spectral Ninazunta. This work was supported by grants from the National signature: discriminating closely related species in the Amazon with near- infrared leaf-spectroscopy. Forest Ecology and Management 291: 240–248. Science Foundation (DEB-0640630 and DIMENSIONS of Bio- Embrechts MJ, Gatti CJ, Linton J, Roysam B. 2013. Hierarchical clustering for diversity DEB-1135733), and Nouragues Travel Grants Pro- large data sets. In: Georgieva P, Mihaylova L, Jain LC, eds. Advances in gram, CNRS, France, to T.A.K. and P.D.C., and the Secretarıa intelligent signal processing and data mining: theory and applications. New York, Nacional de Educacion Superior, Ciencia, Tecnologıa e Inno- NY, USA: Springer, 197–233. vacion del Ecuador (SENESCYT) to M-J.E. Endara MJ, Coley PD, Ghabash G, Nicholls JA, Dexter KG, Donoso DA, Stone GN, Pennington RT, Kursar TA. 2017. Coevolutionary arms race versus host defense chase in a tropical-herbivore plant system. Proceedings of the National Author contributions Academy of Sciences, USA 114: E7499–E7505. Gemeinholzer B, Oberprieler C, Bachmann K. 2006. Using GenBank data for M-J.E., P.D.C., N.L.W., D.L.F. and T.A.K. designed and con- plant identification: possibilities and limitations using the ITS 1 of Asteraceae – ducted the research. M-J.E., N.L.W. and D.L.F. designed and species belonging to the tribes Lactuceae and Anthemideae. Taxon 55: 173 187. performed the data analysis. G.C.Y. contributed to the Gibbs RD. 1974. Chemotaxonomy of flowering plants. Montreal, QC, Canada: metabolomic analysis. J.A.N., R.T.P., K.G.D., C.A.K. and McGill’s Queen’s University Press. G.N.S. contributed the next-generation DNA sequence data. Gonzalez MA, Baraloto C, Engel J, Mori SA, Petronelli P, Riera B, Roger A, M-J.E., P.D.C., N.L.W., D.L.F., J.A.N., R.T.P., K.G.D., Thebaud C, Chave J. 2009. Identification of Amazonian trees with DNA C.A.K., G.N.S. and T.A.K. wrote the manuscript. barcodes. PLoS ONE 4: e7483.

New Phytologist (2018) Ó 2018 The Authors www.newphytologist.com New Phytologist Ó 2018 New Phytologist Trust New Phytologist Research 11

Grimaldi C, Riera B. 2001. Geography and climate. In: Bongers F, Charles- Radtke MG, Da Fonseca CRV, Williamson GB. 2007. The old and young Dominique P, Forget PM, Thery M, eds. Nouragues: dynamics and plant– Amazon: dung beetle biomass, abundance and species diversity. Biotropica 39: animal interactions in a Neotropical rain forest. New York, NY, USA: Springer- 725–730. Science + Business Media. Razafimandimbison SG, Kellogg EA, Bremer B. 2004. Recent origin and Harborne JB, Turner BL. 1984. Plant chemosystematics. London, UK: Academic phylogenetic utility of divergent ITS putative pseudogenes: a case study from Press. Naucleeae (Rubiaceae). Systematic Biology 53: 177–192. Hollingsworth PM, Forrest LL, Spouge JL, Hajibabaei M, Ratnasingham S, van Reeb PD, Bramardi SJ, Steibel JP. 2015. Assessing dissimilarity measures for der Bank M, Chase MW, Cowan RS, Erickson DL, Fazekas AJ et al. 2009. A sample-based hierarchical clustering of RNA sequencing data using plasmode DNA barcode for land plants. Proceedings of the National Academy of Sciences, datasets. PLoS ONE 10: e0132310. USA 106: 12794–12797. Richardson JE, Pennington RT, Pennington TD, Hollingsworth PM. 2001. Hollingsworth PM, Graham SW, Little DP. 2011. Choosing and using a plant Rapid diversification of a species-rich genus of neotropical rain forest trees. DNA barcode. PLoS ONE 6: e19254. Science 293: 2242–2245. Kress WJ, Erickson DL, Jones FA, Swenson NG, Perez R, Sanjur O, Seberg O, Petersen G. 2009. How many loci does it take to DNA barcode a Bermingham E. 2009. Plant DNA barcodes and a community phylogeny of a Crocus? PLoS ONE 4: e4598. tropical forest dynamics plot in Panama. Proceedings of the National Academy of Sedio BE, Rojas Echeverri JC, Boya CA, Wright JS. 2017. Sources of variation Sciences, USA 106: 18621–18626. in foliar secondary chemistry in a tropical forest tree community. Ecology 98: Kress WJ, Erickson DL, Swenson NG, Thompson J, Uriarte M, Zimmerman 616–623. JK. 2010. Advances in the use of DNA barcodes to build a community Singh R. 2016. Chemotaxonomy: a tool for plant classification. Journal of phylogeny for tropical trees in a Puerto Rican forest dynamics plot. PLoS ONE Medicinal Plant Studies 4:90–93. 5: e15409. Sinimbu G, Coley PD, Lemes MR, Lokvam J, Kursar TA. 2012. Do the Kress WJ, Garcıa-Robledo C, Uriarte M, Erickson DL. 2015. DNA barcodes for antiherbivore traits of developing leaves in the Neotropical tree ecology, evolution, and conservation. Trends in Ecology & Evolution 30:25–35. Inga paraensis () vary with light availability? Oecologia 170: 669– Kuhl C, Tautenhahn R, Bottcher C, Larson TR, Neumann S. 2012. CAMERA: 676. an integrated strategy for compound spectra extraction and annotation of Smith PM. 1976. The chemotaxonomy of plants. London, UK: Edward Arnold. liquid chromatography/mass spectrometry data sets. Analytical Chemistry 84: Smith CA, Want EJ, O’Maille G, Abagyan R, Siuzdak G. 2006. XCMS: 283–289. processing mass spectrometry data for metabolite profiling using nonlinear Kursar TA, Dexter KG, Lokvam J, Pennington RT, Richardson JE, Weber MG, peak alignment, matching, and identification. Analytical Chemistry 78: Murakami ET, Drake C, McGregor R, Coley PD. 2009. The evolution of 779–787. antiherbivore defenses and their contribution to species coexistence in the ter Steege H, Pitman NCA, Sabatier D, Baraloto C, Salom~ao RP, tropical tree genus Inga. Proceedings of the National Academy of Science, USA Guevara JE, Phillips OL, Castilho CV, Magnusson WE, Moline JF 106: 18073–18078. et al. 2013. Hyperdominance in the Amazonian tree flora. Science 342: Lang C, Costa FRC, Camargo JLC, Durgante M, Vicentini A. 2015. Near 1243092. infrared spectroscopy facilitates rapid identification of both young and mature Suzuki R, Shimodaira H. 2014. pvclust: hierarchical clustering with P-values via Amazonian tree species. PLoS ONE 10: e0134521. multiscale bootstrap resampling. Bioinformatics 22: 1540–1542. Lawrence RL, Shana DW, Roger LS. 2006. Mapping invasive plants using Tautenhahn R, B€ottcher C, Neumann S. 2008. Highly sensitive feature hyperspectral imagery and Breiman Cutler Classifications (randomForest). detection for high resolution LC/MS. BMC Bioinformatics 9: 504. Remote Sensing of Environment 100: 356–362. Tautenhahn R, Patti GJ, Kalisiak E, Miyamoto T, Schmidt M, Lo FY, McBee J, Leigh EG. 1999. Tropical forest ecology: a view from Barro Colorado Island. New Baliga NS, Siuzdak G. 2011. MetaXCMS: second-order analysis of untargeted York, NY, USA: Oxford University Press. metabolomics data. Analytical Chemistry 83: 696–700. Liaw A, Wiener M. 2002. Classification and regression by randomForest. R News Valencia R, Balslev H, Mino~ GP. 1994. High tree a diversity in Amazonian 2:18–22. Ecuador. Biodiversity and Conservation 3:21–28. Liu J, Yan HF, Newmaster SG, Pei N, Ragupathy S, Ge XJ. 2015. The use of Valencia R, Foster RB, Villa G, Condit R, Svenning J-C, Hernandez C, DNA barcoding as a tool for the conservation biogeography of subtropical Romoleroux K, Losos E, Magard E, Balslev H. 2004. Tree species forests in China. Diversity of Distributions 21: 188–199. distributions and local habitat variation in the Amazon: large forest plot in Lokvam J, Clausen TP, Grapov D, Kursar TA. 2007. Galloyl depsides of eastern Ecuador. Journal of Ecology 92: 214–229. tyrosine from young leaves of Inga laurina. Journal of Natural Products 70: Valentini A, Pompanon F, Taberlet P. 2009. DNA barcoding for ecologists. 134–136. Trends in Ecology & Evolution 24: 110–117. Mace GM. 2004. The role of in species conservation. Philosophical Vernooy R, Haribabu E, Muller MR, Vogel JH, Hebert PDN, Schindel DE, Transactions of the Royal Society of London Series B: Biological Sciences 359: 711– Shimura J, Singer GAC. 2010. Barcoding life to conserve biological diversity: 719. beyond the taxonomic imperative. PLoS Biology 8: e1000417. Naciri Y, Linder HP. 2015. Species delimitation and relationships: the dance of Wiggins NL, Forrister DL, Endara MJ, Coley PD, Kursar TA. 2016. the seven veils. Taxon 64:3–16. Quantitative and qualitative shifts in defensive metabolites define chemical Nicholls JA, Pennington RT, Koenen EJM, Hughes CE, Hearn J, Bunnefeld L, defense investment during leaf development in Inga, a genus of tropical trees. Dexter KG, Stone GN, Kidner CA. 2015. Using targeted enrichment of Ecology and Evolution 6: 478–492. nuclear genes to increase phylogenetic resolution in the neotropical rainforest genus Inga (Leguminosae: Mimosoideae). Frontiers in Plant Science 6:1–20. Pennington TD. 1997. The genus Inga: botany. Kew, UK: The Royal Botanic Supporting Information Gardens, Kew. Pennington RT, Lavin M. 2015. The contrasting nature of woody plant species Additional Supporting Information may be found online in the in different neotropical forest biomes reflects differences in ecological stability. Supporting Information tab for this article: New Phytologist 210:25–37.  Pennington TD, Revelo N. 1997. El genero Inga en el Ecuador. Kew, UK: The Fig. S1 Photographs of the Inga species for each study case. Royal Botanic Gardens, Kew. Pitman N 2007. An overview of the Los Amigos watershed, Madre de Dios, southeastern Peru. October 2007 version of an unpublished report available Fig. S2 Two morphologically confusing or cryptic species that from the author at [email protected] are present at the same site: Inga coruscans and I. laurina in Peru.

Ó 2018 The Authors New Phytologist (2018) New Phytologist Ó 2018 New Phytologist Trust www.newphytologist.com New 12 Research Phytologist

Fig. S3 Two morphologically confusing or cryptic species that Table S1 Mass difference rules used for peak annotation in nega- are present at the same site: Inga umbellifera and I. microcoma in tive mode with CAMERA Ecuador. Table S2 Species accuracy classification from Random Forest Fig. S4 Two morphologically confusing or cryptic species that Analyses for Inga species with representatives from a single site are present at the same site: Inga chartacea and I. sapindoides in Ecuador. Table S3 Species accuracy classification from Random Forest Analyses for Inga species that occurred at two to four of the study Fig. S5 Morphological and chemical variation within a site: Inga sites leiocalycina T65 and I. leiocalycina T86 in Ecuador. Methods S1 R code for LC/MS raw data pre-processing in Fig. S6 Identification of a widespread species across its range: XCMS. Inga pezizifera in Panama, French Guiana and Brazil. Methods S2 R code for Random Forest Analysis. Fig. S7 Identification of a widespread species across its range: Inga alba in French Guiana, Brazil and Peru. Please note: Wiley Blackwell are not responsible for the content or functionality of any Supporting Information supplied by the Fig. S8 Identification of a widespread species across its range: authors. Any queries (other than missing material) should be Inga alata in French Guiana, Ecuador and Peru. directed to the New Phytologist Central Office.

Fig. S9 Identification of a widespread species across its range: Inga marginata in Panama, French Guiana, Ecuador and Peru.

New Phytologist is an electronic (online-only) journal owned by the New Phytologist Trust, a not-for-profit organization dedicated to the promotion of plant science, facilitating projects from symposia to free access for our Tansley reviews and Tansley insights.

Regular papers, Letters, Research reviews, Rapid reports and both Modelling/Theory and Methods papers are encouraged. We are committed to rapid processing, from online submission through to publication ‘as ready’ via Early View – our average time to decision is <26 days. There are no page or colour charges and a PDF version will be provided for each article.

The journal is available online at Wiley Online Library. Visit www.newphytologist.com to search the articles and register for table of contents email alerts.

If you have any questions, do get in touch with Central Office (np-centraloffi[email protected]) or, if it is more convenient, our USA Office (np-usaoffi[email protected])

For submission instructions, subscription and all the latest information visit www.newphytologist.com

New Phytologist (2018) Ó 2018 The Authors www.newphytologist.com New Phytologist Ó 2018 New Phytologist Trust