<<

CORE Metadata, citation and similar papers at core.ac.uk

Provided by MPG.PuRe Rzanny et al. Methods (2019) 15:77 https://doi.org/10.1186/s13007-019-0462-4 Plant Methods

RESEARCH Open Access , or both? How to obtain suitable images for automated plant identifcation Michael Rzanny1* , Patrick Mäder2, Alice Deggelmann1, Minqian Chen2 and Jana Wäldchen1

Abstract Background: Deep learning algorithms for automated plant identifcation need large quantities of precisely labelled images in order to produce reliable classifcation results. Here, we explore what kind of perspectives and their combi- nations contain more characteristic information and therefore allow for higher identifcation accuracy. Results: We developed an image-capturing scheme to create observations of fowering . Each observation comprises fve in-situ images of the same individual from predefned perspectives (entire plant, fower frontal- and lat- eral view, top- and back side view). We collected a completely balanced dataset comprising 100 observations for each of 101 species with an emphasis on groups of conspecifc and visually similar species including twelve species. We used this dataset to train convolutional neural networks and determine the prediction accuracy for each single perspective and their combinations via score level fusion. Top-1 accuracies ranged between 77% (entire plant) and 97% (fusion of all perspectives) when averaged across species. frontal view achieved the highest accuracy (88%). Fusing fower frontal, fower lateral and leaf top views yields the most reasonable compromise with respect to acquisition efort and accuracy (96%). The perspective achieving the highest accuracy was species dependent. Conclusions: We argue that image databases of herbaceous plants would beneft from multi organ observations, comprising at least the front and lateral perspective of fowers and the leaf top view. Keywords: Species identifcation, Object classifcation, Multi-organ plant classifcation, Convolutional networks, Deep learning, Plant images, Computer vision, Plant observation, Plant determination, Plant leaf, Flower, Poaceae

Background pace with the multitude of determination tasks neces- Te continuing unprecedental loss of species from eco- sary for various monitoring purposes. Automated plant logical communities strongly afects properties, function- identifcation is considered to be the key in mitigating ing and stability of entire ecosystems [1, 2]. Plants form the “taxonomic gap” [6, 7] for many professionals such the basis for many terrestrial food webs and changes in as farmers, foresters or teachers in order to improve neo- plant composition are known to cascade up through the phyte management, weed control or knowledge transfer. entire community [3, 4], afecting multiple ecosystem Serious proposals to automate this process have functions [5]. Monitoring and managing the presence or already been published 15 years ago [8] but have only abundance of plant species is therefore a key requirement now become an increasingly reliable alternative [9]. of conservation biology and sustainable development, but Recent boosts in data availability, accompanied by sub- depends on expert knowledge in terms of species identi- stantial progress in machine learning algorithms, nota- fcation. However, the number of experts can hardly keep bly convolutional neural networks (CNNs), pushed these approaches to a stage where they are better, faster, *Correspondence: mrzanny@bgc‑jena.mpg.de cheaper and have the potential to signifcantly contrib- 1 Department of Biogeochemical Integration, Max Planck Institute ute to biodiversity and conservation research [10]. Well for Biogeochemistry, Hans‑Knöll‑Str. 10, Jena, Germany trained automated plant identifcation systems are now Full list of author information is available at the end of the article

© The Author(s) 2019. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat​iveco​mmons​.org/licen​ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creat​iveco​mmons​.org/ publi​cdoma​in/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Rzanny et al. Plant Methods (2019) 15:77 Page 2 of 11

considered to be comparable to human experts in label- each species is represented by 100 of those completely ling plants on images, given the limited amount of infor- balanced observations. Te entire dataset was created mation present in the two dimensional images [11]. using the freely available Flora Capture smartphone app A considerable hurdle in this research direction has [24], which was intentionally developed to routinely col- been the acquisition of qualifed training images. Today, lect this type of multi organ observation. We reviewed the ubiquity of smartphones allows people to capture, each image to ensure the quality of species identifcation digitize, and share their observations, providing large and allowing us to address our research questions largely quantities of images which may be utilized for the train- independent of any data quality constraints. More spe- ing of classifcation algorithms. Worldwide citizen sci- cifcally we ask: ence platforms such as Pl@ntNet [7] and iNaturalist [12] show the great potential of crowd-sourcing vast amounts • RQ1 Which are the most and the least important of image data. However, such images inhibit a wide range perspectives with respect to prediction accuracy? of quality. A widely known example is the PlantCLEF • RQ2 What gain in accuracy can be achieved by com- dataset [13], which is used as benchmark for various bining perspectives? computer vision tasks [14–18]. In this collection, each • RQ3 How do the accuracies difer among separate image is assigned a posteriori to one of seven categories CNNs trained per image perspective in contrast to a (entire, leaf, leaf scan, fower, fruit, stem and branch). Yet, single CNN trained on all images? it is not clear how the results achieved on such a dataset • RQ4 Is the specifcity of a perspective or a combina- are afected by data imbalance towards image number per tion of perspectives universal or species dependent? species and organs, poor image quality and misidentifed • RQ5 How sensitive are identifcation results to the species [19]. As there is no dedicated sampling protocol number of utilized training images? for generating these observations, in most cases observa- tions consists of single images [18] of the whole plant or To answer these questions we trained a convolutional organs taken from undefned perspectives. Other pub- neuronal network classifer (CNN) for each perspec- licly available benchmark datasets such as Oxford fower tive and used it to explore the information contained in 102 [20], MK leaf [21] or LeafSnap [22] usually comprise images from diferent organs and perspectives. either leaves or fowers but in no case multi organ obser- vations. A recent approach named WTPlant uses stacked Methods CNNs to identify plants in natural images [23]. Tis Image acquisition approach explicitly addresses multiple scales within a sin- All images were collected using the Flora Capture app gle picture and aims at analyzing several regions within [24], a freely available smartphone application intended the image separately, incorporating a preprocessing step to collect structured observations of plants in the feld. with interactive image segmentation. Each observation is required to consist of at least fve Even for experienced botanists it is sometimes impos- images (cp. Fig. 1). For forbs, the following fve perspec- sible to provide a defnite identifcation based on a single tives are prescribed. First, entire plant—an image captur- image [19], because important details might not be vis- ing the general appearance of the entire plant (referring ible in sufcient resolution in order to be recognized and to the ramet; i.e viable individuals within a clonal colony) distinguished from similar species. Similar to humans, taken in front of its natural background. Second, fower who increase the chance of correctly identifying plant frontal—an image of the fower from a frontal perspec- specimen by observing several organs at the same time, tive with the image plane vertical to the fower axis. considering more than one viewing angle and taking a Tird, fower lateral—an image of the fower from a lat- closer look at specifc organs, combining diferent per- eral perspective with the foral axis parallel to the image spectives and organs in an automated approach is sup- plane. Fourth, leaf top—an image showing an entire posed to increase the accuracy of determination tasks upper surface of a leaf. In the case of compound leafs, all [16, 17]. Especially, separate images of organs and dif- leafets shall be covered by the image. Fifth, leaf back— ferent viewing angles might be benefcial to depict spe- the same as before but referring to the leaf lower surface. cifc small-scaled structures. To our knowledge, the In the case of composite fowers and fower heads form- contribution of diferent perspectives and the fusion of ing a functional unity (i.e., Asteraceae, Dipsacaceae) the several combinations have never been assessed using a fower heads were treated as a single fower. In the case controlled and completely balanced dataset. Terefore, of grasses (Poaceae), this scheme is slightly modifed. we curated a partly crowd-sourced image dataset, com- Instead of the fower frontal view, users are requested to prising 50,500 images of 101 species. Each individual was take an image of at least one fower from the minimum photographed from fve predefned perspectives, and focusing distance of their device. Te fower lateral view Rzanny et al. Plant Methods (2019) 15:77 Page 3 of 11

Fig. 1 a Dataset overview: Each of the 101 species was photographed from fve perspectives with 100 repetitions per species. b Examples for a complete observation of a grass species (Poa pratensis, left) and a forb species (Ranunculus acris, right). The perspectives are names entire plant, fower frontal, fower lateral, leaf tip and leaf back. Please note that content defnition of the grass perspectives does not match exactly with the defnition of the forb species as described in the text. Despite of the slightly diferent defnitions, the perspective names are the same for grass and forb observations. This fgure shows only 100 species for the ease of presentation

relates always to a side view of the whole inforescence. Yet, the majority of observations (especially grasses) were Images of entire grass leaves would have too less detail obtained by project members and a number of students and the image would be dominated by the background. with a variety of smartphone models, in diferent regions Instead, we requested an image of the upper side of the and with smartphones interchanged among persons. leaftip (leaf top) and another one taken from the back- None of the images was preprocessed in any way. Te side in the mid of the leaf (leaf back). Tese images are only qualifying condition for an observation was that fve also taken at the minimum focusing distance. Despite the images from the predefned perspectives were taken with slightly diferent defnitions in grass species we always a smartphone using the Flora Capture App. used the names of the forb perspectives for all species. All images are obtained in situ and the users are instructed not to remove any part of the plant while creating the Dataset curation observation. Especially, photos of the leaf backsides Te 101 species in the dataset have been selected to mainly required additional manual efort to arrange the objects represent the large plant families and their widely distrib- appropriately and without damage [25]. In the last step, uted members across Germany (cp. Fig. 2). Nomenclature the observations were uploaded to the Flora Incognita follows the GermanSL list [27]. Whenever possible we server. Te correct species for all observations were selected two or more species from the same genus in order determined, validated or corrected by the authors of this to evaluate how well the classifers are able to discriminate paper. Te citizen science community of the Flora Incog- between visually very similar species (see Additional fle 1: nita project [26] was encouraged to particularly contrib- Table S1 for the complete species list). Each individual was ute observations of species covered by this experiment. fowering during the time of image acquisition. Rzanny et al. Plant Methods (2019) 15:77 Page 4 of 11

which is a common and benefcial procedure for training of classifers with less than one million available train- ing images [31]. Tat is, we used a network that was pre- trained on the large-scale ImageNet [32] ILSVRC 2012 dataset before our actual training began. Training used a batch size of 32, with a learning rate of 0.003 and was ter- minated after 200,000 steps. Because an object should be equally recognizable as its mirror image, images were ran- domly fipped horizontally. Furthermore, brightness was adjusted by a random factor up to 0.125 and also the satu- ration of the RGB image was adjusted by a random fac- tor up to 0.5. As optimizer for our training algorithms we used RMSProp [33] with a weight decay of 0.00004. Each Fig. 2 Family membership of the species included in the dataset image was cropped to a centered square containing 87.5% of the original image. Eventually, each image was resized to 299 pixels. We used 80 images per species for train- ing and ten for each validation and testing. Te splitting Classifer and evaluation was done based on observations rather than on images, We trained convolutional neural network (CNN) classif- i.e., all images belonging to the same observation were ers on the described data set. CNNs are a network class used in the same subset (training, validation or testing). applicable to deep learning of images that are comprised Consequently, the images in the three subsets across all of one or more convolutional layers followed by one or fve image types belong to the same plants. We explicitly more fully connected layers (see Fig. 3). CNNs consider- forced the test set to refect the same observations across ably improve visual classifcation of botanical data com- all perspectives, combinations and training data reduc- pared to previous methods [28]. Te main strength of tions in order to enable comparability of results among this technology is its ability to learn discriminant visual these variations. Using images from difering observa- features directly from the raw pixels of an image. In this tions in the test, validation and training set for diferent study, we used the state-of-the-art Inception-ResNet-v2 confgurations might have obscured efects and impeded architecture [29]. Tis architecture achieved remarkable interpretation through the introduction of random fuctu- results on diferent image classifcation and object detec- ations. In order to investigate the efect of combining dif- tion tasks [30]. We used a transfer learning approach, ferent organs and perspectives, we followed two diferent

Fig. 3 Overview of the approach illustrating the individually trained CNNs and the score fusion of predictions for two perspectives. Each CNN is trained on the subset of images for one perspective, its topology is comprised of 235 convolutional layers followed by two fully connected layers. For each test image the classifer contributes a confdence score for all species. The overall score per species is calculated as the arithmetic mean of the scores for this species across all considered perspectives Rzanny et al. Plant Methods (2019) 15:77 Page 5 of 11

approaches. On the one hand, we trained one classifer for of the leaf perspectives (cp. Table 1, Fig. 4). Accuracy each of the fve perspectives (A) and on the other hand, increases with the number of perspectives fused, while we trained a classifer on all images irrespective of their variability within the same level of fused perspectives designated perspective (B). All subsequent analyses were decreases. Te increase in accuracy decreases with every subjected to the frst training strategy (A), while the sec- added perspective (Fig. 4) and fusing all fve perspectives ond one was conducted to compare the results against yields the highest overall accuracy of all combinations the baseline approach, as used in established plant iden- (97.1%). Te fgure also shows that certain combinations tifcation systems (e.g. Pl@ntNet [7], iNaturalist [12] or with more fused perspectives actually perform worse Flora Incognita [26]), where a single network is trained than combination with less fused perspectives. For exam- on all images. Finally, we applied a sum-rule based score ple, the accuracy of the best two-perspectives-combina- level fusion for the combination of the diferent perspec- tion, fower lateral combined with with leaf top (FL + LT: tives (cp. Fig. 3). We decided to apply a simple sum rule- 93.7%), is higher than the accuracy for the worst based fusion to combine the scores of perspectives, as this represents the most comprehensible method and allows Table 1 Top-1 and Top-5 accuracies achieved for each a straightforward interpretation of the results. Te over- single perspective and for all combinations of perspectives all fused score S is calculated as the sum of the individual scores for the particular combination as Analysed All Forbs Grasses n s Perspective(s) Top-1 Top-5 Top-1 Top-5 Top-1 S = n (1) Entire plant (EP) 77.4 92.7 80.8 94.5 63.3 1 Flower frontal (FF) 87.1 96.5 92.6 98.7 85.0 Flower lateral (FL) 88.2 97.6 89.1 97.9 85.0 where n is the number of perspectives to be fused. Leaf top side (LT) 81.1 94.8 84.9 95.6 55.8 As our dataset is completely balanced we can simply Leaf back side (LB) 81.8 95.4 83.8 96.1 68.3 calculate Top-1 and Top-5 accuracy for each species as EP FF 91.6 98.5 94.6 99.1 84.2 the average across all images of the test set. Top-1 accu- + EP FL 92.4 98.4 85.1 96.2 90.8 racy is the fraction of test images where the species which + EP LT 88.3 97.6 91.2 98.1 71.7 achieved the highest score from the classifer is consist- + EP LB 90.3 98.2 92.5 98.4 78.3 ent with the ground truth, i.e the predicted species equals + FF FL 93.3 98.5 94.0 98.9 87.5 the actual species. Te Top-5 accuracy refers to the frac- + FF LT 92.5 98.6 96.4 99.4 84.2 tion of test images where the actual species is one of the + FF LB 93.3 98.7 96.1 99.3 88.3 fve species achieving the highest score. + FL LT 93.7 98.7 87.2 96.1 84.2 + FL LB 93.5 99.0 86.7 96.7 85.8 Reducing the number of training images + LT LB 88.4 97.4 91.0 97.8 70.0 As the achieved accuracy will be dependent on the num- + EP FF FL 95.3 98.9 94.5 98.9 92.5 ber of available training images, we reduced the original + + EP FF LT 94.0 99.0 96.5 99.3 82.5 number of 80 training images per species to 60, 40 and + + EP FF LB 95.1 99.3 96.9 99.6 90.0 20 images. We than repeated the training of CNNs for + + EP FL LT 94.4 98.9 92.4 98.3 84.2 each of the reduced sets and used each of the new classi- + + EP FL LB 95.5 99.3 93.3 98.7 84.2 fers to identify the identical set of test images. i.e. images + + EP LT LB 92.1 98.7 94.8 98.8 89.2 belonging to the same ten observations. Te diference in + + FF FL LT 95.8 99.0 96.8 99.3 87.5 accuracy achieved with less training images would indi- + + FF FL LB 95.8 99.1 96.5 99.3 91.7 cate whether adding more training images can improve + + FF LT LB 95.3 98.9 97.4 99.6 91.7 the accuracy of the classifer. On the contrary, if accuracy + + FL LT LB 94.8 99.0 92.3 98.0 84.2 is unchanged or only slightly lower with the number of + + EP FF FL LT 96.0 98.9 97.0 99.2 90.0 training images reduced, this would indicate that adding + + + EP FF FL LB 96.7 99.4 97.4 99.6 92.5 more training images is unlikely to further improve the + + + EP FF LT LB 95.8 99.3 97.6 99.4 86.7 results. + + + EP FL LT LB 96.2 99.1 95.6 98.9 88.3 + + + FF FL LT LB 96.8 99.3 97.6 99.6 90.8 Results + + + Performance of perspectives and combinations All 97.1 99.3 98.2 99.4 90.0 Classifcation accuracy for the single perspectives ranges “All” refers to the complete dataset comprising 101 species, “Forbs” refers to the 89 forb species and “Grasses” refers to the twelve Poaceae species. Top-5 between 77.4% (entire plant) and 88.2% (fower lateral). accuracy is not shown for “Grasses” due to the low number of species. The Both fower perspectives achieve a higher value than any highest accuracy per combination level is marked in bold face Rzanny et al. Plant Methods (2019) 15:77 Page 6 of 11

a 100

90

80 Accuracy

70

12345 Number of perspectives

b 100 Number of perspectives alpha 1-A 90 1-B 2-A 2-B 3-A 80 3-B

Accuracy 4-A 4-B 5-A 70 5-B

EP FF FL LT LB E EP+FLEP+LEP+LBFF FF+L F FL+LTFL+LBLT E EP+FF+LEP+FT FEP+FL+LEP+ EP+LFF+FL+LBFF FF+LT+LF B EP+FFEP+FFEP+FFEP+FL+LF All P+F F P+ L F+F + +LB +LB +FL+LT +LT FL F F F T T F L+L T L+LT+ +FL +LB +LB +LB + + +L FL+L F T B L+LBT+LBT +LB L T B Perspective combination Fig. 4 a Accuracy as a function of number of combined perspectives. Each data point represents one combination shown in b. b Mean accuracy for each perspective individually and for all possible combinations. The letters A and B in the legend refer to the diferent training strategies. The letter A and more saturated colours indicate training with perspective-specifc networks while the letter B and less saturated colours represent the accuracies for the same set of test images when a single network was trained on all images. The grey lines connect the medians for the numbers of considered perspectives for each of the training approaches. Error bars refer to the standard error of the mean

three-perspective-combination entire plant in combina- three-perspectives-combinations (cp. Table 1, Fig. 4). tion with leaf top and leaf back (EP + LT + LB: 92.1%). Fusing all fve perspectives achieves the highest accuracy Te combination of the two fower perspectives yields and the complete set of ten images for 83 out of the 101 similarly high accuracies as the combination of a leaf studied species is correctly classifed, while this is the and a fower perspective, while the combination of case for only 38 species if considering only the the best both leaf perspectives achieve the second lowest over- performing single perspective fower lateral (cp. Fig. 5). all accuracy across all two-perspective-combinations with only the combination of entire plant and leaf top slightly worse. Te best performing three-perspective Diferences among the training approaches combinations are both fower perspectives combined Te accuracies gained from the single CNN (approach with any of the leaf perspectives. Te four-perspectives- B) are in the vast majority markedly lower than the combinations generally show low variability and equally accuracies resulted from the perspective-specifc CNNs or slightly higher accuracies when compared to the (approach A) (Fig. 4). On average, accuracies achieved Rzanny et al. Plant Methods (2019) 15:77 Page 7 of 11

Bromus ramosus agg. erectus agg. Brachypodium sylvaticum Bromus sterilis Bromus hordeaceus agg. Bromus racemosus agg. Arrhenatherum elatius Poa annua agg. Poa pratensis agg. Poa trivialis Dactylis glomerata agg. Sesleria varia agg. Plantago lanceolata Plantago media agg. Plantago major agg. Euphorbia cyparissias Euphorbia helioscopia Euphorbia platyphyllos Mercurialis perennis agg. Aegopodium podagraria Anthriscus sylvestris agg. Chaerophyllum aureum Chaerophyllum bulbosum Accuracy Chaerophyllum temulum Alliaria petiolata Capsella bursa-pastoris Cardamine hirsuta 100 Draba verna agg. Thlaspi arvense Microthlaspi perfoliatum Stellaria holostea Cerastium fontanum agg. Arctium lappa Arctium tomentosum Arctium nemorosum Centaurea jacea agg. Centaurea scabiosa Knautia arvensis agg. Scabiosa columbaria agg. Trifolium medium 80 Trifolium pratense Trifolium repens Geranium sylvaticum Geranium pratense Geranium robertianum agg. Geranium pyrenaicum Geranium dissectum Geranium columbinum Geranium pusillum Glechoma hederacea agg. Prunella grandiflora Prunella vulgaris Stachys sylvatica 60 Lamium purpureum Species Lamium maculatum Polygala amara agg. Polygala comosa Viola odorata Vicia sepium Campanula rapunculoides Campanula rotundifolia agg. Campanula trachelium Hepatica nobilis Veronica hederifolia agg. Veronica persica Veronica chamaedrys agg. 40 Veronica arvensis Veronica serpyllifolia Lapsana communis Mycelis muralis Leontodon hispidus Leucanthemum vulgare agg. Tripleurospermum maritimum agg. Tanacetum corymbosum agg. Bellis perennis Fragaria moschata Fragaria vesca Fragaria viridis Potentilla sterilis 20 Anemone nemorosa Oxalis acetosella Anemone ranunculoides Potentilla reptans Potentilla verna agg. Ranunculus bulbosus agg. Ranunculus repens Ranunculus acris agg. Ranunculus lanuginosus Ranunculus auricomus agg. Ficaria verna Chelidonium majus Geum urbanum Hypericum perforatum Hypericum hirsutum Primula veris Primula elatior agg. Lotus corniculatus agg. Hippocrepis comosa Medicago lupulina Trifolium campestre Trifolium dubium agg.

EP FF FL LT LB EP+FFEP+FEP+LTL EP+LFF+FLB FF+LTFF+LBFL+LTFL+LBLT+ EP+FF+FE E EP+FL+LTEP+FL+LBEP+LT+FF+FL+LFF+FL+LTFF+LT+LBFL+LTEP+FF+FL+EP+FF+FL+LE EP+FL+LT+LFF+FL+A P+FFP+FF+L+LT B P+FF+LT+LB ll LB + L LB LT+L L B B L T B B B Perspective combination Fig. 5 Species wise accuracy for each single perspective and for all combinations of perspectives. Accuracy of a particular perspective combination is color coded for each species Rzanny et al. Plant Methods (2019) 15:77 Page 8 of 11

with training approach B are reduced by more than two of a plant increases the chance to defnitely cover the percent compared to training approach A. most important perspective. Especially, images depict- ing the entire plant inevitably contain lots of background information, which is unrelated to the species itself. In Diferences between forbs and grasses the majority of cases, images of the category entire plant Generally, the accuracies for the twelve grass species are also contain other individuals or parts of other spe- lower for all perspectives than for the 89 forb species (cp. cies (Fig. 1a) and it may even be difcult to recognize an Table 1, Fig. 6). Additionally, all accuracies achieved for individual as a perceivable unit within its habitat. Such the forbs are higher than the average across the entire background information can be benefcial in some cases, dataset. Grasses achieve distinctly lower accuracies for such as trunks in the background of typical forest the entire plant perspective and for both leaf perspec- species or bare limestone in the back of limestone grass- tives. Te best single perspective for forbs is fower land species. In other cases, such as pastures, it is hard frontal, achieving 92.6% accuracy alone while the same to recognize a certain focus grass species among others perspective for grasses achieves only 85.0% (Table 1). on the image. Tis similarity in background represents— to a certain degree—a hidden class, which is only partly Species‑specifc accuracy diferences related to species identity. Tis could be the reason for While for some species all test images across all per- the lower accuracies achieved, when a single classifer spectives are correctly identifed (e.g., Oxalis acetosella, was trained on all images where much more confound- Tripleurospermum maritimum), for other species none ing background information enters the visual space of of the perspectives or combinations thereof allows the the network. Visual inspection of test images for species accurate identifcation of all test observations (e.g., Poa with comparably low accuracy (e.g. Trifolium campestre pratensis, Poa trivialis, Fragaria vesca). For the major- and Trifolium pratense) revealed that these contained a ity of species, however, a single or only a few fused relatively higher number of images taken at large distance perspectives allows a reliable identifcation. Yet, which and were not properly focused. Tis was possibly due kind of perspective achieves the highest accuracy, to their small size and low height making it hard for the depends on the species (cp. Fig. 5). For 28 species, none photographer to acquire proper images. of the single perspective alone allows to identify all test images correctly, while fusing perspectives allows to Combining perspectives identify the correct species across all test observations Flower side view and fower top view provide quite dif- (e.g., Ranunculus bulbosus, Bromus ramosus, Scabiosa ferent sources of information which, when used in columbaria). combination, considerably improve the classifcation result (Fig. 4). We found that combining perspectives, e.g. fower lateral and leaf top, yields a mean accuracy Reduction of training images of about 93.7% and adding fower top adds another two Reducing the number of training images to 60 or even percent, summing to an accuracy of about 95.8% for to 40 images causes no consistent efect on any per- this dataset. Given that the species in this dataset were spective. Yet, accuracy drops strongly when reducing chosen with an emphasis on containing congeneric and to 20 training images for the entire plant and leaf back visually similar species, the accuracies achieved here with perspectives, while the accuracies for both fower per- a standard CNN setting are considerably higher than spectives and the combination of all perspectives are comparable previous studies that we are aware of. For still only slightly afected (Fig. 6). example, [18] used comparable methods and achieved an accuracy of 74% for the combination of fower and leaf Discussion images using species from the PlantCLEF 2014 dataset. We found that combining multiple image perspectives [34] report an accuracy of 82% on the perspectives of leaf depicting the same plant increases the reliability of iden- and fower (fused via sum rule) for the 50 most frequent tifying its species. In general, from all single perspectives species of the PlantCLEF 2015 dataset with at least 50 entire plant achieved the lowest mean accuracy while images per organ per plant. It remains to be investigated the fower lateral perspective achieved the highest accu- whether the balancing of image categories, the balanc- racies. However, in the specifc case the best perspec- ing of the species itself, species misidentifcations or the tive depends on the particular species. Tere are several rather vaguely defned perspectives in image collections examples where another perspective achieves better such the PlantCLEF datasets are responsible for these results. As a universal best perspective for all species is substantially lower accuracies. Yet, our results underline lacking, generally collecting diferent views and organs that collecting images following a simple but predefned Rzanny et al. Plant Methods (2019) 15:77 Page 9 of 11

All fused Entire plant Flower frontal Flower lateral Leaf top Leaf back 100

All_species_80 75 All_species_60 All_species_40 All_species_20 Forbs_80 50 Forbs_60 Forbs_40 Accuracy Forbs_20 Grasses_80 Grasses_60 25 Grasses_40 Grasses_20

0

Fig. 6 Classifcation accuracies for the entire dataset (All_spechies), and separately for the subsets grasses and forbs. Numbers next to the dataset in the legend refer to the number of utilized training images protocol, i.e. structured observations, allows to achieve for the related and equally less studied families, such as substantially better results than previous work for a Cyperaceae and Juncaceae. larger dataset and with presumingly more challenging species evaluated with as few as 20 training observations A plea for structured observations per species. An important obstacle in verifying crowd sourced image data is that in many cases the exact species cannot unam- Identifying grasses biguously be determined, as particular discriminating We are not aware of any study that explicitly addresses characters are not depicted on the image. According to the automated identifcation of grasses (Poaceae). Te [19], 77.5% of all observations from the frst period after members of this large family strongly resemble each launching Pl@ntNet were single image observations and other and it requires a lot of training and experience for another 15.6% were two image observations leaving less humans to be able to reliably identify these species, espe- than 7% of all observations to consist of more than two cially in the absence of fowers. images. Generating multi-image-observations of plants While our study demonstrates substantial classifcation can improve automated plant identifcation in two ways: results for most species, the utilized perspectives are not (1) facilitating a more confdent labelling of the training sufcient to reliably identify all species. Poa trivialis and data, and (2) gaining higher accuracies for the identifed Poa pratensis are recognized with an accuracy of 60% and species. Te more difcult the plant is to identify, i.e. 70% respectively, when all perspectives are fused. In vivo, the more help humans need for identifcation, the more these two species may be distinguished by the shape of imperative structured observations become. It is impor- the leaf tips and the shape of their ligules. But many of tant to provide suitable and broadly applicable instruc- the collected images depict partly desiccated and coiled tions for users regarding which kind of images are useful leaves, which do not reveal those important features. to capture in order to gain reliable identifcation results Te shape of the ligule, another important character for from automated approach also for challenging species. grass species is not depicted in any of the perspectives Applications aiming at automated species identifcation used in this experiment. Terefore, we conclude that the would beneft from the use of multiple and structured chosen perspectives for grasses are still not sufcient image sets to increase accuracy. Our results suggest that to distinguish all species, especially if the identifcation a further increase in accuracy would not necessarily be would only be based on leafs. More research is neces- achieved by acquiring more training data, but rather by sary to identify suitable perspectives allowing to reliably increasing the quality of the training data. Generating recognize grass species. We assume that the same applies structured observations are an important step in this Rzanny et al. Plant Methods (2019) 15:77 Page 10 of 11

direction. However, in contrary to the dataset analyzed Availability of data and materials The image datasets used and analyzed during the current study are available in this paper, real world automatic plant identifcation from the corresponding author on reasonable request and under copyright systems usually sufer from a variety of shortcomings restrictions. such as them imbalanced distributions, bias between test Compliance with ethical standards and training images and unreliable species labels. Tese shortcomings can be reduced by a huge variety of techni- Ethics approval and consent to participate cal and structural improvements. Nevertheless, encour- Not applicable. aging users to provide structured species observations Consent for publication for both training and identifcation will enhance future All authors approved the manuscript and provided consent for publication. improvement of automatic plant identifcation systems. Competing interests The authors declare that they have no competing interests.

Conclusions Author details We propose that the recognition rates, especially for 1 Department of Biogeochemical Integration, Max Planck Institute for Bio- geochemistry, Hans‑Knöll‑Str. 10, Jena, Germany. 2 Software Engineering inconspicious species and species which are also difcult for Safety‑Critical Systems Group, Technische Universität Ilmenau, Ehrenberg- to diferentiate for humans, would greatly beneft from str. 20, 98693 Ilmenau, Germany. multi organ identifcation. Moreover, it is even essential to allow for a proper review of such challenging species Received: 19 November 2018 Accepted: 9 July 2019 by human experts, as we found it often impossible to ver- ify the record of a certain species based on a single image. Furthermore, we suggest to improve the observation References scheme for grasses and modify our previously utilized 1. Murphy GEP, Romanuk TN. A meta-analysis of declines in local species perspectives to add further crucial characters such as the richness from human disturbances. Ecol Evol. 2014;4(1):91–103. https​:// ligule. In fact we show that with some constraints on per- doi.org/10.1002/ece3.909. 2. Naeem S, Dufy JE, Zavaleta E. The functions of biological diversity spective and a thorough review of the images, as few as in an age of extinction. Science. 2012;336(6087):1401–6. https​://doi. 40 training observations can be sufcient to achieve rec- org/10.1126/scien​ce.12158​55. ognition rates beyond 95% for a dataset comprising 101 3. Ebeling A, Rzanny M, Lange M, Eisenhauer N, Hertzog LR, Meyer ST, Weisser WW. Plant diversity induces shifts in the functional structure and species. diversity across trophic levels. Oikos. 2018;127(2):208–19. https​://doi. org/10.1111/oik.04210.​ 4. Rzanny M, Voigt W. Complexity of multitrophic interactions in a grassland ecosystem depends on plant species diversity. J Anim Ecol. Additional fle 2012;81(3):614–27. https​://doi.org/10.1111/j.1365-2656.2012.01951​.x. 5. Allan E, Weisser WW, Fischer M, Schulze E-D, Weigelt A, Roscher C, Baade J, Barnard RL, Beßler H, Buchmann N, Ebeling A, Eisenhauer N, Engels C, Fer- Additional fle 1. Species list. gus AJF, Gleixner G, Gubsch M, Halle S, Klein AM, Kertscher I, Kuu A, Lange M, Le Roux X, Meyer ST, Migunova VD, Milcu A, Niklaus PA, Oelmann Y, Pašalić E, Petermann JS, Poly F, Rottstock T, Sabais ACW, Scherber C, Abbreviations Scherer-Lorenzen M, Scheu S, Steinbeiss S, Schwichtenberg G, Temperton CNN: convolutional neural network; EP: entire plant; FF: fower frontal; FL: V, Tscharntke T, Voigt W, Wilcke W, Wirth C, Schmid B. A comparison of fower lateral; LT: leaf top; LB: leaf back. the strength of biodiversity efects across multiple functions. Oecologia. 2013;173(1):223–37. https​://doi.org/10.1007/s0044​2-012-2589-0. Acknowledgements 6. Goëau H, Bonnet P, Joly A. Plant identifcation in an open-world (lifeclef We are grateful to David Boho, Martin Hofman, Marco Seeland, Hans-Christian 2016). In: CLEF 2016-conference and labs of the evaluation forum; 2016. Wittich and Christian Engelhardt who contributed in various ways to this pp. 428–439 study. We thank Jasmin Tannert, Richard Niemann, Michaela Schädler, Tom 7. Joly A, Goëau H, Bonnet P, Bakić V, Barbe J, Selmi S, Yahiaoui I, Carré J, Wittig and the Flora Incognita community for generating thousands of Mouysset E, Molino J-F, Boujemaa N, Barthélémy D. Interactive plant iden- plant observations enabling this study. Furthermore, we thank Anke Bebber tifcation based on social image data. Ecol Inf. 2014;23:22–34. https​://doi. for carefully proofreading and substantially improving the language of our org/10.1016/j.ecoin​f.2013.07.006 (Special Issue on Multimedia in Ecology manuscript. and Environment). 8. Gaston KJ, O’Neill MA. Automated species identifcation: Why not? Authors’ contributions Philos Trans R Soc Lond B Biol Sci. 2004;359(1444):655–67. https​://doi. Funding acquisition: PM, JW; experiment design: MR, AD, JW and PM; image org/10.1098/rstb.2003.1442. acquisition: MR, AD, JW; data analysis: MR and MC; data visualisation: MR, AD; 9. Wäldchen J, Rzanny M, Seeland M, Mäder P. Automated plant spe- writing manuscript: MR, MC, JW, PM, AD. All authors read and approved the cies identifcation—trends and future directions. PloS Comput Biol. fnal manuscript. 2018;14(4):1–19. https​://doi.org/10.1371/journ​al.pcbi.10059​93. 10. Pimm SL, Alibhai S, Bergl R, Dehgan A, Giri C, Jewell Z, Joppa L, Kays R, Funding Loarie S. Emerging technologies to conserve biodiversity. Trends Ecol We are funded by the German Ministry of Education and Research (BMBF) Evol. 2015;30(11):685–96. https​://doi.org/10.1016/j.tree.2015.08.008. Grants: 01LC1319A and 01LC1319B; the German Federal Ministry for the 11. Goëau H, Joly A, Bonnet P, Lasseck M, Šulc M, Hang ST. Deep learning Environment, Nature Conservation, Building and Nuclear Safety (BMUB) for plant identifcation: how the web can compete with human experts. Grant: 3514 685C19; and the Stiftung Naturschutz Thringen (SNT) Grant: Biodivers Inf Sci Stand. 2018;2:25637. https://doi.org/10.3897/biss.2.25637​ .​ SNT-082-248-03/2014. 12. iNaturalist. https​://www.inatu​ralis​t.org/. Accessed 15 July 2019. Rzanny et al. Plant Methods (2019) 15:77 Page 11 of 11

13. Champ J, Lorieul T, Servajean M, Joly A. A comparative study of fne- 24. The Flora Incognita project: Flora Capture; 2019. https​://fora​incog​nita. grained classifcation methods in the context of the LifeCLEF plant com/fora​-captu​re-app/. Accessed 15 July 2019. identifcation challenge 2015. In: CEUR-WS, editors. CLEF: conference and 25. Rzanny M, Seeland M, Wäldchen J, Mäder P. Acquiring and preprocessing labs of the evaluation forum. CLEF2015 working notes, Toulouse, France, leaf images for automated plant identifcation: understanding the trade- vol. 1391; 2015. https​://hal.inria​.fr/hal-01182​788. Accessed 15 July 2019. of between efort and information gain. Plant Methods. 2017;13(1):97. 14. Seeland M, Rzanny M, Boho D, Wäldchen J, Mäder P. Image-based classif- 26. Wäldchen J, Mäder P. Flora Incognita–wie künstliche Intelligenz cation of plant genus and family for trained and untrained plant species. die Pfanzenbestimmung revolutioniert. Biologie in unserer Zeit. BMC Bioinf. 2019;20(1):4. 2019;49(2):99–101. 15. Lee SH, Chan CS, Remagnino P. Multi-organ plant classifcation based on 27. Jansen F, Dengler J. Germansl—eine universelle taxonomische referen- convolutional and recurrent neural networks. IEEE Trans Image Process. zliste fuer vegetationsdatenbanken. Tuexenia. 2008;28:239–53. 2018;27(9):4287–301. https​://doi.org/10.1109/TIP.2018.28363​21. 28. Wäldchen J, Mäder P. Machine learning for image based species identif- 16. Lee SH, Chang YL, Chan CS, Alexis J, Bonnet P, Goeau H. Plant clas- cation. Methods Ecol Evol. 2018;9(11):2216. https​://doi.org/10.1111/2041- sifcation based on gated recurrent unit. In: Bellot P, Trabelsi C, Mothe 210X.13075​. J, Murtagh F, Nie JY, Soulier L, SanJuan E, Cappellato L, Ferro N, editors. 29. Szegedy C, Iofe S, Vanhoucke V, Alemi AA. Inception-v4, inception-resnet Experimental IR meets multilinguality, multimodality, and interaction. and the impact of residual connections on learning. In: 31st AAAI confer- Cham: Springer; 2018. p. 169–80. ence on artifcial intelligence; 2017. arXiv​:1602.07261​. 17. Lee SH, Chang YL, Chan CS, Remagnino P. HGO-CNN: hybrid generic- 30. Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, organ convolutional neural network for multi-organ plant classifcation. Song Y, Guadarrama S, Murphy K. Speed/accuracy trade-ofs for modern In: 2017 IEEE international conference on image processing (ICIP); 2017. convolutional object detectors. CoRR; 2016. arXiv​:1611.10012​. pp. 4462–4466. https​://doi.org/10.1109/ICIP.2017.82971​26. 31. Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features 18. He A, Tian X. Multi-organ plant identifcation with multi-column deep in deep neural networks? In: Proceedings of the 27th international convolutional neural networks. In: 2016 IEEE international conference on conference on neural information processing systems, vol. 2. MIT Press, systems, man, and cybernetics (SMC); 2016. pp. 002020–002025. Cambridge, MA, USA; 2014. pp. 3320–3328. 19. Joly A, Bonnet P, Goëau H, Barbe J, Selmi S, Champ J, Dufour-Kowalski S, 32. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpa- Afouard A, Carré J, Molino J-F, Boujemaa N, Barthélémy D. A look inside thy A, Khosla A, Bernstein MS, Berg AC, Li F. Imagenet large scale visual the pl@ntnet experience. Multimed Syst. 2016;22(6):751–66. https​://doi. recognition challenge. CoRR; 2014. arXiv​:1409.0575. org/10.1007/s0053​0-015-0462-9. 33. Tieleman T, Hinton G. Lecture 6.5—RmsProp: divide the gradient by a 20. Nilsback M-E, Zisserman A. Automated fower classifcation over a large running average of its recent magnitude. In: COURSERA: neural networks number of classes. In: Proceedings of the Indian conference on computer for machine learning; 2012. vision, graphics and image processing; 2008. 34. Do T, Nguyen H, Nguyen T, Vu H, Tran T, Le T. Plant identifcation using 21. Lee SH, Chan CS, Wilkin P, Remagnino P. Deep-plant: plant identifca- score-based fusion of multi-organ images. In: 2017 9th International con- tion with convolutional neural networks. In: 2015 IEEE international ference on knowledge and systems engineering (KSE); 2017. pp. 191–6. conference on image processing (ICIP); 2015. pp. 452–456. https​://doi. https​://doi.org/10.1109/KSE.2017.81194​57. org/10.1109/ICIP.2015.73508​39 22. Kumar N, Belhumeur PN, Biswas A, Jacobs DW, Kress WJ, Lopez I, Soares JVB. Leafsnap: a computer vision system for automatic plant species Publisher’s Note identifcation. In: 12th European conference on computer vision (ECCV); Springer Nature remains neutral with regard to jurisdictional claims in pub- 2012. lished maps and institutional afliations. 23. Krause J, Sugita G, Baek K, Lim L. Wtplant (What’s that plant?): a deep learning system for identifying plants in natural images. In: Proceedings of the 2018 ACM on international conference on multimedia retrieval. ICMR ’18. ACM, New York, NY, USA; 2018. pp. 517–520. https​://doi. org/10.1145/32060​25.32060​89.

Ready to submit your research ? Choose BMC and benefit from:

• fast, convenient online submission • thorough peer review by experienced researchers in your field • rapid publication on acceptance • support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations • maximum visibility for your research: over 100M website views per year

At BMC, research is always in progress.

Learn more biomedcentral.com/submissions