Rzanny et al. Methods (2017) 13:97 DOI 10.1186/s13007-017-0245-8 Plant Methods

RESEARCH Open Access Acquiring and preprocessing leaf images for automated plant identifcation: understanding the tradeof between efort and information gain Michael Rzanny1, Marco Seeland2, Jana Wäldchen1* and Patrick Mäder2

Abstract Background: Automated species identifcation is a long term research subject. Contrary to fowers and fruits, leaves are available throughout most of the year. Ofering margin and texture to characterize a species, they are the most studied organ for automated identifcation. Substantially matured machine learning techniques generate the need for more training data (aka leaf images). Researchers as well as enthusiasts miss guidance on how to acquire suitable training images in an efcient way. Methods: In this paper, we systematically study nine image types and three preprocessing strategies. Image types vary in terms of in-situ image recording conditions: perspective, illumination, and background, while the preprocess- ing strategies compare non-preprocessed, cropped, and segmented images to each other. Per image type-preproc- essing combination, we also quantify the manual efort required for their implementation. We extract image features using a convolutional neural network, classify species using the resulting feature vectors and discuss classifcation accuracy in relation to the required efort per combination. Results: The most efective, non-destructive way to record herbaceous leaves is to take an image of the leaf’s top side. We yield the highest classifcation accuracy using destructive back light images, i.e., holding the plucked leaf against the sky for image acquisition. Cropping the image to the leaf’s boundary substantially improves accuracy, while precise segmentation yields similar accuracy at a substantially higher efort. The permanent use or disuse of a fash light has negligible efects. Imaging the typically stronger textured backside of a leaf does not result in higher accuracy, but notably increases the acquisition cost. Conclusions: In conclusion, the way in which leaf images are acquired and preprocessed does have a substantial efect on the accuracy of the classifer trained on them. For the frst time, this study provides a systematic guideline allowing researchers to spend available acquisition resources wisely while yielding the optimal classifcation accuracy. Keywords: Leaf image, Image acquisition, Preprocessing, Segmentation, Cropping, Background, Leaf side, Back light, Efort, CNN, Computer vision

Background component of workfows in plant ecological research. Accurate plant identifcation represents the basis for Species identifcation is essential for studying the bio- all aspects of related research and is an important diversity richness of a region, monitoring populations of endangered species, determining the impact of cli- mate change on species distributions, payment of envi- *Correspondence: jwald@bgc‑jena.mpg.de ronmental services, and weed control actions [1, 2]. 1 Department Biogeochemical Integration, Max-Planck-Institute Accelerating the identifcation process and making it for Biogeochemistry, Hans‑Knöll‑Str. 10, 07745 Jena, Germany Full list of author information is available at the end of the article executable by non-experts is highly desirable, especially

© The Author(s) 2017. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Rzanny et al. Plant Methods (2017) 13:97 Page 2 of 11

when considering the continuous loss of plant biodiver- CNNs and achieve signifcant improvements over meth- sity [3]. ods developed in the decade before [19–22]. Furthermore More than 10 years ago, Gaston and O’Neill [4] pro- it was empirically observed that CNNs trained for a task, posed that developments in artifcial intelligence and e.g., object categorization in general, also achieve excep- digital image processing could make automated species tional results on similar tasks after minor fne-tuning identifcation realistic. Te fast development and ubiq- (transfer learning) [18]. Making this approach usable in uity of relevant information technologies in combination an experimental setting, researchers demonstrated that with the availability of portable devices such as digital using pre-trained CNNs merely for feature extraction cameras and smartphones results in a vast number of from images results in compact and highly discrimina- digital images, which are accumulated in online data- tive representations. In combination with classifers like bases. So today, their vision is nearly tangible: that mobile SVM, these CNN derived features allow for exceptional devices are used to take pictures of specimen in the feld classifcation results especially on smaller datasets as and afterwards to identify their species. investigated in this study [17]. Considerable research in the feld of computer vision Despite all improvements in transfer learning, to suc- and machine learning resulted in a number of studies cessfully train a classifer for species identifcation that propose and compare methods for automated plant requires a large amount of training data. We argue that identifcation [5–8]. Te majority of studies solely utilize the quality of an automated plant identifcation system leaves for identifcation, as they are available for examina- crucially depends not only on the amount, but also on tion throughout most of the year and can easily be col- the quality of the available training data. While funding lected, preserved and photographed, given their planar organizations are willing to support research into this nature. Previous methods utilize handcrafted features for direction and nature enthusiasts are helpful by contrib- quantifying geometric properties of the leaf: boundary uting images, these resources are limited and should and shape as well as texture [9–13]. Extracting such fea- be efciently utilized. In this paper, we explore difer- tures often requires a preprocessing step in order to dis- ent methods of image acquisition and preprocessing to tinguish the leaf from the background of the image, i.e., enhance the quality of leaf images used to train classifers a binary segmentation step. For the ease of accurate and for species identifcation. We ask: (1) How are diferent simple segmentation, most studies use leaf images with a combinations of image acquisition aspects and preproc- uniform, plain background, e.g., by utilizing digital scan- essing strategies characterized in terms of classifcation ners or photographing in a controlled environment [14]. accuracy? (2) How is this classifcation accuracy related Only few studies addressed the problem of segmenting to the manual efort required to capture and preprocess and identifying leaves in front of cluttered natural back- the respective images? grounds [15, 16]. At the same time, machine learning techniques have Methods matured. Especially, deep learning convolutional neural Our research framework consists of a pipeline of four networks (CNNs) have almost revolutionized computer consecutive steps: image acquisition, preprocessing, vision in the recent years. Latest studies in object cat- feature extraction, and training of a classifer as shown egorization demonstrate that CNNs allow for superior in Fig. 1. Te following subsections discuss each step in results compared to state of the art traditional methods detail and especially refer to the variables, image types [17, 18]. Current studies on plant identifcation utilize and preprocessing strategies that we studied in our

Fig. 1 Consecutive steps of our research framework Rzanny et al. Plant Methods (2017) 13:97 Page 3 of 11

experiments. We used state of the art feature extraction unaltered. Cropping was performed based on a bound- and classifer training methods and kept them constant ing box enclosing the leaf (see Fig. 2). To facilitate an for all experiments. efcient segmentation, we developed a semi-automated approach based on the GrabCut method [24]. GrabCut is Image acquisition based on iterated graph cuts, and was considered accu- For each observation of an individual leaf, we system- rate and time-efective for interactive image segmenta- atically varied the following image factors: perspective, tion [25, 26]. Te frst iteration of GrabCut was initialized illumination, and background. An example of all images by a rectangle placed at the relevant image region, the collected for a single observation is shown in Fig. 2. We focus area defned during image acquisition and available captured two perspectives per leaf in-situ and in a non- in an image’s EXIF data. Tis rectangle should denote the destructive way: the top side and the back side, since potential foreground whereas the image corners were leaf structure and texture typically substantially difer used as background seeds. Te user was then allowed to between these perspectives. If necessary, we used a thin iteratively refne the computed mask by adding markers black wire to arrange the leaf accordingly. We recorded denoting either foreground or background, if necessary. each leaf under two illumination conditions using a Te total amount of markers was logged for every image. smartphone: fash of and fash on. Flash of refers to a To speed up the segmentation process, every image was natural illumination, without artifcial light sources. In resized to a maximum of 400 px at the longest side while case of bright sunlight, we used an umbrella to shade the maintaining the aspect ratio. Finally, the binary mask leaf against strong refections and harsh shadows emerg- depicting only the area of the leaf was resized to the orig- ing from the device, the plant itself, or the surrounding inal image size. Te boundary of the upsized mask was vegetation. Flash on was used for a second image, taken then smoothed using a colored watershed variant after in the same manner, but with the built-in fashlight acti- morphological erosion of the foreground and background vated. We also varied the background by recording an labels, followed by automated cropping to that mask. initial image in the leaf’s environment composed of other leaves and stones, termed natural background. Addition- Quantifying manual efort ally, we utilized a white sheet of plastic to record images Image acquisition and preprocessing require substantial with plain background. Leaves were not plucked for this manual efort depending on the image type and preproc- procedure but arranged onto the sheet using a hole in the essing strategy. We aim to quantify the efort for each sheet’s center. Eventually, the leaf was picked and held up combination in order to facilitate a systematic evaluation against the sky using a black plastic sheet as background and a discussion of their resulting classifcation accuracy to prevent image overexposure. Tis additional image in relation to the necessary efort. type is referred to as back light. In summary, we captured For a set of ten representative observations, we meas- nine diferent image types per observation. ured the time in seconds and the amount of persons All images were recorded with the use of an iPhone 6, needed for the acquisition of each image. Tis was done between April and September 2016, throughout a single for all combinations of the image factors perspective vegetation season. Following a strict sampling protocol and background. Whereas a single photographer is suf- for each observation, we recorded images for 17 species fcient to acquire images in front of natural background, representing typical, wild-fowering that com- a second person is needed for taking images with plain monly occur on semi-arid grasslands scattered around background and for the back light images in order to the city of Jena, located in eastern Germany. At the time arrange the leaf and the plastic sheet. We then quantifed of image acquisition, every individual was fowering. Te the efort of image acquisition for these combinations closest focusing distance represented a technical limit for by means of average ’person-seconds’ by multiplying the the resolution of smaller leaves while ensuring to capture time in seconds with the amount of persons. the entire leaf on the image. Te number of observations In order to quantify the manual efort during pre- per species ranged from eleven (Salvia pratensis) to 25 processing, we measured the time in seconds an experi- ( ). In total, we acquired 2902 images. enced user requires for performing either cropping or Te full dataset including all image annotations is freely segmentation on a set of 50 representative images. For available from [23]. each task, the timer was started the moment the image was presented to the user and was stopped when the Image preprocessing user confrmed the result of his task. For cropping, the Each leaf image was duplicated twice to execute the three time needed for drawing a bounding box around the preprocessing strategies: non-preprocessed, cropped, leaf. Tis required 6.8 s on average independently from and segmented. Non-preprocessed images were kept the image conditions. Image segmentation on the other Rzanny et al. Plant Methods (2017) 13:97 Page 4 of 11

Fig. 2 Leaf image set belonging to one observation of Aster amellus depicting all nine image types and the preprocessing strategies explored in this study Rzanny et al. Plant Methods (2017) 13:97 Page 5 of 11

hand involved substantial manual work depending on Large Scale Visual Recognition Challenge in 2015 the leaf type, e.g., compound or pinnate leaves, and the [27], for extracting compact but highly discrimina- image background. In case of natural background, often tive image features. Every image was bilinearly resized multiple markers were required. We measured the aver- to ft 256 px at the shortest side and then a center crop × age time for setting one marker, amounting to 4.7 s, fol- of 224 224 px was forwarded through the network lowed by multiplying this average time with the amount using the Cafe deep learning framework [28]. Te out- of markers needed for segmenting each image. In case put of the last convolutional layer (fc5) was extracted of plain background and simple leaves, the automati- as 2048 dimensional image feature vector, followed by cally initialized frst iteration of the segmentation process L2-normalization. often delivered accurate results. In such cases, the man- ual efort was taken only to confrm the segmentation Image classifcation result, which took about 2 s. For compound and pinnate We used the CNN image features discussed in the pre- leaves, e.g., of Centaurea scabiosa, the segmentation task vious section to train linear Support Vector Machine was considerably more difcult and required 135 s on (SVM) classifers. Each combination of the nine image average per image with natural background. types and the three preprocessing strategies resulted Te mean efort measured in “person-seconds” for all in one dataset creating 27 in total. Tese datasets were combinations of image types and preprocessing steps is split into training (70% of the images) and test sets (30% displayed in Fig. 3. We defne a baseline scenario for com- of the images). In order to run comparable experiments, paring the resulting classifcation accuracy in relation we enforced identical observations across all 27 datasets, to the necessary efort for each combination: With an i.e., for all combinations of image types and preprocess- empirically derived average time of 13.4 s, the minimum ing strategies, the test and train sets were composed of manual efort is in acquiring a top side leaf image with the same individuals. Using the trained SVM, we clas- natural background and no preprocessing steps. sifed the species for all images of each test dataset and calculated the classifcation accuracy as percentage of Feature extraction correctly identifed species. All experiments were cross- Using CNNs for feature extraction results in power- validated using 100 randomized split confgurations. ful image representations that, coupled with a Support Similarly, we quantifed the species specifc accuracy as Vector Machine as classifer, outperform handcrafted percentage of correctly identifed individuals per species. features in computer vision tasks [17]. Accordingly, We used R version 3.1.1 [29] with the packages e1071 we used the pre-trained ResNet-50 CNN, that ranked [30] for classifer training along with caret [31] for tuning among the best performing networks in the ImageNet and evaluation.

Natural background Plain background Leaf top side Leaf top side Back light Leaf back side Leaf back side

Non-preprocessed Cropped Segmented 100

75

50

25

0 Fig. 3 Mean manual efort per image, quantifed by means of ’person-seconds’ for the fve diferent image types and three preprocessing strategies Rzanny et al. Plant Methods (2017) 13:97 Page 6 of 11

≈ Results 70% for Prunella grandifora. Especially for these low- Figure 4 displays the mean species classifcation accuracy performing species, back light images yield a consider- > 90% separated for the nine image types and aggregated for the ably higher accuracy, e.g., for Prunella grandifora. three preprocessing strategies. Te highest classifcation Furthermore, Fig. 5 shows that: (1) back light images ± accuracy (91 3)% was achieved on cropped back light yield a higher and more homogenous classifcation accu- images, while the lowest accuracy was obtained for non- racy across the diferent species; (2) preprocessing by preprocessed backside images with natural background cropping or segmentation increases accuracy; and (3) ± and without fash (55 4)%. Across all three preproc- the images with plain background achieve higher accu- essing strategies, back light images achieved the highest racies if cropped, while the images with natural back- classifcation accuracy. Preprocessed images, i.e.cropped ground obtain higher accuracies when segmented prior and segmented images, yielded higher accuracy than to classifcation. non-preprocessed images. Images of leaves in front of In order to investigate species dependent misclassifca- natural background beneft most from cropping and seg- tions in more detail, we computed the confusion matri- mentation as they indicate the highest relative increase ces shown in Fig. 6. Tis fgure illustrates the instances among the three preprocessing strategies. Starting with of an observed species in rows versus the instances of an already high accuracy, its relative gain between non- the same species being predicted in columns, averaged preprocessed and preprocessed back light images is across all image conditions. We present one matrix per smaller than that of natural and plain background images. preprocessing strategy, visualizing how a certain spe- Taking images with or without fash light seems not to cies was confused with others, if its accuracy was below 100% afect classifcation accuracy in a consistent manner. . Tis is, for example, the case for Anemone sylvestris Figure 5 shows classifcation results not only separated versus Fragaria viridis and Prunella grandifora versus per image type and preprocessing strategy, but addition- Origanum vulgare. Evidently, cropping and segmenta- ally per species within the dataset. Te results show that tion, notably decrease the sparse tendency towards false classifcation accuracy depends on the classifed species. classifcation. Prunella grandifora achieved the low- While cropping and segmentation notably increase clas- est classifcation accuracy across all species (Fig. 6) and sifcation accuracy for some species, e.g., Fragaria vir- was often misclassifed as Origanum vulgare, a species idis and Centaurea scabiosa, other species remain at a with similarly shaped leaves. Some species, such as Aster low classifcation accuracy despite of preprocessing, e.g., amellus or Origanum vulgare are more or less invariant

Natural background Plain background Leaf top side Leaf top side Back light Leaf back side Leaf back side

Non-preprocessed Cropped Segmented 100

80 %

60

40 Mean accuracy in 20

0 of f on of f on of f on of f on of f on of f on of f on of f on of f on of f on of f on of f on of f of f of f Flash Flash Flash

Illumination condition Fig. 4 Classifcation results for diferent image types and preprocessing strategies, averaged across all species. Each bar displays the mean accuracy averaged across 100 randomized split confgurations and the error bars display the associated standard deviation Rzanny et al. Plant Methods (2017) 13:97 Page 7 of 11

to subsequent preprocessing. Other species, such as Fra- higher accuracies primarily contain images of tree leaves garia viridis, Salvia pratensis, or Ononis repens show that are comparably larger and can be well separated by very diferent results depending on whether or not the shape and contour [6]. In addition, the leaves in our data- images were preprocessed. set were still attached to the stem upon imaging (except Tese diferences seem to be strongly species specifc, for the backlight images) and could not be perfectly even perspective specifc without any discernible pattern. arranged as it is possible for scanned, high resolution As an example, the classifcation accuracy of images in other datasets. minor is greatly improved by cropping for the backsides We studied nine image types varying the factors: per- in front of natural background, while segmenting of these spective (top side, back side, and back light), illumination images does not further increase the classifcation accu- (fash on, fash of), and background (natural and plain). racy (Fig. 5). In contrast, the accuracy of the topsides, Studying results for the diferent perspectives, we found also recorded in front of natural background is only that back light images consistently allowed for the highest slightly improved by cropping, while subsequent segmen- classifcation accuracy across species and preprocessing tation clearly improves the result. Tis species is a plant strategies; followed by top side images that yield in most with pinnate compound leaves and the leafets are often combinations higher accuracy than back side images, folded inwards along the midrib. especially for the non-processed ones. Taking back sides images of a leaf, still attached to the stem, requires to Discussion bend it upwards while forcing it into an unnatural posi- We found the classifcation accuracy to difer substan- tion. Tis results in variation across the images with tially among the studied image types, the applied pre- respect to exact position, angle, focal plane and perspec- processing strategy, and the studied species. While tive, each hardly to control under feld conditions. We species specifc efects exist (see Figs. 5, 6), they were not studied illumination by enforcing a specifc fash setting the focus of our study and are not changing the general on purpose and found that it did not afect classifcation conclusions drawn from these experiments. Te overall results in a consistent way. Hence, we conclude that auto- achieved classifcation accuracy is rather low when com- matic fash settings depending on the overall illumination pared to other studies classifying leaf image datasets. In of the scene may be used without negative impact on the contrast to most other studies, our dataset comprises classifcation result. However, in contrast to back light smartphone images of leaves from herbaceous plants. images, the diferent illumination conditions cause strong Here, small and varying types of leaves occur on one and undesired variations in the image quality. For exam- the same species. Many of the other datasets achieving ple, a disabled fash results in images with small dynamic

Non-preprocessed Cropped Segmented

Back Back Back Natural Plain Natural Plain Natural Plain light light light TopBack Top Back Top TopBack Top Back TopTop Back Top Back Top Teucrium chamaedrys Stachys recta Sanguisorba minor 100 Salvia pratensis Prunella grandiflora Pimpinella saxifraga 90 Origanum vulgare Ononis repens Knautia arvensis 80 Hypericum perforatum Species Hippocrepis comosa Accuracy in % Fragaria viridis 70 Centaurea scabiosa Bupleurum falcatum Aster amellus 60 Anemone sylvestris Ajuga genevensis off off off off off off off on on on off on off off on on on off on off off off off on on on on Flash Flash Flash Illumination condition Fig. 5 Mean classifcation accuracy averaged across 100 dataset splits. The accuracy for each combination of species and image type is color coded and aggregated per preprocessing strategy Rzanny et al. Plant Methods (2017) 13:97 Page 8 of 11

Non-preprocessed Cropped Segmented

Ajuga genevensis

Anemone sylvestris 100 Aster amellus Bupleurum falcatum Centaurea scabiosa Fragaria viridis 75 Hippocrepis comosa Hypericum perforatum Knautia arvensis 50 Ononis repens Origanum vulgare Accuracy in % Pimpinella saxifraga

Prunella grandiflora 25 Salvia pratensis Sanguisorba minor Stachys recta Teucrium chamaedrys 0

AjugaAnemone genevensisAsterBupleurum amellusylvestrisCentaureaFragaria falcatumHippocrepis scabiosa HypericumviridiKnautias comosaOnonis perforatu arvensiOriganum repensPimpinellas Prunella vulgarSalvia saxifrage Sanguisorbagrandiflora pratensisStachysTeucrium rmino rchamaedrysAjugaAnemone genevensiAsterBupleurum amellusylvestrissCentaureasFragaria falcatumHippocrepis scabiosa HypericumviridiKnautias comosaOnonis perforatu arvensiOriganum repensPimpinellas Prunella vulgarSalvia saxifrage Sanguisorbagrandiflor pratensiStachysa Teucriuma rectminoa rchamaedrysAjugaAnemone genevensAsterBupleu amsylvCenFragariaHippocrepis Hypericumvir Knautia comosaOnonis perforatu arvensisOriganum repensPimpinellaPrunella vulgarSalvia saxifrage Sanguisorbagrandiflor pratensiStachysa Teucriumas rectminoa rchamaedry taurea scabiosa rum falcatum ecta e llus idis s estris s is a m m m s Fig. 6 Confusion matrices presenting classifcation accuracy per preprocessing strategy. Observed species (rows) versus predicted species (col- umns) are averaged across the diferent image preprocessing steps

range while an enabled fash afects the coloring and cre- We also studied three preprocessing strategies per ates specularities that mask leaf venation. We also varied image (non-preprocessed, cropped, and segmented). We the background by imaging leaves with plain—as well found that preprocessing consistently increased classif- as natural background, of which the latter allowed for cation accuracy for all species and image types compared higher classifcation accuracy compared to plain back- to the original non-preprocessed images. For those, the ground images. We found that imaging leaves in front of classifer is trained on a lot of potentially misleading a plain background strongly afects the dynamic range of background information. Removing vast parts of this the leaves’ colors. Leaves in front of a plain background background, through either cropping or segmentation, typically appear darkened and with an overall reduced improved classifcation accuracy in all cases. In general, contrast (cp. Figs. 2, 7). classifcation accuracy is substantially increased upon We conclude from the analysis of image types that back cropping, since the CNN features encode more infor- light images contain the largest amount of visual infor- mation about the leaves, not the background. We found mation and the least amount of clutter (cp. Fig. 7). Tis only slight increases in accuracy for images with natural image type facilitates: (1) sharply imaged leaf bounda- background for segmented compared to cropped images. ries, especially in comparison to images with natural For images with plain background, the accuracy even background; (2) homogeneously colored and illuminated decreases upon segmentation. Tis is induced since parts leaves with high dynamic range making even slight of compound leaves were accidentally removed dur- details in the venation pattern and texture visible; (3) a ing the segmentation especially in cases of delicate leaf- lighting geometry that suppresses specularities by design; lets. Tis is true for species such as Sanguisorba minor, and (4) images that can be automatically cropped and Hippocrepis comosa, and Pimpinella saxifraga, where segmented. the segmented images with plain background contained

Fig. 7 Detailed view of the leaf margins of the top side of Aster amellus. The images are the same as in the example shown in Fig. 2 Rzanny et al. Plant Methods (2017) 13:97 Page 9 of 11

less visual information of the leaves compared to images and the preprocessing of an image. Terefore we defned with natural background. Against our expectation, the the accuracy-efort gain segmentation of images with natural background results ai − ab eb Gi = in a very minor benefcial efect on species recognition ab ei − eb (1) (Fig. 4). Tis conclusion, however, holds only for fea- ture classifcation based on CNNs, as they are superior relating obtained accuracy to the manual efort. In Eq. 1, a in handling background information, when compared i represents the achieved classifcation accuracy using to handcrafted features. Te images with natural back- the ith combination of image type and preprocessing e ground may contain more species related information strategy and i is the manual efort necessary to create an a e such as leaf attachment to the stem, and the stem itself. image of this combination. b and b correspond to accu- Te CNN features possibly encode the relative size of racy and efort of the baseline scenario, i.e., imaging the the object by comparing it with its background. Also, the leaf top sides in front of a natural background and apply- common perspective in which leaf images of certain spe- ing no further preprocessing. cies are acquired is taken into account from the leaves’ Figure 8 indicates that the solution with optimal accu- natural surroundings [32]. racy-efort gain is to take a top side image of a leaf with Teoretically, it is possible to combine multiple images natural background and to crop it with a simple bound- from the same observation in the recognition process, ing box. Comparing the manual efort during image which could further improve the accuracy, as shown for a acquisition and no preprocessing, only back light images diferent dataset by [33]. We expect that the combination yield a positive efect on the accuracy-efort gain. Any of diferent image types would also beneft the classifca- other efort during image acquisition, e.g., by imaging the tion accuracy within our dataset. Te scope of this work, back side of a leaf or using a plain background, does not however, is to reveal the most efective way of dataset sufciently improve the classifcation accuracy over the acquisition. baseline. Comparing preprocessing strategies, the highest Our discussion so far compared image types and pre- positive impact on the accuracy-efort gain is achieved by processing strategies solely based on classifcation accu- cropping. Realized by drawing a bounding box around racy. However, various combinations of methods difer the object of interest, cropping is a comparably sim- only marginally from each other and it is reasonable to ple task and required only 6.8 s per image on average in also consider the efort related to the acquisition process our experiments. It notably improved the classifcation

Natural background Plain background Leaf top side Leaf top side Back light Leaf back side Leaf back side

Non-preprocessedCropped Segmented 0.4

0.0

Accuracy-effort gain -0.4

-0.8 of f of f on of f on of f on of f on of f of f on of f on of f on of f on of f on of f on of f on of f on of f Flash Flash Flash

Illumination condition Fig. 8 Accuracy-efort gain for the fve image types and three preprocessing strategies relative to the baseline combination, i.e., non-preprocessed leaf top side images with natural background Rzanny et al. Plant Methods (2017) 13:97 Page 10 of 11

accuracy for all image types (cp. Fig. 4). Hence, we con- Received: 2 August 2017 Accepted: 25 October 2017 sider cropping the most efective type of manual efort.

Conclusion While high accuracy is clearly the foremost aim in clas- References sifcation approaches, acquiring sufciently large train- 1. Farnsworth EJ, Chu M, Kress WJ, Neill AK, Best JH, Pickering J, Stevenson RD, Courtney GW, VanDyk JK, Ellison AM. Next-generation feld guides. ing datasets is a substantial investment. We argue that BioScience. 2013;63(11):891–9. researchers should consider carefully how to spend avail- 2. Austen GE, Bindemann M, Grifths RA, Roberts DL. Species identifcation able resources. Summarizing our fndings in the light of by experts and non-experts: comparing images from feld guides. Sci Rep. 2016;6:33634. human efort during image acquisition and further pro- 3. Ceballos G, Ehrlich PR, Barnosky AD, García A, Pringle RM, Palmer TM. cessing, we found that it is very useful to crop images Accelerated modern human-induced species losses: entering the sixth but not to segment them. Image segmentation is a dif- mass extinction. Sci Adv. 2015;1(5):1400253. 4. Gaston KJ, O’Neill MA. Automated species identifcation: why not? Philos fcult and time-consuming task in particular for images Trans R Soc Lond B Biol Sci. 2004;359(1444):655–67. with natural background. Our results show that within 5. Joly A, Bonnet P, Goëau H, Barbe J, Selmi S, Champ J, Dufour-Kowalski S, the used framework, this expensive step can be replaced Afouard A, Carré J, Molino J-F, et al. A look inside the pl@ ntnet experi- ence. Multimed Syst. 2016;22(6):751–66. by the much simpler but similarly efective cropping. 6. Wäldchen J, Mäder P. Plant species identifcation using computer vision Against our expectation, we also found no evidence that techniques: a systematic literature review. Arch Comput Methods Eng. imaging leaves on plain background yields higher clas- 2017;1–37. https://doi.org/10.1007/s11831-016-9206-z. 7. Cope JS, Corney D, Clark JY, Remagnino P, Wilkin P. Plant species sifcation accuracy. We considered a leafs back side to identifcation using digital morphometrics: a review. Expert Syst Appl. be more discriminative than its top side. However, our 2012;39(8):7562–73. results suggest that back side images do not yield higher 8. Seeland M, Rzanny M, Alaqraa N, Wäldchen J, Mäder P. Plant species classifcation using fower images—a comparative study of local feature accuracy but rather require considerably more human representations. PLoS ONE. 2017;12(2):1–29. https://doi.org/10.1371/jour- efort due to a much more challenging acquisition pro- nal.pone.0170629. cess. In conclusion, the most efective, non-destructive 9. Belhumeur PN, Chen D, Feiner SK, Jacobs DW, Kress WJ, Ling H, Lopez I, Ramamoorthi R, Sheorey S, White S, Zhang L. Searching the world’s way to record herbaceous leaves in the feld is taking leaf herbaria: a system for visual identifcation of plant species. In: Forsyth D, top side images and cropping them to leaf boundaries. Torr P, Zisserman A, editors. Computer vision–ECCV 2008. Lecture notes in When destructive acquisition is permissible, the back computer science, vol 5305. Berlin: Springer; 2008. p. 116–29. 10. Caballero C, Aranda MC. Plant species identifcation using leaf image light perspective after plucking the leaf yields the best retrieval. In: Proceedings of the ACM international conference on image overall result in terms of recognition accuracy. and video retrieval. CIVR ’10, ACM, New York, NY, USA; 2010. p. 327–334. 11. Kumar N, Belhumeur P, Biswas A, Jacobs D, Kress W, Lopez I, Soares J. Leaf- Authors’ contributions snap: a computer vision system for automatic plant species identifcation. Funding acquisition: PM, JW; experiment design: MR, MS, JW and PM; image In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C, editors. Computer acquisition: MR; data analysis: MR and MS; data visualisation: MR, MS, JW, PM; vision–ECCV 2012. Lecture notes in computer science, vol 7573. Berlin: writing manuscript: MR, MS, JW, PM. All authors read and approved the fnal Springer; 2008. p. 502–16. manuscript. 12. Mouine S, Yahiaoui I, Verroust-Blondet A. A shape-based approach for leaf classifcation using multiscaletriangular representation. In: Proceedings Author details of the ACM international conference on multimedia retrieval. ICMR ’13, 1 Department Biogeochemical Integration, Max-Planck-Institute for Biogeo- ACM, New York, NY, USA; 2013. p. 127–34. 2 chemistry, Hans‑Knöll‑Str. 10, 07745 Jena, Germany. Institute for Computer 13. Mzoughi O, Yahiaoui I, Boujemaa N, Zagrouba E. Automated semantic leaf and Systems Engineering, Technische Universität Ilmenau, Helmholtzplatz 5, image categorization by geometric analysis. In: IEEE international confer- 98693 Ilmenau, Germany. ence on multimedia and expo (ICME) 2013; 2013. p. 1–6. 14. Soares JB, Jacobs DW. Efcient segmentation of leaves in semi-controlled Acknowlegements conditions. Mach Vis Appl. 2013;24(8):1623–43. We are funded by the German Ministry of Education and Research (BMBF) 15. Cerutti G, Tougne L, Mille J, Vacavant A, Coquin D. Understanding leaves Grants: 01LC1319A and 01LC1319B; the German Federal Ministry for the in natural images—a model-based approach for tree species identifca- Environment, Nature Conservation, Building and Nuclear Safety (BMUB) Grant: tion. Comput Vis Image Underst. 2013;117(10):1482–501. 3514 685C19; and the Stiftung Naturschutz Thüringen (SNT) Grant: SNT-082- 16. Grand-Brochier M, Vacavant A, Cerutti G, Bianchi K, Tougne L. Com- 248-03/2014. We would like to acknowledge the support of NVIDIA Corpora- parative study of segmentation methods for tree leaves extraction. In: tion with the donation of a TitanX GPU used for this research. We thank Anke Proceedings of the international workshop on video and image ground Bebber for carefully editing, which substantially improved our manuscript. truth in computer vision applications. VIGTA ’13, ACM, New York, NY, USA; We further thank Sandra Schau, Lars Lenkardt and Alice Deggelmann for 2013. p. 7–177. assistance during the feld recordings. 17. Razavian SA, Azizpour H, Sullivan J, Carlsson S. CNN features of-the- shelf: an astounding baseline for recognition. 2014. ArXiv e-prints: Competing interests arxiv:1403.6382. The authors declare that they have no competing interests. 18. Chatfeld K, Simonyan K, Vedaldi A, Zisserman A. Return of the devil in the details: delving deep into convolutional nets; 2014. ArXiv e-prints: Publisher’s Note arxiv:1405.3531. Springer Nature remains neutral with regard to jurisdictional claims in pub- 19. Choi S. Plant identifcation with deep convolutional neural network: lished maps and institutional afliations. Snumedinfo at lifeclef plant identifcation task 2015. In: CLEF (Working Notes); 2015. Rzanny et al. Plant Methods (2017) 13:97 Page 11 of 11

20. Champ J, Lorieul T, Servajean M, Joly A. A comparative study of fne- 28. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama grained classifcation methods in the context of the LifeCLEF plant iden- S, Darrell T. Cafe: convolutional architecture for fast feature embedding; tifcation challenge 2015. In: CEUR-WS, editor. CLEF: Conference and Labs 2014. arXiv preprint arXiv:1408.5093 of the Evaluation Forum. CLEF2015 working notes, vol 1391. Toulouse, 29. Team RC. R: a language and environment for statistical computing. R France; 2015. https://hal.inria.fr/hal-01182788 Foundation for Statistical Computing, Vienna, Austria; 2016. R Foundation 21. Reyes AK, Caicedo JC, Camargo JE. Fine-tuning deep convolutional for Statistical Computing. https://www.R-project.org/ networks for plant recognition. In: CLEF (Working Notes); 2015 30. Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F. E1071: Misc 22. Barré P, Stöver BC, Müller KF, Steinhage V. Leafnet: a computer vision sys- Functions of the Department of Statistics, Probability Theory Group (For- tem for automatic plant species identifcation. Ecol Inform. 2017;40:50–6. merly: E1071), TU Wien; 2017. R package version 1.6-8. https://CRAN.R- https://doi.org/10.1016/j.ecoinf.2017.05.005. project.org/package e1071 23. Rzanny M, Seeland M, Alaqraa N, Wäldchen J, Mäder P. Jena Leaf 31. from Jed Wing MKC, =Weston S, Williams A, Keefer C, Engelhardt A, Cooper Images 17 Dataset. Harvard Dataverse. 2017. https://doi.org/10.7910/ T, Mayer Z, Kenkel B, the R Core Team, Benesty M, Lescarbeau R, Ziem A, DVN/8BP9L2. Scrucca L, Tang Y, Candan C, Hunt, T. Caret: classifcation and regression 24. Rother C, Kolmogorov V, Blake A. ”Grabcut”: interactive foreground extrac- training; 2016. R package version 6.0-73. https://CRAN.R-project.org/ tion using iterated graph cuts. ACM Trans Graph. 2004;23(3):309–14. package caret https://doi.org/10.1145/1015706.1015720. 32. Seeland =M, Rzanny M, Alaqraa N, Thuille A, Boho D, Wäldchen J, Mäder P. 25. McGuinness K, Connor NEO. A comparative evaluation of interactive Description of fower colors for image based plant species classifcation. segmentation algorithms. Pattern Recogn. 2010;43(2):434–44. https://doi. In: Proceedings of the 22nd German Color Workshop (FWS). Zentrum für org/10.1016/j.patcog.2009.03.008 Interactive Imaging and Vision. Bild- und Signalverarbeitung e.V, Ilmenau, Germany; 2016. p. 145–1154 26. Peng B, Zhang L, Zhang D. A survey of graph theoretical approaches to 33. Goëau H, Joly A, Bonnet P, Selmi S, Molino J-F, Barthélémy D, Boujemaa image segmentation. Pattern Recogn. 2013;46(3):1020–38. https://doi. N. Lifeclef plant identifcation task 2014. In: Working Notes for CLEF 2014 org/10.1016/j.patcog.2012.09.015. Conference, Shefeld, UK, September 15–18; 2014, CEUR-WS. p. 598–615 27. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), p. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

Submit your next manuscript to BioMed Central and we will help you at every step: • We accept pre-submission inquiries • Our selector tool helps you to find the most relevant journal • We provide round the clock customer support • Convenient online submission • Thorough peer review • Inclusion in PubMed and all major indexing services • Maximum visibility for your research

Submit your manuscript at www.biomedcentral.com/submit