Human Experts Vs. Machines in Taxa Recognition

1 Human experts vs. machines in taxa recognition Johanna Arje¨ ∗z∗∗, Jenni Raitoharju∗, Alexandros Iosifidisk, Ville Tirronenx, Kristian Meissnery, Moncef Gabbouj∗, Serkan Kiranyaz{, Salme Karkk¨ ainen¨ z zDepartment of Mathematics and Statistics, University of Jyvaskyl¨ a,¨ Jyvaskyl¨ a,¨ Finland ∗Department of Computing Sciences, Tampere University, Tampere, Finland kDepartment of Engineering, Aarhus University, Aarhus, Denmark {Department of Electrical Engineering, Qatar University, Doha, Qatar xFaculty of Information Technology, University of Jyvaskyl¨ a,¨ Jyvaskyl¨ a,¨ Finland yProgramme for Environmental Information, Finnish Environment Institute, Jyvaskyl¨ a,¨ Finland ∗∗Corresponding author: johanna.arje@jyu.fi, Department of Mathematics and Statistics, P.O. Box 35 (MaD), FI-40014 University of Jyvaskyla Abstract—The step of expert taxa recognition currently slows down the response time of many bioassessments. Shifting Sample to quicker and cheaper state-of-the-art machine learning approaches is still met with expert scepticism towards the ability and logic of machines. In our study, we investigate both the differences in accuracy and in the identification logic of taxo- Identification nomic experts and machines. We propose a systematic approach utilizing deep Convolutional Neural Nets with the transfer learning paradigm and extensively evaluate it over a multi-pose taxonomic dataset with hierarchical labels specifically created Indices for this comparison. We also study the prediction accuracy on different ranks of taxonomic hierarchy in detail. Our results revealed that human experts using actual specimens yield the lowest classification error (CE = 6:1%). However, a much Ecological assessment faster, automated approach using deep Convolutional Neural Nets comes close to human accuracy (CE = 11:4%). Contrary to previous findings in the literature, we find that for machines Figure 1: A schematic of the biomonitoring process. following a typical flat classification approach commonly used in machine learning performs better than forcing machines to adopt a hierarchical, local per parent node approach used by human While a growing body of work has used different genetic taxonomic experts. Finally, we publicly share our unique dataset tools (e.g. Elbrecht et al., 2017; Zimmermann et al., 2015) for to serve as a public benchmark dataset in this field. species identification, these methods are not yet standardized Keywords: hierarchical classification; taxonomy; convolutional neural networks; taxonomic expert; multi-image data; biomoni- or capable of producing reliable abundance data currently toring required in, e.g., Water framework directive. While we have also worked on genetic approaches and acknowledge the great promise that genetic taxa identification methods hold I. INTRODUCTION (e.g. Hering et al., 2018), we will not explore them here but alternatively examine the suitability of machine learning arXiv:1708.06899v4 [stat.ML] 17 May 2019 Due to its inherent slowness, traditional manual identifi- techniques on image data for routine taxa identification. cation has long been a bottleneck in bioassessments (Fig. Many studies on automatic classification of biological image 1). The growing demand for biological monitoring and the data have been published during the past decade. Yousef declining funding and number of taxonomic experts is forcing Kalafi et al. (2018) have done an extensive review on auto- ecologists to search for alternatives for the cost-intensive and matic species identification and automated imaging systems. time consuming manual identification of monitoring samples Classification methods for aquatic macroinvertebrates have (Borja and Elliott, 2013; Nygard˚ et al., 2016). Identification of been proposed in several studies (e.g. Culverhouse et al., taxonomic groups in biomonitoring of, e.g., aquatic environ- 2006; Lytle et al., 2010; Kiranyaz et al., 2011; Arje¨ et al., ments often involves a large number of samples, specimens in 2013; Joutsijoki et al., 2014; Raitoharju et al., 2018). The a sample, and the number of taxonomic groups to identify. For most popular classification methods used for identification example, even in relatively species-poor regions like Finland, of biological image data, such as insects, are deep neural the calculation of Water framework directive related indices networks and support vector machines (Kho et al., 2017). often involves hundreds of individual specimens from 118-349 Despite the potential of computational, as well as DNA lotic diatom taxa and 44-113 lotic benthic macroinvertebrate methods for taxa identification, some taxonomists continue to taxa (Arje¨ et al., 2016). object the shift from manual to novel identification methods 2 (Kelly et al., 2015; Leese et al., 2018). Often biologists of hierarchy and multi-label structure improved classification that take a cursory look at automated identification tend to results when compared to single-label cases. Babbar et al. mistrust computational methods because they observe that (2016) performed a theoretical study on the difference between a classifier is unable to separate two specimens which to flat and hierarchical classification and found that for well- them are clearly different to the human eye. Similarly, experts balanced data flat classifiers should be preferred, whereas are baffled when the same classifier is able to discriminate hierarchical classifiers are a better for unbalanced data. between two specimens from low-resolution images while they Automatic classification of benthic macroinvertebrates, as as taxonomic experts cannot. This mismatch in the ability well as plankton, has received increasing attention in recent of computers to identify taxa observed for single cases is years. However, most of the previous studies have focused on often mistakenly extrapolated into an overall unreliability of single-image data (see e.g. Arje¨ et al., 2010; Kiranyaz et al., algorithms. But how different truly is both the logic used and 2011; Arje¨ et al., 2013; Joutsijoki et al., 2014; Uusitalo et al., the overall accuracy of taxonomic experts and algorithms? 2016; Lee et al., 2016; Arje¨ et al., 2017) and have not taken Only few studies assess the accuracy of human experts the inherent hierarchical structure of the data into account. In and automatic classifiers, and their consequences on aquatic single-image data studies, the posture of the specimens can biomonitoring. In a study on human accuracy, Haase et al. have substantial impact on the classification. Besides Lytle (2010) reported on the audit of macroinvertebrate samples et al. (2010), an imaging system producing multiple-image from an EU Water Framework Directive monitoring program. data is presented in Raitoharju et al. (2018). In this paper, They found a great discrepancy between the experts deter- we present a comparison of taxonomic experts and automatic mining the true taxonomic classes and the audited laboratory classification methods on a benthic macroinvertebrate data that workers. Contrastingly, in a study on the effect of mistakes incorporates information on the taxonomic resolution. We test made in automated taxa identification on biological indices, flat classifiers, local per level classifiers, and hierarchical top- Arje¨ et al. (2017) found a relatively small impact. Literature down classification, i.e., local classification per parent node, on direct human versus machine comparisons in classification and perform the automatic classification using convolutional tasks in an aquatic biomonitoring context is equally scant neural networks (CNNs) and support vector machines (SVMs). and ambiguous. Culverhouse et al. (2003) compared human The results are compared with the results of a proficiency test and machine identification of six phytoplankton species using organized for human taxonomic experts and with a test where images and noted a similar average performance for both the taxonomic experts used the same images as the automatic experts and a computer algorithm. In Lytle et al. (2010), au- classifiers. The comparisons evaluate traditional single level tomatic classifiers outperformed 26 humans (a mix of experts accuracy and additionally use a novel variant of an accuracy and amateurs) when distinguishing between two stonefly taxa. measure that accounts for the hierarchical structure of the data. Given these contrasting results, we feel it is necessary to simultaneously examine the effect of taxonomic hierarchy and II. THEORY of using human logical pathways for human and computer- based identification. A. Hierarchy in classification Taxonomic experts identify specimens based on a predefined Silla and Freitas (2011) unified the concepts of methods taxonomic resolution while automatic classifiers operate on used in hierarchical classification problems, and in this section the information of taxonomic rank used in the training data. we follow their terminology. There are different ways for accounting for data hierarchy, Human experts base visual identification of, e.g., inverte- such as taxonomy, in classification. Hierarchical classification brate taxa on rules defined in the International commission on is widely investigated in the current literature. Silla and Freitas zoological nomenclature (1999). Therefore, human experts can (2011) sought to describe and unify the concepts of methods be thought of as hierarchical, local per parent node classifiers used in hierarchical classification problems from different (see Fig. 2c) that first identify the order of the specimen, domains. Using the existing literature,

Load more