Article Revealing the Unknown: Real-Time Recognition of Galápagos Species Using Deep Learning

Anika Patel 1, Lisa Cheung 1, Nandini Khatod 1, Irina Matijosaitiene 1,2,3,*, Alejandro Arteaga 4 and Joseph W. Gilkey Jr. 1

1 Data Science Institute, Saint Peter’s University, Jersey City, NJ 07306, USA; [email protected] (A.P.); [email protected] (L.C.); [email protected] (N.K.); [email protected] (J.W.G.J.) 2 Institute of Environmental Engineering, Kaunas University of Technology, 44249 Kaunas, Lithuania 3 Department of Information Technologies, Vilnius Gediminas Technical University, 10223 Vilnius, Lithuania 4 Tropical Herping, Quito 170150, Ecuador; [email protected] * Correspondence: [email protected]

 Received: 2 March 2020; Accepted: 26 April 2020; Published: 6 May 2020 

Simple Summary: The in Galápagos are the least studied group of vertebrates in the archipelago. The conservation status of only four out of nine recognized species has been formally evaluated, and preliminary evidence suggests that some of the species may be entirely extinct on some islands. Moreover, nearly all park ranger reports and citizen/science photographic identifications of Galápagos snakes are spurious, given that the systematics of the snakes in the archipelago have just recently been clarified. Our solution is to provide park rangers and tourists with easily accessible applications for species identification in real time through automatic object recognition. We used deep learning algorithms on collected images of the snake species to develop the artificial intelligence platform, an application software, that is able to recognize a species of a snake using a user’s uploaded image. The application software works in the following way: once a user uploads an image of a snake into the application, the algorithm processes it, classifies it into one of the nine snake species, gives the class of the predicted species, as well as educates users by providing them with information about the distribution, natural history, conservation, and etymology of the snake.

Abstract: Real-time identification of wildlife is an upcoming and promising tool for the preservation of wildlife. In this research project, we aimed to use object detection and image classification for the racer snakes of the Galápagos Islands, Ecuador. The final target of this project was to build an artificial intelligence (AI) platform, in terms of a web or mobile application, which would serve as a real-time decision making and supporting mechanism for the visitors and park rangers of the Galápagos Islands, to correctly identify a snake species from the user’s uploaded image. Using the deep learning and machine learning algorithms and libraries, we modified and successfully implemented four region-based convolutional neural network (R-CNN) architectures (models for image classification): Inception V2, ResNet, MobileNet, and VGG16. Inception V2, ResNet and VGG16 reached an overall accuracy of 75%.

Keywords: artificial intelligence (AI) platform; deep learning; Galápagos Islands; image classification; machine learning; Pseudalsophis; racer snake; region-based convolutional neural network (R-CNN); snake species

1. Introduction Despite being noteworthy inhabitants of the world’s most widely studied living laboratory, the snakes in Galápagos are the least studied group of vertebrates in the archipelago. The conservation

Animals 2020, 10, 806; doi:10.3390/ani10050806 www.mdpi.com/journal/animals Animals 2020, 10, 806 2 of 16 status of only four out of the nine recognized species has been formally evaluated [1–4], and preliminary evidence [5] suggests that some of the species may be extinct in some islands despite being evaluated as of “Least Concern” by IUCN (International Union for Conservation of Nature) assessments. Perhaps the most important reason why some of these local extinctions have gone unnoticed in Galápagos, one of the world’s capitals of wildlife-oriented tourism, is because nearly all park ranger reports and citizen-science photographic identifications of Galápagos snakes are erroneous. This is not surprising, given that the systematics of the snakes in the archipelago has just recently been clarified [6], and even now, it remains obscure and out-of-reach for the general public. One way to cope with this challenge is to provide park rangers and tourists with easily accessible applications for species identification through automatic object recognition. Real-time image identification of snake species is an upcoming and promising solution to bridging the taxonomic gap, which receives considerable attention in both data science and herpetologist communities. Wildlife experts explore different regions of the world to find these legless creatures and observe them in their natural habitat. Taxonomists have been searching for more efficient methods to meet species identification requirements, such as developing digital image processing and pattern recognition techniques [7] and using objection detection by camera traps that will precisely and concisely identify species [8]. Object recognition in images is a relatively new area; the first (deep) convolutional neural network architecture and early achievements in text document recognition were described by LeCun in 1998 [8]. More recent deep neural networks perform well in face recognition and object detection in streets, airports, and other buildings due in large part to the high volume of images that are available to train the models (hundreds of thousands of images). Therefore, having a small number of available images forces researchers to apply creative approaches by modifying the architecture of existing neural networks or developing their own neural network architecture. The novelty of our research is highlighted by the creative modification and application of novel state-of-the-art deep convolutional neural network architectures that have been trained by our team on a small number of images and received a high model performance. Moreover, due to a considerably short image testing time, our proposed artificial intelligence (AI) platform for the real-time recognition of the Galápagos snake species would be the first of this type and may be a successful mobile/web application for snake species recognition in one of the most protected and diverse national parks on Earth. Szabolcs Sergyán, in 2007 [9], implemented a color content-based image classification where color is stored in the intensity vectors of image pixels and the information can be retrieved easily. Color can be represented in different color spaces or features and can be stored in several ways. RGB space is a widely used color space for image display. It is composed of three color components: red, green, and blue (Table1). YCbCr is a family of color spaces used in video systems. Y is the luma component and Cb and Cr are the blue and red chroma components, respectively. HSV (hue, saturation and value) space is frequently used in computer graphics. The three-color components are hue, saturation (intensity), and value (brightness). Similarly, for the color-based features, the most commonly used descriptors are color moments, color histograms, and color coherence vector (CCV). Szabolcs Sergyán implemented k-means clustering, where he classified images into four clusters (k = 4) using the obtained six features (4 from moments, 1 from histogram, and another 1 from color coherence vectors) per color space. Animals 20192020, 910, x, 806FOR PEER REVIEW 33 of of 16

Opponent space Moments of 1 channel Blue and red colors OpponentTable space 1. Different color Moments spaces withof 2 colorchannel descriptions and Blue their color, classification sharp and basis. blurred colors RGB Color Space Color Moments Description Classification Lightness, Based blue On color YCbCr RGB Moments Moments of R channel Lightness,Lightness blue color HSV XYZ Moments Moments of Z channel Lightness, Darkness, blue color blue color rgb xyz Moments Histogram of x channel Blue Red colorand green colors xyz Moments of z channel Blue color rgb YCbCr MomentsCCV of Y channel Lightness Lightness xyz YCbCr Moments CCV of Cb channel Lightness, Blue color blue and green colors YCbCr YCbCr MomentsCCV of Cr channel Red color Blue color HSV Moments of H channel Number of colors Opponent spaceHSV Moments CCV of S channel Sharp and blurred Blue colors color HSV Moments of V channel Lightness YCbCr Opponentis a family space of color spaces Moments used of in 1 channelvideo systems. Y is the Blue luma and red component colors and Cb and Cr are the blueOpponent and red space chroma components, Moments of 2 respec channeltively. HSV Blue color,(hue,sharp saturation and blurred and colorsvalue) space is RGB Moments Lightness, blue color frequently usedYCbCr in computer graphics. MomentsThe three-color components Lightness, are hue, blue saturation color (intensity), and value (brightness).HSV Similarly, for the Moments color-based features, the Darkness,most commonly blue color used descriptors are color moments,rgb color histograms, Histogram and color coherence vector Blue and (CCV). green colors Szabolcs Sergyán rgb CCV Lightness implemented k-meansxyz clustering, where CCVhe classified images Lightness, into four blue clusters and green (k colors = 4) using the obtained six featuresYCbCr (4 from moments, 1 from CCV histogram, and another 1 Bluefrom color color coherence vectors) per color space.Opponent space CCV Blue color In 2014, Andres Hernandex-Serna and Luz Fernanda Jimenez-Segura [10] used an automatic objectIn detection 2014, Andres with the Hernandex-Serna use of photographic and Luz images Fernanda to identify Jimenez-Segura different species [10] usedof fish, an butterflies, automatic andobject plants. detection This withincluded the use 697 of fish photographic species, 32 imagesplant species, to identify and di11ff erentspecies species of butterflies of fish, butterflies,that were takenand plants. from the This Natural included History 697 fish Museum species, records 32 plant and species, online. andThey 11 implemented species of butterflies feature extraction that were andtaken pattern from the recognition Natural History to build Museum their detection records and models. online. The They images implemented were preprocessed feature extraction using andthe GrabCutpattern recognition algorithm toto buildremove their noisy detection backgrounds, models. Theso feature images was were extracted preprocessed easily. using Fifteen the GrabCutfeatures werealgorithm extracted to remove and fed noisy into backgrounds, an artificial so neural feature network. was extracted A system easily. architecture Fifteen features (Figure were 1) extracted[10] was createdand fed to into define an artificial the process neural of network. when an A image system is architecture fed into the (Figure model1)[ and10] how was createdthat model to define will recognizethe process it. of A when sample an imageimage isis fedgiven into to the the model model, and the how model that modelpreprocesses will recognize the image it. Aand sample then extractsimage is its given features. to the If model, the model the model recognizes preprocesses the image, the imageit will andprovide then a extracts result. itsHowever, features. if If it the is unrecognized,model recognizes the theimage image, will itbe will trained provide and astored result. into However, its memory if it is before unrecognized, giving the the result. image Their will workbe trained shows and promising stored into results, its memory with up before to 90–91.65% giving the of result.true positive Their work fish id showsentifications, promising 92.87% results, of plants,with up and to 90–91.65%93.25% of butterflies. of true positive However, fish identifications, a challenge they 92.87% face ofis plants,phenotypes and 93.25%that are of similar butterflies. and harderHowever, to distinguish. a challenge they face is phenotypes that are similar and harder to distinguish.

Figure 1. System architecture for feature extraction and pattern recognition used by Hernandex-Serna Figureand Jimenez-Segura 1. System architecture to identify for di featurefferent extraction species of an fish,d pattern butterflies, recognition and plants used [10 by]. Hernandex-Serna and Jimenez-Segura to identify different species of fish, butterflies, and plants [10]. Relevant research has also been performed by the University of Malaysia Perlis on snake species identificationRelevant usingresearch machine has also learning been performed techniques [by11 ].the The University team of researchers of Malaysia has Perlis gathered on snake 349 samples species identificationof 22 species ofusing snakes machine from Perlislearning snake techniques park in Malaysia [11]. The and team created of researchers a database has called gathered Snakes 349 of samplesPerlis Corpus. of 22 species They used of snakes textural from features Perlis extractionsnake park as in their Malaysia key method and created in identifying a database the called snake Snakesspecies. of To Perlis extract Corpus. the textural They used features, textural CEDD features (color extraction and edge directivity as their key descriptor) method in was identifying used, which the snake species. To extract the textural features, CEDD (color and edge directivity descriptor) was used,

Animals 2020, 10, 806 4 of 16 incorporates color and texture information in histograms. Features are created using two steps. First, theAnimals histogram 2019, 9, is x FOR divided PEER REVIEW into six regions based on texture. Second, 24 regions are calculated4 of 16 from each of these regions by using color characteristics. After extracting the textural information using which incorporates color and texture information in histograms. Features are created using two steps. CEDD, they implemented five machine learning algorithms to classify the snake images: naïve Bayes, First, the histogram is divided into six regions based on texture. Second, 24 regions are calculated nearest neighbors, K-nearest neighbors, decision tree J48, and backpropagation neural network. The from each of these regions by using color characteristics. After extracting the textural information nearestusing neighbors CEDD, they algorithm implemented uses five unsupervised machine learning learning algorithms and starts to classify with the k = snake1. It images: is unsupervised naïve becauseBayes, the nearest points neig havehbors, no external K-nearest classification. neighbors, Ondecision the other treehand, J48, and K-nearest backpropagation neighbors (K-NN)neural is a supervisednetwork. learning The nearest classification. neighbors Italgo classifiesrithm uses a point unsupervised based on thelearning known and classification starts with ofk = other 1. It points.is Backpropagationunsupervised because neural networkthe points and have K-nearest no external neighbors classification. gave the On best the accuracy other hand, and precisionK-nearest (99% andneighbors 96%, respectively) (K-NN) is (Tablea supervised2). learning classification. It classifies a point based on the known classification of other points. Backpropagation neural network and K-nearest neighbors gave the best accuracyTable 2. andPerformance precision of(99% algorithms and 96%, used respectively) for snakespecies (Table classification2). in Perlis park, Malaysia [11].

Table 2. PerformanceClassifier of algorithms Correctused for Predictionsnake species classification Recall in PrecisionPerlis park, Malaysia F-Measure [11]. Naïve Bayes 75.64 0.92 0.94 0.93 BackpropagationClassifier neural networkCorrect 87.93 Prediction 1Recall Precision 0.99 F-Measure 0.99 NearestNaïve neighbors Bayes 89.22 75.64 1 0.92 0.96 0.94 0.97 0.93 BackpropagationK-NN (k neural= 7) network 80.34 87.93 1 1 0.96 0.99 0.97 0.99 NearestDecision neighbors tree J48 71.29 89.22 0.79 1 0.71 0.96 0.72 0.97 K-NN (k = 7) 80.34 1 0.96 0.97 Kody G. DantongdeeDecision tree andJ48 Franz Kurfess from 71.29 California Polytechnic 0.79 State 0.71 University 0.72[12], San Luis Obispo, built an image classifier to identify plants. The dataset for this research was produced by Kody G. Dantongdee and Franz Kurfess from California Polytechnic State University [12], San searching Google Images using Python script adapted from a web-crawler created by Hardik Vasa [13]. Luis Obispo, built an image classifier to identify plants. The dataset for this research was produced Convolutionalby searching neural Google networks Images using (CNN), Python commonly script adapted used from for image a web-crawler classification, created were by Hardik applied Vasa here to classify[13]. theConvolutional plants. The neural model networks was trained (CNN), using commonly the dataset used for at image the genus-species classification, level,were applied which had approximatelyhere to classify 100 the images plants. perThe class.model was The trained model using used the for dataset classification at the genus-species was a very level, basic which inception modelhad ofapproximately CNN. The design 100 images of these per class. networks The model is made used up for ofclassification a series of was convolutional, a very basic inception pooling, and fullymodel connected of CNN. layers. The design Although of these they networks failed to is achievemade up good of a series accuracy of convolutional, on the plant pooling, classifier and due to an incompletefully connected dataset layers. and Although basic CNN theydesign, failed to the achi researcheve good gave accuracy good on scope the plant for future classifier researchers due to to customizean incomplete designs dataset of neural and basic networks CNN design, that can the be rese implementedarch gave good and scope used for by future scientists researchers or potential to userscustomize in the botanical designs of field neural through networks mobile that applications. can be implemented and used by scientists or potential usersSnake in the species botanical identification field through may mobile be useful applications. for wildlife monitoring purposes. Images captured Snake species identification may be useful for wildlife monitoring purposes. Images captured by camera traps in national parks and forests can be used to assess the movement patterns of the by camera traps in national parks and forests can be used to assess the movement patterns of the corresponding . In this paper we present the different deep learning convolutional neural corresponding reptiles. In this paper we present the different deep learning convolutional neural networksnetworks used used to to identify identify a a set set of of snake snake speciesspecies found in in the the Galápagos Galápagos Islands. Islands. The The Ecuadorian Ecuadorian institutioninstitution Tropical Tropical Herping Herping provided provided the the imageimage dataset of of 9 9 PseudalsophisPseudalsophis snakesnake species species (Figure (Figure 2). 2).

Figure 2. Pseudalsophis snake species found in the Galápagos Islands, Ecuador, and used in this Figure 2. Pseudalsophis snake species found in the Galápagos Islands, Ecuador, and used in this research [14]. research [14].

Animals 2020, 10, 806 5 of 16

2. Materials and Methods The final goal of this research was to develop an artificial intelligence platform for the real-time object recognition of snake species in the Galápagos islands by applying machine learning and deep learning techniques on snake images. This requires two main tasks: object detection and classification of images of the Pseudalsophis snake species. R-CNN (region-based convolutional neural network) is widely used for object detection, whereas CNN (convolutional neural network) is used for image classification.

2.1. R-CNN Structure R-CNN works with the feature extraction process. It has three main layers of operations: feature network, region proposal network (RPN), and detection network [15–17]. The function of the feature network is to extract the features from the image and output the original shape and structure of the mapped image to the pixel coordinates of the image. The region proposal network has three convolutional layers from which one is used for classification and the other two are used for finding the regions that contain the feature pixels that make up the image. The purpose of the RPN is to generate the bounding boxes called regions of interest (ROI) that have higher probability of containing an object [15,16]. The detection network takes the input from the feature network and RPN and generates the final class and bounding box, which are fully connected layers. Training is performed at the RPN and detection layers.

2.1.1. Training the RPN Animals 2019, 9, x FOR PEER REVIEW 6 of 16 For training the RPN, bounding boxes are generated by mechanisms known as anchor boxes (Figureselected3). from Every the pixel foreground of an image and background. is considered Features as an anchor. are cropped Nine rectangular and scaled boxesto 14 × of 14 di [17–19].fferent shapesThe set and of cropped sizes are features generated for around each image each anchoris passed (due through to 9 proposals the detection generated network for each as a pixel batch. in The the image:final dense 3 aspect layers/ratio are and created 3 scale), for hence each therecropped are tensfeature, of thousands with the ofscore boxes and around bounding each anchorbox for in each the imageclass. [15,17–19]. To remove the multiple boxes around each anchor, the algorithm applies a reduction operationTo generate called a the non-maximum label for the detection suppression. network It removes classification the overlapping task, IOUs boxes of all on the each ROIs object and the by consideringground truth a boxboxes with are the calculated highest score.depending Thus, on during the IOU the trainingthresholds phase (e.g., of foreground RPN the number above of 0.5, boxes and isbackground reduced to between 2000 [15]. 0.5 During and 0.1), the thus, testing labels phase, are the generated score is for fed a to subset the detection of ROIs. networkThe difference to indicate with whichRPN is boxes that it to contains keep. more classes.

(a) (b)

FigureFigure 3.3. ((a)) Anchor boxes generated generated around around each each of of nine nine anchors anchors [15]. [15]. (b ()b Inception) Inception V2 V2 module module in inwhich which the the 5 × 55 convolutions5 convolutions are replaced are replaced by two by su twoccessive successive 3 × 3 convolutions 3 3 convolutions and pooling and pooling is applied is × × applied[20]. [20].

2.2. R-CNNThen, the Implementation RPN predicts if the anchor is background or foreground and refines the anchor. After determining the bounding boxes from all the above processes, there is a need to label the Implementation of different modified architectures of R-CNN let us test and determine the best fit method to correctly identify species of snakes: Inception V2, ResNet, MobileNet, and VGG16.

2.2.1. Inception V2 While processing the image in the neural network, the inception module prevents the model from overfitting. Images of different sizes are filtered at the same level thus making the network wider rather than making it deeper. The model “uses convolutions of different sizes to capture details at varied scales (5 × 5, 3 × 3, 1 × 1)” [20,21]. We applied the Inception V2 module in which 5 × 5 convolutions are replaced by two successive 3 × 3 convolutions and applied pooling as shown in Figure 3b. The inception model also replaced the fully connected layers at the end with the pooling layer, where max pooling is performed and the 2D feature map that reduces the number of parameters is obtained. Inception V2 has 22 layers, where going deeper with convolutions led to a smarter approach: Instead of stacking convolutional layers, the authors of Inception V2 stack modules, within which the convolutional layers are located [22,23].

2.2.2. ResNet ResNet was designed by the Microsoft Research team, where they address the problem of decreasing accuracy with the increase of depth of neural networks by using skip connections (such as residuals), while actually making the models deeper [24]. The key insight into the stability of ResNet is that the architecture tries to model the residual of the intermediate output, instead of the traditional method of modeling the intermediate output itself [25]. This allows researchers to train ResNet models that are comprised of hundreds of layers. We propose a modification in the ResNet architecture as described in Figure 4. Our proposed ResNet model comprises basic blocks that are, in turn, derived from a pair of convolutional layers that are pre-processed using batch normalization

Animals 2020, 10, 806 6 of 16 anchor, for which the ground truth boxes are used. The anchor that largely overlaps with the ground truth as foreground and the one that overlaps less as background are then labeled [15,16], hence, the classification task is implemented at this step. In the above described method, loss occurs with the background anchor that is not compared with the ground truth boxes. Therefore, to define these background anchors and to minimize the error, a smooth-L1 loss on the position (x, y) of the top-left box is used, as well as the logarithm of the heights and widths [16]. The overall loss of the RPN is a combination of the classification loss and the regression loss, since the RPN has a classifier to determine the probability of a proposal having the target object, and a regressor to regress the coordinates of the proposals. X L (tu, v) = smooth (tu v loc L1 i − i), i x,y,w,h ∈{ } ( 0.5 x2 i f x < 1 where smooth (x) = | | . L1 x 0.5 otherwise | | − 2.1.2. Training the Detection Layer Training the detection layer is similar to training the RPN. First the 2000 of intersections over unions (IOUs) generated by RPN against ground truth are calculated. Then, ROIs are labeled as foreground and background depending on the threshold values. Then, a fixed number of ROIs are selected from the foreground and background. Features are cropped and scaled to 14 14 [17–19]. × The set of cropped features for each image is passed through the detection network as a batch. The final dense layers are created for each cropped feature, with the score and bounding box for each class. To generate the label for the detection network classification task, IOUs of all the ROIs and the ground truth boxes are calculated depending on the IOU thresholds (e.g., foreground above 0.5, and background between 0.5 and 0.1), thus, labels are generated for a subset of ROIs. The difference with RPN is that it contains more classes.

2.2. R-CNN Implementation Implementation of different modified architectures of R-CNN let us test and determine the best fit method to correctly identify species of snakes: Inception V2, ResNet, MobileNet, and VGG16.

2.2.1. Inception V2 While processing the image in the neural network, the inception module prevents the model from overfitting. Images of different sizes are filtered at the same level thus making the network wider rather than making it deeper. The model “uses convolutions of different sizes to capture details at varied scales (5 5, 3 3, 1 1)” [20,21]. We applied the Inception V2 module in which 5 5 × × × × convolutions are replaced by two successive 3 3 convolutions and applied pooling as shown in × Figure3b. The inception model also replaced the fully connected layers at the end with the pooling layer, where max pooling is performed and the 2D feature map that reduces the number of parameters is obtained. Inception V2 has 22 layers, where going deeper with convolutions led to a smarter approach: Instead of stacking convolutional layers, the authors of Inception V2 stack modules, within which the convolutional layers are located [22,23].

2.2.2. ResNet ResNet was designed by the Microsoft Research team, where they address the problem of decreasing accuracy with the increase of depth of neural networks by using skip connections (such as residuals), while actually making the models deeper [24]. The key insight into the stability of ResNet is that the architecture tries to model the residual of the intermediate output, instead of the traditional method of modeling the intermediate output itself [25]. This allows researchers to train ResNet models Animals 2020, 10, 806 7 of 16 that are comprised of hundreds of layers. We propose a modification in the ResNet architecture as describedAnimals 2019 in, 9,Figure x FOR PEER4. Our REVIEW proposed ResNet model comprises basic blocks that are, in turn,7 derivedof 16 from a pair of convolutional layers that are pre-processed using batch normalization and are passed throughand are apassed non-linear through rectifier a non-linear function. rectifier The aim func is totion. compute The aim an exponentiallyis to compute increasingan exponentially number of independentincreasing number representations of independent at each representations basic block. In at contrast, each basic the traditionalblock. In contrast, ResNet the only traditional increases the numberResNet ofonly intermediate increases the representations number of intermediate from 16 up representations to 64 across the from entire 16 up network. to 64 across the entire network.

(a) (b)

FigureFigure 4.4.( a()a) ResNet ResNet architecture architecture [21 [21].]. ReLU ReLU is ais linear a linear rectifier rectifier function, function, which which is an is activation an activation function forfunction the neural for the network. neural Batchnorm,network. Batchnorm, or batch normalization, or batch normalizat is a techniqueion, is a used technique to standardize used to the inputstandardize either bythe an input activation either by function an activation or by function a prior node. or by a Stride prior isno thede. numberStride is the of pixelnumber shifts of pixel over the inputshifts matrix. over the For input instance, matrix. when For in thestance, stride when is 1, then the stride we move is 1, thethen filters we move 1 pixel the at filters a time, 1 andpixel when at a the stridetime, and is 2, when then wethe movestride is the 2, filtersthen we 2 pixelsmove the at afilters time. 2 (pixelsb) MobileNetV2 at a time. (b) architecture MobileNetV2 [26 architecture]. [26]. 2.2.3. MobileNet 2.2.3. MobileNet The MobileNetV2 architecture is based on an inverted residual structure where the input and outputThe of MobileNetV2 the residual blockarchitecture are within is based bottleneck on an in layers,verted inresidual contrast structure to traditional where the residual input and models thatoutput use of expanded the residual representations block are within in the bottleneck input. MobileNetV2 layers, in contrast uses lightweight to traditional depth-wise residual models separable that use expanded representations in the input. MobileNetV2 uses lightweight depth-wise separable convolutions to filter features in the intermediate expansion layer (Figure4b) [27]. convolutions to filter features in the intermediate expansion layer (Figure 4b) [27]. 2.2.4. VGG16 2.2.4. VGG16 VGGNet is another convolutional neural network model that was created by Karen Simonyan VGGNet is another convolutional neural network model that was created by Karen Simonyan and Andrew Zisserman at Oxford Visual Geometry Group. The network is simple compared to most and Andrew Zisserman at Oxford Visual Geometry Group. The network is simple compared to most convolutional neural networks and works with a simple 3 3 stacked layer [28]. The application of convolutional neural networks and works with a simple 3 × ×3 stacked layer [28]. The application of this model is available due to the large image dataset (ImageNet) it was pretrained on, which has over this model is available due to the large image dataset (ImageNet) it was pretrained on, which has 14over million 14 million images images and 1000 and di 1000fferent different classes classe of images.s of images. There areThere two are di fftwoerent different VGG (Visual VGG (Visual Geometry Group)Geometry models Group) that models are available: that are VGG16available: and VGG16 VGG19. and The VGG19. difference The difference between between the two isthe the two number is ofthe weight number layers. of weight VGG-16 layers. has VGG-16 13 convolutional has 13 conv andolutional 3 fully-connected and 3 fully-connected layers, with layers, the ReLu with function the appliedReLu function to all 16 applied layers (Figureto all 165 layers). For (Figure our research, 5). For VGG16our research, was implemented. VGG16 was implemented. The model requires a 224 × 244 RGB image to be inputted as the training data. The architecture of the network has three different parameters for a convolutional layer: receptive field size, stride, and padding. The stride refers to how pixels are controlled around convolutional filters for a given image. VGGNet uses a stride of 1 which means the network filters one pixel at a time [29]. Padding describes adding extra pixels outside of the image to prevent images from decreasing in size each time a filter is used on an image [29]. A padding of 1 was used to maintain the same size spirally,

Animals 2019, 9, x FOR PEER REVIEW 8 of 16

since our receptive field was 3. Along with the 3 × 3 convolutional layer, it also has a pooling layer of 2 × 2 with a stride of 2 and zero padding [28]. The VGGNet training procedure is similar to the procedure presented by Alex Krizhevsky’s team for a deep convolutional neural network in the sense that it is “optimizing the multinomial Animalslogistic2020 regression, 10, 806 objective using mini-batch gradient descent (based on back-propagation 8with of 16 momentum)” [28].

Animals 2019, 9, x FOR PEER REVIEW 8 of 16 since our receptive field was 3. Along with the 3 × 3 convolutional layer, it also has a pooling layer of 2 × 2 with a stride of 2 and zero padding [28]. The VGGNet training procedure is similar to the procedure presented by Alex Krizhevsky’s team for a deep convolutional neural network in the sense that it is “optimizing the multinomial logistic regression objective using mini-batch gradient descent (based on back-propagation with momentum)” [28].

Figure 5. VGG16 architecture [26]. ReLU is a linear rectifier function, which is an activation function Figure 5. VGG16 architecture [26]. ReLU is a linear rectifier function, which is an activation function for the neural network. Softmax is the final layer of the neural network that has a value either 0 or 1. for the neural network. Softmax is the final layer of the neural network that has a value either 0 or 1. The model requires a 224 244 RGB image to be inputted as the training data. The architecture of 2.3. Workflow towards Training ×the Image Classifier the network has three different parameters for a convolutional layer: receptive field size, stride, and padding.Before The implementing stride refers tothe how R-CNNs pixels we are developed controlled aroundthe workflow convolutional for training filters the for image a given classifier image. VGGNetto correctly uses identify a stride Pseudalsophis of 1 which means snake the species network (Figure filters 6). one Each pixel step at is a timediscussed [29]. Paddingin the subsections describes addingbelow. extra pixels outside of the image to prevent images from decreasing in size each time a filter is used on an image [29]. A padding of 1 was used to maintain the same size spirally, since our receptive field was 3. Along with the 3 3 convolutional layer, it also has a pooling layer of 2 2 with a stride of × × 2 and zero padding [28]. The VGGNet training procedure is similar to the procedure presented by Alex Krizhevsky’s Data team for Installation a deep convolutional Data neuralcollection network in the sense that it is “optimizingModelling the multinomial Result logistic regression objective using mini-batch gradientpreprocessing descent (based on back-propagation with Figure 5. VGG16 architecture [26]. ReLU is a linear rectifier function, which is an activation function momentum)” [28]. for the neural network. Softmax is the final layer of the neural network that has a value either 0 or 1. 2.3. Workflow towards Training the Image Classifier 2.3. Workflow towards Training the Image Classifier BeforeBefore implementing implementing the the R-CNNs R-CNNs we wedeveloped developed the workflow the workflow for training for training the image the image classifier classifier to correctly identify Pseudalsophis snake species (Figure6). Each step is discussed in the subsections below. to correctly identify PseudalsophisFigure snake 6. species Process (Figure of training 6). Eachthe image step isclassifier. discussed in the subsections below. 2.3.1. Installation of Libraries For a machine to learn, a larger sample is preferred. The more images the machine learns from, the easier and quicker it will learn to recognize the species. Therefore, for this research, the machine needs a larger processing and computational power.Data TensorFlow-GPU allows the computer to use a Installation Data collection Modelling Result video card to provide extra processing powerpreprocessing while training the model. Thus, a TensorFlow-GPU environment was set up in our local personal computer (PC) that had a NVIDIA GEFORCE GTX 1050 graphics card supporting TensorFlow-GPU version 1.12 along with CUDA version 10 required for image processing in the Windows 10 operating system.

Figure 6. Process of training the image classifier. Figure 6. Process of training the image classifier.

2.3.1. Installation of Libraries For a machine to learn, a larger sample is preferred. The more images the machine learns from, the easier and quicker it will learn to recognize the species. Therefore, for this research, the machine needs a larger processing and computational power. TensorFlow-GPU allows the computer to use a video card to provide extra processing power while training the model. Thus, a TensorFlow-GPU environment was set up in our local personal computer (PC) that had a NVIDIA GEFORCE GTX 1050 graphics card supporting TensorFlow-GPU version 1.12 along with CUDA version 10 required for image processing in the Windows 10 operating system.

Animals 2020, 10, 806 9 of 16

2.3.1. Installation of Libraries For a machine to learn, a larger sample is preferred. The more images the machine learns from, the easier and quicker it will learn to recognize the species. Therefore, for this research, the machine needs a larger processing and computational power. TensorFlow-GPU allows the computer to use a video card to provide extra processing power while training the model. Thus, a TensorFlow-GPU environment was set up in our local personal computer (PC) that had a NVIDIA GEFORCE GTX 1050 graphics card supporting TensorFlow-GPU version 1.12 along with CUDA version 10 required for image processing in the Windows 10 operating system.

2.3.2. Data Collection After setting up the environment, we collected our dataset from three different sources. First, from the Tropic Herping collection of images [5]. Second, the images were collected by searching Google images using a Python script adapted from a web-crawler created by Hardik Vasa [13], in a similar process as the image collection completed for the project conducted by Kody G. Dantongdee from California Polytechnic State University, San Luis Obispo [12]. However, some of the images returned in the searches were not correctly identified. Third, we leveraged Flickr [30] for images using the snakes’ genre and species names.

2.3.3. Data Preprocessing For this project, we had in total 247 images of snakes making up 9 species. These images were manually checked for accuracy in terms of representing the correct snake species. Because raw images cannot be fed to the neural network for training the model, they needed preprocessing, therefore, we performed the following steps: (1) Applied the labeling software to label each of the images. After an image was labeled the xml file for each image was generated and saved in the folder. (2) Converted all xml files to csv files using the Python code. Thus, for training the model, we used the csv files as external parameters (input) to the network (Table3). As internal parameters for training the models we used a default parameter for the number of layers in the neural network, weights for the neural network that come with the model.

Table 3. External parameters (input) for training the models.

File Name Width Height Class xmin ymin xmax ymax 14924394811_c505bc3d_o.jpg 3497 2332 P. biserialis 1002 625 3497 2015 151761_5-Galapagos.jpg 600 500 P. occidentalis 41 65 554 442 16081314279_c833e990_b.jpg 1024 683 P. slevini 164 269 698 412 182611421_cba87acd82_o.jpg 3648 2736 P. biserialis 449 673 3166 2166

For the different deep learning models we used different sets of training and testing images. For the Inception and ResNet models, the training set contained 211 images and testing set 36 images (Table4). For the MobileNet model, the first run using the initial split into training and testing sets (Table4) did not yield a good accuracy. Therefore, we increased the size of the training set to improve the accuracy, because the model needed more images for training: 237 images used in the training set and 10 images in the testing set (at least 1 for each snake species). For the VGG16 model, we trained on 230 images and tested 50 images. This model gave a prediction answer based on the Boolean true or false parameters, meaning whether an image is classified correctly or incorrectly. We used a few of the training images for testing to verify if the model was correctly predicting the trained images to get false positives or false negatives. Animals 2020, 10, 806 10 of 16

Table 4. Number of images used for training and testing sets for Inception and ResNet models.

Pseudalsophis Snake Species Training Set Images Test Set Images Total Images P. biserialis 27 5 32 P. darwini 10 2 12 P. dorsalis 23 4 27 Animals 2019, 9P., xhephaestus FOR PEER REVIEW 8 1 9 10 of 16 P. hoodensis 23 3 26 2.3.4. ModelingP. occidentalis 58 10 68 P. slevini 6 1 7 All imagesP. steindachneri varied in size and features. In16 some instances, the snakes 3 were the object in 19 focus and the backgroundP. thomasi was blurry. Thus, the area occupied40 by the object was 7 different in each image. 47 For instance, the imagesTotal: of Pseudalsophis biserialis 211 show the object in three 36different variations 247(Figure 7). Therefore, the question is how to process the image in neural networks and obtain the result 2.3.4.with Modeling high accuracy? One solution is to make the network deeper by adding a greater number of layers in the network, but sometimes very deep networks are prone to overfitting. Choosing the right kernel sizeAll for images the neural varied network in size becomes and features. a challenging In some task. instances, Usually the“a larger snakes kernel were is the preferred object infor focus andinformation the background that is wasdistributed blurry. more Thus, globally, the area and occupied a smaller by kernel the objectis preferred was diforff erentinformation in each that image. Foris instance, distributed the more images locally” of Pseudalsophis [20]. biserialis show the object in three different variations (Figure7).

FigureFigure 7. Three7. Three diff erentdifferent variations variations of P. biserialisof P. biserialis[5,31,32 [5,31,32].]. Images Images demonstrate demonstrate a variety a invariety background in andbackground percentage and of portionpercentage covered of portion by a co snakevered in by the a snake entire in frame. the entire frame.

3.Therefore, Results the question is how to process the image in neural networks and obtain the result with high accuracy? One solution is to make the network deeper by adding a greater number of layers in the Google has released few object-detection APIs (application programming interface) for network,TensorFlow, but sometimes an open verysource deep platform networks for aremachine prone learning to overfitting. that has Choosing prebuilt the architectures right kernel and size for theweights neural networkfor specific becomes models aof challenging neural networks. task. From Usually TensorFlow, “a larger we kernel used isFaster preferred R-CNN for Inception information thatand is distributed R-CNN ResNet more models globally, for the and object a smaller detectio kerneln and is image preferred classification for information in the snake that images. is distributed more locally”VGG16 [ 20network]. is loaded in the Keras deep learning library in TensorFlow. Keras has the advantage of containing pre-trained models (such as VGG16) which we used and modified for our 3. Resultsclassification task. Oxford Visual Geometry Group provides weights, but due to their size, the model requiresGoogle a has lot of released time and few memory object-detection to run. APIs (application programming interface) for TensorFlow, While using four different modified R-CNN architectures, Inception V2, ResNet, MobileNet, and an open source platform for machine learning that has prebuilt architectures and weights for specific VGG16, to implement object detection and image recognition of nine snake species of Pseudalsophis, models of neural networks. From TensorFlow, we used Faster R-CNN Inception and R-CNN ResNet the models demonstrated the following performance results. MobileNet gave the lowest accuracy of models for the object detection and image classification in the snake images. 10%, Inception V2 model demonstrated performance around 70%, whereas, both VGG16 and ResNet tiedVGG16 for 75% network accuracy is (Table loaded 5). inPretrained the Keras models deep we learningre used to library test the in benefits TensorFlow. of each individual Keras has the advantagealgorithm, of containinghowever, for pre-trained future research, models to achi (sucheveas better VGG16) accuracy, which the we models used andshould modified be trained for our classificationwith a much task. bigger Oxford data Visualset of images. Geometry One Group of the providescomplications weights, of our but work due was to their that size,we did the not model requireshave aenough lot of timeimages and for memory some species to run. to train these species correctly. To further improve the accuracy,While using we will four need diff toerent obtain modified more images R-CNN of architectures,each species to Inception train the model. V2, ResNet, In the MobileNet,subsections and VGG16,below, to we implement discuss the object performance detection of and each image model recognition in more detail. of nine snake species of Pseudalsophis, the models demonstrated the following performance results. MobileNet gave the lowest accuracy of 10%, InceptionTable V2 model5. Summary demonstrated of performance performance results for all around region-based 70%, convolutional whereas, both neural VGG16 network and (R-CNN) ResNet tied for models implemented in this research. 75% accuracy (Table5). Pretrained models were used to test the benefits of each individual algorithm, however,R-CNN for future Model research, to achieve better accuracy, the Accuracy models should be trained with a much bigger data setResNet of images. One of the complications of ourAround work 75% was that we did not have enough Inception V2 Around 70% VGG16 Around 70% MobileNet Around 10%

Animals 2020, 10, 806 11 of 16 images for some species to train these species correctly. To further improve the accuracy, we will need to obtain more images of each species to train the model. In the subsections below, we discuss the performance of each model in more detail.

Table 5. Summary of performance results for all region-based convolutional neural network (R-CNN) models implemented in this research.

R-CNN Model Accuracy ResNet Around 75% Inception V2 Around 70% VGG16 Around 70% Animals 2019, 9, x FOR PEER REVIEWMobileNet Around 10% 11 of 16

3.1.3.1. Inception V2 WeWe usedused Faster Faster R-CNN R-CNN Inception Inception V2 V2 to trainto train the model.the model. The modelThe model was trained was trained on all nineon all species nine ofspecies snakes. of snakes. Loss started Loss started at around at around loss = loss3 at = step 3 at 0, step and 0, we and continued we continued to train to train our modelour model until until we obtainedwe obtained a loss a loss around around 0.08 0.08 and and below. below. It took It took 7000 7000 steps steps to obtain to obtain that that loss loss and and 14 to 14 15 to h 15 to h train to train the model.the model. Once Once this this loss loss stayed stayed below below 0.08 0.08 constantly, constantly, we stoppedwe stopped training training the modelthe model and and generated generated the inferencethe inference graph graph for thefor modelthe model that that we used we used in our in codeour code to test to the test images. the images. The inferenceThe inference graph graph was generatedwas generated from from the highest the highest checkpoint checkpoint that wasthat savedwas saved while while running running the model. the model. WhileWhile usingusing the the Inception Inception V2 V2 model, model, we we identified identified 70% 70% of theof the test test images images correctly correctly with with an 80–90% an 80– confidence90% confidence interval interval for each for snakeeach snake species species (Figure (Figure8). 10).

(a) (b)

(c) (d)

FigureFigure 8.10.Pseudalsophis Pseudalsophissnake snake species species recognition recognition results results obtained obtained while usingwhile the using Inception the Inception V2 model; V2P. slevinimodel;correctly P. slevini classified correctly with classified 98% probability with 98% (probabilitya,b), P. darwini (a),correctly (b), P. darwini classified correctly with 95% classified probability with (95%c,d). probability (a,c): actual (c images.), (d). (a (),b (,dc):): actual prediction images. output. (b), (d): prediction output.

3.2. ResNet We used the Faster R-CNN ResNet V1 model (faster_rcnn_resent_v1_feature_extractor) to train the model. All nine snake species were used to train the model. The step loss started at around loss = 4 at step 0, and we continued to train the model until we obtained the step loss less than 0.05. It took 3500 steps to obtain that loss and 17 h to train the model. While using the ResNet, 75% of the images were identified correctly with a 95–99% confidence interval for each snake species. Individuals of P. occidentalis in the images below were identified correctly with a 98% confidence interval (Figure 11). Besides correctly identifying most of the images, it is worth mentioning the failure example of our implemented ResNet model. The ResNet model identified incorrectly individuals of P. darwini as P. occidentalis (Figure 11) by predicting both species in one image.

Animals 2020, 10, 806 12 of 16

3.2. ResNet We used the Faster R-CNN ResNet V1 model (faster_rcnn_resent_v1_feature_extractor) to train the model. All nine snake species were used to train the model. The step loss started at around loss = 4 at step 0, and we continued to train the model until we obtained the step loss less than 0.05. It took 3500 steps to obtain that loss and 17 h to train the model. While using the ResNet, 75% of the images were identified correctly with a 95–99% confidence interval for each snake species. Individuals of P. occidentalis in the images below were identified correctly with a 98% confidence interval (Figure9). Animals 2019, 9, x FOR PEER REVIEW 12 of 16

(a) (b)

(c) (d)

Figure 9.11.Pseudalsophis Pseudalsophissnake snake species species recognition recognition results results obtained obtainedwhile while usingusing thethe ResNetResNet model:model: 75% of images were correctlycorrectly recognized. P. occidentalis [[33]33] correctly classifiedclassified with 98% probability ( a), ((bb).). P. darwinidarwini inin thethe samesame image (c), (d) was classified classified correctly with 83% probability, andand incorrectly classifiedclassified asas P.P. occidentalisoccidentaliswith with 59%59% probability.probability.( a(a,c),): (c actual): actual images. images. (b ,(db):), prediction(d): prediction output. output.

3.3. MobileNetBesides correctly identifying most of the images, it is worth mentioning the failure example of our implemented ResNet model. The ResNet model identified incorrectly individuals of P. darwini as The MobileNet model was trained on all nine species of snakes. The model training time was P. occidentalis (Figure9) by predicting both species in one image. very long, taking almost 20 h to bring the step loss to less than 0.5, which started at 14.0 for MobileNet. 3.3.Unfortunately, MobileNet with 10 images used for testing, all the images were classified as P. thomasi, giving us an accuracy of 10%, considering only P. thomasi was correctly classified. The neural network failed The MobileNet model was trained on all nine species of snakes. The model training time was for this particular model. However, given more time and resources this model can learn deeply and very long, taking almost 20 h to bring the step loss to less than 0.5, which started at 14.0 for MobileNet. can be improved. Unfortunately, with 10 images used for testing, all the images were classified as P. thomasi, giving us an accuracy3.4. VGG16 of 10%, considering only P. thomasi was correctly classified. The neural network failed for this particular model. However, given more time and resources this model can learn deeply and can be improved.The VGG16 model was trained and tested on all nine species of snakes. The process was very fast, taking only 20 min to run the code because the images were trained on the last layer only. With 3.4.50 images VGG16 used for testing, 13 out of the 50 images were predicted incorrectly. P. occidentalis and P. biserialis were the two species that were misclassified in the 13 images, the model predicted both The VGG16 model was trained and tested on all nine species of snakes. The process was very fast, species in one image. Therefore, this model gave us an accuracy of 75%. Some complications that taking only 20 min to run the code because the images were trained on the last layer only. With 50 appeared while training and testing this model were that it had difficulties in recognizing certain species of snakes in images with a cluttered background. Accordingly, it caused difficulties when distinguishing the differences in snake species.

4. Discussion This research project succeeded in implementing different models in recognizing snake species based on image data, though there is room for improvement. The following are some of the challenges faced by this project: 1. Image dataset was limited. There were not enough images available for each species to train the model. Two hundred thirty images to train a deep learning model is not enough to achieve good accuracy. Image data augmentation could be a remedy in this case, though it requires additional

Animals 2020, 10, 806 13 of 16 images used for testing, 13 out of the 50 images were predicted incorrectly. P. occidentalis and P. biserialis were the two species that were misclassified in the 13 images, the model predicted both species in one image. Therefore, this model gave us an accuracy of 75%. Some complications that appeared while training and testing this model were that it had difficulties in recognizing certain species of snakes in images with a cluttered background. Accordingly, it caused difficulties when distinguishing the differences in snake species.

4. Discussion This research project succeeded in implementing different models in recognizing snake species based on image data, though there is room for improvement. The following are some of the challenges faced by this project:

1. Image dataset was limited. There were not enough images available for each species to train the model. Two hundred thirty images to train a deep learning model is not enough to achieve good accuracy. Image data augmentation could be a remedy in this case, though it requires additional deep-domain skills in deep learning, image processing, and computer vision. Image data augmentation would allow us to significantly increase the data set size and diversity of images, providing the options of image modification by turning, flipping, padding, cropping, brightening, or darkening the object and background in the image. 2. There were a few images with complex backgrounds in which separating the snake from its surroundings was difficult. To resolve this issue, removing the backgrounds in images would significantly improve the results, however, the solution to automate the background removal for hundreds of thousands of images would have to be provided. Recent scientists’ achievements reveal that there is a promising background removal method called DOG, created by Fang et al. [34,35], used alongside VGG-DCGAN to increase the effectiveness of image classification. The method is based on CNN and created from the original model Tiramisu (One Hundred Layers Tiramisu) [36], however, with fewer layers leading to a better performance. Lastly, transfer learning can be utilized to deal with complex backgrounds in images, which allows for incorporation of knowledge previously gained from other models into current models. This would improve the accuracy and performance of the model, since it is leveraging knowledge gained from previous trained models.

4.1. Future Scope: From a Data Science Perspective There are further pretrained models and CNN architectures trained on datasets, such as CIFAR 10 or CIFAR 100, that can be used to train the Galápagos snake species images. YOLO is one of the promising models. YOLO (you only look once) algorithm is different from other object detection models because it uses a single convolutional neural network for both classification and localizing the object using bounding boxes [37]. Previous algorithms used region-based techniques, while YOLO uses the whole image for detection [37]. YOLO divides the imagine into an S S grid. If an object of × interest falls into one of the grids, that grid is responsible for detecting that object. Each grid has a confidence score for predicting its object, but if there is no object shown in that grid, the confidence score is zero. YOLO is based on the Darknet, which is an open source neural network framework in C language. However, it was translated from Darknet to Darkflow to allow for its use in TensorFlow in Python for deep learning. The main benefit of this algorithm is that its processing and speed detection is rapid, which is excellent for real-time processing [37]. The research project presented in this paper was developed on pre-built structures of the TensorFlow and Keras models, such as R-CNN ResNet, R-CNN Inception, and MobileNet. Our future step will be to build our own architecture and convolutional layers to achieve better accuracy. There are many deep learning studios and APIs, such as Deep Learning Studio launched by Deep Cognition, that can help build each layer in the neural network as we aim to improve the outcome. Animals 2020, 10, 806 14 of 16

Another library or package built by Facebook-PyTorch provides a high level of tensor computation with strong GPU acceleration and a deep neural network built on a tape-based autograd system. It can be used to identify the snake species in our dataset.

4.2. Future Scope: From a Software Development Perspective The final target of this research project was to build an artificial intelligence (AI) platform, in terms of a web or mobile application, that would serve as a real-time decision making mechanism for the visitors and park rangers in the Galápagos islands. A user would take a picture and upload it to the AI platform to check for the correct snake species name. Our developed deep learning model, that was built using TensorFlow, is connected via an API with the Python backend and frontend technologies. Flask is the Python library that allowed us to build the web application by connecting TensorFlow to the machine learning algorithm. The demo version of the AI application software is presented in Figure 10. Image testing took less than 15 s to identify an image. Even though, the testing time was short for an image detection/recognition task, we believe it can be reduced by using new technologies, described here, to improve the real-time product–user interaction. The final step of the AI application Animals 2019, 9, x FOR PEER REVIEW 14 of 16 software not only gives the Galápagos national park rangers and tourists the class of the predicted snakethe species, predicted but snake also educatesspecies, but users also byeducates providing users themby providing with information them with information about the distribution,about the naturaldistribution, history, conservation, natural history, and conser etymologyvation, and of the etymology snake. of the snake.

(a) (b)

Figure 10.FigureArtificial 12. intelligenceArtificial intelligence software software application application demo for demo automated for automated real-time real-time decision decision making software,making helping software, visitors helping and visitors park rangersand park in rangers the Gal iná thepagos Galápagos islands islands to correctly to correctly identify identify the snakethe speciessnake that species they see that in they nature. see (ain): nature. architecture (a): architecture and network and connectivity network conn ofectivity the artificial of the intelligenceartificial softwareintelligence application. software (b): application. predicted output(b): predicted “Pseudalsophis output “Pseudalsoph Biserialis” afteris Biserialis” a user uploads after a auser snake uploads a snake photo into the artificial intelligence software application. photo into the artificial intelligence software application.

Author Contributions: Conceptualization, A.P., L.C., N.K., I.M., A.A., J.W.G.J.; methodology, A.P., L.C., N.K., AuthorI.M., Contributions: A.A.; software,Conceptualization, A.P., L.C., N.K.; validation, A.P., L.C., A.P., N.K., L.C., I.M., N.K.; A.A., formal J.W.G.J.; analysis, methodology, A.P., L.C., N.K., A.P., I.M., L.C., A.A.; N.K., I.M., A.A.;investigation, software, A.P., A.P., L.C., L.C., N.K., N.K.; I.M., A.A; validation, resources, A.P., A.P., L.C., L.C., N.K.; N.K., formal I.M., A.A. analysis, J.W.G.J.; A.P., data L.C.,curation, N.K., A.P., I.M., L.C., A.A.; investigation,N.K., I.M., A.P., A.A.; L.C., writing—original N.K., I.M., A.A; draf resources,t preparation, A.P., A.P., L.C., L.C., N.K., N.K., I.M., I.M., A.A. A.A.; J.W.G.J.; writing—review data curation, and editing, A.P., L.C., N.K.,A.P., I.M., L.C., A.A.; N.K., writing—original I.M., A.A., J.W.G.J.; draft visualization, preparation, A.P., A.P., L.C., L.C., N.K.; N.K., supervision, I.M., A.A.; I.M., writing—review A.A.; funding acquisition, and editing, A.P., L.C., N.K., I.M., A.A., J.W.G.J.; visualization, A.P., L.C., N.K.; supervision, I.M., A.A.; funding acquisition, J.W.G.J. All authors have read and agreed to the published version of the manuscript. J.W.G.J. All authors have read and agreed to the published version of the manuscript. Funding:Funding:This This research research received received no no external external funding. funding. Acknowledgments:Acknowledgments:Special Special thanks thanks to to Jose Jose VieiraVieira for for providing providing images images of Pseudalsophis of Pseudalsophis snakesnake species, species, and to Luis and to Luis Ortiz-CatedralOrtiz-Catedral for for advice advice on onthe theidentification identification of Ga oflápagos Galápagos racer snakes. racer snakes.Research Researchand photography and photography permits permits(PC-31- (PC-31-17, 17, PC-54-18, PC-54-18, and and DNB-CM-2016-0041-M-0001) DNB-CM-2016-0041-M-0001) were wereissued issued by the by Ministerio the Ministerio de Ambiente de Ambiente and the and the GalGalápagosápagos National National ParkPark Directorate.

ConflictsConflicts of Interest: of Interest:The The authors authors declare declare no no conflict conflict of of interest. interest.

References

1. Márquez, C.; Cisneros-Heredia, D.F.; Yánez-Muñoz, M. Pseudalsophis Biserialis. The IUCN Red List of Threatened Species, 2017. Available online: http://dx.doi.org/10.2305/IUCN.UK.2017- 2.RLTS.T190541A56253872.en (accessed on 1 March 2020). 2. Márquez, C.; Cisneros-Heredia, D.F.; Yánez-Muñoz, M. Pseudalsophis Dorsalis. The IUCN Red List of Threatened Species, 2016. Available online: http://dx.doi.org/10.2305/IUCN.UK.2016- 1.RLTS.T190538A56261413.en (accessed on 1 March 2020). 3. Cisneros-Heredia, D.F.; Márquez, C. Pseudalsophis Hoodensis. The IUCN Red List of Threatened Species, 2017. Available online: http://dx.doi.org/10.2305/IUCN.UK.2017-2.RLTS.T190539A54447664.en (accessed on 1 March 2020). 4. Márquez, C.; Yánez-Muñoz, M. Pseudalsophis Occidentalis. The IUCN Red List of Threatened Species, 2016. Available online: http://dx.doi.org/10.2305/IUCN.UK.2016-1.RLTS.T190540A56267613.en (accessed on 1 March 2020). 5. Arteaga, A.; Bustamante, L.; Vieira, J.; Tapia, W.; Guayasamin, JM. Reptiles of the Galápagos; Tropical Herping: Quito, Ecuador, 2019; p. 208.

Animals 2020, 10, 806 15 of 16

References

1. Márquez, C.; Cisneros-Heredia, D.F.; Yánez-Muñoz, M. Pseudalsophis Biserialis. The IUCN Red List of Threatened Species, 2017. Available online: http://dx.doi.org/10.2305/IUCN.UK.2017-2.RLTS. T190541A56253872.en (accessed on 1 March 2020). 2. Márquez, C.; Cisneros-Heredia, D.F.; Yánez-Muñoz, M. Pseudalsophis Dorsalis. The IUCN Red List of Threatened Species, 2016. Available online: http://dx.doi.org/10.2305/IUCN.UK.2016-1.RLTS. T190538A56261413.en (accessed on 1 March 2020). 3. Cisneros-Heredia, D.F.; Márquez, C. Pseudalsophis Hoodensis. The IUCN Red List of Threatened Species, 2017. Available online: http://dx.doi.org/10.2305/IUCN.UK.2017-2.RLTS.T190539A54447664.en (accessed on 1 March 2020). 4. Márquez, C.; Yánez-Muñoz, M. Pseudalsophis Occidentalis. The IUCN Red List of Threatened Species, 2016. Available online: http://dx.doi.org/10.2305/IUCN.UK.2016-1.RLTS.T190540A56267613.en (accessed on 1 March 2020). 5. Arteaga, A.; Bustamante, L.; Vieira, J.; Tapia, W.; Guayasamin, J.M. Reptiles of the Galápagos; Tropical Herping: Quito, Ecuador, 2019; p. 208. 6. Zaher, H.; Yánez-Muñoz, M.H.; Rodrigues, M.T.; Graboski, R.; Machado, F.A.; Altamirano-Benavides, M.; Bonatto, S.L.; Grazziotin, F. Origin and hidden diversity within the poorly known Galápagos snake radiation (Serpentes: Dipsadidae). Syst. Biodivers. 2018, 16, 614–642. [CrossRef] 7. Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. 2012, 25, 84–90. [CrossRef] 8. LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [CrossRef] 9. Sergyan, S. Color content-based image classification. In Proceedings of the 5th Slovakian-Hungarian Joint Symposium on Applied Machine Intelligence and Informatics, Poprad, Slovakia, 25–26 January 2007; pp. 427–434. 10. Hernandez-Serna, A.; Jimenez, L. Automatic identification of species with neural networks. PeerJ 2014, 2. Available online: https://peerj.com/articles/563/ (accessed on 1 March 2020). 11. Amir, A.; Zahri, N.A.H.; Yaakob, N.; Ahmad, R.B. Image Classification for Snake Species Using Machine Learning Techniques. In Computational Intelligence in Information Systems; Phon-Amnuaisuk, S., Au, T.W., Omar, S., Eds.; CIIS 2016. Advances in Intelligent Systems and Computing; Springer: Cham, Switzerland, 2017; Volume 532, pp. 52–59. 12. Dangtongdee, K.D. Plant Identification Using Tensorflow; California Polytechnic State University: Luis San Obispo, CA, USA, 2018; Available online: https://digitalcommons.calpoly.edu/cpesp/264 (accessed on 1 March 2020). 13. Python: An All-in-One Web Crawler, Web Parser and Web Scrapping Library! Available online: https: //github.com/hardikvasa/webb (accessed on 1 March 2020). 14. Racer. Tropical Herping. Available online: https://www.tropicalherping.com/science/books/reptiles/reptiles_ galapagos/.html (accessed on 1 March 2020). 15. Goswami, S. A Deeper Look at How Faster R-CNN Works, 2018. Available online: https://medium.com/ @whatdhack/a-deeper-look-at-how-faster-rcnn-works-84081284e1cd (accessed on 1 March 2020). 16. Gao, H. Faster R-CNN Explained, 2017. Available online: https://medium.com/@smallfishbigsea/faster-r- cnn-explained-864d4fb7e3f8 (accessed on 1 March 2020). 17. Girshick, R. Fast R-CNN, 2015. Available online: https://arxiv.org/pdf/1504.08083.pdf (accessed on 1 March 2020). 18. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, 2016. Available online: https://arxiv.org/pdf/1506.01497.pdf (accessed on 1 March 2020). 19. Chen, X.; Gupta, A. An Implementation of Faster RCNN with Study for Region Sampling, 2017. Available online: https://arxiv.org/pdf/1702.02138.pdf (accessed on 31 January 2020). 20. Raj, B. A Simple Guide to the Versions of the Inception Network, 2018. Available online: https: //towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202 (accessed on 1 March 2020). Animals 2020, 10, 806 16 of 16

21. ResNet, AlexNet, VGGNet, Inception: Understanding Various Architectures of Convolutional Networks, 2018. Available online: https://cv-tricks.com/cnn/understand-resnet-alexnet-vgg-inception/ (accessed on 1 March 2020). 22. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions, 2014. Available online: https://arxiv.org/abs/1409.4842v1 (accessed on 1 March 2020). 23. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision, 2015. Available online: https://arxiv.org/abs/1512.00567 (accessed on 1 March 2020). 24. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition, 2015. Available online: https://arxiv.org/abs/1512.03385 (accessed on 1 March 2020). 25. Nigam, I. Exploring the Structure of Residual Neural Networks, in Mathematical Fundamentals for Robotics, 2016. Available online: http://www.cs.cmu.edu/~{}jia/16-811/16-811.html (accessed on 17 January 2019). 26. Jordan, J. An Overview of Object Detection: One-Stage Methods, 2018. Available online: https://www. jeremyjordan.me/connet-architectures/ (accessed on 17 January 2019). 27. Hollemans, M. MobileNet Version 2, 2018. Available online: https://machinethink.net/blog/mobilenet-v2/ (accessed on 1 March 2020). 28. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition, 2015. Available online: https://arxiv.org/pdf/1409.1556.pdf (accessed on 1 March 2020). 29. Murphy, J. An Overview of Convolutional Neural Network Architectures for Deep Learning, 2016. Available online: https://pdfs.semanticscholar.org/64db/333bb1b830f937b47d786921af4a6c2b3233.pdf (accessed on 1 March 2020). 30. Flickr. Available online: https://www.flickr.com/ (accessed on 17 January 2019). 31. Schramer, T. Galapagos Racer. HerpMapper, 2016. Available online: https://www.herpmapper.org/record/ 165402 (accessed on 17 January 2019). 32. Mejia, D. Galápagos: la culebra de Floreana y el centenario misterio de su ausencia en su isla. Mongabay, 2017. Available online: https://es.mongabay.com/2017/05/ecuador-galapagos-culebra-de-floreana-misterio/ (accessed on 17 January 2019). 33. Mangersnes, R. Galapagos Racer Snake. Nature Picture Library. Available online: https://www.naturepl. com/search?s=galapagos+snake (accessed on 10 April 2020). 34. Fang, W.; Ding, Y.; Zhang, F.; Sheng, V. DOG: A New Background Removal for Object Recognition from Images. Neurocomputing 2019, 361, 85–91. [CrossRef] 35. Fang, W.; Ding, Y.; Zhang, F.; Sheng, J. Gesture recognition based on CNN & DCGAN for calculation and text output. IEEE Access 2019, 7. Available online: https://ieeexplore.ieee.org/document/8653825 (accessed on 10 April 2020). 36. Jégou, S.; Drozdzal, M.; Vazquez, D.; Romero, A.; Bengio, Y. The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation, 2017. Available online: https://arxiv.org/abs/1611.09326 (accessed on 10 April 2020). 37. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection, 2016. Available online: https://arxiv.org/pdf/1506.02640.pdf (accessed on 1 March 2020).

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).