Identification of Saimaa individuals using transfer learning

Ekaterina Nepovinnykh12, Tuomas Eerola1, Heikki K¨alvi¨ainen2, and Gleb Radchenko2

1 Machine Vision and Pattern Recognition Laboratory, Department of Computational and Process Engineering, School of Engineering Science, Lappeenranta University of Technology, Lappeenranta, , [email protected] 2 School of Electrical Engineering and Computer Science, South Ural State University, Chelyabinsk, Russian Federation

Abstract. The conservation efforts of the endangered Saimaa ringed seal depend on the ability to reliably estimate the population size and to track individuals. Wildlife photo-identification has been successfully utilized in monitoring for various species. Traditionally, the collected im- ages have been analyzed by biologists. However, due to the rapid increase in the amount of image data, there is a demand for automated meth- ods. Ringed seals have pelage patterns that are unique to each seal en- abling the individual identification. In this work, two methods of Saimaa ringed seal identification based on transfer learning are proposed. The first method involves retraining of an existing convolutional neural net- work (CNN). The second method uses the CNN trained for image clas- sification to extract features which are then used to train a Support Vector Machine (SVM) classifier. Both approaches show over 90% iden- tification accuracy on challenging image data, the SVM based method being slightly better.

Keywords: biometrics, Saimaa ringed seals, convolutional neu- ral networks, transfer learning, identification, image segmentation

1 Introduction

The Saimaa ringed seal ( hispida saimensis) is a subspecies of ringed seal (Pusa hispida) living in Lake Saimaa in Finland (Fig. 1). At present, around 360 seals inhabit the lake, and on the average 60 to 85 pups are born annually. This small and fragmented population is threatened by various anthropogenic factors, especially by-catch and climate change [13]. Therefore the long-term and accurate assessment of the population is needed for conservation purposes. Successful conservation requires constant population monitoring which is not easy to do without invasive methods. Traditional population monitoring meth- ods include tagging that requires catching the animal and may cause stress to Fig. 1. Saimaa ringed seal. it, as well as may change its behavior or increase mortality. This makes non- invasive methods preferable for population monitoring. Wildlife Photo Identifi- cation (WPI) is a technology that allows to recognize individuals and to track the movement of animal populations over time. It is based on acquiring images of and further identifying individuals. Recently, camera trapping has been launched as a monitoring tool also for the Saimaa ringed seal [5, 12]. The Saimaa ringed seals have a distinctive fur pattern that is never repeated in different individuals and does not significantly change over the course of seal’s life [12]. This makes photo identification based on the fur pattern suitable for non-invasive monitoring. In this work, an automatic photo identification of the Saimaa ringed seals is considered. The proposed method first segments the seal from the background and then uses the fur pattern to identify the individual. The work continues stud- ies presented in [20] and [7] where the first steps towards automatic individual identification of the Saimaa ringed seal were taken. In this paper, new methods for both the segmentation phase and the identification phase are proposed by utilizing convolutional neural networks (CNNs) and transfer learning.

2 Related work

A computational approach to the wildlife photo identification is an emergent field that aims to apply formal methods to automate the process of animal biometric identification. There are many advantages over manual identification: traditional methods are time-consuming, highly dependent on the skills of a person who performs identification, and prone to various errors such as observer errors and biases [15]. Moreover, human observers often ignore classification uncertainty, and as such misclassification is often underestimated [9]. Computer methods avoid this problem by utilizing probabilistic methods and often report classification certainty along with other possible classification results. The main advantage of utilizing the animal biometrics system is that it allows researchers to rapidly collect and to robustly analyze the extensive amount of data which ultimately improves research about the seals and their monitoring. Several approaches for automatic image-based animal identification can be found in the literature. Methods have been developed, for example, for polar bears [3], newts [11], giraffes [10], salamanders [6], and snakes [1], All of these methods use image processing and pattern recognition techniques to identify individuals. Most of the studies limit the individual identification to a certain animal species or species groups. All the above methods were developed for one species only and as such are not generalizable to the Saimaa ringed seals. In [20], the first steps towards the auto- matic individual identification of the Saimaa ringed seals were taken. The paper proposes a segmentation method for the Saimaa ringed seals using unsupervised segmentation and texture based superpixel classification. Furthermore, a simple texture based approach for the ringed seal identification was evaluated. In [7], the segmentation method was further developed to decrease its computation time without sacrificing the performance. Moreover, a set of post-processing op- erations for segmented images was proposed to make the seals easier to identify. Two existing species independent individual identification methods were eval- uated to demonstrate the importance of the segmentation and post-processing operations. However, the identification performance of neither of the methods is good enough for most practical applications. There have been also research efforts towards creating a unified approach ap- plicable for identification purposes for several animal species. For example, in [8], the HotSpotter method to identify individual animals in a labeled database was presented. This algorithm is not species specific and has been applied to Grevy’s and plain zebras, giraffes, leopards, and lionfish. HotSpotter uses viewpoint in- variant descriptors and a scoring mechanism that emphasizes the most distinc- tiveness keypoints and descriptors. In [19], a species recognition algorithm based on sparse coding spatial pyramid matching (ScSPM) was proposed. It was shown that the proposed object recognition techniques can be successfully used to iden- tify animals on sequences of images captured using camera traps in nature. One of the problems with the species independent individual identification methods is that they do not provide an automatic method to detect the animals in images. Therefore, either a manual detection or development of a detection method for the studied animal is needed. Furthermore, typically higher identification per- formance can be obtained by tuning the identification method for one species only.

3 Proposed method

In this work two Saimaa ringed seal identification methods based on transfer learning are proposed. The goal of the both methods is, given the image of a Saimaa ringed seal, to output the best suitable individual identifier for the specimen. The both proposed identification algorithms consist of two steps. In the first step, the image is segmented. The segmentation result is an image of a seal without the background or overlapping objects. This is important since most of the image material is obtained using static camera traps. Therefore, the same seal is often captured with the same background increasing the risk that a supervised Fig. 2. General seal identification algorithm. identification algorithm learns to “identify” the background instead of the actual seal if the full image or the bounding box around the seal is used. This may further lead to a system that is not able to identify the seal in a new environment. The second step is the identification using transfer learning. The first pro- posed method for the identification is a CNN-based method. It involves retrain- ing a classification CNN by using image extraction layer from another, pre- existing convolutional neural network. After the initial experiments, it was con- cluded that training a CNN from the ground up is too computationally intensive given the constraint. Therefore, it was decided to use a pretrained general pur- pose CNN from [14] as the source of feature extraction layers. The second method for the identification is a Support Vector Machine (SVM) based method. For this method transfer learning is performed by using the above pretrained CNN for the feature extraction and SVM for the classification. The identification process is visualized in Fig. 2.

3.1 Segmentation Automatic segmentation of animals is often difficult due to the camouflage colors of animals, i.e., the coloration and patterns are similar to the visual background of the animal. Segmentation results, however, can have a significant impact on identification performance. Segmentation helps to reduce the overfitting by re- moving the irrelevant background from an image, allowing a standardized object rotation on different images, reducing the dataset bias by only presenting the ob- jects of interest to the training algorithm, and allowing improved color-correction by zeroing out all background colors and only focusing on object colors. In this work, the segmentation framework proposed in [20] is used. The seg- mentation pipeline contains the following steps: 1) unsupervised segmentation of an image to produce a set of superpixels, 2) feature extraction from each su- perpixel, 3) classification of the superpixels to the seal and background classes, 4) composing of the seal segments into one image, and 5) cropping the result- ing image to contain only the seal. Fig. 3 shows the segmentation process. For the unsupervised segmentation Multiscale Combinatorial Grouping (MCG) [4] is used. To classify the superpixels, SVM with the feature extraction layers of AlexNet [14] is used, instead of Local Phase Quantization (LPQ) [17] features utilized in [20].

Fig. 3. Segmentation algorithm: 1) unsupervised segmentation, 2) training, 3) classifi- cation.

3.2 Identification

This work compares two different ways of building an identification method with transfer learning. The result of the identification is an individual seal id. The CNN-based identification algorithm is shown in Fig. 4. The method utilizes the well known AlexNet architecture [14]. The classification layer of the original network is removed and replaced with a new classification layer. Instead of the 1000-way classification of the original network, the new classification layer has the number of classes equal to the number of seal individuals. The whole reconstructed neural network is then retrained with cropped and resized input images. The coefficients for training layers 1–7 are intentionally set to be low and for layer 8 high. This is done to reduce the impact of retraining on the feature extraction layers and to focus on the classification layer. The retrained CNN is then used to identify seals. The SVM-based identification algorithm is shown in Fig. 5. Similarly to the previous method, the AlexNet architecture is utilized. The classification layer is removed after which the output of the neural network is not class probabilities but a 4096-dimensional vector from the fully connected layer 7. This vector is used as a feature vector for the SVM classifier. The ’one against all’ strategy [2] is used in generalizing SVM to several classes where there is one binary learner for each class. Fig. 4. CNN-based identification method.

4 Experiments

4.1 Datasets The experiments were performed using two datasets of known seals. The first dataset consisted of four individuals with a large amount of images for each individual. The total number of images was 976 (244 images for each seal). The data was randomly divided into the training set and the test set so that in the test set contained 171 images and the training set 73 images per seal. Fig. 6 shows example images from the dataset. The second dataset consisted of 5585 images of 29 seals. The number of images per individual seal varied significantly from 20 to 860. The dataset was divided into the training and test sets with three different proportions: 30%, 50% or 70% of the images for training and the rest of the images for testing.

4.2 Method implementation In order to perform the transfer learning with the neural network, layers that have been trained to extract features from the image have to be selected, but not layers that use these features for classification. The classification layers of AlexNet start with the layer 23 (fully connected layer 8) which is the last fully- connected layer with 1000 outputs. While it is possible to extract the layers 1-22 and to use them as the basis for transfer learning, it is not practical to do so because the layers 21 and 22 are rectifier and dropout layers which are used just for training and do not hold any learned weights themselves. As such, the last Fig. 5. SVM-based identification method.

Fig. 6. Examples of seal images. feature extraction layer is 20 (fully connected layer 7). It is a fully-connected layer with 4096 outputs, meaning that the neural network up to this layer performs dimensionality reduction on input image going from a 227×227×3 image to a 4096-dimensional vector. The segmentation procedure was implemented using MATLAB. The thresh- old value of 0.25 was used to turn the ultrametric contour map obtained using the MCG method [4] into superpixels. The same feature extraction and classifier training procedures as in the SVM-based seal identification was used for the su- perpixel classification. This technique involves using the feature extraction layers of AlexNet and then training an SVM classifier using these features. In order to collect a training set superpixels were extracted from 100 images of different seals and manually labeled as either belonging to the seal or to the background. The CNN for the identification was implemented using MATLAB Neural Network Toolbox. The softmax layer and the classification layer were added. Softmax performs normalization of fully connected outputs to a sum of 1 while the classification layer selects a class with the maximum probability and assigns a class label. The following parameters were used during the neural network retraining: The learning rate factors were set low for the early layers of the network and high for the fully connected layer in order to focus training on the new classification layer instead of transfered weights. Stochastic gradient descent with momentum was used as a solver. The maximum number of learning epochs was 20. An initial learning rate of the entire network was set to 0.0001 to further reduce the learning rate of layers which were transferred from AlexNet.

4.3 Segmentation The proposed segmentation method was used as a preprocessing step before the actual identification. Since the main focus of this work is on identification and not on segmentation, only qualitative results are presented. Examples of the bad and good segmentation results with cropping are shown in Fig. 7.

Fig. 7. The example of bad and good segmentation results with cropping.

4.4 Identification The experiments were performed with the both methods of transfer learning. The both methods were tested in order to determine their ability to identify the Saimaa ringed seals and to measure the identification performance. The SVM classification was significantly faster to train. The overall identification performance for the first dataset with the CNN- based method was 82.9%, i.e., 82.9% of all testing images were correctly iden- tified. For the SVM based method the overall identification performance was 97.5% and there were no significant confusion between different classes. The confusion matrices are shown in Table 1. More detailed results can be found in [16].

Table 1. Confusion matrices for the CNN and SVM-based method.

Target class 1 2 3 4 1 118 (17.3%) 2 (0.3%) 3 (0.4%) 2 (0.3%) 94.4% 2 0 (0.0%) 120 (17.5%) 0 (0.0%) 4 (0.6%) 96.8% Output class (CNN) 3 12 (1.8%) 29 (4.2%) 167 (24.4%) 3 (0.4%) 79.1% 4 41 (6.0%) 20 (2.9%) 1 (0.1%) 162 (23.7%) 72.3% 69.0% 70.2% 97.7% 94.7% 82.9% 1 163 (23.8%) 2 (0.3%) 1 (0.1%) 0 (0.0%) 98.2% 2 2 (0.3%) 168 (24.6%) 0 (0.0%) 4 (0.6%) 96.6% Output class (SVM) 3 5 (0.7%) 0 (0.0%) 169 (24.7%) 0 (0.0%) 97.1% 4 1 (0.1%) 1 (0.1%) 1 (0.1%) 167 (24.4%) 98.2% 95.3% 98.2% 98.8% 97.7% 97.5%

With the second dataset, the CNN-based method obtained identification ac- curacy of 90.5% and the SVM-based method accuracy of 91.2%. These numbers were obtained by using 70% of the dataset for training and the rest 30% of the dataset for testing. Table 2 shows the results for the different ratios of the training and test set sizes.

Table 2. The proportions in the second dataset

Training/testing set ratio CNN SVM 30/70% 83.8% 87.3% 50/50% 88.8% 89.5% 70/30% 90.5% 91.2%

While the total accuracy is an important metric, it is not the only way to as- sess the performance of the identification system. A useful animal identification system should not just present a single identification candidate, but also give researchers a potential choice in ambiguous cases. This is achieved by presenting not just a single top pick of the specimen identifier for each image, but several individuals with the largest posterior probabilities. As such, the rank-based Cu- mulative Match Score (CMS) was used to assess the identification performance. CMS is commonly used in the face recognition research [18]. It measures how well the identification system ranks the identities in the database with respect to the input image. The Nth bin in the CMS histograms tells the percentage of test images where the correct individual seal was in the set of the N best matches proposed by the identification algorithm. Figure 8 presents CMS histograms for the both proposed identification meth- ods. CMS for rank 1 is the same as the total accuracy mentioned earlier. It can be seen that the SVM-based method gives better results for all ranks. For ex- ample, with SVM-based methods in 99% of the cases the correct seal was within the 5 best matches while CNN-based methods the correct seal was within the 5 best matches in 98% of the cases.

1 1

0.98 0.98

0.96 0.96

0.94 0.94

0.92 0.92

0.9 0.9

0.88 0.88

0.86 0.86 Cumulative match score Cumulative match score 0.84 0.84

0.82 0.82

0.8 0.8 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Rank Rank (a) (b)

Fig. 8. CMS histograms for CNN-based and SVM-based identification methods on the second dataset.

It can be inferred that the classes with more samples have generally higher accuracy and lower relative number of errors. There are 6 seals in the second dataset with more than 300 images. For each of these individuals the identi- fication accuracy was 88.6% or higher with CNN-based method and 92.1% or higher with SVM-based method. With certain individuals with a low number of training images the accuracy was considerably lower. With the CNN-based method the the identification accuracy varied from 27.6% to 100.0% with 48% individuals having the identification accuracy of 90% or higher. With the SVM- based method the identification accuracy varied from 51.7% to 100.0% with 41% individuals having the identification accuracy of 90% or higher. More detailed results can be found in [16]. The obtained results are clearly better than the results reported in [20] where only 10% of the seals were correctly identified and in [7] where 44% of seals were correctly identified. However, it should be noted that, in the earlier studies, dif- ferent datasets were used, and therefore, the results are not directly comparable. The datasets used in the preliminary studies were not suitable for this study as they contained small number of images for each individual seals, making it impossible to train deep CNNs. The both methods produce very similar results, sharing similar characteris- tics. In the both cases the least accurately identified individual was the same and the percentage of errors with the individuals with a large number of images followed a similar distribution with the exception that the SVM-based identifica- tion method was generally more accurate. The both methods essentially use the same CNN-based feature extraction method and the same dataset which leads to the conclusion that these patterns depend on the feature extraction steps, not on the selection of a classification method.

5 Conclusions

In this paper two methods to identify the Saimaa ringed seal individuals based on the pelage patterns were proposed. The both methods utilize transfer learn- ing. They start with the segmentation step where the seal is extracted from the background. This is done by dividing the image into superpixels and by classify- ing the superpixels using CNN-based features. The identification step utilizes the well-known AlexNet CNN architecture. The first method is based on retraining the original AlexNet with a new classification layer. The second method uses AlexNet only for the feature extraction and SVM for the classification. The experiments show that both methods provide a good performance on the iden- tification with the challenging image data and the transfer learning contains a great potential on animal biometrics. The SVM-based method produced slightly higher identification accuracy with 91.2% of individuals correctly identified.

Acknowledgements

The authors would like to thank Meeri Koivuniemi, Miina Auttila, Riikka Lev¨a- nen, Marja Niemi, and Mervi Kunnasranta from Department of Environmen- tal and Biological Sciences at University of Eastern Finland for providing the database for the experiments and expert knowledge for identifying the individ- uals.

References

1. Albu, A.B., Wiebe, G., Govindarajulu, P., Engelstoft, C., Ovatska, K.: Towards automatic modelbased identification of individual sharp-tailed snakes from nat- ural body markings. In: Proceedings of ICPR Workshop on Animal and Insect Behaviour, Tampa, FL, USA (2008) 2. Allwein, E.L., Schapire, R.E., Singer, Y.: Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research 1(Dec), 113– 141 (2000) 3. Anderson, C.: Individual identification of polar bears by whisker spot patterns. Ph.D. thesis, University of Central Florida, Orlando, Florida (2007) 4. Arbel´aez,P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale com- binatorial grouping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 328–335 (2014) 5. Auttila, M., Niemi, M., Skrzypczak, T., Viljanen, M., Kunnasranta, M.: Estimating and mitigating perinatal mortality in the endangered Saimaa ringed seal (phoca hispida saimensis) in a changing climate. Annales Zoologici Fennici 51(6), 526–534 (2014) 6. Bendik, N.F., Morrison, T.A., Gluesenkamp, A.G., Sanders, M.S., O’Donnell, L.J.: Computer-assisted photo identification outperforms visible implant elastomers in an endangered salamander, Eurycea tonkawae. PLoS One 8(3), e59424 (2013) 7. Chehrsimin, T., Eerola, T., Koivuniemi, M., Auttila, M., Lev¨anen,R., Niemi, M., Kunnasranta, M., K¨alvi¨ainen,H.: Automatic individual identification of saimaa ringed seals. IET Computer Vision 12(2), 146–152 (2018) 8. Crall, J., Stewart, C., Berger-Wolf, T., Rubenstein, D., Sundaresan, S.: Hotspot- ter - patterned species instance recognition. IEEE Workshop on Applications of Computer Vision (WACV) pp. 230–237 (2013) 9. Guschanski, K., Vigilant, L., McNeilage, A., Gray, M., Kagoda, E., Robbins, M.M.: Counting elusive animals: comparing field and genetic census of the entire moun- tain gorilla population of bwindi impenetrable national park, uganda. Biological Conservation 142(2), 290–300 (2009) 10. Halloran, K.M., Murdoch, J.D., Becker, M.S.: Applying computer-aided photo- identification to messy datasets: a case study of Thornicroft’s giraffe (Giraffa camelopardalis thornicrofti). African Journal of Ecology 53(2), 147–155 (2014) 11. Hoque, S., Azhar, M., Deravi, F.: Zoometrics-biometric identification of wildlife using natural body marks. International Journal of Bio-Science and Bio-Technology 3(3), 45–53 (2011) 12. Koivuniemi, M., Auttila, M., Niemi, M., Lev¨anen,R., Kunnasranta, M.: Photo-id as a tool for studying and monitoring the endangered Saimaa ringed seal. Endan- gered Species Research 30, 29–36 (2016) 13. Kovacs, K.M., Aguilar, A., Aurioles, D., Burkanov, V., Campagna, C., Gales, N., Gelatt, T., Goldsworthy, S.D., Goodman, S.J., Hofmeyr, G.J.G., H¨ark¨onen,T., Lowry, L., Lydersen, C., Schipper, J., Sipil¨a,T., Southwell, C., Stuart, S., Thomp- son, D., Trillmich, F.: Global threats to . Marine Science 28(2), 414–436 (2012) 14. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con- volutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, (NIPS). pp. 1097–1105 (2012) 15. K¨uhl,H.S., Burghardt, T.: Animal biometrics: quantifying and detecting pheno- typic appearance. Trends in ecology & evolution 28(7), 432–441 (2013) 16. Nepovinnykh, E.: Saimaa ringed seal fur pattern extraction for identification pur- poses. Master’s thesis, Lappeenranta University of Technology, Finland (2017) 17. Ojansivu, V., Heikkil¨a,J.: Blur insensitive texture classification using local phase quantization. In: 3rd International Conference on Image and Signal Processing. pp. 236–243 (2008) 18. Phillips, P.J., Moon, H., Rizvi, S.A., Rauss, P.J.: The FERET evaluation method- ology for face recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(10), 1090–1104 (2000) 19. Yu, X., Wang, J., Kays, R., Jansen, P., Wang, T., Huang, T.: Automated identifi- cation of animal species in camera trap images. EURASIP Journal on Image and Video Processing 2013(1), 52 (2013) 20. Zhelezniakov, A., Eerola, T., Koivuniemi, M., Auttila, M., Lev¨anen,R., Niemi, M., Kunnasranta, M., K¨alvi¨ainen, H.: Segmentation of saimaa ringed seals for identifi- cation purposes. In: Proceedings of International Symposium on Visual Computing. pp. 227–236. Las Vegas, USA (2015)