Environ Monit Assess (2019) 191:406 https://doi.org/10.1007/s10661-019-7518-9

EventFinder: a program for screening remotely captured

Michael Janzen · Ashley Ritter · Philip D. Walker · Darcy R. Visscher

Received: 22 August 2018 / Accepted: 3 May 2019 © Springer Nature Switzerland AG 2019

Abstract traps are becoming ubiquitous in conjunction with other software developments for tools for ecologists. While easily deployed, they managing data. require human time to organize, review, and classify images including sequences of images of the same Keywords Background subtraction · Camera traps · individual, and non-target images triggered by envi- Computer classification · Data reduction · ronmental conditions. For such cases, we developed processing an automated computer program, named EventFinder, to reduce operator time by pre-processing and clas- sifying images using background subtraction tech- Introduction niques and histogram comparisons. We tested the accuracy of the program against images previ- Remote are increasingly important tools for ously classified by a human operator. The automated ecologists from site-specific studies to global monitor- classification, on average, reduced the data requiring ing initiatives (Meek et al. 2014; Burton et al. 2015; human input by 90.8% with an accuracy of 96.1%, Steenweg et al. 2017). While they collect field data and produced a false positive rate of only 3.4%. Thus, with minimal human attention, the resulting images EventFinder provides an efficient method for reducing require organization, post-processing, and character- the time for human operators to review and classify ization before analysis, requiring input of time and images making camera trap projects, which compile money. Harris et al. (2010) highlighted two general a large number of images, less costly to process. Our issues facing camera-aided studies, namely the lack testing process used medium to large animals, but of systematic organization and problems involving the will also work with smaller animals, provided their volume of imagery produced. images occupy a sufficient area of the frame. While While a number of applications exist to organize, our discussion focuses on camera trap image reduc- manage, and facilitate characterization of camera trap tion, we also discuss how EventFinder might be used images (Fegraus et al. 2011; Krishnappa and Turner 2014; Bubnicki et al. 2016; Niedballa et al. 2016), fewer options exist for pre-processing images to reduce the need for human post-processing. The need M. Janzen () · A. Ritter · P. D. Walker · D. R. Visscher The King’s University, Edmonton, Alberta, Canada for pre-processing is due, in part, to a high incidence e-mail: [email protected] of non-target images produced by the camera or long D. R. Visscher sequences of the same animal. These images create an e-mail: [email protected] “analytical bottleneck” during post-processing (Harris 406 Page 2 of 10 Environ Monit Assess (2019) 191:406 et al. 2010; Spampinato et al. 2015) and less human the image it determined most relevant for human involvement in pre-processing images may help alle- processing. EventFinder functions as a collection of viate it. In particular, the ability to remove “noisy” smaller programs and modules, each with a focused images resulting from background movement and the task. The program overview is shown in Fig. 1 with ability to identify a single image from a sequence for programs and the information passed to the next pro- classification will greatly reduce processing time. gram depicted. Segmenting tasks helps to focus each A primary technique to discriminate a “noisy” programs’ inputs and output, but the process is auto- image from an event (animal) is through background matic, so the user needs only to start the program subtraction (Piccardi 2004). This technique helps dif- without concern of which module to start next. In ferentiate images containing background noise from this respect, the user selects the input images to pro- images containing events of interest, and is imple- cess, optionally specifies parameters, and EventFinder mented in video imaging studies (Desell et al. 2013; copies the relevant processed images to a direc- Goehner et al. 2015; Swinnen et al. 2014; Weinstein tory for human classification and identification. We 2015). Camera trap images, as opposed to video, are describe our usage of EventFinder with default set- separated by longer periods of time making back- tings, but users may set their own parameters and ground subtraction more challenging. We developed modify XML instruction files to make customizations a stand-alone computer program to aid in image pro- to EventFinder, if they consider a different sequence cessing, categorization, and data reduction using back- of operations provides better image classification. ground subtraction to identify candidate foreground As labeled in Fig. 1, in our study, the Picture Pro- areas and color histogram comparisons to determine cessor program produced sets of images based on foreground areas from background. The program pre- temporal proximity, separating sequences when more processes remotely captured images into relevant and than 60 s had elapsed between sequential images. This irrelevant sets for human consideration, identifying part of the program required the image file names to unique events. In this paper, we describe the develop- link to the images and the time difference between ment and procedure of EventFinder, and provide an image sets as input. EventFinder extracts the time the assessment of EventFinder based on a known dataset picture was taken from the file metadata. Users select characterized completely by a human. files using a GUI (shown in Appendix 1). The output is a batch file that repeatedly launches Automate Finder, once for each image set. Automate Finder loads a Methods default XML file specifying a series of image opera- tions, which are passed to Find Animal.IfFind Animal Study area is run in user mode, rather than the regular automated mode, the user can press buttons to initiate image- From June 2015 through June 2016, we maintained processing operations (shown in Appendix 1). User nine cameras (Reconyx PC500) throughout the Cook- mode is intended to test different image-processing ing Lake—Blackfoot Provincial Recreation Area, sequences to include in the automated mode rather located 45 km east of Edmonton, Alberta. Cameras than use for actual batch image processing since the were located along trails frequented by animals at user must be present to initiate each operation in this approximately 1.25 m from the ground and set to mode. User mode lets a user graphically test sequences collect a burst of three images when triggered. Dur- of operations by clicking buttons rather than requiring ing the time they were active, the individual cameras users to enter instructions in an XML file. collected, on average 3514 (SD = 3214) images each. Our default image-processing operations for Find Animal were as follows. To reduce noise and improve Program description processing speed, Find Animal down-sampled each image by averaging 16 blocks. The result was EventFinder pre-processed camera trap images and, a mapping from 4 × 4 blocks of to one pixel for each sequence of images, determined if the with less noise. From the images in a sequence, Find sequence contained a target animal or noise. For Animal generated a background image and identified sequences with a target animal, EventFinder marked foreground regions. With video frames, a background Environ Monit Assess (2019) 191:406 Page 3 of 10 406

Fig. 1 The flow, inputs, and relationship between components of EventFinder. Input parameters and operations are shown in ellipses while program modules are shown in rectangles. The user specifies inputs into Picture Processor and the program calls Automate Finder and Find Animal to determine which image for humans to examine. Providing the generated CSV file to Picture Mover moves retained images to a separate directory for human processing image can be generated with temporal averaging or initially used as the threshold to classify foreground other techniques for moving backgrounds (Power and and background pixels to give an initial approxima- Schoonees 2002; Van Droogenbroeck and Barnich tion of foreground and background pixels. At each 2014); however, camera images of an event consisted pixel location, two sets of pixels were created, the set of as few as three images causing ghosting, where of foreground pixels and the set of background pix- foreground objects are included in the background as els. Considering each input image, the pixel at the faint regions. Thus, when using camera traps instead location was either included or excluded, forming a of video, only pixels that are actually background binary label. Connected foreground pixels with the should be included. EventFinder determines which same binary labels were grouped into regions, sim- pixels to include by comparing the pixel in question ilar to the approach mentioned in McIvor (2000), with the average value from all pixels at that location, where each region corresponds to a moving object or averaging a location across images in the sequence. noise. The largest region should most frequently cor- To create a suitable background image, the Find respond to the largest moving animal in the image, Animal program computed the standard deviation in while other regions correspond to additional animals color for each input pixel location across input image or noise. EventFinder increased the threshold setting (an overview of background creation is shown in pseu- until the largest region decreased in pixel count by docode in Fig. 2). The average standard deviation was 10%. This value enabled removal of smaller regions

Fig. 2 Pseudocode highlighting the steps used to create a background image 406 Page 4 of 10 Environ Monit Assess (2019) 191:406 corresponding to noise, but left the majority of pix- The logic is as follows, for some pixel locations, all els for regions corresponding to animals since they pixels are within the threshold regardless from which exceeded the threshold by a larger amount than the image in the set they originate. These pixels are con- noise. sidered definite background and correspond to areas In some situations, as happens when animals over- with no animal at that location in any input image. lap in multiple images, the majority of pixels at a Other pixel locations have some pixels within the location correspond to animal rather than background. threshold and other pixels exceeding the threshold This happens when an animal lingers at a location representing background and foreground respectively. and results in a background image with a partial ani- When an animal lingers in a location, the corre- mal, as shown in Fig. 3. The input images are shown sponding pixels containing the animal are incorrectly in Appendix 2. Find Animal corrects these region considered background. Thus, for each potential fore- misidentifications by checking each potential fore- ground region, Find Animal considered the two possi- ground region against known background locations. ble background regions: one constructed from pixels with deviations within the threshold and the fragment constructed from the remaining pixels with deviations greater than the threshold. Find Animal compared both region color histograms (histograms of RGB values) to nearby areas of definitive background and retained the region with less difference. Consequently, Find Animal kept the region with fewer pixels as back- ground when its color closely matched nearby known background. For example, in a set of five images containing a slow moving ungulate, the ungulate may overlap in three images. This area of overlap constitutes the majority of pixels coming from three of five images. Nearby regions contain true background regions con- sisting of pixels coming from all five images where the animal did not pass. The color histogram of the pixels coming from the remaining two images more closely match true background regions, and consequently, the background region where the animal slowly crossed is made from pixels coming from the two images con- taining the background, rather than the three images containing the ungulate. After constructing a proper background, Find Ani- mal performed background subtraction identifying foreground fragments and thresholded the result to a binary image; an example is shown in Appendix 2.In Fig. 3 Two methods of background creation made from an the binary image, each white area of pixels constitutes input image sequence of three input images with temporally a mask corresponding to pixels containing animal in overlapping regions of animal. The first method a uses stan- the original image. Small foreground fragments, 30 dard deviation to identify the background but retains parts of the animal due to overlap in the majority of the input images. pixels or less, were considered noise and ignored. In In the second method, b EventFinder considers nearby known daytime images, fragments consisting mostly of green background areas to select the minority of pixels at locations were removed since they likely corresponded to mov- representing the actual background, resulting in a truer back- ing vegetation. Images where the majority of the pix- ground image. The remaining animal fragment seen in the right of the image was present in all input images. Input images and els were deemed to be the result of background move- masks are shown in Appendix 2 ment were removed since these images typically also Environ Monit Assess (2019) 191:406 Page 5 of 10 406 resulted from wind in the image. Image-processing between images used to determine unique event sets operations of dilations and corresponding erosions based on a sequence of images containing the same filled in small holes in fragments and merged nearby individual. fragments similar to McIvor (2000) (see Gonzalez and Woods 2007 for image-processing operations). Some fragments corresponding to a single animal Results remained separated, such as when a foreground ele- ment in the input visually divided the animal, such as We found that EventFinder removed on average 90.8% when an animal passed behind a tree. In such cases, (SD = 4.5%) of images with an accuracy of 96.1% EventFinder ranked fragments and logically merged (SD = 2.8%) of true events (Table 1). Images retained nearby fragments if the color histograms matched. by the program as events of interest, but not clas- Image sets with no large remaining fragments were sified as such by the human operator (false positive removed as noise. In other sets, the image with the rate), was 3.4% (SD = 3.0%), showing EventFinder largest foreground fragment was marked for human reduces the human time spent on false events. Indeed, analysis, provided the fragment did not touch an retained images contained approximately two-thirds edge. Thus, for each input image set, at most one unique and independent events. image was retained for human classification. Addi- The sensitivity analysis on the trade-off between tionally, EventFinder computed the average directions accuracy and the proportion of images removed sug- of the center of mass of the largest fragment, indicat- gests that EventFinder is robust with respect to the ing whether the animal is moving left or right, and user-specified length of time that passes between sub- the number of large fragments per image, giving an sequent images before an image is considered part of estimate of number of animals. a new set. (Fig. 4). The simultaneous high removal rate and retention of true event images ensures Program testing EventFinder correctly identifies the majority of events while greatly reducing post-processing time required We tested program efficacy versus human classifica- by humans. tion by comparing program-retained images to those where humans inspected each image, where a positive event is an animal in the image. While the charac- Discussion terization of images by humans can result in errors (Harris et al. 2010), we consider the human char- Numerous recent ecological studies employing cam- acterization as “truth.” We considered the program era traps as non-invasive methods for collecting data to have identified the same event if it retained an emphasize their utility in answering ecological ques- image from the same set as the human. Following tions (reviewed in Burton et al. 2015), yet camera trap pre-processing, the events identified by the program usage has issues. Harris et al. (2010) highlighted four were compared to truth using a confusion matrix of problems with the proliferation of images in camera binary outcomes (Landis and Koch 1977). We con- trap studies, namely (1) slow image classification, (2) sider three chief metrics for our program. Firstly, the tedious data entry leading to error, (3) struggling to removal rate is the proportion of images the program keep pace with new data acquisition, and (4) inconsis- pre-processed but did not require human analysis, tent filing and naming conventions. With EventFinder representing reduced time and cost to researchers. and image pre-processing, we reduce the concerns Secondly, we defined the accuracy of the program to surrounding the first three problems. The approxi- be the sum of true positive events and true negative mately 90% reduction in human workload enables events divided by the total number of images. Thirdly, researchers to classify only the images needed, reduc- the false positive rate indicates the images that the ing the monotony of data entry by providing humans program assesses as needing human attention but do with manageable collections of “target” (event) rich not contain a true event. We investigated the sensi- images, and keep pace with incoming data. This is tivity of our efficacy measures to the elapsed time particularly true in situations where cameras often 406 Page 6 of 10 Environ Monit Assess (2019) 191:406

Table 1 Classification performance of EventFinder with respect to the number of events characterized by a human for nine camera locations

Site Total Human Computer True False False True Removal Accuracy False positive images events events positive positive negative negative rate rate

1 1213 79 96 70 17 9 1117 0.921 0.979 0.015 2 10118 658 982 560 324 98 9136 0.903 0.958 0.034 3 3595 17 389 17 372 0 3206 0.892 0.897 0.104 4 2867 325 419 295 94 30 2448 0.854 0.957 0.037 5 7170 979 1201 879 222 100 5969 0.832 0.955 0.036 6 1491 65 93 48 28 17 1398 0.938 0.970 0.020 7 1174 25 28 18 3 7 1146 0.976 0.991 0.003 8 298 13 16 13 3 0 282 0.946 0.990 0.011 9 3699 167 320 150 153 17 3379 0.913 0.954 0.043 Average 0.908 0.961 0.034 Standard deviation 0.045 0.028 0.030

Results are for when the time delta between temporally adjacent sequences of images is set to 60 s. Accuracy is calculated as (true positive + true negative)/total images

100

90 Percent

80

70

15 30 60 120 Time interval (sec)

removed accuracy

Fig. 4 The percentage of images removed (gray) and accuracy (white) of EventFinder across a range of values used as the time delta beyond which temporally adjacent images are considered a new sequence containing a new event Environ Monit Assess (2019) 191:406 Page 7 of 10 406 capture non-target “noisy” images due to background robust principle component analysis, a cascade object movement, or when images form large sequences detector, and a support vector machine to detect snow of the same individual “loitering.” We suspect these leopards based on their spots on their coats with an situations are common in ecological research. accuracy of 93.74%, but requires training. Compared Our work complements existing research in cam- with previous work, EventFinder’s advantages include era trap processing. Other computer programs to being more general since it is not species specific, pre-process camera trap images identify individuals testing its performance against a human’s classifica- or species, but may require human supervision for tion with an accuracy of ∼ 96%, and EventFinder training or segmentation (Yu et al. 2013; Agnieszka does not require training for pre-processing camera et al. 2017; Yousif et al. 2017). Some camera trap trap images. Consequently, it is well suited as a gen- programs were developed for video, increasing the eral tool for use in camera trap studies where custom potential for a good background image for the iden- tools have not been created or trained. EventFinder is tification of foreground objects of interest; however, designed to work with camera trap images as opposed the benefits of using video come at cost in the field to video for which other systems are designed (Desell including monitoring and frequent battery or mem- et al. 2013; Goehner et al. 2015; Weinstein 2015). For ory card replacement (Swinnen et al. 2014; Weinstein more discussion comparing our approach to some of 2015). Muliple papers consider background subtrac- these video processing systems, see our previous work tion using video as input, but our system differs in (Janzen et al. 2017). that it is designed to consider input sequences with The trade-off between false positives and true pos- as few as three images (Van Droogenbroeck and Bar- itive events is a concern whenever dichotomous clas- nich 2014; Hofmann et al. 2012;McIvor2000;Power sification occurs (Landis and Koch 1977). We have and Schoonees 2002). Consequently, the approaches shown EventFinder to be robust to this trade-off. listed in processing designed for video input are not Prior work shows roughly equal performance in day directly suitable for camera trap images, such as the or night conditions (Janzen et al. 2017). Depending need to determine a changing background over time on the study in question, researchers are encouraged (Van Droogenbroeck and Barnich 2014; Hofmann to determine if this trade-off is suitable. We envi- et al. 2012). Power mentions that with fewer than sion situations, involving rare species, for instance, two background modes then it is easier to use simple where researchers may be more willing to accom- subtraction of an averaged background image (Power modate false positives to avoid missing rarely occur- and Schoonees 2002). Since this is a near ubiquitous ring true events. While EventFinder does not com- occurrence in our input, our approach builds on sim- pletely remove the human operator, it reduces the time ple subtraction but addresses the issues of few images, required to classify image data for further analysis. If and that the majority of pixels in a region may come we assume, a trained operator classifies 500 images from a foreground animal. per hour (approximately the rate of our human oper- Other existing methods for classification use a ator) then the ∼ 32,000 images used in this study trained neural network to identify individuals (Yu et al. required 63 h of work, while after using EventFinder 2013; Yousif et al. 2017). The system from Yousif only ∼ 7 h are required for ∼ 96% accuracy and et al. (2017) constructs a background and trained a “wasted” time spent on false positives is ∼ 2.5 h. deep convolutional neural network on 30,000 image While a camera set up differs between studies, best patches of human, animal, and background to achieve practices can help researchers maximize data collec- a classification accuracy as high as 95.6%. Agnieszka tion and minimize post-collection processing (Meek et al. (2016), similar to our work, process camera trap et al. 2014). Researchers should be aware of the back- images into sets based on time, creates a background ground, which may form part of the camera’s detec- image, and segments areas containing motion into tion zone causing “noisy” images containing moving regions; however, their paper does not compare their vegetation rather than animals. While EventFinder system to results from human processing, and as such attempts to identify and remove noise from such it is difficult to assess the accuracy of their system. An images, it cannot completely compensate for poor alternative system from Agnieszka et al. (2017)usesa camera placement. The poor performance (high false 406 Page 8 of 10 Environ Monit Assess (2019) 191:406 positive rate) for one of our sites (site 3; Table 1) unlock the potential of camera traps as an ecological was primarily due to sub-optimal camera placement tool and to more efficiently answer crucial ecological where the camera was located in a shallow ditch with questions. abundant long grass. Camera settings allowing for bursts of images taken with each trigger provide a useful opportunity to cap- Acknowledgements We thank Alberta Conservation Associ- ation, Alberta Parks, and The King’s University for funding ture multiple images over which a background image and The King’s Centre for Visualization in Science for com- is created. They can, however, present the researcher putational support. Special thanks to K. Visser who worked with long sequences of temporally adjacent images on early versions of this program for an undergraduate project that do not represent unique events. This presents a and to D. Vujnovic for useful discussion on remote camera data. Alberta Parks staff provided logistic support in the Cook- time cost on humans who inspect each image to deter- ing Lake—Blackfoot Provincial Recreation Area, as well as we mine if an event occurs when only a single image from thank anonymous reviewers for feedback used to improve this within the sequence need be classified. EventFinder paper. uses the burst of images to help delineate background noise from foreground animals, and by retaining a sin- gle image the program reduces human classification Appendix: 1 of redundant images. EventFinder was developed as a stand-alone pro- Software availability: gram to pre-process images but could be incorpo- http://cs.kingsu.ca/∼mjanzen/CameraTrapSoftware. rated into a workflow involving existing database, html naming, and file management programs (Fegraus et al. Example instructions for running the program on a 2011; Krishnappa and Turner 2014; Bubnicki et al. Windows computer are also available from this web- 2016; Niedballa et al. 2016). Currently, the classifi- site. Screen captures for starting the program and user cation of output depends on additional software to mode control are shown in Figs. 5 and 6 respectively. tag the image with pertinent metadata and summarize data into forms for statistical analysis; this software is camera-specific (e.g., Reconyx MapView Profes- sional, Reconyx, Holmen, WI, USA) or based on existing open source options (e.g., TRAPPER Bub- nicki et al. 2016 or camtrapR, Niedballa et al. 2016). An additional future development for EventFinder is the automated identification of animal species found in the image and the ability to differentiate the number of individuals contained within an image. Currently, EventFinder correctly identifies events containing a single individual 74.4% (SD = 11.4%) of the time versus multiple individuals (M. Janzen, unpublished data). In conclusion, we have shown that EventFinder, an automated program for pre-processing camera trap images, aides in reducing the analytic bottleneck identified by Harris et al. (2010) and Spampinato et al. (2015) that results from the volumes of images recorded by camera trap studies. The ∼ 90% reduc- tion in classification time by humans following pre- Fig. 5 Input for an operator starting the program. The operator possessing, with high accuracy (∼ 96%) and low selects the images to process and can start the program with loss of true events (∼ 4%), allows researchers to default settings or modify the parameters Environ Monit Assess (2019) 191:406 Page 9 of 10 406

Fig. 6 Control panel for experimenting with user mode if the operator wants to deviate from the default options

Appendix: 2

Input image examples and resulting masks are shown in Fig. 7.

Fig. 7 Three input images with an overlapping animal (a, b, c). The resulting masks from image (b) without color histogram comparison (d) and using color histogram comparison (e). The generated backgrounds are shown in Fig. 3 406 Page 10 of 10 Environ Monit Assess (2019) 191:406

References McIvor, A.M. (2000). Background subtraction techniques. In Proceedings of image and vision computing conference New Zealand. Agnieszka, M., Beery, S., Flores, E., Klemesrud, L., Bayrakci- Meek, P., Ballard, G., Claridge, A., Kays, R., Moseby, K., smith, R. (2016). Finding areas of motion in camera trap O’Brien, T., O’Connell, A., Sanderson, J., Swann, D., images. In 2016 IEEE international conference on image Tobler, M., et al. (2014). Recommended guiding principles processing (ICIP) (pp. 1334–1338). for reporting on camera trapping research. Biodiversity and Agnieszka, M., Beard, J.S., Bales-Heisterkamp, C., Bayrakci- Conservation, 23, 2321–2343. smith, R. (2017). Sorting camera trap images. In 2017 IEEE global conference on signal and information processing Niedballa, J., Sollmann, R., Courtiol, A., Wilting, A. (2016). (GlobalSIP) (pp. 249–253): IEEE. camtrapR: an R package for efficient camera trap data Bubnicki, J.W., Churski, M., Kuijper, D.P. (2016). TRAPPER: management. Methods in Ecology and Evolution, 7, 1457– an open source web-based application to manage camera 1462. trapping projects. Methods in Ecology and Evolution, 7(10), Piccardi, M. (2004). Background subtraction techniques: a 1209–1216. review. In 2004 IEEE international conference on systems, Burton, A.C., Neilson, E., Moreira, D., Ladle, A., Steenweg, man and cybernetics, (Vol. 4 pp. 3099–3104): IEEE. R., Fisher, J.T., Bayne, E., Boutin, S. (2015). REVIEW: Power, P.W., & Schoonees, J.A. (2002). Understanding back- Wildlife camera trapping: a review and recommendations ground mixture models for foreground segmentation. In for linking surveys to ecological processes. Journal of Proceedings of image and vision computing conference Applied Ecology, 52, 675–685. New Zealand (pp. 267–271). Desell, T., Bergman, R., Goehner, K., Marsh, R., VanderClute, Spampinato, C., Farinella, G.M., Boom, B., Mezaris, V., Betke, R., Ellis-Felege, S. (2013). Wildlife@home: combining M., Fisher, R.B. (2015). Special issue on animal and insect crowd sourcing and volunteer computing to analyze avian behaviour understanding in image sequences. EURASIP nesting video. In Proceedings - IEEE 9th international Journal on Image and Video Processing, 2015,1. conference on e-Science, e-Science 2013 (pp. 107-115). Steenweg, R., Hebblewhite, M., Kays, R., Ahumada, J., Fisher, Fegraus, E.H., Lin, K., Ahumada, J.A., Baru, C., Chandra, S., J.T., Burton, C., Townsend, S.E., Carbone, C., Rowcliffe, Youn, C. (2011). Data acquisition and management soft- M.J., Whittington, J., Brodie, J., Royle, J.A., Switalski, ware for camera trap data: a case study from the team A., Clevenger, A.P., Heim, N., Rich, L.N. (2017). Scaling- network. Ecological Informatics, 6, 345–353. up camera traps: monitoring the planet’s biodiversity with Goehner, K., Desell, T., Eckroad, R., Mohsenian, L., Burr, P., networks of remote sensors. Frontiers in Ecology and the Caswell, N., Andes, A., Ellis-Felege, S. (2015). A com- Environment, 15(1), 26–34. parison of background subtraction algorithms for detecting Swinnen, K.R., Reijniers, J., Breno, M., Leirs, H. (2014). A avian nesting events in uncontrolled outdoor video. In novel method to reduce time investment when processing 2015 IEEE 11th International Conference on e-Science videos from camera trap studies. PloS One, 9, e98,881. (e-Science) (pp. 187–195): IEEE. Van Droogenbroeck, M., & Barnich, O. (2014). viBe: a disrup- Gonzalez, R.C., & Woods, R.E. (2007). Digital image pro- tive method for background subtraction. In T. Bouwmans, cessing (3rd Edition). New Jersey: Pearson Prentice Hall, F. Porikli, B. Hoferlin,¨ A. Vacavant (Eds.) Background Pearson Education, Inc. modeling and foreground detection for video surveillance Harris, G., Thompson, R., Childs, J.L., Sanderson, J.G. (2010). (pp. 7.1–7.23): Chapman and Hall/CRC. Automatic storage and analysis of camera trap data. The Weinstein, B.G. (2015). Motionmeerkat: integrating motion Bulletin of the Ecological Society of America, 91, 352–360. video detection and ecological monitoring. Methods in Hofmann, M., Tiefenbacher, P., Rigoll, G. (2012). Background Ecology and Evolution, 6, 357–362. segmentation with feedback: the pixel-based adaptive seg- Yousif, H., Yuan, J., Kays, R., He, Z. (2017). Fast human-animal menter. In 2012 IEEE computer society conference on com- detection from highly cluttered camera-trap images using puter vision and pattern recognition workshops (pp. 38–43). joint background modeling and deep learning classifica- Janzen, M., Visser, K., Visscher, D.R., MacLeod, I., Vujnovic, tion. In 2017 IEEE international symposium on circuits and D., Vujnovic, K. (2017). Semi-automated camera trap systems (ISCAS) (pp. 1–4). image processing for the detection of ungulate fence cross- Yu, X., Wang, J., Kays, R., Jansen, P.A., Wang, T., Huang, T. ing events. Environmental Monitoring and Assessment, (2013). Automated identification of animal species in cam- 189(10), 527. era trap images. EURASIP Journal on Image and Video Krishnappa, Y.S., & Turner, W.C. (2014). Software for min- Processing, 2013, 52–61. imalistic data management in large camera trap studies. Ecological Informatics, 24, 11–16. Landis, J.R., & Koch, G.G. (1977). The measurement of Publisher’s note Springer Nature remains neutral with regard observer agreement for categorical data. Biometrics, 33, to jurisdictional claims in published maps and institutional 159–174. affiliations.