Biological Journal of the Linnean Society, 2015, 115, 731–749. With 3 figures.

Emerging technologies for biological recording

TOM AUGUST1*, MARTIN HARVEY2, PAULA LIGHTFOOT3, DAVID KILBEY4, TIMOS PAPADOPOULOS5 and PAUL JEPSON6

1Centre for Ecology and Hydrology, Natural Environment Research Council, Wallingford OX108BB, UK 2Faculty of Science, The Open University, Milton Keynes, UK 3National Forum for Biological Recording, UK 4IT services, University of Bristol, Bristol, UK 5Biodiversity Institute, University of Oxford, Oxford, UK 6School of Geography and the Environment, University of Oxford, Oxford, UK

Received 5 December 2014; revised 11 February 2015; accepted for publication 13 February 2015

Technology has played an important role in biological recording for hundreds of years, from the invention of the microscope to the microprocessor. We review current and emerging technologies that are changing the way we study and record the natural world. From websites to smartphones, data capture is becoming easier, faster and more accessible. Increases in data volume and wider participation raises concerns over data quality, which are being addressed with accurate sensors, automated validation tools, and verification platforms that utilize expert taxonomists and collective intelligence to ensure the highest level of quality possible. Data curation and interoperability have been transformed in the information age. The need to collate data at continental and global scales and across institutions continues to drive the formation of standardized data formats and taxonomies. Once collated these data can be analysed using high-performance computing, and used to provide valuable feedback in the form of interactive visualizations, computer-generated text or even game-like rewards. We also address issues arising from these technological developments. For example: how will the role of the expert naturalist change? Is biological recording undergoing a revolutionary or evolutionary process? How is technology leading to the empowerment of the public? © 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749.

ADDITIONAL KEYWORDS: biological recording – citizen science – data capture – technology.

INTRODUCTION sensor technologies (LiDAR, UAVs, acoustic arrays, eDNA, etc.) and biodiversity informatics infrastruc- Widespread access to computational and communica- tures generate ‘big ecological data’ (Snaddon et al., tion technology is one of the defining characteristics of 2013). Big data is changing the way scientists view, our generation, prompting talk of humanity embark- study and analyse the world (e.g. National Ecological ing on an information revolution (e.g. Sliwa & Observation Network; http://www.neoninc.org) and it Benoist, 2011; Saylor, 2012). The implications of will probably change the nature of biological record- social media, apps, low-cost sensors, search engines ing in terms of techniques, practice and the relation- and predictive analytics associated with ‘big data’ ship between expert and non-expert. have profound implications for many areas of society, The processes driving the uptake of technologies in including biological recording. We can expect a step- biological recording are two-fold. First, the passion change in biological recording as developments in and curiosity for the natural world that has moti- vated naturalists for centuries is inspiring present- day naturalists to develop and use technologies to *Corresponding author. E-mail: [email protected] enhance their recording activities. Secondly, for many

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749 731 732 T. AUGUST ET AL. of the pressing questions in conservation, such as A brief history of these developments is presented in identifying species at risk of extinction, structured Figure 1. We will explore these as we review the role survey data are available for only a small group of of technology at each stage of the recording process. charismatic taxa (e.g. UK Butterfly Monitoring Specifically we will consider the quantity, quality and Scheme; http://www.ukbms.org). The next best alter- diversity of data afforded to us by new data capture native are unstructured data, which are based on technologies; the importance of record verification ad-hoc observations. Methods have recently been and dataset interoperability; and new ways in which developed to compensate for sampling biases associ- technology is being used to share, analyse and engage ated with unstructured data which identify trends people with data. and predict species responses to environmental change (van Strien, van Swaay & Termaat, 2013; Isaac et al., 2014). When collated nationally in a DATA CAPTURE standard format, biological records data can be used The mechanisms for recording data have changed for assessments of extinction risk (Maclean & Wilson, dramatically over the past 50 years, from recording 2011), impacts of climate change (Hickling et al., based solely on paper to schemes that are now based 2006), informing emergency responses to environmen- solely on electronic systems. This change has been tal threats (Department for Environment Food and possible due to developments in hardware, such as Rural Affairs, 2012) and reporting on the Convention mobile computing and the internet, and software such on Biological Diversity (CBD) targets (Department for as databasing and HTML. Environment Food and Rural Affairs, 2013). This has led to an increased interest from data users in the ways in which technology might be able to enhance WEBSITES biological recording. For example, the ecosystem Websites allow data to be entered and shared quickly approach of the revised CBD (http://www.cbd.int/ and reliably and have become an integral part of ecosystem) requires data on multi-taxa assemblages modern wildlife recording (Table 1). The ubiquity of of species as indicators of habitat condition or proxies computers in contemporary life has led paper forms to for ecosystem services, such as pollination or soil be replaced by digital forms presented via a PC or biodiversity. Examples of projects that take on this mobile web-app. These technologies afford standard- challenge in the UK include PondNet (http:// ized forms with restricted data entry, drop-down lists www.freshwaterhabitats.org.uk/projects/pondnet), the and mapping interfaces that help ensure data quality. National Plant Monitoring Scheme (http:// In addition they allow the capture of any data that www.brc.ac.uk/npms/content/welcome) and SPLASH can be stored digitally, notably photographs. Because (Survey of Plants and Lichens associated with Ash; the need for distributing and collecting physical forms http://www.brc.ac.uk/splash/home). Emerging technol- has been removed, websites have radically extended ogy plays a key role in such projects, both in engaging the reach, accessibility and interactivity of biological participants and in improving standards of data recording schemes, which can now be promoted via a quality and data flow. simple link embedded in an email, news report or The scope of this paper is on current, cutting edge tweet. Websites may be considered by many as old and future applications of technologies in volunteer technology as the first website was created over 20 biological recording. Technological innovations have years ago, although website design is constantly influenced the practice of biological recording since evolving. The design of websites, and any other the outset; for example, field guides and distribution human–computer interaction, is key for the success of maps were an outcome of the print press and new biological recording systems. Good design engages cartographic techniques (Demeritt, 2001), respec- users, maintains their motivation and makes the tively. The technological innovations we examine here process of recording and data exploration efficient and are computational – hardware and software – and we enjoyable. include ‘emerging’ to flag our focus on technologies Xeno-canto (http://www.xeno-canto.org) is a website that are emerging from the interplay of cloud and that successfully utilizes the engagement of the mobile computing, social media and networks, general public from across the globe to record and sensors, informatics and big data within institutions archive bird vocalizations with minimal needs in of biological recording. While we will include interna- funding and centrally invested human effort. tional examples, our focus will be on examples from Launched less than 10 years ago and maintained by the UK that exemplify these developments. only four administrators, with the assistance of the The explosion of technologies of the late 20th xeno-canto community, this website now contains century and early 21st century have provided both recordings from more than 9000 avian species col- opportunities and challenges for biological recording. lected by more than 2000 registered users (covering

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749 EMERGING TECHNOLOGIES FOR BIOLOGICAL RECORDING 733

Figure 1. Milestones in biological recording (above) and technologies (below) in the context of the five waves of computing (bottom).

Table 1. A summary of exemplar platforms and websites

Name Description URL

Atlas of Living An online platform for exploring and analysing data on http://www.ala.org.au Australia the flora and fauna of Australia BeeWatch A website that crowd sources records of bees, http://www.bumblebeeconservation.org/ integrating natural language generation get-involved/surveys/beewatch BirdTrack A website and smartphone app for recording bird http://blx1.bto.org/birdtrack/main/ observations in the UK data-home.jsp BRC The Biological Records Centre is a focus of recording http://www.brc.ac.uk for freshwater and terrestrial species in the UK eBird An online system for recording observations of birds http://www.ebird.org and for exploring the observations of others GBIF The Global Biodiversity Information Facility is a http://www.gbif.org repository of free-to-access global biodiversity data Indicia A free and open source toolkit for building biological http://www.indicia.org.uk recording websites iRecord A website, built using Indicia, for collating and http://www.brc.ac.uk/irecord verifying biological records iSpot A website designed to allow users to share images and http://www.ispotnature.org learn about identification through crowd sourcing NBN The National Biodiversity Network is a partnership of http://www.nbn.org.uk organizations involved in biological recording in the UK. It is the UK node of GBIF The New Forest A smartphone app for recording and identifying calls by http://www.newforestcicada.info/ Cicada Hunt the New Forest cicada Zooniverse A collection of online projects that crowd source the http://www.zooniverse.org interpretation of images and sounds

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749 734 T. AUGUST ET AL. roughly 80% of all vocalizing avian species). Species vided at any point in the key, making use of informa- labelling as well as audio quality characterization of tion such as the location or date of the identification each individual recording is done based on a peer (Burkmar, 2014). Given the benefits of multi-access assessment scheme. Collecting digital data on a keys the uptake of this approach has been slow global scale, as in this example, presents exciting (Morrison, 2011), in part because the technology possibilities, and would not have been possible required (i.e. well-designed user interfaces and port- without modern technologies. able hardware that can be taken into the field) has Most websites invite users to register and provide been absent, but also because of loyalty to existing basic information about themselves. This generates paper-based guides and single-access keys (Burkmar, secondary data relating to the recorders’ use 2014). These technological challenges have now been of the website. The Zooniverse project (http:// overcome and the field is advancing rapidly (Nimis & www.zooniverse.org), which deals with data classifi- Lebbe, 2010), with standards in place – the Struc- cation rather than capture, is using such data to tured Descriptive Data (SDD; http://wiki.tdwg.org/ generate algorithms that profile users and automati- SDD) – to ensure interoperability between systems. cally present tasks suited to their skill and commit- The potential to take tablet computers into the field, ment levels, in much the same way Google targets loaded with tens or hundreds of guides, user-friendly adverts. Such approaches could be applied to large- multi-access keys, and with access to online libraries scale biological recording schemes in the future. of photos, sounds and videos, is exciting and will Websites, such as Zooniverse, can also themselves result in increased quality of data gathered and become a place for data generation. An ever increas- widened participation in biological recording. ing number of projects take advantage of ‘cognitive surplus’ (Shirky, 2010), the idea that for much of our spare time we do not make use of our cognitive SMARTPHONES capacity. These projects use this surplus to undertake In the fifth wave of computing (Fig. 1) users became useful tasks. One task that humans are particularly liberated from their work stations, being able to carry good at, which computers are not, is pattern recogni- around computers in their pockets (Saylor, 2012). tion. A number of the most successful citizen science The first mobile computers were the PalmPilots projects have developed websites to harness this skill. (PDAs) launched by Hewlett Packard, which were Zooniverse projects (e.g. Snapshot Serengeti and applied to biological recording in a limited way, Seafloor explorer) have attracted over 1.2 million notably Cybertracker (http://www.cybertracker.org/ users, generating data whose analysis has led to background/our-story). However, the introduction of numerous scientific publications (e.g. Lintott et al., smartphones represented a step change in mobile 2008). Other websites seek to encourage volunteers to computing. Since their introduction 20 years ago digitize physical records; for example, Herbaria@ smartphones, and more recently tablets, have under- Home (http://www.herbariaunited.org/atHome) has gone a meteoric rise in popularity and have become led to the digitization of over 142 000 herbarium almost ubiquitous in the UK: an estimated 72% of UK specimens by volunteers, while the Atlas of consumers aged 16–64 owned a smartphone device in Living Australia’s DigiVol project (http:// 2013, a rise of 14% since the previous year (Deloitte, volunteer.ala.org.au/) has expanded this idea to other 2013). The latest generation of smartphones integrate taxonomic groups and successfully engaged over 700 enhanced sensors and computational power with the volunteers. capacities of cloud computing, big-data, social net- Typically seen as a portal through which data are working, crowdsourcing and the human user. ‘Apps’, submitted or viewed, biological recording websites small task-orientated programs, create this integra- can also provide tools to aid recorders. Multi-access tion and have proliferated exponentially since the keys are a notable example with great future poten- launch of Apple’s iTunes App store (2008) and tial. In contrast to traditional dichotomous keys, Google’s Play Store (2009). users of multi-access keys can consider the morpho- The wide adoption of smartphones in the developed logical characters of a specimen in any order. One of world has presented some unique opportunities for the benefits of this approach is that users can avoid biological recording by facilitating data entry in the characters that are absent on their specimen or which field and providing built-in sensors. The sensors used they are not confident identifying, but multi-access most in biological recording are: GPS chipsets for keys are very hard to produce in paper form. deriving accurate location, camera and a microphone. Computer-based multi-access keys take advantage of Some newer phones also feature sensors to measure the flexibility of a graphical user interface and can temperature, humidity, air pressure and movement incorporate elements absent from paper versions. For (gyroscopes). These provide opportunities to add valu- example, probabilistic species matches can be pro- able metadata to recordings, although it is important

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749 EMERGING TECHNOLOGIES FOR BIOLOGICAL RECORDING 735 to capture uncertainty from these sensors as they apps allow users attempting to submit records may be imprecise and inaccurate. without an internet connection to store them on their Scientists are harnessing the ubiquity and func- phones for submission later. While these two features tionality of smartphones by designing apps to facili- address the main issues of reduced connectivity in tate biological recording by experienced biological rural areas, improved connectivity would allow for a recorders but also by members of the public with greater diversity of features in these apps. only a limited or casual interest in recording. Examples include PlantTracker (http://planttracker .naturelocator.org/) for crowd sourcing location data REPURPOSED TECHNOLOGY on invasive plant species. Users photograph and geo- The data capture technologies we have considered so locate the specimen and then submit the record to a far are designed with a top-down approach. They are database to be verified by experts. The New Forest built by organizations or individuals who desire data, Cicada Hunt app (http://www.newforestcicada.info/) with the intent that volunteers will use the technol- deploys the capacity of smartphone microphones ogy to collect data for them. Communities are now to detect the near ultrasonic call of the UK’s only emerging across the globe that seek to develop tech- native cicada, and submits an audio recording nologies that address local concerns, a bottom-up and sonogram should something be detected, placing development process. itself at the cutting edge of biological recording Through the use of repurposed technologies – tech- apps. nology that has been adapted for a purpose other than The way in which apps are produced is changing that for which it was originally designed – it is too. With the continuous evolution of web standards possible for citizen science projects that are there has been an increase in hybrid and pure web technology-dependent to become independent of insti- apps, which facilitate a ‘write once, deploy anywhere’ tutions who would previously have provided the tech- approach. This is in contrast to natively coded apps, nology needed. This provides the opportunity for which are specific to a particular phone platform. This technology-dependent citizen science independent of is making apps both more economical to produce and industry or academia, although co-development of easier to maintain, although currently with some projects is often advantageous to both parties. This compromise in functionality and overall user experi- movement is aided by the relatively low cost of ence. Using this approach a number of apps have electronics and computers, and online communities been developed for biological recording which work that are dedicated to supporting and sharing infor- across multiple platforms such as the Nature Locator mation on repurposing technologies (e.g. instruc- family of apps (http://www.naturelocator.org). tables; http://www.instructables.com and The Public Pure web apps have become more feasible options Laboratory for Open Technology and Science; http:// owing to the increased capabilities of the HTML5 www.publiclab.org). standard. HTML5 is the latest iteration of the lan- Common technologies that are repurposed include guage used to create websites and offers new features digital cameras, UAVs (unmanned aerial vehicles) such as support for touch-based interaction, as is and computers. Projects in this area have included common on touch screen devices. These apps are run documenting the Deep Horizon oil spill using modi- from the host’s server and have both advantages and fied digital cameras attached to kites and balloons disadvantages compared with native and hybrid apps. (The Public Laboratory for Open Technology and Among the pros include the ability to update the app Science; http://www.publiclab.org/wiki/gulf-coast), website and for these updates to be instantly visible monitoring pollution using cheap open source sensors to all users. Additionally, because pure web apps run and electronics platforms (FLOAT; http://www.f-l-o-a- from a server, they can use the increased computing t.com) and monitoring water quality (The Public power this affords to undertake complex tasks such as Laboratory for Open Technology and Science; http:// sophisticated sound and image recognition. However, www.publiclab.org/wiki/open-water). these types of app have no App Store presence nor do There are few repurposed technologies that have they yet have access to some smartphone features been constructed for biological recording, although a such as touch gestures or access to certain hardware. number of opportunities exist. There are projects These apps also suffer from a limitation that affects which have developed microscopes from webcams many smartphone apps, the need for an internet which could help identification of small organisms, connection. cheap camera traps can be constructed from digital Many apps navigate the issue of unreliable internet cameras and infrared sensors, and advances in UAV connectivity by storing content on the device rather technology provide opportunities for low-cost, high- than loading it from a server, although the storage resolution, imagery for mapping species occurrence space on smartphones is limited. Additionally many (Martin et al., 2012).

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749 736 T. AUGUST ET AL.

COMMUNITY CREATION AND SUPPORT www.inaturalist.org) allow users to submit photos of The social aspect of biological recording is a key sightings, comment on each others’ photos as well as motivation for many volunteers and has resulted in give their own taxonomic identifications to other peo- vibrant recording communities in many parts of the ple’s observations. world. However, communities of recorders can often be sparsely distributed and often form around geo- DATA MANAGEMENT AND graphical regions such as towns/cities or counties/ QUALITY ASSURANCE states. This can result in high spatial variability in recording activities. The biological recording community has always been Many recording communities now have websites quick to adopt new technologies to help manage data: which are a centre point for organizing events, pub- spreadsheets and databases have long formed the lishing news and promoting recording. These can help backbone of data management for recording schemes to unify a community in a formal way by promoting (Foster, 2015). Recent and emerging technologies common projects such as creating regional atlases, provide further possibilities for increasing efficiency and activities such as trips or conferences. in data management through shared online systems While websites can help to unify a community, they and automated approaches. However, while some of are often used as a notice board rather than a place these new approaches are making it easier to capture for social interactions. Instead, social networking greater amounts of data more quickly and from a sites have become a useful tool for enabling easy wider range of recorders, the need remains for such communication between like-minded members of the data to be checked to ensure that its quality is known recording community. In the 1990s biological record- and documented. Not all data uses require high- ers were quick to adopt the Yahoo! system of email quality data, especially if data volume is large (see discussion groups, with a plethora of groups being set section on analysis). However, for many questions up for different taxonomic groups and interests, data of a known quality is required and as such sharing information and allowing sightings of inter- metadata is important. Data quality can be quanti- esting species to be rapidly shared. Many of these fied, and in some cases improved through quality groups are still active today. More recently natural- assurance steps. ists have been quick to adopt new social media includ- Quality assurance for biological records has largely ing Facebook and Twitter. Social media offers users been carried out by people working or volunteering the ability to form their own communities and com- for local records centres and for national and local municate with individuals or groups easily and recording schemes. The terms ‘validation’ and ‘verifi- instantaneously (Box 1). It also offers established cation’ are often used to refer to, respectively, carrying organizations the ability to communicate easily with out standardized checks on the completeness and their members in a more interactive and less formal legitimacy of a record’s contents, and ensuring the way. For example Butterfly Conservation has an esti- accuracy of the recorded location and taxon name mated 24 100 followers on Twitter, and organizations (James, 2011). such as local bat groups use Facebook groups to Some aspects of quality assurance can be auto- organize events and have discussions. mated, while others require significant input from Some biological recording websites also include experts. The extent to which emerging technologies social elements to gain the best of both worlds. will be able to augment and assist in quality assur- For example both iSpot and iNaturalist (http:// ance is not yet clear, but there is scope for them to

Box 1. Garden Bioblitz is an example of the impact of social media on biological recording. In a blog posted in 2011 a UK-based naturalist described how she had carried out a ‘bioblitz’ survey of her garden, recording as many species as possible. Her experience was shared via Twitter, which led a number of people with similar interests to set up a trial project in 2012 using Twitter as the focus for a larger group of people to take part (Comont, 2013). The success of this trial led to the launch of a nationwide Garden Bioblitz in 2013, which has continued to use Twitter and Facebook to engage more people to take part, with online resources including iRecord (http://www.brc.ac.uk/irecord) and iSpot (http://www.ispotnature.org) used to store biologi- cal records and provide help with identifications. The swift development of such a project, bringing together novice and experienced wildlife recorders from many widely dispersed locations, was only possible by embracing the possibilities offered by social media.

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749 EMERGING TECHNOLOGIES FOR BIOLOGICAL RECORDING 737 play a greater role in the future. However, it should based on the current set of verified records, updating also be recognized that expert verifiers also use the them in real-time as further records are verified. verification process as a means of communicating with, educating and enthusing the recorders who Automated observation and identification provide data. Automated procedures can provide help Automated collection of images and sound recordings to human verifiers, but are unlikely to replace them. for biological recording is becoming increasingly familiar. Work using camera traps has traditionally focused on mammals (Trolliet et al., 2014), but is AUTOMATION also being extended to other taxa, including Automated location data , for example via the Rana system (http:// In field location data have previously relied on the www.tumblingdice.co.uk/rana/detecting-pollinators). observer’s map reading skills and can be a source of Different types of microphones have been used to error. Many websites for biological records offer the record bats (Walters et al., 2012), marine mammals recorder an interactive map so that they can provide (Mellinger et al., 2007), birds (e.g. Zwart et al., 2014) the location by simply clicking on a map, thus avoid- and insects (e.g. Chesmore & Ohya, 2004). These ing the need for a detailed understanding of coordi- approaches can produce images and sounds alongside nate systems. In addition fully automated location metadata such as date, time and often location. recording is available from many mobile devices con- Human input is usually required to identify the taining GPS. This is a welcomed advance, although it species that have been recorded, but this aspect is is important that the GPS accuracy is also captured also becoming amenable to automation. as this can vary greatly depending on habitat struc- Arguably, one of the most pertinent examples of ture and device. automated detection and identification of biological sounds is the case of birdsong recognition. Recordings Automated validation and verification tools of birdsong have become increasingly available due to Using our understanding of organisms’ traits (distri- the widespread interest of the general public, the bution, phenology, etc.), tools are now available to existing networks of bird enthusiasts and the fact automatically detect unusual records, helping to that bird vocalizations reside largely in the same direct the effort of experts whose time is often in frequency region as human speech (and hence can be short supply. For example, NBN Record Cleaner recorded with cheap voice recorders and mobile (http://www.nbn.org.uk/Tools-Resources/Recording- devices). This has resulted in increased research Resources/NBN-Record-Cleaner.aspx) is an auto- interest in the application of machine learning mated validation and verification decision-support methods to the problem of automatic bird sound rec- tool for recorders and biodiversity data managers. It ognition. Early research works on the subject in the is designed to improve the efficiency of data flow and 1990s made use of simple pattern matching tech- ensure the quality of biodiversity datasets by ena- niques from the early days of speech processing (e.g. bling automated checking of large datasets in a Anderson, Dave & Margoliash, 1996). The similarities variety of formats against validation and verification of birdsong to human speech (e.g. Doupe & Kuhl, ‘rule sets’ developed by national taxonomic experts. 1999) led the research area to grow based on methods Verification rule sets flag up records that fall outside and findings from the speech processing literature the known temporal or spatial distribution of that through the 2000s. This included the use of audio species, as well as highlight records of species that features originally devised for human speech recogni- are inherently difficult to identify. The human verifier tion, the use of Hidden Markov Models and in most can then choose whether to accept these ‘outlier’ cases the need of a manual preprocessing syllable records, or to go back to the original recorder to seek segmentation stage for training of the proposed learn- further evidence for the record. A similar system has ing algorithms (for a review see Stowell & Plumbley, been developed by eBird (http://www.ebird.org), which 2011). More recently, the severe practical limitations uses similar rules to highlight unusual records, and associated with manual syllable segmentation as well many others are documented by the Global Biodiver- as the need for classification algorithms that can sity Information Facility (GBIF; http://www.gbif.es/ discriminate between hundreds of different bird BDQ.php). species, labels for which no dictionaries of phonemes/ Published rulesets are defined at a point in time, syllables exist, has resulted in the gradual departure but in the future it would be possible to facilitate live from speech recognition methods in favour of a wider updating of some of the rules as new data arrive. For toolset of probabilistic machine learning methods (e.g. instance, fixed rules based on the known range of the Briggs et al., 2012; Stowell & Plumbley, 2014). A species can become out of date quickly as ranges significant advantage of structured probabilistic change, but in principle geographical rules could be models is the ability to provide human users with an

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749 738 T. AUGUST ET AL. ordered list of most likely results, thus allowing the funded project Warblr (https://www.kickstarter.com/ crowd-sourcing of human corrective input to the auto- projects/1190241008/warblr-an-app-that-recognises- matic identification scheme. birds-from-their-son) and the BioSound project (http:// An increasing number of online sound recognition oxlel.zoo.ox.ac.uk/research/projects/biosound/) both competitions are organized as part of international on bird song identification, as well as the Google conferences and other initiatives. This is as a result of Impact Challenge-funded collaboration between the increasing interest in designing biological sound rec- Royal Botanic Gardens Kew and the University ognition algorithms, using machine learning, to of Oxford (https://impactchallenge.withgoogle.com/ analyse audio obtained in realistic conditions. Such uk2014/charity/kew), which aims at crowd-sourcing competitions attract worldwide entries from several data to help prevent mosquito-borne diseases. The tens of competing design teams at a time. They facili- true impact of this proliferation of automated identi- tate agile forms of knowledge and expertise sharing fication applications is yet to be evaluated. and provide a constant update on the state of the art in the field. The most prominent examples of recent competitions include the UCR Classification CROWD-SOURCING Contest in 2012 (http://www.cs.ucr.edu/∼eamonn/CE/ Biological recording employed ‘crowd-sourcing’ long contest.htm), the Multi-label Bird Species Classi- before the term had been coined. Online technologies fication – NIPS 2013 (http://www.kaggle.com/c/ have enabled this collaborative approach to evolve multilabel-bird-species-classification-nips2013), the further, allowing people from all over the world to Machine Learning for Signal Processing 2013 Bird communicate more quickly (Box 2). Classification Challenge (http://www.kaggle.com/c/ There is no doubt that online interactions enable mlsp-2013-birds) and the two bird sound and one people to learn from ‘the wisdom of the crowd’ whale sound competitions organized as part of the (Galton, 1907) when seeking help with species iden- International Conference on Machine Learning 2014 tifications. However, there is also plenty of evidence (http://sabiod.univ-tln.fr/ulearnbio/challenges.html) that not all such wisdom is accurate. For the less As a result of this research activity, various com- familiar taxon groups expert knowledge may be con- puter programs and smartphone apps have emerged fined to few people within the crowd, and it is not that attempt to identify species from images or easy to assess whether identifications suggested sounds, thus opening the way for complete automation online are correct. For this, input from experienced of the biological recording process – species, location naturalists is invaluable. The iSpot project addresses and date/time. The Leafsnap app (http://www.leafsnap this issue by using social media to put novices in .com) utilizes algorithms based principally around leaf touch with experts, enabling them to learn about shape to aid identification by refining the number of species identification, gain experience themselves and possible matches a user needs to consider (Kumar go on to help others within the crowd. Additionally, et al., 2012), while the Echometer touch app (http:// iSpot uses a ‘reputation score’ to indicate how much www.wildlifeacoustics.com/products/echo-meter-touch) experience each participant has in identifying species and the iBatsID web application (http://www.sites within broad taxonomic groups (Silvertown et al., .google.com/site/ibatsresources/iBatsID) aim at identi- 2015). Participants can suggest identifications, and if fying bat species from their calls. Wildlife sound iden- others agree with their identification they receive an tification computer software and mobile apps that are addition to their reputation score – the amount added based on proprietary algorithms are offered by private to the score depends on the amount of reputation that companies such as SoundID (http://www.soundid.net) has already been assigned to the person agreeing and Isoperla (http://www.isoperla.co.uk). Other exam- with the identification. The reputation scores are ples of recent and ongoing projects include the crowd- indicated by icons on the website, and thus it is

Box 2. In October 2009, a 6-year-old girl found an unusual-looking in her house in Berkshire, UK. To seek help with identification her father posted a photograph on the iSpot website, developed by The Open University (http://www.ispotnature.org/node/7407). Within an hour a provisional identification had been suggested: sinica Moore (: ), a species not previously recorded in Europe. Within 24 h this had been confirmed by experts from the Natural History Museum in London, and their colleagues in Taiwan, within the moth’s native range (Ansine, 2013). It has been suggested that the larvae of the moth may have been imported on the moth’s food-plant ( spp.). This chain of events demonstrates the potential of crowd-sourcing biological recording.

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749 EMERGING TECHNOLOGIES FOR BIOLOGICAL RECORDING 739 possible to make a judgement about how much weight by sites such as iRecord and eBird whose focus is of experience may be behind any particular suggested directed primarily at data collection rather than edu- identification. cation. These systems allow records for any taxon at The combination of web technology and mobile apps any location to be entered online and on smartphone allows a novice naturalist observing wildlife in the apps, often with accompanying photos. Verification is field to have almost immediate access to expertise typically provided by expert verifiers with local exper- from a community of users who may in principle be tise in species’ ecology and identification. based anywhere in the world (Scanlon, Woods & Clow, 2014), providing new opportunities for learning and sharing information. A challenge associated DATA CENTRES with crowd-sourcing verification is false-positives: Increased participation in biological recording as a instances where a species identification is verified result of technological innovation is generating large when it is in fact incorrect. Were this type of error to volumes of digital data. While this does not currently be prevalent we would expect a weak relationship pose a problem in terms of physical memory space, it between the number of identifications per species and is more important than ever to curate digital data to their difficulty of identification. When results from ensure it is as accessible as possible to those who iSpot were tested there was a strong relationship might want to use it, including those who helped to between the number of observations of a species and collect the data. Interoperability with other datasets its difficulty of identification (Fig. 2), suggesting that and future-proofing against technological changes these false-positives are not common. However, it also and increasing volumes of data is also key. raises a concern that increasing use of crowd-sourced New data types such as digital photographs and data based on photographs of wildlife will bias species sound recordings (e.g. bat calls and bird song) have recording datasets towards those species that are required modifications to databases and in the near easy to identify, along with other potential biases future we should expect genetic sequences to become towards more conspicuous and colourful species and a common addition to records. Environmental DNA those that are easier to photograph. (eDNA) analysis is likely to increase the volume of Crowd-sourced identification such as that used by biological records data and have an impact on biologi- iSpot and Project Noah (http://www.projectnoah.org) cal recording more generally, which is discussed in encourage people to learn how to identify species, and another paper in this special issue (Lawson Handley, as a result biological records are accumulated. More 2015). direct online systems to collate records are provided Despite the increasing number of biological records being submitted digitally, and the increase in data that are submitted with these, such as images and sound files, data storage is not currently thought to be a major concern. For example, iSpot currently holds ∼300 000 observations and ∼470 000 associated images, which uses ∼500 GB of storage space. With storage costing under £50 per terabyte, this is not currently problematic, although mobile internet con- nectivity and data limits currently restrict the ability to readily upload and download large files such as images and videos from mobile devices. Standardized, quality-assured, permanently ar- chived data centres are needed to ensure longevity and security of data. Examples of permanent data archives include the Natural Environment Research Figure 2. The number of observations per species for Council’s network of data centres (http://www.nerc each species that has been observed on iSpot and that has been given an ‘ID difficulty’ category in the NBN Record .ac.uk/research/sites/data), GBIF, the BioFresh Cleaner rulesets. iSpot data are based on all observations platform (http://www.freshwaterbiodiversity.eu) and made in Britain up to 8 September 2014 that have GenBank (http://www.ncbi.nlm.nih.gov/GenBank), received a ‘Likely ID’ (Silvertown et al., 2015). The number the last being a global repository of genetic informa- of species included in the analysis is given in parentheses tion. While some data centres are static, others are after the taxon group name. Birds and Butterflies & dynamic, interfaced with a growing suite of tools for , only use categories 1–4. Information on the NBN editing and managing data and metadata. rulesets is available at http://www.nbn.org.uk/Tools- With the number of data centres increasing it is a Resources/Recording-Resources/NBN-Record-Cleaner.aspx challenge to ensure that data can be aggregated

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749 740 T. AUGUST ET AL. across centres if needed. One solution is to create a biodiversity data are easily citable and that credit is centralized system, all data flowing into one data given to those responsible for the data collection. Data warehouse, which is possible using the Indica toolkit centres can assign these identifiers to data they hold (http://www.indicia.org.uk). Indicia can be used to and some, such as GBIF, already do so. Identifiers can create a centralized data warehouse which can also be used to capture the state of a dataset at a set receive data from a number of entry points. These point in time, ensuring that results of analyses are include web forms that can be implemented on the reproducible in the future. GUIDs must be unique, websites of many different recording schemes and persistent and provide a route, such as a web address, other organizations, as well as a variety of apps. For for accessing the data object. Examples that are suit- instance, in the UK the National Garden BioBlitz able for use with biodiversity data include Life (http://www.gardenbioblitz.org), the PlantTracker Science IDs (LSIDs) and Digital Object Identifiers app (http://planttracker.naturelocator.org/) and the (DOIs) (Taxonomic Databases Working Group, 2011). iRecord Butterflies app (http://www.brc.ac.uk/article/ Data centres should not be thought of as isolated irecord-butterflies-mobile-app) all feed data into the entities but considered as nodes in a network of data same data warehouse. These projects all look very centres. By connecting data across data centres we different, and most people who participate in them can bring together complementary data and address will be unaware that they are linked in any way. questions that cannot be answered by any one data However, behind the scenes, rather than having sepa- centre in isolation. rate data storage solutions for each of these projects, a single central data warehouse, maintained by the Biological Records Centre (http://www.brc.ac.uk), INTEROPERABILITY takes in the data from each project (Fig. 3). Experts Interoperability allows data from across regions, acting as verifiers can see all the data that are rel- nations or institutions to be combined without loss in evant to their taxonomic group or geographical area quality. Combining data in this way is necessary to in one place, rather than having to deal with multiple monitor change in biodiversity at the scale at which datasets from each project. The data become available policy decisions are being made such as the European to recording schemes and records centres for analysis Union (EU). For example, the EU water framework purposes, with the level of verification documented as directive intercalibration exercise (Water Information part of the record. System for Europe, 2008) sought to combine data on Globally unique identifiers (GUIDs) enable data the ecological status of water bodies across 27 objects, such as taxon names, field observations or member states by ensuring comparability and consist- voucher specimens, to be identified and accessed via ency in classification results between members. In the Internet. They are an effective way of ensuring this case, combining data at the EU level is important

Figure 3. Schematic representation of data flow using the indicia toolkit. Data from many sources feed into the same data warehouse making verification and data sharing more straightforward. Solid arrows show data capture processes whereas dashed arrows show the flow of data post-collection.

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749 EMERGING TECHNOLOGIES FOR BIOLOGICAL RECORDING 741 for reporting against targets and informing future greater than the effort needed to reformat the data policy. In the EU the INSPIRE directive aims to make when passed on to global systems. spatial environmental data better connected, saleable Protocols for sharing information such as HTTP and accessible. The INSPIRE directive was trans- and XML, which are widely used in other disciplines, posed into UK law in 2009 as the INSPIRE regula- have been used to collate and deliver biodiversity tions. Not only does interoperability afford the data over the internet. The Distributed Generic Infor- analysis of data over a large spatial extent, it also mation Retrieval (DiGIR) and BioCASe protocols allows the combination of different data types, for were developed to retrieve structured data over the example distribution and climate data for species internet from independent, heterogeneous databases distribution modelling. using Darwin Core and ABCD (Access to Biological To allow data to be shared and combined efficiently, Collections Data) schemas, respectively. These proto- standardized data exchange formats are used. These cols allow website and application developers to formats specify the structure and content of data and access biological data programmatically, opening up metadata, enabling data stored in a variety of formats new possibilities for data re-use. and software to be analysed in combination. Tradi- One of the challenges of combining biological tionally, these exchange formats have been applied to records data is matching up taxonomies between datasets post-collection, but standardization can now datasets. This process has been aided by standardized be built into the collection phase via online recording taxon dictionaries. The task of coordinating and websites and apps. The most well developed format updating taxon dictionaries is extremely complex and for biological recording purposes are the Darwin would not be possible without modern information Core standards (Wieczorek et al., 2012), which are technology. The Catalogue of Life (CoL; http:// ‘intended to facilitate the sharing of information www.catalogueoflife.org) holds data on over 1.5 about biological diversity by providing reference defi- million species, an estimated 70% of the world’s bio- nitions, examples, and commentaries’ and are used to diversity, and provides the taxonomic backbone for facilitate the sharing of biological data worldwide by global partnership projects including GBIF, the IUCN GBIF. Darwin Core Format (DwC; http://rs.tdwg.org/ Red List and the Encyclopaedia of Life. This database dwc/index.htm) uses a suite of text files, each with its and others like it are vital for making datasets com- own attributes, allowing efficient representation of parable. For example, the UK Species Inventory the various aspects of a record without duplication of (UKSI), based on over 230 separate taxonomic check- information while retaining the flexibility to the add lists in the UK, allows data to be passed between new fields. DwC has been adopted by GBIF, who have numerous UK systems with relative ease. Addition- started to develop tools based on the standard, ally dictionaries allow searches based on taxonomy, notably the Integrated Publishing Toolkit (IPT; http:// such as species within a family, as well as incorpora- ipt.gbif.org/) which provides an efficient mechanism to tion of synonyms (including old names and common publish and share existing biodiversity databases in misspellings). Darwin Core Archive (DwC-A) format. Taxon dictionaries are constantly being updated by The development and use of standards for biological taxonomic experts and this is likely to increase as recording is not new (e.g. Burnett, Copp & Harding, DNA sequencing becomes more pervasive in system- 1995), but the rapid advances in technology and atics and existing taxa are re-assessed. However, changes in the focus of policy mean that existing updates to taxon dictionaries are currently not passed standards need to be periodically reviewed to ensure on automatically to systems that use them. For they remain fit for purpose, and continue to facilitate example, custodians and managers of taxonomic web- the combination of data for analyses. New technolo- sites could be alerted automatically when a taxonomic gies are likely to lead to further developments in split or dispute occurs to help them to respond. standards to cover things such as the addition of Both data standards and taxon dictionaries are well digital photographs or genetic data (Lawson Handley, developed tools that can significantly increase the 2015) as part of the biological record, the explicit value of biodiversity data. However, there is a lack of recording of verification decisions and sharing of awareness of standards, dictionaries and the freely unique record identifiers that allow records to be available tools to aid the collection, collation and tracked across multiple platforms. While standards dissemination of biodiversity, which leads to wasted clearly have advantages it is also important to note effort and loss of data quality. Several biological that many biological recording schemes are small recording websites and apps have been developed enterprises with limited resources and technological that are not underpinned by a standard taxonomic expertise. For these groups implementing interna- dictionary, do not include georeferencing aids such as tional standards is not a priority, and the effort an interactive map, do not integrate automated required to fully implement such standards may be checks for validation and verification support, and do

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749 742 T. AUGUST ET AL. not export data in a standard exchange format. Valid visit, species detectability and recorder effort, both biological records can still be produced and dissemi- spatially and temporally (Powney & Isaac, 2015). nated by such websites and apps, but greater effort is Despite these challenges, opportunistic species occur- required to achieve this. rence data have been used to provide insights into the Combining datasets can increase their value above impact of climate change on species range (Hickling the sum of their parts. This can be achieved by et al., 2006; Mason et al., 2015), the spread of invasive aggregating data from different sources and making species (Roy et al., 2012) and assessing species extinc- them available via a single portal, such as the Atlas tion risk (Maes et al., 2015), amongst others. of Living Australia (http://www.ala.org.au) (Isaac & Methods developed for analysing opportunistic Pocock, 2015; Powney & Isaac, 2015). The Atlas of species occurrence data take on two main forms. The Living Australia combines data on regional bounda- first attempts to model the bias that is present in the ries, marine regions, species occurrences data and data and the second tries to subsample the data, habitat data, and can additionally call in layers from leaving only unbiased records. Recently these various web mapping services (WMS) or use user-uploaded methods have been reviewed (Isaac et al., 2014) data. Making data available via web services, web showing considerable variation in the ability of these feature services or Open Geospatial Consortium WMS methods to detect a change in species occurrence in enables developers to create their own combinations the presence of realistic recording behaviour (Isaac & of data from different sources for analysis or dissemi- Pocock, 2015). In support of concerns over the use of nation (e.g. Plantwise; http://www.plantwise.org). citizen science data, in this special issue Isaac et al. Increased interoperability and data exchanges show that simple methods fail under almost any between data centres pose a challenge for tracking deviation from even recording. However, a number of changes to data and avoiding duplication. Currently methods show great potential, being able to detect it is possible for a single record to be submitted to a realistic changes in species occurrence in the face of number of different websites, apps or recording significant changes in recorder behaviour across time. schemes, for each platform to perform bespoke data These methods provide opportunities to produce valu- validation and verification and pass the record on to able information for the assessment of extinction risk a centralized data centre such as GBIF. These actions (Maes et al., 2015), to measure progress towards con- result in duplication of records, which is particularly servation targets (Pocock et al., 2015b), and to assess problematic when the record has been amended in drivers of change (Mason et al., 2015). Many of these one version but not another. Creating an audit trail methods are being made freely available in the sta- for biological records would address these issues and tistical programming language R, allowing anyone to could be achieved by creating an interoperable stand- perform these analyses on their own datasets (https:// ard for the allocation of GUIDs to observations at the github.com/BiologicalRecordsCentre/sparta). Some of point of first submission. the most promising of these methods are extremely computationally intensive and have only become viable thanks to advances in high-performance com- DATA USE puting. This does, however, mean that some methods Biological recording around the world has generated become impractical for those who do not have access hundreds of millions of observations of thousands of to such computer systems. This may be alleviated in species, and the rate at which observations are being the future by the continued improvement of comput- made is ever increasing. This wealth of data has great ing power and by the move to cloud-based computing. potential for research, education and engagement, which technology is helping to realize. FEEDBACK While the motivations of recorders is varied, it is ANALYSES generally accepted that feedback is motivational, and The increasing demand for policy-relevant informa- the sooner it is given after a record is submitted, the tion from data collected through citizen science pro- better. Feedback can come in many forms, tradition- jects has driven the development of new methods to ally being face-to-face conversations with fellow vol- extract meaningful information from these datasets unteers or written feedback; new technologies have (Isaac & Pocock, 2015; Powney & Isaac, 2015). opened up the possibilities for new mechanisms for The unstructured nature of much biological record- providing feedback which utilize advances in web ing can add noise to data, hiding true changes in a design and computer-generated text. species’ distribution and abundance. This has been a Online recording and identification systems, as well major criticism to date of using these data for analy- as the use of social media sites, have made it possible ses. Biases include variation in recorder effort per site for experts and record verifiers to communicate

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749 EMERGING TECHNOLOGIES FOR BIOLOGICAL RECORDING 743 instantaneously with a wider range of people than this concept could be developed to provide information was possible through direct contact alone. This places relevant to the experience of the recorder or to their demands on those people who do have the skills and geographical location, or even suggest other locations experience to answer the many questions posed by or species that they might want to record. beginners. This can be especially problematic for taxonomic groups that require specialist input to identify species correctly. In these cases there may be GAMIFICATION relatively few ‘experts’ compared with ‘beginners’, and Games can be seen as rapid feedback environments in the experts who choose to engage may feel that they which the users’ actions affect the feedback they get are unable to meet demand. over short timescales. Some biological recording plat- Community feedback is an important feature of forms have seen potential benefits in applying design recording websites that rely on identifications that elements from games in an effort to make the user are achieved by consensus. On these websites users experience more enjoyable and engaging. The most can aid each other’s learning by providing feedback on common elements deployed are league tables, which identifications, adding a social element to the online show the ‘top users’ (e.g. BirdTrack), typically defined recording experience. Some sites such as iSpot by the number of observations contributed, and encourage this behaviour by offering incentives (in badges which serve to identify that a user has accom- the form of points and online ‘badges’) to users who plished a task or goal (e.g. Project Noah). Careful provide feedback. consideration should be given to the effect that Automated, computer-generated, feedback has been gamification has on users’ behaviour; for example, an area of research in computer sciences and web game elements such reputation badges in iSpot design for some time. Applications that use this tech- encourage users to help each other assign species nology range from targeted advertising and film sug- identities to photographs, whereas league tables may gestions, to computer-generated news articles and prompt users to try and increase the volume of data stock reports. Automation of this kind allows comput- contributed at the cost of quality. ers to turn data into information that is engaging to Successful gamification requires a detailed under- users but monotonous for a human to create. Graphi- standing of the motivation of users. It can work well cal feedback is perhaps the most common form and when used to amplify intrinsic motivations, such as can be seen on many biological recording websites. concern for the environment, or provide rewards that This usually takes the form of a map with the are desired by users, but cannot be expected to get submitted observation in the context of all other users to undertake tasks that they do not want to do recordings of that species (e.g. Sealife tracker; http:// (Deterding et al., 2011). Indeed, game elements may www.brc.ac.uk/sealife_tracker/home), but other exam- dissuade participation, particularly by experts, as the ples include up-to-date phenologies (e.g. BirdTrack; platform becomes more game-like and less tool-like http://blx1.bto.org/birdtrack/main/data-home.jsp) and (Prestopnik & Crowston, 2012). current recording activity (e.g. eBird). More complicated than generating graphical feed- back is creating human-readable text, termed natural DISSEMINATION OF INFORMATION language generation (NLG). This technique takes Visualization underlying data and transforms it into text that reads Free access to non-sensitive records, particularly via as though it has been written by a human. This real-time results maps, is identified as an important method has already been used by some biological motivation for participants in biological recording recording schemes. BeeWatch (http://www.bumblebee projects (Tweddle et al., 2012). Enabling volunteers to conservation.org/get-involved/surveys/beewatch) uses visualize and explore submitted records in the context NLG to provide feedback to users who submit photos. of other data, such as previous biological records, When the user gets the identification of a photograph environmental data, conservation action or inter- wrong, as deemed by an expert verifier, the system preted data such as modelled distribution maps, may automatically creates a response that thanks the user be even more motivational but is not yet widely for their record and highlights the characteristics of practised. Map of Life (http://www.mol.org) provides the bees that were misidentified. Additionally the such visualizations, displaying species occurrence system is able to provide contextual information about records alongside processed data such as models of the record. In a study comparing how this feedback species ranges and checklists for ecoregions and pro- affected the behaviour of recorders there was evidence tected areas. The portal includes the ability to dis- that NLG feedback improved identification skills over cover the data behind the interpreted maps, including time as well as increasing the number of records links to the original sources of the data. This provides submitted per user (Blake et al., 2012). In the future an easy-to-use interface for members of the public to

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749 744 T. AUGUST ET AL. explore the available data while also having sufficient and some uses of data under a non-commercial use data for non-governmental organizations and licence by charities may not be allowed (Hagedorn researchers. et al., 2011). A challenge for websites wishing to allow users to Technology also has a crucial role to play in ena- explore data is to make this process as unrestricted as bling data collators and publishers to gather statistics possible. The NBN Gateway’s interactive mapping on how their data are being used and in ensuring that tool gives users control over what is presented and users can cite data correctly. Practical data citation how. Users can select and display multiple layers of protocols and mechanisms, incorporating persistent data about species, sites and habitats, and change the identifiers such as DOIs, are vital for encouraging colour, transparency and resolution of these layers. more individuals and organizations to make their However, to analyse data the user needs to download data openly available, as they provide a means to the records or access them via an R package (http:// secure recognition and reward for their efforts cran.rproject.org/web/packages/rnbn/index.html). (Costello, 2009; CODATA-ICSTI Task Group on Data The rapid flow of data through recording systems as Citation Standards and Practices, 2013). These devel- a result of online data recording allows real-time opments indicate a move from data ‘sharing’ via the analysis and visualizations. The movements of internet to data ‘publication’ in perpetuity with a migrant birds (e.g. British Trust for Ornithology; citation mechanism, and ideally peer review. This is http://www.bto.org/volunteer-surveys/birdtrack) and identified as an important step in providing the moti- butterflies (e.g. Butterfly Conservation; http://www vation to make biodiversity data publicly available .butterfly-conservation.org/612/Migrant-watch.html) (Costello et al., 2012, 2013). Data journals, such as are tracked using live interactive mapping systems the Biodiversity Data Journal, already provide a peer- that can display records as soon as they are contrib- reviewed publication mechanism for biodiversity uted. Other phenological events are tracked to assess data, facilitating access to the data and offering a how species respond to climatic variables, and climate means of tracking their re-use via DOI-based citation change more broadly (e.g. Nature’s Calendar; http:// system. Additionally, Scratchpads (http://www www.naturescalendar.org.uk). .scratchpads.eu) offers anyone the ability to set up an online system for creating, collating and publishing Open access data using standard taxonomic dictionaries and data Data that are ‘open access’ are freely accessible to all formats (DwC-A) to facilitate interoperability and at minimal cost, usually via the internet and with no open access to data (Smith et al., 2009, 2011). restrictions on use. Calls for biological records data to Open access should not only apply to data but also be open access are not new, but open access is becom- to tools and applications for biological recording. For ing increasingly accepted in the broader scientific example, incorporating analytical methods within community and technologies are being developed portals to biological records data greatly increases the which make open access more easily achievable. value of the underlying data. This allows users to In 1995 the Coordinating Commission for Biological analyse the data in a way that answers their specific Recording (CCBR) estimated that there were over questions, which could prove valuable for local con- 2000 organizations in the UK involved in collecting or servation practitioners, as well as national and inter- curating biological records, but three-quarters of all national conservation managers. Currently there are data were used only within the original collecting/ few systems that integrate biological records data collating organization (Burnett et al., 1995). The and tools for their analysis. openModeller (http:// majority of records were paper-based, and those that openmodeller.sourceforge.net/) is one such system; were digitized were held in a range of different soft- designed as a platform for environmental niche mod- ware and formats. Clearly the technology available elling, it offers a range of analytical tools, as well as today means that there is no longer any need for data the ability to read in data from the GBIF (de Souza to be physically isolated in ‘information silos’ as they Muñoz et al., 2009). Building on openModeller and were 20 years ago, yet there is still a risk that tech- many other web services including GBIF and the nology can be used to create barriers to data access Catalogue of Life, BioVeL (http://www.biovel.eu) offers and use, resulting in ‘virtual silos’. These silos can a suite of free tools for those studying biodiversity. exist when data are shared but excessive limitations While BioVeL and openModeller are targeted at a are placed on their use. GBIF announced in 2014 that technical audience The Atlas of Living Australia has all data they hold will be under one of three creative the ability to model species’ distributions using occur- commons licences. This allows much more freedom for rence data held about Australian species in a user- users of the data while still allowing data providers to friendly web interface, more suited for members of prevent commercial use of data if needed. However, the public. These systems required significant invest- there is some debate over the meaning of ‘commercial’ ments of time and funding but should be seen as the

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749 EMERGING TECHNOLOGIES FOR BIOLOGICAL RECORDING 745 current standard for open data and analysis in bio- online keys and guides) as well as individuals and logical recording. communities (e.g. scheme websites and discussion forums) that allow anyone to gain the skills and knowledge required to record the natural environ- DISCUSSION ment. Furthermore, technologies have recently Technologies have always played an important part in opened the doors to those without any skills or knowl- documenting the world around us, from the invention edge of biological recording. For example, smart of the microscope to the microprocessor. However, it is phone apps such as iRecord ladybirds (http://www only in recent times, on the back of the explosion of .ladybird-survey.org/recording.aspx) do not require digital technologies of all kinds at the end of the 20th any existing knowledge of the taxa to participate. century (Fig. 1), that technology has changed the These apps have a great potential to engage, educate landscape of biological recording at such a rate. Tech- and enthuse the public with the natural world and to nologies are now challenging all elements of biological raise awareness of environmental issues (Roy et al., recording, from how data are collected and verified, 2015) and active areas of scientific research. The low through to data analysis and dissemination. cost of electronics and software now also allows non- professionals to build their own equipment, or develop their own platforms, to explore areas of science that REVOLUTION VERSUS EVOLUTION are of personal interest. While data collected by Technologies that are applied to biological recording people with little background in biological recording fall on a continuum between revolutionary, creating may be more ad-hoc and of lower quality than those an entirely new process, and evolutionary, using tech- collect by experienced taxonomists, more widespread nology to improve an existing process. Most technolo- engagement of the public is key to influencing the gies that we have encountered are the latter, which public’s attitude to science and nature, and to inspir- asks the question: ‘Are there more opportunities to ing the next generation of naturalists. apply technologies to existing rather than new pro- cesses; or are we missing opportunities for revolution- ary approaches?’ DATA QUALITY AND FLOW Websites and apps that allow data entry are an The widening of participation in biological recording evolution of the traditional paper and pencil recording and the increased ease with which records can be form, and many online recording forms still retain generated has seen an increase in the number of many of the characteristics of these paper precursors. records submitted in recent years (Tulloch et al., Such developments have undoubtedly increased the 2013). In tandem, tools have been developed to accuracy and speed of data capture. Revolutionary ensure that data quality is as high as possible. This data capture methods, for example the automated includes developments during data collection, such as detection of cicadas using a smartphone app (The GPS and image capture, and post submission, such New Forest Cicada Hunt), can be harder to place into as automated validation and expert review. Combin- the standard biological recording pathway, and the ing this flow of data with centralized data centres data generated may be difficult to analyse in conjunc- across the world, we now have more access to high tion with data collected by more traditional methods. quality and high resolution (both temporal and However, using technologies in novel ways is likely to spatial) than ever before. These data are invaluable open up opportunities to answer new questions for addressing many of the pressing questions that (Lawson Handley, 2015). We suggest that revolution- face humanity in the 21st century such as the effects ary ideas are likely to emerge from the interface of climate change, the spread of invasive species, and of biodiversity science and other disciplines such the effect of anthropogenic activities on species, com- as computer science, engineering and social munities and ecosystem services. The immediacy sciences and collaborations of this nature should be with which biological data are shared is of particular encouraged. importance to the study of invasive species (Roy et al., 2015). Emerging EU regulations on Invasive Alien Species will require member states to set up REMOVING BARRIERS TO PUBLIC ENGAGEMENT surveillance and rapid alert systems for agreed lists WITH SCIENCE of invasive species by December 2015 (European Amateur naturalists have been undertaking scientific Parliament, 2014). Online reporting of potentially research for hundreds of years (Pocock et al., 2015b) invasive non-native species can help prompt swift and it is perhaps only in modern times that ‘science’ action to prevent them becoming established (e.g. has become the preserve of professional researchers. GB Species alerts, http://www.nonnativespecies.org/ Technologies can provide access to literature (e.g. alerts/index.cfm).

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749 746 T. AUGUST ET AL.

ENHANCEMENT OR REPLACEMENT Anderson, 2014). These motivations must be consid- The affordability of new technologies along with ered and understood when developing a project. Once increased desire to include the wider community in the key motivations are understood, technologies can research poses a challenge to the traditional roles of be used to target these. For example, league tables expert data providers (Wilson & Graham, 2013a). and badges can be implemented for users who are Biological information is a significant component in motivated by reputation and competition while apps the ‘democratization’ of geographical knowledge (Warf can be loaded with identification guides and quizzes & Sui, 2010) which creates new possibilities for bio- for users who are motivated by increasing their skills. logical recording and ecological politics as data can be proved by anyone rather than only professionals and CONCLUSIONS experts. Whilst these processes seem set to alter the relationship between expert and amateur (Freenberg, There are few aspects of society that have not been 1999) it is unlikely to mean the end of the expert. It influenced by technologies developed in the post- is suggested that the complexities of technological industrial information revolution. These technologies systems are such that power and expertise will shape the way we communicate, work and socialize, remain within small groups (Haklay, 2013; Wilson & fundamentally changing the way we live our lives in Graham, 2013b). Indeed, the same is likely to be true comparison with previous generations. of taxonomic expertise, which is a crucial element of Technological advances are also changing the land- biological recording that cannot be replaced and will scape of biological recording: websites and mobile instead grow in demand as the amount of records technologies are streamlining data gathering, ensur- requiring verification increases. ing data quality and engaging a wider audience with Not everything can be done online of course – many nature; automation and crowd-sourcing are improving species identifications still depend on physical speci- verification and validation systems; data are becoming mens being passed to experts to check, and there are better connected, more open and re-usable, allowing many benefits to learning about fieldwork and record- meaningful analyses at policy-relevant scales; and ing wildlife from face-to-face meetings with people in data contributors are being rewarded with data visu- the field. However, the range of tools and resources alization tools, feedback and game-like elements. being developed is undoubtedly extending the reach While there are undoubtedly challenges associated and speed of the sorts of human interaction and with the adoption of emerging technologies, they are collective sharing of information that have always set to significantly enhance biological recording, been such a feature of biological recording. granting us a greater understanding and appreciation There are also areas where technologies can replace of the natural world. humans. Checks of record validity, such as ensuring records of terrestrial are not in the sea, are monotonous for humans but can be quickly and effec- ACKNOWLEDGEMENTS tively carried out by computers. Even more complex We thank two anonymous reviewers for their helpful tasks such as identifying species from sounds, images comments. The Biological Record Centre receives or eDNA, or providing written feedback can be com- support from the Joint Nature Conservation Commit- pleted by computers. While these methods are cur- tee and the Natural Environment Research Council rently in their infancy they show the potential to (via National Capability funding to the Centre for work at least with more straightforward scenarios. Ecology & Hydrology, project NEC04932). This will allow experts to focus on the more challeng- ing aspects of recording. GLOSSARY USER MOTIVATION VERSUS ACADEMIC AMBITION Apps – An abbreviation of application, ‘app’ is typi- No matter the implementation of technology it is cally used to describe a piece of software that can be important to recognize that the motivations of the downloaded and used on a mobile device such as a end-users and the creator are unlikely to be the same. smartphone or tablet. A study of expert opinion recently found the proper- Big Data – Extremely large datasets, often a result ties of a citizen science project identified as important of increased data gathering or new technology, varied greatly depending on if the experts questioned which require modern, powerful computers and were end-users or participants (Pocock et al., 2015a). data processing to analyse. The motivation of participants may vary greatly, from Cloud Computing – The use of servers on the inter- shared beliefs with the project to a desire to increase net to store data and run software instead of a one’s reputation in the community (Nov, Arazy & personal computer.

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749 EMERGING TECHNOLOGIES FOR BIOLOGICAL RECORDING 747

Crowd Sourcing – Using a large number of people, bioacoustic signal recognition. Bulletin of Entomological often online, to perform a service often without pay. Research 94: 319–330. The term is a blend of ‘crowd’ and ‘outsourcing’. CODATA-ICSTI Task Group on Data Citation Standards Data Warehouse – A repository of data, often coming and Practices. 2013. Out of cite, out of mind: the current from a number of different sources. state of practice, policy, and technology for the citation of HTML5 – Hyper Text Markup Language is the lan- data. Data Science Journal 12: CIDCR1–CIDCR75. guage used to create webpages, defining how they Comont R. 2013. The Garden Bioblitz: citizen science meets look and function. HTML5 is the latest version of social media. National Forum for Biological Recording Newsletter 46: 19–20. this language released in October 2014. Costello MJ. 2009. Motivating online publication of data. Machine Learning – The use of computational BioScience 59: 418–427. probabilistic methods for the detection (‘learning’) Costello MJ, Michener WK, Gahegan M, Zhang Z, of patterns and structures in available (‘training’) Bourne P, Chavan V. 2012. Quality assurance and Intel- data and the construction of algorithms that can be lectual Property Rights in advancing biodiversity data pub- used to make predictions about previously unseen lication October 2012. Copenhagen: Global Biodiversity data. Information Facility. Open Access – Freely accessible to all at minimal Costello MJ, Michener WK, Gahegan M, Zhang ZQ, cost, usually via the internet and with no restric- Bourne PE. 2013. Biodiversity data should be published, tions on use. The term is usually applied to data. cited, and peer reviewed. Trends in Ecology & Evolution 28: Open Source – Applied to software this term indi- 454–461. cates that the source code is freely accessible. De Souza Muñoz ME, De Giovanni R, de Siqueira MF, Smartphone – A mobile phone that has functionality Sutton T, Brewer P, Pereira RS, Canhos DAL, Canhos of a computer. Smartphones usually have access to VP. 2009. openModeller: a generic approach to species’ the internet, GPS, cameras, a touch screen inter- potential distribution modelling. GeoInformatica 15: 111– face, and the ability to run third-party applications. 135. Social Networks/Media – Websites and applica- Deloitte. 2013. The Deloitte Consumer Review Beyond the tions that allow users to interact socially, typically hype: the true potential of mobile. online. These platforms often allow users to share Demeritt D. 2001. Scientific forest conservation and the messages, images and videos with one another statistical picturing of nature’s limits in the Progressive-era United States. Environment and Planning D: Society and Space 19: 431–459. REFERENCES Department for Environment Food and Rural Affairs. 2012. Interim Chalara Control Plan. Anderson SE, Dave AS, Margoliash D. 1996. Template- Department for Environment Food and Rural Affairs. based automatic recognition of birdsong syllables from con- 2013. UK Biodiversity Indicators in Your Pocket 2013. tinuous recordings. Journal of the Acoustical Society of Deterding S, Sicart M, Nacke L, O’Hara K, Dixon D. America 100: 1209–1219. 2011. Gamification. using game-design elements in non- Ansine J. 2013. Reaching the public through iSpot: your place gaming contexts. Proceedings of the 2011 annual conference to share nature. Science Communication – a practical guide extended abstracts on Human factors in computing systems for scientists. Oxford: Wiley Blackwell, 257–259. – CHI EA ’11: 2425. Blake S, Siddharthan A, Nguyen H, Sharma N, Doupe AJ, Kuhl PK. 1999. Birdsong and human speech: Robinson AM, O’Mahony E, Darvil B, Mellish C, van common themes and mechanisms. Annual Review of Neu- der Wal R. 2012. Natural language generation for nature roscience 22: 567–631. conservation: automating feedback to help volunteers iden- European Parliament. 2014. Legislative resolution of 16 tify bumblebee species. Proceedings of COLING 2012, 311– April 2014 on the proposal for a regulation of the European 324. Parliament and of the Council on the prevention and Briggs F, Lakshminarayanan B, Neal L, Fern XZ, Raich management of the introduction and spread of invasive R, Hadley SJK, Hadley AS, Betts MG. 2012. Acoustic alien species (COM(2013)0620 – C7-0264/2013 – 2013/ classification of multiple simultaneous bird species: a multi- 0307(COD). European Parliament. instance multi-label approach. Journal of the Acoustical Foster GN. 2015. Taking the oldest insect recording scheme Society of America 131: 4640–4650. into the 21st Century. Biological Journal of the Linnean Burkmar R. 2014. The shifting paradigm of biological iden- Society 115: 594–504. tification. Shrewsbury: Fields Studies Council. Freenberg A. 1999. Questioning technology. New York: Burnett J, Copp C, Harding P. 1995. Biological recording Routledge. in the United Kingdom – present practice and future devel- Galton F. 1907. Vox populi. Nature 75: 450–451. opment. London: Department for the Environment. Hagedorn G, Mietchen D, Morris R, Agosti D, Penev L, Chesmore ED, Ohya E. 2004. Automated identification of Berendsohn WG, Hobern D. 2011. Creative Commons field-recorded songs of four British grasshoppers using licenses and the non-commercial condition: implications for

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749 748 T. AUGUST ET AL.

the re-use of biodiversity information. ZooKeys 149: 127– progress and problems. Proceedings of the International 149. Congress, Paris, September 20–22, 2010. Haklay M. 2013. Citizen science and volunteered geographic Nov O, Arazy O, Anderson D. 2014. Scientists@Home: what information – overview and typology of participation. In: Sui drives the quantity and quality of online citizen science D, Elwood S, Goodchild M, eds. Crowdsourcing geographic participation? PloS ONE 9: e90375. knowledge: volunteered geographic information (VGI) in Pocock MJO, Newson SE, Henderson IG, Peyton J, theory and practice. Dordrecht: Springer, 105–122. Sutherland WJ, Noble DG, Ball SG, Beckmann BC, Hickling R, Roy DB, Hill JK, Fox R, Thomas CD. 2006. Biggs J, Brereton T, Bullock DJ, Buckland ST, The distributions of a wide range of taxonomic groups are Edwards M, Eaton MA, Harvey MC, Hill MO, Horlock expanding polewards. Global Change Biology 12: 450–455. M, Hubble DS, Julian AM, Mackey EC, Mann DJ, Isaac NJB, Pocock MJO. 2015. Bias and information in Marshall MJ, Medlock JM, O’Mahony EM, Pacheco M, biological records. Biological Journal of the Linnean Society. Porter K, Prentice S, Procter DA, Roy HE, Southway 115: 522–531. SE, Shortall CR, Stewart AJA, Wembridge DE, Wright Isaac NJB, van Strien AJ, August TA, de Zeeuw MP, Roy MA, Roy DB. 2015a. Developing and enhancing biodiver- DB. 2014. Statistics for citizen science: extracting signals of sity monitoring programmes: a collaborative assessment of change from noisy ecological data. Methods in Ecology and priorities. Journal of Applied Ecology. doi: 10.1111/1365- Evolution 5: 1052–1060. 2664.12423 James T. 2011. Improving wildlife data quality. Nottingham: Pocock MJO, Roy HE, Preston CD, Roy DB. 2015b. The National Biodiversity Network. Biological Records Centre: a pioneer of citizen science. Kumar N, Belhumeur PN, Biswas A, Jacobs DW, Kress Biological Journal of the Linnean Society 115: WJ, Lopez IC, Soares JVB. 2012. Leafsnap: a computer 475–493. vision system for automatic plant species identification. Powney GD, Isaac NJB. 2015. Beyond maps: a review of the Computer Vision – ECCV 2012, Lecture Notes in Computer applications of biological records. Biological Journal of the Science, 502–516. Linnean Society 115: 532–542. Lawson Handley L. 2015. How will the ‘molecular revolu- Prestopnik N, Crowston K. 2012. Purposeful gaming & tion’ contribute to biological recording? Biological Journal of socio-computational systems: a citizen science design case. the Linnean Society 115: 750–766. Proceedings of the 17th ACM international conference on Lintott CJ, Schawinski K, Slosar A, Land K, Bamford S, Supporting group work, 75–84. Thomas D, Raddick MJ, Nichol RC, Szalay A, Roy HE, Adriaens T, Isaac NJB, Kenis M, Onkelinx T, Andreescu D, Murray P, Vandenberg J. 2008. Galaxy Martin GS, Brown PMJ, Hautier L, Poland R, Roy DB Zoo: morphologies derived from visual inspection of galaxies et al. 2012. Invasive alien predator causes rapid declines of from the Sloan Digital Sky Survey. Monthly Notices of the native European ladybirds. Diversity and Distributions 18: Royal Astronomical Society 389: 1179–1189. 717–725. Maclean IMD, Wilson RJ. 2011. Recent ecological responses Roy HE, Rorke SL, Beckmann B, Booy O, Botham MS, to climate change support predictions of high extinction Brown PMJ, Harrower C, Noble D, Sewell J, Walker risk. Proceedings of the National Academy of Sciences of the K. 2015. The contribution of volunteer recorders to our United States of America 108: 12337–12342. understanding of biological invasions. Biological Journal of Maes D, Isaac NJB, Harrower CA, Collen B, van Strien the Linnean Society 115: 678–689. AJ, Roy DB. 2015. The use of opportunistic data for IUCN Saylor M. 2012. The mobile wave: how mobile intelligence Red List assessments. Biological Journal of the Linnean will change everything. New York: Perseus Books/Vanguard Society 115: 690–706. Press. Martin J, Edwards HH, Burgess MA, Percival HF, Scanlon E, Woods W, Clow D. 2014. Informal participation Fagan DE, Gardner BE, Ortega-Ortiz JG, Ifju PG, in science in the UK: identification, location and mobility Evers BS, Rambo TJ. 2012. Estimating distribution of with iSpot. Educational Technology & Society 17: 58–71. hidden objects with drones: from tennis balls to manatees. Shirky C. 2010. Cognitive surplus: creativity and generosity PLoS ONE 7: e38882. in a connected age. London: Penguin Group. Mason SC, Palmer P, Fox R, Gillings S, Hill JK, Thomas Silvertown J, Harvey M, Greenwood R, Dodd M, CD, Oliver TH. 2015. Geographical range margins of Rosewell J, Rebelo T, Ansine J, McConway K. 2015. many taxonomic groups continue to shift polewards. Crowdsourcing the identification of organisms: a case-study Biological Journal of the Linnean Society 115: 586– of iSpot. ZooKeys 480: 125–146. 597. Sliwa J, Benoist E. 2011. Pervasive computing – the next Mellinger D, Stafford K, Moore S, Dziak R, Matsumoto technical revolution. 2011 Developments in E-systems Engi- H. 2007. An overview of fixed passive acoustic observation neering. IEEE, 621–626. methods for cetaceans. Oceanography 20: 36–45. Smith VS, Rycroft SD, Brake I, Scott B, Baker E, Morrison D. 2011. Why is taxonomy still presented to the Livermore L, Blagoderov V, Roberts D. 2011. world as books? Australasian Systematic Botany Society Scratchpads 2.0: a Virtual Research Environment support- Newsletter 149: 17–22. ing scholarly collaboration, communication and data publi- Nimis PL, Lebbe RV. 2010. Tools for identifying biodiversity: cation in biodiversity science. ZooKeys 70: 53–70.

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749 EMERGING TECHNOLOGIES FOR BIOLOGICAL RECORDING 749

Smith VS, Rycroft SD, Harman KT, Scott B, Roberts D. Van Strien AJ, van Swaay CAM, Termaat T. 2013. Oppor- 2009. Scratchpads: a data-publishing framework to build, tunistic citizen science data of species produce reli- share and manage information on the diversity of life. BMC able estimates of distribution trends if analysed with Bioinformatics 10: Suppl 1: S6. occupancy models (V Devictor, Ed.). Journal of Applied Snaddon J, Petrokofsky G, Jepson P, Willis KJ. 2013. Ecology 50: 1450–1458. Biodiversity technologies: tools as change agents. Biology Walters CL, Freeman R, Collen A, Dietz C, Fenton MB, Letters 9: 20121029. Jones G, Obrist MK, Puechmaille SJ, Sattler T, Stowell D, Plumbley MD. 2011. Birdsong and C4DM: a Siemers BM et al. 2012. A continental-scale tool for acous- survey of UK birdsong and machine recognition for music tic identification of European bats. Journal of Applied researchers. London: Centre for Digital Music, Queen Mary, Ecology 49: 1064–1074. University of London. Warf B, Sui D. 2010. From GIS to neogeography: ontological Stowell D, Plumbley MD. 2014. Automatic large-scale clas- implications and theories of truth. Annals of GIS 16: 197– sification of bird sounds is strongly improved by unsuper- 209. vised feature learning. PeerJ 2: e488. Water Information System for Europe. 2008. Water Note 7 Taxonomic Databases Working Group. 2011. GUID and – intercalibration: a common scale for Europe’s waters. Life Sciences Identifiers Applicability Statements. Brussels: European Commission. Trolliet F, Huynen M-C, Vermeulen C, Hambuckers A. Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, 2014. Use of camera traps for wildlife studies . A review. Giovanni R, Robertson T, Vieglais D. 2012. Darwin Biotechnology, Agronomy, Society and Environment 18: 446– Core: an evolving community-developed biodiversity data 454. standard. PloS ONE 7: e29715. Tulloch AIT, Possingham HP, Joseph LN, Szabo J, Wilson MW, Graham M. 2013a. Situating neogeography. Martin TG. 2013. Realising the full potential of citizen Environment and Planning A 45: 3–9. science monitoring programs. Biological Conservation 165: Wilson MW, Graham M. 2013b. Neogeography and volun- 128–138. teered geographic information: a conversation with Michael Tweddle JC, Robinson LD, Pocock MJO, Roy HE. 2012. Goodchild and Andrew Turner. Environment and Planning Guide to citizen science: developing, implementing and A 45: 10–18. evaluating citizen science to study biodiversity and the Zwart MC, Baker A, McGowan PJK, Whittingham MJ. environment in the UK. London: Natural History 2014. The use of automated bioacoustic recorders to replace Museum and NERC Centre for Ecology & Hydrology for human wildlife surveys: an example using nightjars. PloS UK-EOF. ONE 9: e102770.

© 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, 115, 731–749