Revisionary taxonomy in a changing e-landscape

Malcolm J. Scoble, Ben R. Clark, H. Charles J. Godfray, Ian J. Kitching & Simon J. Mayo

Fixed-publication (whether paper, CD-ROM or PDF files) as the medium for descriptive taxonomy is being challenged almost by default and is unlikely to survive in the long-term – at least as the sole means of publication. The future is already apparent in a number of online approaches to revisionary (descriptive, monographic) taxonomy that involve continuous addition and correction of data. Such information comes from single source web pages to databases linked in distributed networks. The ubiquity of the Internet for promoting particular views is likely to be too strong to resist, however persuasive the arguments for control by the codes of nomenclature. This prediction is strengthened by the diminishing workforce of professional taxonomists (who are largely responsible for shaping the Codes) involved in describing life on the planet. While taxonomists may be unable to control taxonomic content in the new and more anarchic medium (other than within their own diminishing community), they do have the opportunity to shape the field to the benefit of their own close community and the much wider (yet undefined) user base. An Internet-based approach to revisionary taxonomy is described in the CATE project (www.cate-project.org). This allows new taxonomic proposals to be made against an integral community style peer-review process forming part of the workflow. It also provides the opportunity for high quality products to be maintained through the incorporation of well-tested taxonomic standards. We also consider that there is much merit in treating taxonomy overtly as an information science, while still regarding it as an intellectual endeavour in its own right. This shift in emphasis is explored by examining the growing influence of e-projects in the changing taxonomy landscape. Malcolm J. Scoble* & Ian J. Kitching, Department of Entomology, Natural History Museum, Cromwell Road, London SW7 5BD, UK. [email protected], [email protected] Ben R. Clark & H. Charles J. Godfray, Department of Zoology, Oxford University, South Parks Road, Oxford OX1 3PS, UK. [email protected], [email protected] Simon J. Mayo, Royal Botanic Gardens, Kew, UK. [email protected]

Introduction 2004). Prominent among the practical reasons is the This paper examines some issues about e-taxonomy, value of producing an inventory of life on the planet so the implicit assumption is made that taxonomy (e.g., and notably, the United Nations Conference is desirable – it is, indeed, an integral part of hu- on Environment and Development (UNCED, also man communication about species. There has been a known as the Earth Summit), held in Rio de Janei- number of justifications for the discipline, both acad- ro in 1992 http://www.un.org/geninfo/bp/enviro. emic and practical (e.g., papers in Godfray & Knapp html and, more recently, the G8 Potsdam Initiative

Tijdschrift voor Entomologie 150: 305–317, Figs 1–3. [ISSN 0040–7496]. http://www.nev.nl/tve © 2007 Nederlandse Entomologische Vereniging. Published 1 December 2007.

Downloaded from Brill.com09/28/2021 04:57:49PM * Corresponding author via free access 306 Tijdschrift voor Entomologie, volume 150, 2007

on Biodiversity 2007, http://www.g7.utoronto.ca/ is expected from a printed monograph, with an environment/env070317-gabriel.htm). analytical treatment of all the included taxa. Unlike What should be meant by inventory is not just a com- most printed monographs, online versions have the prehensive list of taxa (although that goal is difficult capacity to be more extensively illustrated and ar- enough to achieve in itself), but the provision of the ranged so that the information is more appropriately kind of rich information that is traditionally made presented than that typically offered in the succinct available in good quality monographic treatments, style of traditional monographs. Even more impor- whether regional or global. Considerable success tant, such online treatments have two other key bene- has been achieved in the provision of underpinning fits. One is that they are updatable within a short taxonomic infrastructure for revisionary taxonomy space of time, which is decidedly not the case with (specimens, collections, taxon names), although printed monographs. Typically decades pass between there is far to go before the metadata associated with the appearance of successive comprehensive rev- the specimens housed in the plethora of collections isions with additions being published in the interim around the world are computerised and made avail- as short, disjunct papers. The other benefit is that able online, let alone digital images of the specimens. online versions have the capacity to be improved The BioCASE project (www.biocase.org) provides a and expanded by the taxonomic community at large mechanism to link specimen (unit)- and collection- rather than just the author(s) of the treatment. level databases and has a web user-interface, enabling It has been suggested, inter alia, (Dayrat 2005) that taxonomists to gain access to details of specimens new species names should be created for a taxon such as their depositories, type-status and geographi- only if a recent taxonomic revision deals with all the cal location. The Species 2000 and ITIS Catalogue names of the group. Although this and other propos- of Life (http://www.catalogueoflife.org/info_about_ als made in the same paper are probably impracti- col.php), provides access to species names and associ- cal (Esselstyn 2007), Dayrat’s point is one that will ated data through its federated system of 47 distrib- resonate with many taxonomists. All too often the uted databases compiled (and owned) by a variety lack of access to up to date critical treatments leads of authors, and the Global Biodiversity Information to the description of synonyms, and for very many Facility (www.gbif.org) is an international body set taxa no recent revision exists. Even in the case of such up specifically to provide access to biodiversity in- a conspicuous and relatively popular taxon as hawk- formation, currently mainly taxonomic names and (, ), the previous glo- specimens. bal revision was published over 100 years ago (Roth- There is also, a great deal of information on the schild & Jordan 1903). (The annotated catalogue Internet about species and higher taxa beyond to this group by Kitching & Cadiou (2000), which this basic level. Yet few comprehensive treatments corrects and updates the names, was never intended exist online that are equivalent to what we expect to to be a revision enabling specimens to be identified see in monographic treatments. Most taxonomy on to species.) Dayrat’s understandable concern could the Internet (leaving aside electronic journals), takes be met if taxonomists were to focus their efforts on the form of HTML pages posted by individuals. critically synthesising current knowledge and mak- Encouragingly, there is a growing number of web- ing it available at one place online, thus providing sites developed by special interest groups for the ex- a platform for new species (and associated informa- change of information on particular taxa (e.g., www. tion) to be added. tortricid.net; www.antweb.org and the ‘scratchpads’ Both cultural and technical factors are responsible of the EDIT project, see www.editwebrevisions. for impeding the vision of user-friendly, online and info/scratchpadSiteList). Notable among those rapidly updatable taxonomic databases. With regard websites that are more comprehensive in their cov- to culture, taxonomists have sometimes shared their erage are the extensive knowledge bases on, among knowledge, but typically the production of mono- others, fishes (www.fishbase.org), bumblebees graphic content has been done in a solitary fashion (http://www.nhm.ac.uk/research-curation/projects/ or perhaps with one or a few co-authors. This ap- bombus/index.html), and echinoids (http:// proach has been effective in many ways, given the www.nhm.ac.uk/research-curation/projects/ demonstrably large corpus of paper-published infor- echinoid-directory/). mation. Where it has fallen short is in the scattered We believe (e.g., Godfray 2002; Scoble 2004; God- nature of the literature on most taxa that it gave rise fray et al. in press) that taxonomic revisions for to, in effect rendering most of the information rela- each major taxon should be both accessible on the tively inaccessible to those unable to benefit from the Internet and frequently updated. Ideally, each taxon few large specialist taxonomic libraries. Dedicated would have a consensus taxonomy, including what users may buy monographs for their taxon or taxa

Downloaded from Brill.com09/28/2021 04:57:49PM via free access Scoble et al.: Revisionary taxonomy in a changing e-landscape 307 of interest, should they exist, and may also make a building the websites. They are designed to allow and small personal collection of other key papers. But encourage wider engagement and collaboration by in a world where expeditious access to information those interested in the taxon in question. through the Internet is becoming an expectation, A possible reason for the difficulty in getting taxono- the situation in taxonomy seems untenable. This has mists to work as one is not because they are necessar- profound implications for taxonomists. We believe ily more intransigent than scientists in other fields, that taxonomy is a necessary part of human know- but rather because their ‘raw’ data (specimens, spe- ledge, whether for ‘academic’ or practical purposes. cies identifications) are usually highly interpreted (to So should taxonomists fail to meet wider needs, us- what species does a specimen belong?, and how is ers are likely to derive information wherever they can a species delimited?). Taxonomists are more likely find it on the Internet, with everything that this im- to debate the underlying interpretation of the fun- plies for quality assurance. damental data with which they populate databases Examples where attempts are being made to ad- than is the case in typical e-science projects in other dress the relative lack of comprehensive taxon- fields, where the discussion is likely to be held once omic content online are the Planetary Biodiversity the analysis has been completed. Typical e-science Inventories (an initiative supported by the US Na- projects usually involve the manipulation of massive tional Science Foundation, which funds projects on quantities of numerical data for statistical analysis, several taxa), the recently launched Encyclopedia of modelling and simulation studies (e.g., as in climate Life project (www.eol.org) and the CATE project change) and in this they often differ from taxonomy. (www.cate-project.org), which is discussed in more This is not to suggest that taxonomy is alone in hav- detail below. ing an information base that is complex and hetero- If the vision of online taxonomy in the form of web- geneous. Much ecological work produces complex based revisions is to be implemented, several impedim- and context-specific output, and projects such as ents need to be addressed. EML (Ecological Mark-up Language) seek to dev- elop the semantic structure for organising the field’s metadata. Yet while modelling can be undertaken us- Impediments to an Internet-based ing the results of taxonomic data (as, for example, in taxonomy the BiodiversityWorld project, http://www.bdworld. org/, White et al. 2003), dealing with taxonomic Achieving effective collaboration databases is an exercise in organizing less extensive Getting the taxonomic community to collaborate on datasets but often with more complex data. This a scale as yet not achieved is a significant, even the situation points to more debate and dissent at the major, impediment to effective Internet-based taxon- data-gathering stage thus impeding the building of omy. Shifting taxonomists’ collective mindset from agreed collaborative taxonomies. an individualistic style of working to a collaborative one is problematic. What gains have been made sug- Moving from the comfort-zone of a fixed gest that technical developments, which enable data medium exchange across the Internet, are largely responsible Print on paper, the archetypal fixed medium, has for driving the change. Collaboration as it exists been very effective in providing taxonomy with dat- now often takes the form of online discussion (e.g., ed reference points for establishing names and other the Taxacom mailing list for biological systematics taxonomic acts. Indeed, in general people are wary http://mailman.nhm.ku.edu/mailman/listinfo/taxa- of an entirely electronic medium because it lacks the com) and in the linkage of databases of names and comfort of a physical object. Books, it can be argued, specimens (e.g., as in Species 2000 and BioCASE). have lasted in libraries despite wars, plagues and fam- Comprehensive and synthesized information on the ines, and a paper written in 1700 can still be read Internet is limited, FishBase being a notable excep- today. Works that consist entirely of 0s and 1s are tion. Integration of taxonomic effort across Europe is distrusted because people are uncertain about ar- the aim of the EDIT network (http://www.e-taxon- chiving digital information. Everyone has lost data omy.eu/). While developments are at an early stage, on a computer at sometime or another and it is the existence of online content management systems, understandable that this concern is generalized customized for taxonomists, are proving popular for to the electronic medium as a whole. So a further posting data and exchanging information about impediment to Internet taxonomy is the perceived selected taxa. The advantages of such ‘scratchpads’ loss of the stabilising role of paper. Having a pre- are that they are cheap to set up and require little cise date of publication for additions and nomen- technical knowledge on the part of taxonomists clatural changes (e.g., describing new species and

Downloaded from Brill.com09/28/2021 04:57:49PM via free access 308 Tijdschrift voor Entomologie, volume 150, 2007

synonymising names) has always been important. using word-processing software. Furthermore, writ- Protocols, as standards, are essential for nomeclatu- ing software for demanding taxonomic applications ral stability (e.g., an older name takes priority over a with the typically limited resources available to de- more recent name when the names refer to the same velopers on taxonomic projects is also a significant concept), which is why taxonomists (usually) follow impediment to progress. Printed publications al- the rules and recommendations of the Codes, despite low for subtleties to be included that may require the intricacies of these documents. Furthermore, a highly complex software to be written to match the fixed medium (particularly paper) is capable of be- flexibility. ing easily archived, which is an important matter for fixing dates of publication and authorship of all Sustainability the numerous additions and changes that have been Finally, there is a real concern over the sustainability made to taxonomy since the baseline works of Lin- of databases (and thus websites that stem from them), naeus (1753, 1758). For both the zoological and whether for taxonomy or other subjects (e.g., Merali botanical Codes, a fixed medium is a requirement & Giles 2005). Whatever may be the shortcomings for valid publication of a new taxonomic name. Pub- of our paper-based system of taxonomy, sustainabil- lication on “read-only laser disks” is allowed in zoo- ity (fragmented and inaccessible as the literature on logical nomenclature (ICZN 1999), although not a taxon may be) is a strength, particularly given that the botanical equivalent, because the medium allows the production of multiple identical copies of a pub- “numerous identical, durable and unalterable copies” lication is required by the codes of nomenclature. to be made. The requirement for new names to be published on Printed articles also have the advantage of provid- paper protects the nomenclature system at present, ing an author or authors with unequivocal accredita- but the view that the overall taxonomic system will tion. This means more than just recognition. A sig- be capable of being sustained in this way seems nificant means of assessing the quality of professional naive, or even complacent, given that so much infor- taxonomists is by the number and quality of their mation is being posted on the Internet. Moreover, publications, with most funding bodies using the there is privately published taxonomy or taxonomy peer-reviewed output of applicants in their decision- published in the grey literature lying at the boundary making process when awarding grants. of Code-compliance, which causes some confusion to the literature. Providing quality control A third impediment to changing taxonomy into an Internet-based discipline is the need to reassure Why move taxonomy to the Internet? users of the veracity of the data posted, or at least There are socio-political and technical reasons for to provide them with some means of assessing the embracing the Internet as the primary medium for quality of a given taxonomic website. Taxonomic taxonomy. Anyone with an Internet connection information is associated with species names (and and the ability to use simple protocols can build the names of higher taxa), and erroneous mapping web-pages and thus post their taxonomic data and of names and data can have serious consequences opinions based on those data. Currently the codes (e.g., in selecting organisms for biological control). of nomenclature do not permit publication of new Such a problem is present in any medium, fixed or names in this way; fixed means of publication re- otherwise, but the usual peer-review system adopted main a requirement. But given that breaching no- for many taxonomy papers helps reduce the number menclatural regulations is not a criminal offence, the of errors. We consider how a peer-review system can only reason why the rules of nomenclature are fol- be incorporated into a web-based revision in our dis- lowed is because the practitioners choose to support cussion of the CATE project (below). High quality them. Given the ever increasing use of the Internet as data content is essential in taxonomy given the role an information provider in so many areas of human the discipline plays as scientific infrastructure and endeavour (e.g., Benkler 2006), compliance is much nowhere is this truer than in the need for a stable less likely in the medium- to long-term. While pro- nomenclature. fessional taxonomists are likely to follow the rules, they are a diminishing band. Indeed, much of the Complexity expertise for many taxa lies outside the profession- A fourth impediment lies in the complexity of deliv- al community already, and the extent of that may ering this new world. Hardware and software devel- very well increase in the future. Moreover, if users opment and usage imposes greater demands on tax- of taxonomic information do not have their needs onomists than preparing conventional manuscripts satisfied, they may build utilitarian systems using

Downloaded from Brill.com09/28/2021 04:57:49PM via free access Scoble et al.: Revisionary taxonomy in a changing e-landscape 309 molecular barcodes or ad hoc numbering systems, publication, static systems do not allow constant thus losing the enormous value and associated updating of a classification and knowledge base in knowledge inherent in the Linnaean system. the integrated way possible in online productions. Fortunately, many taxonomists are keen to engage Second, static outputs obviously fail to expose the with what the Internet has to offer. But it is also im- underlying data in such a way that they can be used portant that they help shape the way in which the for computational analysis in answering biodiversity technology is used, for otherwise there is a distinct questions or for visualisation of data. danger that an anarchic system will prevail with a multitude of taxonomies on offer to an increasingly Intermediate systems confused user community. The Internet is an inher- Individual websites that post information as HTML ently uncontrolled platform, so what is meant here pages allow individuals to make data available on is that taxonomists should work together to achieve their taxon of interest and update it as often as the taxon websites that are rendered authoritative by vir- owner wishes. Custom built software (e.g., Fact tue of their quality, as places on the Internet to which Sheet Fusion http://www.lucidcentral.org/fusion/) users from different domains will gravitate as their has been developed specifically for taxonomists to first port of call. The alternative to a single place on create species web pages. The great advantage of such the Internet, wheth is a consolidated, organ- websites is that they can be changed to incorporate ised source of information or a ‘mashup’ (e.g., Butler new knowledge and that they are accessible freely on 2006) derived from aggregation technology, is one in the Internet. But such systems exhibit limitations which users will increasingly find what they can from apparent in static systems. For example, such ‘flat’ the results of web searches, with all that this means in pages are typically the product of a single individual terms of data quality. and are not reviewed, rendering quality variable. De- pending on the inclination of the owner of a particu- lar website, content can be updated and corrected The taxonomic landscape by feedback from users, but the sites do not offer a The taxonomic landscape is complex and, with the sophisticated collaborative environment. Here again, growth in collaboration and developing technol- the absence of underlying databases limits the com- ogy, changing rapidly. One way to comprehend the putational use to which the data can be put. change is to see how much it has diverged from the static system. Editable and dynamic systems Updatable taxonomic systems vary in the degree to Static systems which they can be termed dynamic (in the sense of Zootaxa is highly successful journal for publishing being immediately updatable), and editable seems a descriptive taxonomy. The papers are published and more appropriate word for most. Such sites include disseminated primarily as PDF files – relatively few collaborative taxonomic websites (e.g., ‘scratchpads’); paper copies being printed. The success of Zootaxa websites that encourage incremental additions (such (see http://www.mapress.com/zootaxa/index.html) as new species) and taxonomic changes with a peer- lies partly in its flexibility. Papers of all lengths are review system incorporated into the workflow (e.g., published, without page charges, and authors re- the CATE project); and species pages that are formed ceive a free high resolution and printable PDF file ‘on the fly’ (‘mashups’) by means of aggregation tech- of their article for personal use and for sharing with nology (e.g., iSpecies, www.ispecies.org) individual scientists. There are no charges for on- ‘Scratchpads’ (V. Smith, pers. com.) are produced us- line publication of colour illustrations. Authors are ing the open source content management platform, able to provide open access to their paper at a cost Drupal (http://drupal.org/), which has been custom- of US$20 per page. Where open access has not been ised for taxonomy and provides a space on the web enabled, readers can buy individual papers, instead for encouraging collaboration between taxonomists of whole issues, at the rate of 1 cent per page for a and the posting of content (see http://www.editwe- PDF version. Zootaxa articles are all peer reviewed, brevisions.info/scratchpads). All kinds of content, and publication is rapid compared with most high whether of text or illustrations, can be imported quality taxonomic journals, the aim being to publish into a database by users completing forms via a web manuscripts within a month of acceptance of the fi- browser. Significantly, the content can be changed nal version. Rapidity of publication is another factor or additions made to it by those who are prepared to in the success of the journal. sign up to a given site. But despite the impressive Zootaxa, the limitations of The CATE project, which is described in more de- a static medium are apparent. First, even with rapid tail in the next section, and some of the Planetary

Downloaded from Brill.com09/28/2021 04:57:49PM via free access 310 Tijdschrift voor Entomologie, volume 150, 2007

1a

Fig. 1. CATE species page for chiron (Drury, 1773) (Lepidoptera: Sphingidae), as implemented on the site in August 2007.

Biodiversity Inventory (PBI) projects are aimed website, although not without practical value. With broadly at forming taxonomic web revisions with its incorporation of semantic web elements, it is a images, descriptions and keys on targeted taxa. The pointer to the future of taxonomy. iSpecies uses what PBIs exhibit considerable variation in their presenta- already exists and the species pages it presents pro- tion and structuring, while the CATE project is at- vide data aggregated from each source or links to tempting a structured but flexible approach to web- such data, there being no attempt to integrate the based taxonomy by building an application that can information synthetically. But iSpecies underlines be widely adopted by taxonomists whole or in part. the point that if the taxonomic community does PBIs (e.g., http://slimemold.uark.edu/database- not provide the syntheses, users will rely increasingly frame.htm; http://silurus.acnatsci.org/; http://www. on aggregation technology to provide information nhm.ac.uk/research-curation/projects/solanaceae- ‘mashups’. source/; http://research.amnh.org/pbi/) are funded Aggregations are of special relevance in entomology: by the United States National Science Foundation given the large numbers of species for which and the CATE project by the UK Natural Envi- information needs to be gathered, there is a consider- ronment Research Council. These projects depend able attraction in weaving together data from across significantly on synthesising existing knowledge (a the web. While detailed treatments are highly desir- major need for users of taxonomy), although adding able for the quality of their output, aggregations give new knowledge is encouraged. immediacy and provide a data platform on which A truly dynamic approach to web-based taxonomy more detailed and edited treatments can be built. is taken by iSpecies, developed by R. Page (see www. ispecies.org), which builds species pages dynamically Global initiatives on demand from a few sources – images from Ya- The global information system on fishes (FishBase, hoo Image Search, literature from Google Scholar www.fishbase.org) is a particularly impressive exist- and genomic data from the National Centre for ing Internet-based synthesis for a major taxon, cover- Biotechnology Information. This is a demonstrator ing about 30,000 species. Two features of FishBase are

Downloaded from Brill.com09/28/2021 04:57:49PM via free access Scoble et al.: Revisionary taxonomy in a changing e-landscape 311

1b

notable in the context of this paper. The first of these it uses taxonomy as a means to deliver the included is that FishBase is not primarily a taxonomic site, but knowledge rather than treating it as an end in itself. one that includes extensive data of a wider biological The second is that FishBase is built on information kind. It is described on its website as a global spe- that is already published, and this is incorporated to cies information site. All good quality taxonomic form a product with considerable added value from revisions include information that is often described the resulting synthesis – a synthesis that is constantly under the heading ‘biology’, but FishBase has been updated with information from new publications. developed as a biological knowledge base. As such This approach solves the problem of accreditation,

Downloaded from Brill.com09/28/2021 04:57:49PM via free access 312 Tijdschrift voor Entomologie, volume 150, 2007

for all additions to FishBase are citable. It also means started to digitise the taxonomic literature (although that the original information is sustained by virtue of significant copyright issues will have to be addressed its publication on paper, although without FishBase if access is to be provided to recent publications). the synthesis, and the capacity for it constantly to be The second, and potentially larger, challenge will be updated, would be lost. to encourage the taxonomic community to edit and FishBase is being used as the model for a new project, enhance the crude species pages and then keep them SpeciesBase (www.speciesbase.org), currently (July updated as new knowledge becomes available. 2007) still at the concept stage, which aims to pro- vide ‘free access to all knowledge about life on Earth’. If this aim sounds bold, the fact that it is modelled Creating a Taxonomic E-science (CATE) on FishBase and proposed by those responsible for The CATE project was developed to provide Inter- that project means that the initiative should by no net-access to the kind of taxonomic information one means be treated as mere hyperbole. Species pages would expect to gain from conventionally published will be produced in a common format providing revisionary taxonomy. However, it goes further than information on, for example, higher classification, this in its proponents’ wish to make the content user- size, habitat, climatic zone, distribution and biology. friendly (by incorporating numerous illustrations for In addition, links to other sources of information example) and, particularly, in providing a means for are provided. Distributional point data will be used the content to be frequently updated and peer-re- to create maps. The SpeciesBase portal is intended viewed. A structured database underlies CATE with to make use of existing databases on organisms by the potential for serving data directly to the user building a cache populated from them. These data- and to global species information systems. Although bases will continue to use their own portals and so CATE will also allow unmoderated contributions serve additional content to that available from the (“wikis”) and potentially information gathered using SpeciesBase pages. aggregation software (“mash-ups”), its emphasis is on A key difference between SpeciesBase and search en- using the skills of taxonomists to create the content. gines and text-based encyclopaedias is that the un- The success of CATE is, therefore, rightly judged derlying information will be structured to render it not in scaling up to include all described species in suitable for scientific analyses across all aspects of a short period of time, but in the degree of success biodiversity. achieved in utilizing taxonomic expertise and pro- One approach to consolidating information is to ac- viding a usable, but customizable, system to make cept that the process of posting information on the content development for any given taxon Web- Internet will inevitably be ad hoc rather than syn- accessible. thetic, and then to use aggregation technology to The wider taxonomic community is invited to add amalgamate data into raw species pages. These pages to, edit and emend these data by submitting correc- are then available for editing by taxonomists as a tions or new information for potential incorporation wiki. This approach, broadly, is that being adopted into the Web-revision. Specifically any author will be by the Encyclopedia of Life project (www.eol.org). able to submit online proposals (such as new species The goal of EoL “is to create a constantly evolving or acts of synonymy) for peer-review, or make simple encyclopedia that lives on the Internet, with con- contributions such as a new locality or observation tributions from professional and non-professional for a particular species that might subsequently be scientists alike; to transform the science of biology, included in a more major revision. and inspire a new generation of scientists, by aggre- The CATE project is described in more detail at gating all known data about every living species; and www.cate-project.org and in Godfray et al. (in ultimately, to increase our collective understanding press). Here, we focus on some specific issues that we of life on Earth, and safeguard the richest possible wish to highlight. The first concerns the suggestion spectrum of biodiversity.” A “ballpark estimate” was (Godfray 2002; Godfray et al. in press; Scoble 2004) given for the full encyclopaedia to be produced in that posting an agreed, but updatable, taxonomy for about ten years from the start of the project (2007). any particular group of organisms is desirable. Such Delivering this project will be demanding. The first a ‘consensus’ taxonomy will help users find a name impediment is the very limited amount of informa- and the information linked to that name. The sec- tion on most of the 1.8 million species presently ex- ond consideration concerns the embedding of a peer isting on the Internet (or on paper for that matter), review system into the application. and the quality of what is available is variable. Some Two taxa were selected as demonstrators for the information will come from the Biodiversity Herit- project, hawkmoths (Sphingidae) and Araceae (ar- age Library initiative within EoL, which has already oids – the arum lily family). Both are conspicuous

Downloaded from Brill.com09/28/2021 04:57:49PM via free access Scoble et al.: Revisionary taxonomy in a changing e-landscape 313 and have a sufficient number of species to develop comprehensive treatment. Second, far from restrict- and test the system and to offer worthwhile content ing the input of the wider community, it actively to users. Sphingidae also represent , for which encourages such engagement. Most users welcome there are special challenges in taxonomy given the comprehensive treatments of taxa, especially for the great number of described species and the enor- purpose of identification. We suggest that they will mous number that surely remain undescribed. The value a single, authoritative source of information CATE species page for the sphingid, Xylophanes rather than one where a choice of names and con- chiron (as implemented in August 2007), is shown in cepts is presented. Yet the CATE system encourages Figure 1. proposal and critical analysis of differing taxonomic concepts. While species concepts in traditional mon- Consensus taxonomy ographs have always been challenged, we have never The word ‘consensus’ is taken to mean a general ac- heard of a monograph being criticized for aiming to ceptance that the taxonomy presented is, overall, a present a single ‘best’ classification on the evidence reasonable representation of the state of knowledge available at the time of publication. at a given time. It does not mean that all taxonomists CATE sites are expected to gain authority by peer necessarily agree about the details. Our suggestion review and not by top down imposition. Editorial that we should aim for a consensus taxonomy has boards for the two CATE demonstrator taxa were caused some discomfort both among taxonomists organised in much the same way as they are for sci- and those from the computing community working entific journals: editors and board members were on biodiversity informatics. The argument against a invited because of their expertise and judgement, a consensus taxonomy from taxonomists is that spe- process no more or less subjective than is typical in cialists simply do not agree about all aspects of the shaping scientific publication. The main difference classification of a given group of organisms and that from most journals is that open peer review permits to suggest otherwise means that scientific discourse a wider set of opinions to be gained from the com- is being suppressed for expediency. Those from the munity at large instead of the more usual two or Information and Communication Technology (ICT) three referees. The open system of the internet allows community are concerned because the Internet is competing websites to be constructed. We hope that embracing a freely interactive environment, particu- at least most taxonomists (as well as the wider user larly with the Web 2.0 philosophy of encouraging community) will benefit from engaging with a sin- fluidity of peer to peer interaction. A consensus tax- gle branded site as their first port of call, but we are onomy sounds counter to the desired fluidity. entirely comfortable with the fact that user selection Our position, in fact, is not one of suppression, nor, will determine the authority of any CATE website. indeed, do we suggest that the taxonomy of any group should remain unchanged. Rather, we believe Peer review that the needs of the wider user community should Online peer review has been incorporated into the be respected and that a best estimate of the current CATE workflow as the means of quality control. position of the taxonomy and nomenclature of the This is expressed graphically in Figs 2 and 3. Fig. 2 group in question should be provided. Taxonomists, illustrates the steps in the process, which starts with after all, are fond of pointing out that their work the existing taxonomy in place in the CATE system. underpins other disciplines by providing a stable Potential additions can be made to the existing tax- nomenclature. While any decent (printed) mono- onomy by an author submitting a proposal, such as graph or revision will consider alternative concepts a new species or a synonymy, in the form of a web of species or other taxa, it will invariably propose page. Once a proposal is submitted it will be posted a single classification with reasoned arguments for on the CATE website and made open for peer-review the taxonomic concepts adopted. The CATE site is by anyone who wishes comment. Following the re- sufficiently flexible to include discussion of alterna- view process and the incorporation of any changes to tives in its output and, as with printed revisions, it the proposal, an editor (or editorial committee) de- provides a recommended classification based on the cides, perhaps with advice from a moderator, whether information available at any one time. Furthermore, to incorporate the proposal into the consensus tax- although we intend to present a consensus taxonomy, onomy, or to post it on an un-moderated part of the alternative hypotheses will be posted on a separate website. In addition a contributor may wish to submit part of the website. a less substantive addition, such as a new specimen CATE has two advantages over printed revisions record or observation that enters the unmoderated in this respect. First, as an online system it can be part of a taxon page. It is important to appreciate updated far more quickly than awaiting the next that labels such as author, contributor and editor refer

Downloaded from Brill.com09/28/2021 04:57:49PM via free access 314 Tijdschrift voor Entomologie, volume 150, 2007

Fig. 2. Peer-review process VVeersionrsion N N Version N+1 ProposedProposed AlternativeAlternative for the CATE project. ChangeChangess TheoriesTheories Changes, in the form of web pages, are proposed to the current consensus classifica- 1 Aus aus L. tion (Version N). Online re- view by the peer-community

a s a J av a a p le t Pl ayr e . T he J av a leads to revisions by the pro- V i rt ua lM a ch in e v 1. 4. 2 or g ret a r e An I n te r ac ti ve K e y t ot h e ge n us Ar um

T h is k e y i s dei s g ne d fo r a ny o ne wit h

a n i ntr e es ti n Au r m . T h is k e y w as c ret a e d by An n a Hi a gh

( 2 0 6 ), bs a ed m o sty l o n T he Ge n us A r u m b y Pe te r B oy c e ( 1 9 3 ). U p d ats e a n d im p ro v em e nt s ar e c o ntn i u al y bi e ng . Som e sh ri n ki ng poser of the change or new C o lo u r i nf o rm a ti on is b a se d information. A process of moderation by means of an editorial board results either Aus aus L. 2 Aus aus in acceptance of the new proposal (Version N+1 be-

a s a J av a a p e l t Pl ayr e . T he J av a Bus aus V i rt ua lM a ch in e v 1. 4. 2 or g ret a er An I n te r ac ti ve K e y t ot h e ge n us Ar um comes the new consensus) or T h is k e y i s dei s g ne d fo r a ny o ne wit h

a n i ntr e es ti n A ru m . T h is k e y w as c ret a e d by An n a H ai gh ( 2 0 6 ), b as ed m o sty l o n T he Ge n us

A r u m b y Pe te r B oy c e ( 1 9 3 ). H o w t o u se t he K e y

U p d ats e a n d im p ro v em e nt s ar e, n ot T h e L u ci d3 k e y i s em b e d e d wi th in a l iv in g. S om es h ri nkn i g w e b b ro w s erm u st b ei ns tal d e b e fo re C o lo u r i nf o rm a ti on is b a se d t h e Lui c d 3 pr o grm a s wil lr u n. Se the posting of the proposal “ H o w t o ch e ck i f yo u h av e Jv a a

is n t al e d” ( h t p:// w w. w lui c d ce nt r al. o rg /l uc id 3/l p

a y e rhl e / p d ef au lt .hm t ) f or m o r e Coo l u r i nf o rm a ti on is b a se d as an alternative view. 3 Aus aus L. Aus aus RRevieview

i n dr e ro rs , w e w oud l li k e t o ha e rt h em , Bus aus s o pl e as e c on ta ct (WI KI) H o w t o ue s te h K e y T h e L u ci d3 k e y i s em b e d e d wi thn i a

w e b b ro w s erm u st b ei ns ta l ed b e fo re t h e Lu ci d 3 pr o gr am s wil lr u n. S e

“ H o w t o ch e ck i f yo u hv a e Jv a a H o w t o u se t he K e y i ns t al e d” ( ht p:// w w w. lui c d ce nt r al. o rg /lc u id 3/l p T h e L u ci d3 k e y i s em b e d e d wi th in a

a y eh r l e p/ d efu a t l .hm t ) fr o m o r epr o ta nt w e b b ro w s erm u st b ei ns tal d e b e fo re

i nf o rm a tin o t o hep l y o u us et h e ke y t h e Lui c d 3 pr o grm a s wil lr u n. Se C o lo u r i nf o rm a ti on is b a se d “ H o w t o he lp y o u us et h e ke y c o r e ctl. y d ri ed s p ecm i e ns , no tlv i n i . g

S om e sh ri n ki ng

C o lo u r i nf o rm a ti on is b a se d

Aus aus 4 Aus aus L. Aus aus L.

Bus aus

a s a J av a a p le t Pl ayr e . T he J av a a s a Jv a a a p e l t Ply a r e . T he J av a V i rt ua lM a ch in e v 1. 4. 2 or g ret a r e An V i rt ua lM a chn i e v 1. 4. 2 or g re atr e An I n te r ac ti ve K e y t ot h e ge n us Ar um

I n te r aci t ve K e y t ot h e ge n us Ar um Ti

Ti T h is k e y i s dei s g ne d fo r a ny o ne wit h H o w t o u se t he K e y T h is k e y i s de si g ne d fo r a ny o ne wit h a n i ntr e es ti n Au r m . a n i ntr e s e ti n A ru m . T h e L u ci d3 k e y i s em b e d e d wi th in a T h is k e y w as c ret a e d by An n a Hi a gh T h is k e y ws a c re at e d by An n a Hi a gh w e b b ro w s erm u st b ei ns tal d e b e fo re ( 2 0 6 ), bs a ed m o sty l o n T he Ge n us ( 2 0 6 ), b asd e m o st ly o n T he Ge n us t h e Lui c d 3 pr o grm a s wil lr u n. Se A r u m b y Pe te r By o c e ( 1 9 3 ). A r u m b y Pe te r By o c e ( 1 9 3 ). n “ H o w t o ch e ck i f yo u h av e Jv a a U p d ats e a n d im p ro v em e nt s ar e C o lo u r i nf o rm a ti on is b a se d on is n t al e d” c o ntn i u al y b ei ng m a d. e If y o u ha v e a n y c om m e nt s or fi n d er ro rs , w e w oud l ( h t p:// w w. w lui c d ce nt r al. o rg /l uc id 3/l p

l ik e to h e ar th em , s o pl e as e co nt a ct a y e rhl e / p d ef au lt .hm t ) f or m o r e me me Mo (WIK I) tiio rarat dede de ra de rati Mo tion Mo on Bus aus L . 5 Aus aus L. Bus aus L .

a s a Jv a a a p e l t Ply a er . T he Jv a a

V i rt ua lM a ch in e v 1. 4. 2 or g re at er An a s a J av a a p e l t Pl ayr e . T he J av a I n te r aci t e v K e y t ot h e ge n us Arm u V i rt ua lM a ch in e v 1. 4. 2 or g ret a r e An T h is k e y i s de si g ne d fo r a ny o ne wit h

I n te r ac ti ve K e y t ot h e ge n us Ar um as a Jav a ap plet a n i nt er es ti n Au r m . T h is k e y i s dei s g ne d fo r a ny o ne wit h T h is k e y w as c re at e d by An n a H ai gh a n i ntr e es ti n A ru m . Play er. Th e Jav a ( 2 0 6 ), b asd e m o sty l o h e ar th em , s o T h is k e y w as c ret a e d by An n a H ai gh p l eae s c o nt ac t (WI KI) ( 2 0 6 ), b as ed m o sty l o n T he Ge n us Virtu al Mach in e A r u m b y Pe te r By o c e ( 1 9 3 ). v 1.4.2 or g reater An C o lo u r i nf o rm a ti on is b a se d U p d ats e a n d im p ro v em e nt s ar e c o ntn i u al y b ei ng m a d e. If c or r ec tl y.

d ri e d sp e cimn e , s n ot li vi ng . Som e

s h ri nkn i g C o lo u r i nf o rm a ti on is b a se d

6 Aus aus L. Bus aus L . Bus aus L .

a s a Jv a a a p e l t Ply a r e . T he J av a a s a J av a a p e l t Ply a r e . T he J av a V i rt ua lM a ch in e v 1. 4. 2 or g ret a r e An V i rt ua lM a chn i e v 1. 4. 2 or g re at er An I n te r ac ti ve K e y t ot h e ge n us Ar um I n te r aci t ve K e y t ot h e ge n us Ar um T h is k e y i s de si g ne d fo r a ny o ne wit h T h is k e y i s dei s g ne d fo r a ny o ne wit h a n i nt er es ti n A ru m . a n i ntr e s e ti n Au r m . a s a Jv a a a p le t Ply a r e . T he Jv a a

T h is k e y w as c ret a e d by An n a H ai gh T h is k e y ws a c re at e d by An n a Hi a h g V i rt ua lM a ch in e v 1. 4. 2 or g re at er An ( 2 0 6 ), b asd e m o sty l o h e ar th em , s o I n te r ac ti ve K e y t ot h e ge n us Arm u ( 2 0 6 ), bs a ed m o st ly o n T he Ge n us A r u m b y Pe te r p l eae s c o ntc a t (WI KI) T h is k e y i s de si g ne d fo r a ny o ne wit h

C o lo u r i nf o rm a ti on is b a se d H o w t o u se t he K e y a n i nt ers e ti n A ru m . T h e L u ci d3 k e y i s em b e d e d wi th in a T h is k e y w as c re at e d by An n a H aih g

C o lo u r i nf o rm a ti on is b a se d ( 2 0 6 ), b asd e m o st ly o h e ar th em , s o p l ea se c o ntc a t (WI KI) H o w t o u se t he K e y T h e L u ci d3 k e y i s em b e d e d wi th in a

w e b b ro w s erm u st b ei ns tal d e b e fo re t h e Lui c d 3 pr o grm a s wil lr u n

to functions in the system rather than to individu- gaining agreement from taxonomists to participate als. In CATE we have actually appointed an editorial in the process, to edit the submissions and decide on board of taxonomists for each of the demonstrator the outcomes. taxa. But the editorial function will need to be tri- Constructing a consensus classification in the first alled to find a system that works best. place is likely to be driven by individual taxonomists, Embedding such a peer-review process into the or networks of taxonomists, who are key figures in workflow is demanding, both to write the soft- the taxonomy of their group. While it is possible ware and to implement. Implementation involves that radically different views exist within taxonomic

Downloaded from Brill.com09/28/2021 04:57:49PM via free access Scoble et al.: Revisionary taxonomy in a changing e-landscape 315

author reviewer consumer

SUBMISSION QUALITY IMPORT CONTROL

DATA DATA

CATE WEBSITE

Fig. 3. Workflow for the CATE project. communities, we believe that networking between 2004). The opportunity for change has arisen mainly taxonomists is growing. Evidence for that view is because of what Information and Communications the existence of many taxon-centred meetings and Technologies have offered in terms of data storage, congresses. The selection of members for editorial manipulation, exchange and accessibility. In short, boards is unlikely to differ from the way these bodies the Internet is vastly improving knowledge trans- emerge for scientific journals. fer. Prior to these technical possibilities, taxonomy It is preferable to look at peer-review in the broader was destined to grow as a cumulative series of pub- context of quality control, which is a challenge for lications. This mass of literature is punctuated by publishing in any medium. Given that users of sci- monographic treatments or revisions, which act as entific information, whether specialists in the field or beacons in a sea of fragmented information for users not, will use the Web with increasing frequency, it is because of the syntheses they provide. These synthe- likely that peer-review as we know it will come un- ses, however, become out of date with new discover- der increasing competition from alternative means of ies, but new monographs typically take many years quality control (e.g., Arms 2002). to complete. Therefore, what might be termed the taxonomic infrastructure remains poorly accessible, Requirements capture which is not to discount the many worthwhile keys, In the CATE project, a considerable amount of guides and inventories to various groups that are time was devoted to an iterative requirements-capture more widely available. process between the taxonomists and the computer Almost seventy years ago, taxonomy was already fall- scientist. Turning higher level goals into the reality ing out of favour. Julian Huxley suggested (Huxley of a computerised application is a salutary procedure 1940) that taxonomists should escape the “burden” requiring precise articulation of what is needed at a of description and naming and engage with other ar- fine level. We consider that undertaking this proc- eas of biology (Scoble in press). This suggestion prob- ess is an essential (and educational) process for all ably arose because much of the value of good quality e-taxonomy projects. descriptive taxonomy was effectively hidden as there was no means of synthesising it into a comprehensive infrastructure. Under such circumstances, it is un- Conclusion derstandable that taxonomy declined in status. The Revisionary taxonomy is undergoing both change message is clear enough: taxonomists need to offer and revival (see contributions in Godfray & Knapp a seamless infrastructure to those users requiring

Downloaded from Brill.com09/28/2021 04:57:49PM via free access 316 Tijdschrift voor Entomologie, volume 150, 2007

information about species (such as identification), enthusiasm and awareness, even if translating the and to those who wish to use the data to explore current world into the new digital one will be pain- questions in biodiversity, such as modelling the shifting ful and frustrating. It requires that we come to terms distribution of organisms in response to climate with new technologies and new ways of working change. – particularly in the form of distributed collabora- Such an infrastructure has a real chance of delivery tion. But if we make the changes, we shall have the thanks to the opportunities offered by the capacity dual benefit of seeing our rich sources of taxonomic to structure information into databases and provide information given the value that they deserve by access to this structured information across the Inter- users and a greater say in engaging iteratively with net. The situation has been well described by Bowker these users to the benefit both of producer and (2000) who noted that the development of a database receiver. can now be seen as an end in itself. Whereas in the traditional model of science, data were embedded in the same paper as the hypotheses being tested, mod- Acknowledgements ern database development has led to what Bowker We acknowledge all our colleagues on the CATE calls the “partial disarticulation” between data gath- team for discussion. We thank Thomas Pape (Natural ering and analyses based on those data. The impor- History Museum of Denmark, Copenhagen) and tance of taxonomic data for examining biodiversity Rienk de Jong (Naturalis, Leiden) for helpful com- ‘issues’, means that there are strong socio-political ments on the manuscript and Erik van Nieukerken reasons as well as scientific ones for taxonomists to (Naturalis, Leiden) for inviting this paper. We are collaborate to produce a much needed infrastruc- very grateful for the support of the UK Natural ture. These reasons, indeed, are stronger than they Environmental Research Council, grant numbers have ever been. NE/C001532/1; NE/C51588X/2; NE/C515871/1. An implicit threat to the cohesiveness of taxonomy is that should the discipline be treated essentially as an information science, its boundaries might dissolve References as databases converge from other domains to form Arms, W.Y., 2002. What are the alternatives to peer re- wider biodiversity datasets. Biodiversity data is not view? – The Journal of Electronic Publishing 8(1). just taxonomic (essentially temporal), but also eco- hhtp://www.press.umich.edu/jep/08–01/arms.html logical (essentially spatial) (e.g., Bowker 2000). Such Bowker, G.C., 2000. Biodiversity Datadiversity. – Social dissolution is surely a price worth paying given the Studies of Science 30: 643–683. DOI: 10.1177/0306 potential impact of such an emergent, data-driven 31200030005001. discipline. Butler, D., 2006. Mashups mix data into global service. – Nature 439: 6–7. Might this mean taxonomy is dead – long live tax- Dayrat, B., 2005. Towards integrative taxonomy. – Biolog- onomy? The existence of the World Wide Web and ical Journal of the Linnean Society 85: 407–415. the possibilities it offers for data exchange, suggests Esselstyn, J.A., 2007. Should universal guidelines be ap- that revisionary taxonomy in its current format is plied to taxonomic research? – Biological Journal of ripe for transformation into something more akin the Linnean Society 90: 761–764. to knowledge bases or species information systems. Godfray, H.C.J., 2002. Challenges for taxonomy. – Nature This does not mean that the functions, purpose 417: 17–19. and protocols of taxonomy should or will be lost. Godfray, H.C.J. & S. Knapp (Eds), 2004. Taxonomy for It suggests, however, that all taxonomists should be the twenty-first century. – Philosophical Transactions conscious that the effort they put towards studying of the Royal Society 359. and revising their taxa is a contribution towards the Godfray, H.C.J., B.R. Clark, I.J. Kitching, S.J. Mayo & greater vision of providing authoritative, Internet- M.J. Scoble, In press. The web and the structure of tax- based information on all species. As such, all work onomy. – Systematic Biology. should be carried out in the context of the new taxo- Hine, C., 2003. Systematics as cyberscience: the role of ICTs in the working practices of taxonomy. – Paper nomic landscape currently being defined by global presented at the Oxford Internet Institute “Infor- mega-projects such as Encyclopedia of Life, which mation, Communication and Society” symposium, aims to provide the means of producing web-pages 17–20 September 2003, University of Oxford, UK, for all species. In particular, taxonomic information http://www.soc.surrey.ac.uk/pdfs/hine_oii.pdf should also be provided in a way that is sufficiently ICZN [International Commission on Zoological No- structured and atomised for it to be rendered capable menclature], 1999. International Code of Zoological of being used in analysis of biodiversity questions. Nomenclature (Fourth Edition). –[http://www.iczn. Taxonomists should embrace this new world with org/iczn/index.jsp]

Downloaded from Brill.com09/28/2021 04:57:49PM via free access Scoble et al.: Revisionary taxonomy in a changing e-landscape 317

Kitching, I.J. & J.-M. Cadiou, 2000. Hawkmoths of Scoble, M.J., In press. Networks and their role in e-taxon- the world: an annotated and illustrated revisionary omy. – In Q.D. Wheeler (Ed.), The New Taxonomy. checklist. – The Natural History Museum, London & Systematics Association Special Volume Series Cornell University Press, New York. White, R., F. Bisby, N. Caithness, T. Sutton, P. Brewer, Linnaeus, C., 1753. Species Plantarum. – Stockholm. P. Williams, A. Culham, M. Scoble, A. Jones, W. Gray, Linnaeus, C., 1758. Systema Naturae, 10th Edn. – Hol- N. Fiddian, N. Pittas, X. Xu, O. Bromley, P. Valdez, miae. 2003. The BiodiversityWorld environment as an Merali, Z. & J. Giles, 2005. Databases in peril. – Nature extensible virtual laboratory for analysing biodiver- 435: 1010–1011. sity patterns. – In: S.J. Cox (Ed.), Proceedings of the Microsoft, 2006. Towards 2020 Science. – Microsoft UK e-Science All Hands Meeting, Nottingham, UK, Corporation, 82 pp. (http://research.microsoft.com/ EPSRC: 341–344. towards2020science/background_overview.htm) Rothschild, L. W. & K. Jordan, 1903. A revision of the lepidopterous family Sphingidae. – Novitates Zoologi- cae 9: Suppl. 1–972. Scoble, M.J., 2002. Unitary or unified taxonomy? In H.C.J. Godfray & S. Knapp Eds.), Taxonomy for the twenty-first century. – Philosophical Transactions of Received: 24 July 2007 the Royal Society 359: 699–710. Accepted: 1 September 2007

Downloaded from Brill.com09/28/2021 04:57:49PM via free access