<<

AquaRES tools for taxonomic editors & external users

Leen Vandepitte

On behalf of the WoRMS DMT • Aquacache – tool for comparing taxonomic checklists

• Improved data services in the framework of aquares – Taxon match services – Occurrence checking services – General quality control and data format checking services

• Hands-on demo of data services

• Data exchange and tools targeting internationalinitiatives AquaCache

• From the project description: – we will design and build a central data cache linking the three databases FADA, WoR MS and RAMS. – This data cache will be hosted at VLIZ and is primarily meant to act as an internal system for running common web services in terms of taxon matching and data cleaning & refinement. – Each of the databases will retain its import, update mechanism and quality control.

• In reality: – The goal of the AquaCache is to serve as an internal data management tool, helping the involved editors to search through and compare lists and identify possible overlaps and discrepancies between lists available in more than one register, e.g. on the level of higher classification or the status of a name. – Lists can be uploaded into the AquaCache as Darwin Core Taxon files and need to include to Species Profile extension, indicating whether a taxon is marine, fresh, brackish or terrestrial. – The functionalities of the AquaCache will be broadened towards the future, depending on the needs of the involved editors.

– After the AquaRES project (2013-2016), the AquaCache will continue under the LifeWatch project, where its functionalities and applications can be further developed. Work in progress…

• AquaCache = management tool • Search & compare the involved systems (“search”) • : – Green: exact match (taxon name + higher classification) – Yellow: taxon match identifies inconsistencies between the names (e.g. spelling variation) – Red: higher classification is different • Environment: • !: flag missing in WoRMS • X: flags differ between WoRMS & FADA • V: flags correspond between WoRMS & FADA Demonstration of AquaCache http://aquacache.lifewatch.be/ Improved data services in the framework of AquaRES:

An overview, including some demonstrations

A. Taxon match services

B. Occurrence checking services

C. General quality control and data format checking services Taxon match services

• This service allows users to match their taxonomic list to available online standards, including – WoR MS – (CoL) – Pan-European Species Infrastructure (PESI) – Integrated Taxonomic Information System (ITIS) Available – Index Fungorum (IF) – Paleo-DB – Global Names Index (GNI) – International Plant Name Index (IPNI)

– FAD A – Interim Register of Marine and Non-Marine Genera (IRMNG) In development – RAMS – FishBase

• Option to search all of the listed taxonomic standards or just a selection • Taxon match tool => available through LifeWatch: www.lifewatch.be/data-services Occurrence checking services

Plotting sampling locations on a map: • Enables a quick visual quality check of the data • Users are able to detect possible errors in the coordinates Common mistakes: – Switching of latitude and longitude – Lack of a minus sign to indicate West or South => These kinds of flaws can easily be fixed by the user, improving overall quality of the data.

Comparing your own occurrences with documented distributions • Taking this further, users can also compare their occurrences with the documented distributions in the taxonomic databases (WoRMS, RAMS, FADA). • Detection of possible errors or gaps can go both ways: – Gaps in your own data ó gaps in the taxonomic database – Errors in your own data ó errors in the taxonomic database • DEMO: Show on map tool => available through LifeWatch: www.lifewatch.be/data-services => Useable for marine & non-marine locations

• DEMO: Compare own occurrences to documented distributions => soon available through LifeWatch: www.lifewatch.be/data-services => “only as good as the available data”

=> Still under development! General quality control & data format checking services

From the project description: • These services include e.g. mapping of the uploaded field names with a standard set of fields, highlighting non-matches or missing required fields, and checking of the data format of e.g. the date-related fields.

• These quality control steps are primarily targeting data providers to allow them to easily check the format and content of their data before submission

• Such quality control services were specifically being developed for data that will contribute to (Eur)OBIS (cfr. now largely replacedby IPT).

• Within this project, the different data formats used for WoRMS, FADA, SCAR- MarBIN, AntaBIF and BioFresh will be compared and where possible mapped to a common standard (e.g. Darwin Core) in order to build more generic web services for checking the quality and format of these data. What has been done? => what is already out there? • IPT– Integrated Publishing Toolkit (by GBIF) # inherent checks when uploading your file(s) • Check whether occurrenceIDs are provided & unique (core-extension) • Check whether ‘basis of record’ is provided • Check data format: – EventDate as ISO-standard – Lat-lon as decimal degrees, with correct separator – Character encoding of file can be indicated (preferred: UTF-8) – IndividualCount field as ‘integer’

• Darwin Core Archive Validator – Checks the Darwin Core Archives: inspects files & compares the mapped concepts to GBIF extensions – Specific focus on unique Identifiers that links the different extensions to the core table

– http://tools.gbif.org/dwca-reports/148-7656490821008157004.html • LifeWatch web services – Data format validation: • Latitude & longitude <> 0,0 • Latitude & longitude between acceptable boundaries (-180/180 & -90/90) • EventDate in correct format

=> DEMO

• Already lot of tools exist… no use to re-invent the wheel… Data exchange with international initiatives

A. General overview

B. Catalogue of Life

C. Encyclopedia of Life

D. GBIF Data exchange

• Largely automated GENETICS Backbone Taxonomic LifeWatch WoRMS – provider to Catalogue of Life

• Memorandum of Understanding - 2009 – WoR MS as contributor to CoL, through its editorial network & global species databases – Data will be displayed in original form, without editing – Data are shared freely, but IPR remains with original custodians (=editors)

• Yearly updates of # Global Species Databases (2009-2012) • Monthly automated updates to CoL, in defined exchange format (since2013) • 46 Global Species Databases delivered to CoL, with monthlyupdates

Tantulocarida Acanthocephala Bochusacea Brachiopoda Ophiuroidea Phoronida Acoelomorpha Brachypoda Cephalochordata Oligochaeta Brachyura Orthonectida Asteroidea Myxozoa Bryozoa Rhombozoa

Tanaidacea 113,764 species names Gnathostomulida Gastrotricha across 23 (sub)phyla Cestoda Octocorallia Scaphopoda Nemertea Foraminifera Trematoda Chaetognatha Placozoa Merostomata Thermosbaenacea Polycystina Hydrozoa Kinorhyncha Polychaeta Leptostraca Mollusca Echinoidea Porifera Proseriata - Kalyptorhyncha Xenoturbellida Holothuroidea Cumacea Monogenea Priapulida • 46 Global Species Databases delivered to CoL, with monthlyupdates

Tantulocarida Acanthocephala Bochusacea Brachiopoda Ophiuroidea Phoronida Acoelomorpha Brachypoda Cephalochordata Oligochaeta Remipedia Brachyura Orthonectida Asteroidea Myxozoa Bryozoa Rhombozoa

Tanaidacea 113,764 species names Gnathostomulida Gastrotricha across 23 (sub)phyla Cestoda Octocorallia Scaphopoda 1 species Nemertea Foraminifera Trematoda Chaetognatha Placozoa Merostomata Thermosbaenacea Polycystina Hydrozoa Kinorhyncha Isopoda Polychaeta Leptostraca Mollusca Echinoidea Porifera Proseriata - Kalyptorhyncha Xenoturbellida Holothuroidea Cumacea Monogenea Mystacocarida Priapulida 35,030 species

• Encyclopedia of Life (EoL)

– EoL gets access to all the WoRMS content (ó Catalogue of Life) – MoU between WoR MS & EoL – Selected information: • Accepted taxon names • Higher classification • Distributions • Selection of notes – Data transfer based on monthly exports from WoR MS

http://eol.org/