Using Biodiversity Data from the NBN Database for Research y

Paula Lightfoot, NBN Trust Data Access Officer Introduction to the NBN Database

1. Overview of available data

2. Finding and accessing data

3. Evaluating data quality

4. Using and referencing data

http://data.nbn.org.uk Summary of Available Data

• 91 million georeferenced taxon occurrence records.

• 27 habitat datasets and 44 site boundary datasets to provide context and act as filters.

• 856 datasets from 150 data providers.

• Standard data format.

• Standard from UK Species Inventory. http://data.nbn.org.uk Data Providers

Records in the NBN Database by data provider type. January 2014 (n = 91,206,588)

• A large proportion of data comes from skilled amateur naturalists. • Data collated taxonomically and/or geographically. • Some structured surveys, much ad hoc recording.

Geographic Coverage and Sampling Effort:

• Recorder effort and data mobilisation are not evenly spread across the British Isles. • New NBN Gateway (v.5) extends coverage to include the Channel Islands and offshore data. • National Biodiversity Data Centre is the repository for ROI data.

Sampling Effort

Collembola Recording Scheme BTO Second Atlas of Breeding Birds 10,633 records of 336 species in Britain and Ireland: 1988-1991 over 200 years 1,465,400 records of 272 species over 4 years

Sampling Effort

Orchesella villosa (a ) http://tombio.myspecies.info/ Taxonomic Coverage

Taxonomic breakdown of records in the NBN Database at January 2014 n = 91,269,685 Currency of Data

Number of records in the NBN Database by year of record (January 2014) n = 89,091,428 (98% of total)

Data Attributes

Standard attributes in NBN Exchange Format: Required: Unique record key, taxon, date, date type, coordinates/grid reference/polygon, projection, precision (what? where? when?) Optional: Survey key, sample key, absent, sensitive, site key, site name, recorder, determiner

Other attributes are not (yet) standardised across datasets: e.g. abundance, life stage, sex, verification status, record type, depth i.e. not standard fields and not standard units / vocabularies Absence Data

10km Interactive Map of Sargassum muticum Zero abundance (T/F) is a standard attribute

Absence records are displayed on the NBN Gateway Interactive Map

The NBN Database currently holds 30,625 absence records across 26 datasets (Jan 2014) Effort-based Data

• The NBN Database holds some effort-based datasets (e.g. BTO Breeding Bird Survey, Shorewatch, Shore Thing, UK Butterfly Monitoring Scheme) • The effort-based methodology should be described in the metadata. • Effort data may be stored as attributes of the species observation e.g. number of observers, timespan of observation period. • Effort data is not stored in a way that enables ‘per unit effort’ analysis. NBN Exchange Format is a flat file, not relational tables. Data Resolution

• The finest resolution currently available is 100m squares.

• Data providers can blur resolution of the ‘public’ version of the records to 1km, 2km or 10km, while granting full access to select users.

• ‘Full access’ includes recorder and determiner names and attributes (where available).

Data Resolution

Data providers

Access resolution of all records in Access resolution of records of the NBN Database designated taxa in the NBN Database (n = 91,206,588) (n = 20,548,842) Data Resolution

Access resolution of vascular plant Access resolution of dragonfly and records in the NBN Database damselfly records in the NBN Database (n = 25,998,531) (n = 1,486,554) Exploring Data Exploring Data

NBN Gateway Interactive Map – create and query layers of species, habitats and site boundaries Evaluating Data Quality and Accessibility

• Publicly accessible records have gone through quality control processes, e.g. checks by local and national experts. • Some have also been checked using NBN Record Cleaner, based on:  Spatial distribution rules  Temporal rules: flight period or first/last year recorded  Identification difficulty / rarity / taxonomic uncertainty • NBN Record Cleaner rules have been created by experts at national recording schemes for over 18,000 species including 77% of conservation priority species (NERC Act 2006). • Nevertheless, erroneous records do occur. Always read the metadata. http://www.nbn.org.uk/record -cleaner.aspx

Evaluating Data Quality and Accessibility

Read the dataset’s metadata: Evaluating Data Quality and Accessibility

Read the dataset’s metadata: Requesting better access to data

For one off use: • Request access as an individual. • Apply taxonomic / geographic / date and dataset filters to request access to the records you need across multiple datasets. For repeated use (strongly recommended!) • Register your organisation on the NBN Gateway (quick and free!). • Apply as an organisation for access to all datasets and permission to use data for research purposes. • Make colleagues and students members of the organisation.

Over 200 organisations have user accounts on the NBN Gateway, around 80% of whom also share their own data Accessing and Using Data

Downloading data from the NBN Gateway

Who you are (individual / organisation) Why you are downloading the data (dropdown list and free text description) Accessing and Using Data

Downloading data from the NBN Gateway

Include sensitive records You will need to have been granted access to these records before downloading data Accessing and Using Data

Downloading data from the NBN Gateway

Geographic filter 10km square Site boundary ‘Within’ or ‘overlapping and within’ Accessing and Using Data

Downloading data from the NBN Gateway

Taxonomic filter Taxon (up to Order) Taxon reporting category (e.g. terrestrial mammals) Designation User-defined list

User-defined lists: e.g. species as proxy indicators of climate change, habitat condition, ecosystem services etc. Must be supplied and maintained (with metadata) by a named organisation. Must be relevant for repeated use, not just one-off use. Accessing and Using Data

Downloading data from the NBN Gateway

Year Range e.g. restrict to recent records only

Accessing and Using Data

Downloading data from the NBN Gateway

Dataset filter You may wish to exclude some datasets e.g. If they have not granted permission If the metadata shows they are not suitable for your purpose Accessing and Using Data

Downloading data from the NBN Gateway

Download Zip file containing: Observations (CSV file) Metadata (TXT file) Download date, time and filters used (TXT file)

Limitations: Filters are not ‘multi-select’. For data on 2 species at 5 sites, you have to do 10 downloads. You have to use a taxonomic, geographic or dataset filter – you can’t download everything! Accessing and Using Data

Custom downloads from the NBN Database • Filter by user-defined species list (one-off use) • Filter by user-defined polygon • ESRI shapefile download format

Accessing data via the NBN REST API • REST API available to view and download • Full documentation available by end March • rNBN tool for release this year

Custom downloads and REST API downloads are logged and reported to data providers, the same as downloads from the NBN Gateway. https://data.nbn.org.uk/Documentation/Web_Services/Web_Services-REST/ Accessing and Using Data Accessing and Using Data

NBN Gateway Terms and Conditions • Require written permission for research use • Require the data provider(s) to be acknowledged • Require the recorder to be acknowledged if appropriate and possible • Require a waiver statement to be included • Require OS Map images to be acknowledged

https://data.nbn.org.uk/Terms Referencing Data

Guidance on referencing data is available on the NBN Website

• DOIs are not currently generated from the NBN Database • This is being considered, but the data access controls and the fact that data may be withdrawn by data providers poses a challenge. Links and References

National Biodiversity Network: www.nbn.org.uk NBN Gateway: http://data.nbn.org.uk NBN Record Cleaner: http://www.nbn.org.uk/Tools-Resources/Recording-Resources/NBN-Record- Cleaner.aspx Guidance on referencing data from the NBN Database: http://www.nbn.org.uk/Use-Data/Using-Maps- or-Data/Using-and-referencing-data-from-the-Gateway.aspx GBIF: www.gbif.org NERC guidance on DOIs: http://www.nerc.ac.uk/research/sites/data/doi.asp Guide to the NBN Exchange Format on YouTube: http://www.youtube.com/watch?v=2WfOjQOaVFI#t=24

Data providers who contributed to maps used in this presentation: 10km interactive map of Sargassum muticum: https://data.nbn.org.uk/Taxa/NBNSYS0000188809 Collembola Recording Scheme dataset: https://data.nbn.org.uk/Datasets/GA000566 BTO Breeding Bird Atlas 1988-1991: https://data.nbn.org.uk/Datasets/GA000147