<<

An analysis of how Data supports each of the FAIR Data Principles

In 2016, the FAIR data principles were published to offer guidelines to support communities’ needs on data sharing and improve data “Findability, Accessibility, Interoperability and Reuse”. This document outlines how the Mendeley Data platform supports each of these principles.

Findable F1. ()data are assigned a • All Mendeley Data datasets are assigned Persistent IDentifier (PID) in the form of a Digital Object Identifier (DOI) global, unique and persistent • Folders and files in a dataset have a PID as well, derived from the DOI identifier • Different versions of datasets get different DOI's, but the main DOI always resolves to the latest version • Dataset authors and their affiliations are also enriched with PIDs (Mendeley Profile ID, ORCID, Author ID, Scopus Affiliation ID, Mendeley Institution ID, SciVal ID). F2. Data are described with rich • The Mendeley Data default metadata schema is very rich, for example it includes steps to reproduce, description metadata (defined by R1 below) fields at the level of each folder and file, semantic links with other research outputs, categories from controlled vocabulary (Omniscience) • Furthermore, Mendeley Data enables institutions to enrich the default metadata schema with additional metadata templates, including a flexible choice of field types, with guided data entry (e.g. date fields, drop-down, auto- complete, check boxes) and validation rules. Fields can be chosen to be shared upon publication or remain available only to selected users. F3. Metadata clearly and explicitly The DOI is a mandatory field for the dataset and it is automatically generated. It can also be personalized with the include the identifier of the data institutional prefix, if the institution is a DataCite member. they describe F4. (Meta)data are registered or Mendeley Data dataset metadata is available via several searchable resources: indexed in a searchable resource • Mendeley Data Search, which indexes over 1700 repositories. Mendeley Data datasets are among those where deep-indexing is provided, meaning that not only the metadata, but the files themselves are indexed • Google Dataset Search: Mendeley Data exposes metadata via schema.org • DataCite Search: Mendeley Data uses DataCite to mint DOIs and sends metadata to DataCite for indexing • OpenAIRE: Mendeley Data can be harvested via OAI-PMH • Google: Mendeley Data exposes dataset metadata in the web page using the Dublin Core schema and each dataset page is added to Google's sitemap • Share from Framework. Accessible

1

A1. (Meta)data are retrievable by • Published datasets can be accessed via the HTTPS protocol with the most popular web browsers. Dataset URLs their identifier using a standardized are constructed easily from the DOI: if the standard DOI is 10./., the URL is communications protocol https://data.mendeley.com// • Access is available through a simple REST API at the URL: https://api.mendeley.com/datasets/ Other access options are available through: https://dev.mendeley.com/code/datasets_quick_start_guides.html

A1.1 The protocol is open, free, and Mendeley Data uses HTTPS as its main protocol. All product features are available via HTTPS, whether as a User universally implementable Interface or an API. A1.2 The protocol allows for an Mendeley Data does not require access to openly shared datasets, but it does require authentication and authorization authentication and authorization to access any other resource. To do so, it provides a broad range of authentication options: procedure, where necessary • The user can register for free and obtain login credentials to access its own resources • Users from an institution that is subscribing to the commercial version can authenticate using the institutional credentials, including multi-factor authentication, if supported • Any institution that wishes to subscribe to MD can integrate their institutional authentication, provided that it is compatible with the industry standard SAML 2.0 protocol (for example using Shibboleth • Authorization is managed by Mendeley Data. Users of the free version of Mendeley Data are assigned the default role, which allows them to create and publish datasets. Users of the commercial version can be assigned multiple roles; Administrator, Moderator, Project Owner or Default user. A2. Metadata are accessible, even • Mendeley Data supports "tombstoned" DOIs, meaning that, if a dataset is removed from the , the DOI will when the data are no longer still resolve to the dataset page, where a message that the dataset is not available anymore will be displayed. available • Mendeley Data supports long term archiving with DANS, the Institute for permanent access to digital research resources. Every dataset published in Mendeley Data is sent to DANS for dark archiving. Should Mendeley Data cease its operations for any reason, the dataset DOIs will resolve to the copy of the dataset (including metadata) stored at DANS, where the dataset and metadata will remain available in perpetuity. Interoperable I1. (Meta)data use a formal, • Mendeley Data uses the JSON format to represent metadata and applies controlled vocabularies and identifiers accessible, shared, and broadly on fields such as: Authors and their affiliations, Categories and Licenses applicable language for knowledge • Additionally, custom metadata fields can be added with values taken from controlled lists, and relationships to representation. other research objects are semantic, enabling datasets to be linked to other datasets, research articles and software in a way that fully describes the existing relation • Metadata is exposed via standard, interoperable formats such as Schema.org and Dublin Core, besides the JSON format, using interoperable protocols (HTTPS/REST) I2. (Meta)data use vocabularies that The data vocabularies used in custom metadata fields for controlled values can be inspected via the REST API. follow FAIR principles Metadata fields and allowed values can be accurately described both individually and as a group (template) for documentation purposes.

2

I3. (Meta)data include qualified In Mendeley Data, references to other research objects are fully qualified. We support references to articles, datasets references to other (meta)data and software within the following relationships • is related to this dataset • cites this dataset • cited by this dataset • compiles this dataset • compiled by this dataset • data derived from this dataset Reusable R1. Meta(data) are richly described Datasets in Mendeley Data are annotated in ways that enhance reuse and not just discovery. For example: with a plurality of accurate and • can be described in the "Steps to reproduce" field relevant attributes • support for folders in datasets makes grouping and classification of the data immediate • the ability to annotate individual files and folders with text provides maximum accuracy and completeness in describing the data • custom metadata templates can be applied to further enrich the metadata. R1.1. (Meta)data are released with a Mendeley Data supports 16 different licenses out of the box, including the most common and relevant CC variants, clear and accessible data usage software licenses like GPL, MIT, BSD and Apache, as well as hardware licenses. Metadata sent to DataCite are license licensed with the most liberal CC0 license. R1.2. (Meta)data are associated with • Mendeley Data supports detailed provenance by providing standard fields such as contributors, links to source or detailed provenance derived data or software, steps to reproduce. Furthermore, custom metadata fields can be added to document other aspects that support provenance, based on the specific of the data. Furthermore, versioning capabilities enable to track data changes over time • Mendeley Data Search indexes, the geographical location and temporal coverage of the data, when available. All data in the Mendeley Data platform is accessible in a machine-readable format via the APIs. R1.3. (Meta)data meet domain- Mendeley Data supports standard metadata schema such as Dublin Core and schema.org, besides it supports the use relevant community standards of controlled vocabularies, both in standard fields (e.g. Omniscience taxonomy for categories) as well as custom metadata fields, which can be configured to use values from existing taxonomies).

3