Implementation of Open-World, Integrative, Transparent, Collaborative Research Data Platforms: the University of Things (UoT)

Prof. Peter Fox ([email protected], @taswegian, #twcrpi, ORCID: 0000-0002-1009-7163) Tetherless World Constellation Chair, Earth and Environmental Science/ / / IT and Web Science Rensselaer Polytechnic Institute, Troy, NY USA And the Deep Carbon Observatory Data Science Team CGA, Harvard, April 27, 2018 What to expect…

• Inevitable context, history + perspective • Deep Carbon Observatory (Integration and Collaboration) – Data Science Platform for an international science community – Lots of RED, and BLACK • data.rpi.edu V2 (Integration and Transparency) • Where we are headed – Integration, Transparency and Collaboration – University Infrastructure for Data Science Working premise :== Mission Statement

Scientists – actually ANYONE - should be able to access a global, distributed knowledge base of scientific data and information that: • appears to be integrated • appears to be locally available • is in a language (written, programming, or science) that is understandable and can be shared Data intensive – volume, complexity, mode, scale, heterogeneity, … in an OPEN WORLD Deep Carbon Observatory (DCO) … • “We are dedicated to achieving transformational understanding of carbon’s chemical and biological roles in Earth.”

www.deepcarbon.net Collaboration and Integration needs …

• “Enable DCO team leaders to create new groups and associate a number of content types --- documents, discussions, blog posts, tasks, links, and bibliographic entries --- with the group, as well as simple event management (a private event calendar for the group) and embedding of external services (e.g. and esp. Google Calendar)” … more… (data, publications, projects)… a Knowledge Network … and a Virtual Organization (> 1000 people) Producers Consumers

Experience

Data Information Knowledge

Creation Presentation Integration Gathering Organization Conversation

Ecosystem metaphor Context 6 -> DCO Data Science Platform 2012 Science Network of Things (Objects) 2012

deepcarbon.net

info.deepcarbon.net

data.deepcarbon.net dx.deepcarbon.net 2012 2015 Dataset Browser, People, Field Sites 2015 All information is linked and traceable! 2013

12 2014 State to date… 2014 • Knowledge network – implements both the collaboration and the integration, reporting implements the transparency – It’s being USED • Many means of population – User generation – Machine generation • Contributing these enhancements back to open- source communities (CKAN, VIVO) There’s more – Jupyter notebooks on top 2016

And this: https://news.rpi.edu/content/2018/04/23/applying-network-analysis-natural-history data.rpi.edu (V2) 2013 Insert data.rpi screen shots 2013 2013 2013

Internal transparency and integration Thus…

• Integrative – semantics • Transparent – semantics • Collaborative – semantics • Application integration – Yep – semantics

• So… where are we headed? Research-grade but not “University-grade”

• Adoption of RDA outputs/ • CIOs approach recommendations – “We only run the – Data Type Registry applications we know – Permanent ID Types how to run” – Dynamic Data Citation* • Library (not a research – Scholix* library) • Improvements to VIVO – Helped to start – Hurt in University • Science network of things adoption (hope^)

* underway ^ New Library Director Progress toward a University of Things

[email protected] and the DCO Data 2018 Science Team • @taswegian #twcrpi

• http://tw.rpi.edu • http://tw.rpi.edu/web/project/DCO-DS • http://deepcarbon.net Garden shed Framework v. systems v. platforms

• Rough definitions – Systems have very well-define entry and exit points. A user tends to know when they are using one. Options for extensions are limited and usually require engineering – Frameworks have many entry and use points. A user often does not know when they are using one. Extension points are part of the design – Platforms ~ arise from frameworks

Tetherless World Constellation 24 2013 VIVO Extension: Shibboleth Single-Sign-On VIVO Extension: Dataset deposit in attached data

Need repository 2012 Begin DCO-ID? NO YES Revise YES Generate• Includes & register multi-level metadata DCO-ID (unique suffix, blank NO URL) NO collection Data deposit NO YES Collect CKAN •metadataIncludes Revise persistent CKAN identifierExternal & generate URL metadata data (DCO-ID generation) YES Review DCO-ID Add URL (to data in external Deposit in CKAN & generate &• CKANIncludes metadata interaction withrepository) URL to data dedicated repository ORURL accepts to the downloadable data Update DCO-ID (map the DCO-ID to CKANthird URL)- party deposit details Update DCO-ID record Object without data URL End

DCO-ID & DCO-ID metadata Data Deposited DCO data or URL to external data Science We identify ‘everything’ = DCO-ID 2012

• Two part: all objects are issued Handle’s, and all published objects are also issued DOIs – DCO issues Handles, registration number is 11121 – We obtain DOIs from DataCite – If it is a person, we support ORCID, ResearcherID, ScopusID, eRA Commons, etc..

• You may see (note EPIC style identifier syntax): http://hdl.handle.net/11121/5676-3964-8313-5126-CC and http://dx.deepcarbon.net/11121/5676-3964-8313-5126-CC

• E.g. Adding bibliography is easy, just enter the DOIs, or paste a bibtex record, and we do the rest, same for people (ORCID, ResearcherID, etc.) -> open world – linked to other sources 2013 2013

VIVO Extension: Retrieval of DOI metadata for publications from CrossRef * Expedites entry of e.g. journal articles by retrieving metadata based on DOI * Preserves author rank Core and Framework Semantics - Multi-tiered interoperability 2012

Mediation!

Mediation!

Mediation!