Vocabulary Services with Rich(er) Semantic Content for Earth and Space Sciences Peter Fox ‐ Tetherless World Research Constellation Chair (RPI and adjunct scientist WHOI) With thanks to Cyndy Chandler (WHOI), Robert Groman, Dicky Allison, Andy Maffei (WHOI), Tobias Work (WHOI), Patrick West, Stephan Zednik (RPI) IGSL 2010 Berlin Motivation: Integrated Ecosystems Assessment Multi‐tiered interoperability used by Vocab

Vocab

Vocab

ML But more than that …

• As these layers of the data and information flow are traversed and vocabularies are used… • Are able scientists to explore/ confirm/ deny their ‘hunches’ Accountability Identity

Explanation Justification Verifiability Proof Trust

Provenance Transparency is what we want But translucency is what we get Multi‐domain Knowledge Base

5 Basis of effort

• RPI staff and BCO‐DMO* team‐members have been working with oceanographers, data managers, ontology modelers, software engineers and other experts to iteratively design and develop a semantically enabled prototype showing how domain scientists are able to perform better and smarter searches for data, access and manipulate more data sets, and begin to keep track of data provenance based on the rich accumulation of vocabularies that are being made possible today.

Tetherless World Constellation 6 Tetherless World Constellation 7 Tetherless World Constellation 8 Tetherless World Constellation 9 Tetherless World Constellation 10 Tetherless World Constellation 11 Modern informatics enables a new scale‐free framework approach

• Use cases • Stakeholders • Distributed authority • Access control • Ontologies • Maintaining Identity • Open World Use cases

1. Do you have any data online from Hutchins from award number OCE-0423418? 2. I want to download CTD profiler (temperature, biological, ...) data in the following areas (N. Atlantic, bounding box, where a JGOFs survey was done, ...) 3. What new data has been added since last year (and organize it by project) 4. Show me all the places where the surface temperature in the North Atlantic is 25 degrees during June.

Tetherless World Constellation 13 Populating the knowledge base

• Modeled the use cases and main concepts and relations (ontology on later slide) • Migrated instances from the DB to a triple store (Jena/TBD with Joseki for query) • Developed several application front‐ends to address use cases and present results • Led to need to enrich vocabularies (and be ‘more’ common) ‐> leverage ‘URI’/’URN’ identifier references into community governed vocabulary – SeaDataNet (community, EU effort) – British Oceanographic Data Centre (BODC) vocab server (originated in the NERC DataGrid)

Tetherless World Constellation 14 Quick prototype of use case 1

Tetherless World Constellation 15 http://vocab.ndg.nerc.ac.uk/t erm/L221/11/TOOL0035 urn:SDN:L221:11:TOOL0035 SDN:L221:11:TOOL0035 Sea‐Bird SBE 911 CTD SBE 911 CTD High precision and accuracy CTD made up from a Sea‐Bird SBE 9 underwater unit and a SBE 11 deck unit. The underwater unit comprises protective cage (usually with a rosette) holding a pressure unit and temperature/conducivity unit. The latter is connected to a pump‐fed plastic tubing circuit that may include other sensors. All plumbed and non‐plumbed instruments (e.g. transmissometers and light meters) on the package are logged by the SBE 11. The unit was replaces by the SBE 911plus in 1997. 2010‐08‐26T04:40:06.123+0000

Tetherless World Constellation 16 http://vocab.ndg.nerc.ac.uk/t erm/L211/2/ICAT04

SDN:L211:2:ICAT04 SeaDataNet in‐situ sensor and instrument package categories in‐situ sensors and packages Categories used in the SeaDataNet project to classify devices that make in‐situ measurements of phenomena from fixed or mobile platforms.2010‐08‐26T04:39:37.902+0000

Tetherless World Constellation 17 Current version of knowledge model

Tetherless World Constellation 18 Current version

Tetherless World Constellation 19 Summary

• Migrated a driven, highly programmed implementation into an ontology and smart query driven search with modest effort (okay, a few brain cells died along the way) – Use case driven – Ontology driven at many levels but also uses vocabulary relations (e.g. Broader, Narrower) – Ontology essential for disambiguation of colloquial terms (e.g. CTD Profiler data) – Application oriented, rapid prototyping • Distributed and vocabulary service driven • Richness arises from various knowledge bases

Tetherless World Constellation 20 Further Information

• http://tw.rpi.edu/portal/BCO‐DMO • Contacts: – [email protected], [email protected]

Tetherless World Constellation 21