Next Generation Semantic Data Environments (or Linked Data, Semantics, and Standards in Scientific Applications)

Deborah L. McGuinness Tetherless World Senior Constellation Chair Professor of Computer and Cognitive Science Web Science Research Center Director Rensselaer Polytechnic Institute, Troy, NY

With thanks to the extended RPI Tetherless World Team

OMG Semantics : From Research to Reality: Implementing the Semantic Web March 20, 2013 Reston, VA Trends: More Data & More Diversity

• More data – More open data – More authoritative data – More interest in and generation of metadata – More enthusiast generated / maintained data – More vocabularies, taxonomies, ontologies • More diversity – Broader human participation • Trained scientists, citizens, enthusiast, indigenous, … – More locations – mobile as well as global – More sensors – human, robots, implants, … – Real time feeds – Social sources – Twitter, Facebook, …

2 Increasing Requirements

• Data and data environments should: – Support usability – not just by original authors – Include (usable) documentation - meta data concerning collection methods, sources, recency, assumptions, … – Provide accessibility with transparent access policies – Include schema / ontology information – including mapping information used in integration along with rationales…. – Support queries (with usable and understandable interfaces) – Document verification and curation methods, including access to tools – Support AND encourage interactions; users should be able to comment, question, contribute, discuss, ….

Path moves from Portal -> Virtual Observatory -> Online Community

Next: examples, foundations, and discussion 3

Semantic Environmental and Ecological Monitoring • Enable/Empower citizens & scientists to explore pollution sites, facilities, regulations, and health impacts along with provenance 5 4 • Demonstrates semantic 2 3 monitoring possibilities • Extend to endangered species and resource mgr issues 1 • Explanations and Provenance available http://was.tw.rpi.edu/swqp/map.html and http://aquarius.tw.rpi.edu/projects/semantaqua 1. Map view of analyzed results 2. Explanation of pollution 3. Possible health effect of contaminant (from EPA) 4. Filtering by facet to select type of data 5. Link for reporting problems 6. Extended with input from USGS, with population counts for birds & fish Example Workflow (SemantAqua)

Publish

CSV2RDF4LOD Direct visualize

derive derive archive

Archive

CSV2RDF4LOD Enhance 5 Reusable Ontologies

• Pollution ontology describes the relationship between a regulation violation (a measurement), a polluted thing, and a polluted site • Combined with other ontologies (e.g. W3C Geo) users can ask “Tell me all of the polluted things within 1 mile of my location”

6 Ontologies

• Water quality ontology extends pollution to describe water-related pollution • Further extended by regulation ontologies to provide “regulation violation” inference • Allows the reasoner to match specific regulations to measurements that violate them 7 Interface

8 Semantic Methodology and Semantic Application Evolution

SemantAqua -> SemantEco -> DataOne modularizing, broadening, provenance, interaction

VSTO -> SESDI -> SPCDIS

Originally developed for Virtual Observatories (in solar - modularizing, provenance, terrestrial) , now in water quality, Sea ice, volcanology, broadening, interaction mycology, …. …

McGuinness, Fox, West, Garcia, Cinquini, Benedict, Middleton The Virtual Solar-Terrestrial Observatory: A Deployed Semantic Web Application Case Study for Scientific Research. Proc. 19 Conf. on Innovative Applications of Artificial Intelligence (IAAI-07), http://www.vsto.org

9 Population Sciences Grid: Interventions, Behaviors, and Policy

Extensible Mashups via Linked Data  Diverse datasets from NIH  Exploring Interventions along with correlations with behavior changes - in this case tobacco interventions and smoking prevalance  Accountable Mashups via Provenance Award winning paper on multi-dimensional 10 analysis

An Example: Hawaii

Changes in cigarette use viewed against policy changes

We link states from year to year to that state across time, adding data for each year. 11 Ontology as API: Adding Dimensions

This RDF: Creates this visual:

graph dataset

x axis

y axis

12 Social Observatory – First Responder effort (NIST funded)

Finding Users Social Media use is on the rise. Every day, we write: 294 billion emails 2 million blog posts Over 40 Million Tweets*

First Responders, including Emergency Medical Personnel, Firefighters, and Police Officers, have active online communities on Social Media websites.

How can we leverage Social Media sites Finding Topics … to gather requirements for active First Responders? … to identify stakeholders within those First Responder communities?

13 Web Data “Challenge Response” Enablers - HHS Award winning platform - Target questions: “good hospital for my context” - Prizm, DataCube Explorer, …

14 Open Government Data TWC –Intl Open Government Data Sets Mobile, Distributed, and Context- Aware Computing Rensselaer Tetherless World Constellation Web Observatory Foundations & Directions

THEMES Multi-Dimensional Data Portals Observatories: Science, Open Government, Health and Life Science, Social

Web Science Research Foundations • Making Data Transparent and Actionable Social Media: Reasoning on • Provenance Open Data Workflow Graph Database • Semantic Methodology • Social Network Analysis • Semantically-Enabled Visualization • Web Data "Challenge Response" Enablers

International Open Government Data Sets First Responder Network Health and Human Services Data Challenge

Semantic eScience Data Portals Foundations: Web Layer Cake

Visualization APIs S2S Govt Data Inference Web, Proof Markup Language, W3C Inference Web IW Trust, Provenance Working Air + Trust group formal model, W3C incubator group, DL, KIF, CL, N3Logic … Ontology repositories OWL 1 & 2 WG Edited main OWL (ontolinguag), Docs, quick reference, Ontology Evolution env: OWL profiles (OWL RL), Chimaera, Earlier languages: DAML, Semantic eScience DAML+OIL, Classic Ontologies, MANY other ontologie RIF WG AIR accountability tool SPARQL WG, earlier QL – OWL-QL, Classic’ QL, … Govt metadata search Linked Open Govt Data

SPARQL to Xquery translator RDFS materialization (Billion triple winner) Transparent Accountable Datamining Initiative (TAM Inference Web: Making Data Transparent and Actionable Using Semantic Technologies

• How and when does it make sense to use smart system results & how do we interact with them?

Cognitive Asst -> CPOF & SIRI (Mobile) Knowledge Intelligent Provenance in Virtual Agents NSF Interops: Observatories SONET SSIII – Sea Ice Intelligence Analyst Tools -> Watson

Hypothesis Investigation / Policy Advisors 19

Moving to the Next Generation

Some focus areas to move to the next generation: • Provenance – e.g., not just the sources, and dates but enough to know when to depend on something. • Policy – balance between sharing data, getting credit , making data accessible to all (or all willing to follow the rules • Social aspects – incentives, rewards, evolution, customization • Distributed, Mobile, and Context-aware • Education – scientific method - promote creating testable hypotheses, how to verify/ replication, etc. • Broadly usable semantic methodology • Moving to truly integrated communities

20 Discussion

• Semantic foundations are being used in a wide range of areas. • They are not just for semantic practioners any more • Open as well as commercial software available • Come join us!

• And if you are already there… – What do you want from evolving observatory / collaboratory infrastructure ? – What do you need from provenance and explanation infrastructures? – Do you have tools, tool templates, and/or tool requirements? – Do you have use cases? – Are you using our (or another) semantic methodology?

More info – Deborah McGuinness [email protected]

Extra

22 Semantic Web (RPI) 2013

Research

Innovatio RDFa n What is an Ontology?

Thesauri “narrower Formal Frames General Catalog/ term” is-a (properties) Logical ID relation constraints

Formal Terms/ Informal Value Disjointness is-a instance Restrs. , Inverse, glossary part-of…

Ontologies Come of Age McGuinness, 2001, and From AAAI Panel 99 – McGuinness, Welty, Uschold, Gruninger, Lehmann Plus basis of Ontologies Come of Age – McGuinness, 2003 Interface

25 Core and Framework Semantics - Multi-tiered interoperability

used by