eXtended Registry (XMDR): Input for Open Ontology Repository OOR Panel - “Ontology Registry and Repository Technology & Infrastructure Landscape”

February 28, 2008

Bruce Bargmeyer Lawrence Berkeley National Laboratory and University of California, Berkeley Tel: +1 510-495-2905 [email protected] 1 Topics

✦ Describe the technology/infrastructure that XMDR brings to the table for the OOR project. ✦ How does that contribute to the overall OOR initiative ✦ How does that fit in with the other things that the rest of the teams are bringing to the table

2 What XMDR Brings to the Table

✦ Use cases - semantics challenges - and Requirements ✦ Proposed specifications for ISO/IEC 11179 Edition 3 – Model, definitions, ontology ✦ Modular software architecture and open source software modules ✦ Open Source XMDR software ✦ Test content 3 Challenge: Combine Data, Metadata & Concept Systems Inference Search Query: “find water bodies downstream from Fletcher Creek where chemical contamination was Concept system: over 10 micrograms per liter between December 2001 and March 2003” Contamination Data: ID Date Temp Hg Biological Radioactive Chemical A 06-09-13 4.4 4 B 06-09-13 9.3 2 mercury lead cadmium X 06-09-13 6.7 78 Metadata: Name Datatype Definition Units Monitoring not ID text Station Identifier applicable Date date Date yy-mm-dd Temperature (to degrees Temp number 0.1 degree C) Celcius Mercury micrograms Hg number contamination per liter 4 Challenge: Find and process non- explicit data

For example… Analgesic Agent

Patient data on drugs contains brand Non-Narcotic Analgesic names (e.g. Tylenol, Anacin-3, Datril,…); Analgesic and Antipyretic However, want to study patients taking analgesic agents

Nonsteroidal Acetominophen Antiinflammatory Drug

Tylenol Anacin-3 Datril 5 Challenge: Specify and compute across Relations, e.g., within a food web in an Arctic ecosystem

An organism is connected to another organism for which it is a source of food energy and material by an arrow representing the direction of biomass transfer. Source: http://en.wikipedia.org/wiki/Food_web#Food_web (from SPIRE) 6 Challenge: Use data from systems that record the same facts with different terms

✦ Reduce the human toil of drawing information together and performing analysis -> shift to processing.

7 Challenge: Use data from systems that record the same facts with different terms

Database Catalogs

Common Content ISO 11179 UDDI Registries Registries Table Common Content Data Column Common Content Element Business Specification Country OASIS/ebXML XML CASE Tool Registries IdentifierAttribute Repositories Common Content Common Content Business Object Coverage Term Software Hierarchy Component Ontological Registries Registries Dublin Common Content Core Common Content Registries Common Content 8 Same Fact, Different Terms

Name: Country Identifiers Context: Algeria Data Definition: Belgium Unique ID: 5769 China Element Conceptual Domain: Denmark Maintenance Org.: Egypt Concept Steward: France Classification: . . . Registration Authority: Zimbabwe Others Data Elements

Name: Algeria L`Algérie DZ DZA 012 Context: Belgium Belgique BE BEL 056 Definition: China Chine CN CHN 156 Unique ID: 4572 Value Domain: Denmark Danemark DK DNK 208 Maintenance Org. Egypt Egypte EG EGY 818 Steward: France La France FR FRA 250 Classification: ...... Registration Authority: Zimbabwe Zimbabwe ZW ZWE 716 Others ISO 3166 ISO 3166 ISO 3166 ISO 3166 ISO 3166 English Name French Name 2-Alpha Code 3-Alpha Code 3-Numeric Code 9 Challenge: Draw information together from a broad range of studies, , reports, etc.

10 Challenge: Gain Common Understanding of meaning between Data Creators and Data Users

A common interpretation of what the data

text data represents ambiente agricultura tiempo salud hunano 12312332683268 industria 34534508250825 44544513481348 turismo 67067050385038 tierra 24824827082708 agua 59159100000000 EEA aero 30830821782178

text data environ agriculture climate human health 12312332683268 industry 34534508250825 tourism 44544513481348 soil 67067050385038 USGSwater 24824827082708 air 59159100000000 30830821782178

text data environ agriculture climate human health 12312332683268 industry 34534508250825 44544513481348 tourism 67067050385038 DoD soil 24824827082708 water 59159100000000 air 30830821782178

environtext data agriculture UsersUsers climate 12312332683268 human health 34534508250825 EPA 44544513481348 industry 67067050385038 tourism 24824827082708 soil 59159100000000 water 30830821782178 air

3268 ambientetext data0825 1231348 agricultura 5038 12334527083268 tiempo 0000 salud huno 34544521780825 industria 4456701348 turismo 6702485038 tierra 248591 agua 591308 aero 308 Others . . . Users Information Data Creation systems 11 Semantics Challenges

✦ Managing, harmonizing and vetting semantics is important for traditional data management. ◆ In the past we just covered the basics ✦ Managing, harmonizing, and vetting semantics is essential to enable enterprise

12 Enterprise Vocabulary Services (EVS) Concepts Unite NCI MDR

Conceptual Domain Object Class Agent Chemopreventive Agent

Valid Values Cyclooxygenase Inhibitor Doxercalciferol Data Element Concept Eflornithine Value Domain Chemopreventive Agent … NSC Code Ursodiol NSC Number caDSRTraining Property

Classification Schemes NSCNumber Representation Code Data Element Chemopreventive Agent Name

Context caCORE

13 Source: Denise Warzel, National Cancer Institute XMDR Prototype

Demonstrate capabilities: ✦ Register existing concept systems, based on their underlying structures, such as graphs of varying complexity. ✦ Interrelate concepts systems with each other. ■ E.g., register mappings between multiple vocabularies ✦ Support harmonization and vetting of concept systems for a community of interest. ■ E.g., Register, harmonize, validate, and vet definitions and relations ✦ Interrelate concepts in concept systems with concepts in metadata and concepts in databases, knowledgebases, and text. ✦ Provide semantic services needed to support traditional computing as well as semantic computing. ◆ E.g., dereferencing the URIs used in creating RDF statements, by providing relevant information describing the referenced concept and its authoritative standing within some community of interest. ✦ Register and manage the provenance of data

✦ XMDR is part of the infrastructure for semantics and data management. 14 XMDR Use

✦ Upside ◆ Collaborative ■ Supports interaction with community of interest ■ Shared evolution and dissemination ■ Enables Review Cycle ◆ Standards-based – don’t lock semantics into proprietary technology ◆ Foundation for strategic data centric applications ◆ Lays the foundation for Ontology-based Information Management ◆ Content is reusable for many purposes ✦ Downside ◆ Managing semantics is HARD WORK - No matter how friendly the tools ◆ Needs integration with other components

15 Modular XMDR Archtitecture

Metadata Sources USERS concept systems, Web Browsers…..Client Software data elements Third Party Software

Content Loading & Human User Interface Application Program Transformation (HTML fromJSP and Interface (REST) (Lexgrid & custom) javascript; Exhibit) Authentication Validation Service (XML Schema)

Mapping Metamodel specs Search & Content Serving XMDR data model Engine (Jena, Lucene) (UML & Editing) & exchange format (Poseidon, Protege) XML, RDF, OWL Logic Indexer Text Indexer (Jana & Pellet) (Lucene) Registry Store standard XMDR files XMDR metamodel standard XMDR files (OWL & schema) standard XMDR files Logic Text standard XMDR files Index Index Postgres Initial XMDR REST-style Application Programming Interface (API)

✦ Search Methods (GET) ◆ Text Search ◆ SPARQL Search ◆ XMDR Search (not documented yet) ✦ Registry Information Methods ◆ Summary information ◆ registered models ◆ Identified Items ✦ Method Parameters ◆ can be included as part of any method ◆ as part of URL ◆ Accept_type (what xml components to expect) ◆ Stylesheet (how to display results) REST API (Search Methods)

Resource URI (relative to application Method Representation Accept Request Description root) Text search/text? GET application/xml Any (ignores) Start a text Search query={queryText} (searchResult) search. Text search/text/{queryID}? GET application/xml application/xml, Retrieve the Search offset={offset}& (textResultSet) application/*, or */* results of a Results maxResults={maxResults} text search. application/exhibi* application/exhibit

SPARQL search/? GET application/xml Any (ignores) Start a Search query={queryText}& (searchResult) SPARQL model={modelNameN} search. SPARQL search/sparql/{queryID}? GET application/xml application/xml, Retrieve the Search offset={offset}& (sparqlResultSet) application/*, or */* results of a Results maxResults={maxResults} SPARQL search. application/ application/ sparql-results+xml sparql-results+xml ** application/ application/ sparql-results+ sparql-results+json, *** application/json application/exhibit * application/exhibit XMDR

✦ Content (selected portions of): ◆ISO/IEC 11179 ◆ ISO/IEC 3166 – Country codes ◆ ISO 4217 – Currency codes ◆ EPA Environmental Data Registry content (ISO/IEC 11179 based registry) ◆ Standard Industrial Codes ◆ North American Industrial Classification System ◆ Mapping NAICS 02 to SIC 87 ◆ Adult Mouse Anatomical Dictionary ◆ Defense Technology Info. Center Thesaurus ◆ NBII Biocomplexity Thesaurus ◆ GEneral Multilingual Environmental Thesaurus ◆ NCI_Thesaurus ◆ Cancer Data Standards Repository (NCI registry based on ISO?IEC 11179) ✦ Loading new content (ongoing) ◆OMEGA linguistic ontology ◆OpenCyc ontology ◆SIC – NAICS codes ◆Mapping of NAICS to SIC codes 19 Contribution

How does that contribute to the overall OOR initiative?

✦ It is free for the taking ✦ Save time on development of use cases, specifications, architectures, software, etc.

20 Fitting In

How does that fit in with the other things that the rest of the teams are bringing to the table? ✦ Collaboration on standards development ✦ Collaboration on prototype development and demonstration ✦ Collaboration on proposals?

21 Align, Coordinate, Integrate Standards/Recommendations/Specifications for Semantic Computing

ISO/IEC Terminology Object Semantic 11179 Management Web Metadata UsUs Registries Graph RDF er Metadataer Registry Node Subject ss CONCEPT MOF Terminology Thesaurus Refers To Symbolizes Taxonomy ODM Ontology Edge Predicate “Rose”, CWM Data Structured Stands For “ClipArt Metadata Standards Referent Rose” IMM Node Object

ISO/IEC JTC 1/SC 32 ISO TC 37 OMG W3C

22 Standards Development Semantics Management and Semantics Services – Semantic Computing Align, Co-develop, Fast Track, PAS Submission …

OMG

W3C ISO/IEC JTC 1 SC 32

OASIS ISO TC 154

23 Standards Development Semantics Management and Semantics Services – Semantic Computing

Align, integrate, co-develop, Fast Track, PAS Submission … Can we coordinate content? OMG

ISO/IEC W3C JTC 1 SC 32

OASIS/ ISO TC 154

24 A Success

Some text and figures are identical in the two standards.

OMG

ISO/IEC 24707 ISO/IEC OMG ODM JTC 1 SC 32

ISO/IEC 20944 – Common Logic OMG Ontology Definition Metamodel 25 Standards Development Semantics Management and Semantics Services – Semantic Computing

Ongoing effort

ISO/IEC ISO/IEC 11179 JTC 1 SC 32 (Edition 3)

26 Standards Development Semantics Management and Semantics Services – Semantic Computing

Hopeful?

OMG

IMM ISO/IEC & JTC 1 SC 32 ISO/IEC 11179 (Edition 3)

27 Other Possibilities

✦ OASIS ebXML Registry ✦ W3C Deployment WG ✦ TC 37

28 Acknowledgements

✦ John McCarthy, LBNL ✦ Kevin Keck, LBNL ✦ Harold Solbrig, Apelon

✦ This material is based upon work supported by the National Science Foundation under Grant No. 0637122, USEPA and USDOD. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, USEPA or USDOD.

29