Using formal thesauri and controlled vocabulary as the interface between the unstructured data and axiomatic ontologies

Farhad Ameri NIST TLP COI Event April 2021, Gaithersburg, MD

April 23, 2021 1 High-level Research questions

How Knowledge Organization Systems (KOS) can be used in an integrated fashion to support formal knowledge extraction from unstructured data?

• Supply Chain Use Case : • Data type: Manufacturing Capability data • Data source: company website • Maintenance Use Case : • Data type: MWO records • Data Source: CMMS April 23, 2021 2 Maintenance Use Case Objectives

• To expose and formalize the semantics of unstructured maintenance data. • To visualize MWO records as RDF knowledge graphs • To use RDF knowledge graphs for maintenance diagnostics and root-cause analysis.

April 23, 2021 3 Knowledge Organization Systems (KOS)

KOS Types • Knowledge organization system is any system of terms or scheme that is created Controlled term lists vocabulary for synonym rings to organize, manage, and retrieve Information name authorities information. Retrieval taxonomies thesauri • They vary in complexity, structure, function, glossaries dictionaries and expressivity. gazetteers • Services: categorization schemes classification systems • Abstraction & Indexing subject heading schemes • Tagging semantic networks ontologies • Information retrieval and term disambiguation • Navigation and translation • Reasoning and inference Source: Introduction to taxonomies and other knowledge organization systems, Heather Hedden and Helmut Nagy, Company April 23, 2021 4 Semantic Spectrum of Knowledge Organization Systems

Ontology Conceptual Model OWL, KIF Taxonomy

Expressivity RDFS, UML Thesaurus RDF/RDFS Semantic interoperability Informal Syntactic/Structural Hierarchy SKOS interoperability Name Authorities XML Term List Glossary, Dictionary, Controlled Vocabulary Development Cost / Time

April 23, 2021 5 Thesaurus

• A controlled vocabulary that represents three type of relationships between terms or concepts: • Hierarchical: broader term/narrower terms • Associative: related terms • Equivalence: preferred label /alternative label • Created in accordance with standards: • ISO 25964 (2011, 2013) and ANSI/NISO Z39.19

April 23, 2021 6 SKOS model (Simple Knowledge Organization System)

SKOS is a standard for representation of knowledge organization systems. • W3C Standard Galvanization • Web-native syntax Chemical Reaction • Encoded using XML/RDF Coating

• Each concept has a unique URI SKOS:Broader Rus t

• Machine-readable SKOS:Related Equivalence Relationship • Simple semantics Oxidation SKOS:AltLabel Rus ting

SKOS:Narrower

SKOS:Definition Homolytic Oxidation

Heterolytic Oxidation

Oxidation is the loss of electrons during a April 23, 2021 reaction by a molecule, atom or ion. 7 Why Thesaurus?

Thesaurus They can accommodate large or constantly growing vocabulary They can be used to directly tag unstructured data They are easier to develop and extend compared to ontologies

We want to use the best of both worlds Ontology Thesaurus Lexical semantics Logic-based semantics High level expressivity and Ease of extension formality

April 23, 2021 8 Overall Approach: Incremental structuring and semantic enrichment process

Ontology RDF Triples

Expressivity Thesaurus

Intermediary SKOS concept model

Development Cost / Time

April 23, 2021 9 Example

Raw text (portion of reality) from a MWO Record Explicit Entities Implicit Entities

Artifact States Solenoid valve on upper lift cyl leaking. Hyd pressure reduced on Failure Events Boom. Technician called. valve Agent replaced, works normal Process

April 23, 2021 10 Ontological Representation of problem-solution description

<<< Ideal Situation >>> Solenoid valve on upper lift cyl leaking. Hyd pressure reduced on Boom. Technician called. valve replaced, works normal

April 23, 2021 11 2-step process: First annotate the text with SKOS concepts

Solenoid valve on upper lift cyl leaking. Hyd pressure reduced on Boom. Technician called. valve replaced works normal

April 23, 2021 12 Create an intermediary representation (structured)

Intermediary representation (Concept Model)

Solenoid valve on upper lift cyl leaking. Hyd pressure reduced on Boom. Technician called. valve replaced works normal • Graph • Table

Semantic Web Zone

SKOS, RFD/XML, RDFS. OWL April 23, 2021 13 Maintenance Diagnostics Thesaurus

For this purpose, we developed Maintenance Diagnostics Thesaurus or MDT

A SKOS model for representation of controlled vocabulary in the maintenance domain.

April 23, 2021 14 The scope of Maintenance Diagnostic Thesaurus (MDT)

Scope of MDT:

• Failures • Treatment • Quality (color, temperature, viscosity, lumen) • Feature (crack, hole, breakage) • States • Working, not working • Process • Malfunctioning (e.g., leaking, rotating slowly) • Including not functioning or reduced functioning • Items • Component • Equipment/Machine • Sub-system April 23, 2021 15 MD Thesaurus

Top-level Thesaurus

April 23, 2021 16 Entity Extractor

April 23, 2021 17 Detected Concepts Detected as skos:prefLabel

Detected as skos:altLabel

April 23, 2021 18 Exported Concept Model for the MWO record

Exported Table

Formats available: • CSV • RDF/JSON April 23, 2021 19 Thesaurus Development challenges

• To adequately annotated all WO records even in a single company, we need a large thesaurus. • Thesaurus extension is a bottleneck • Mainly manual process • One concept at a time • Batch import

April 23, 2021 20 Semi-automated Extension of MDT

We are here

CSV CSV or rj RDF Triples

• Provides NLP Services • More complex Taxonomy • Shallow taxonomy of • Lexical and associative terms relationships

April 23, 2021 21 Categorizing Items (from Nestor) in MDT

Nestor Output (Item column)

Candidate Terms

April 23, 2021 22 Next Steps

• Extending the thesaurus and ontology in tandem • Better integration with Nestor • More diverse dataset is needed to feed the thesaurus extension process • Tool Development : • SKOS to OWL : Receives SKOS Concept Model and converts it to RDF graph • Reasoning to extract inferred knowledge (hidden / implicit) • Diagnosis Tool: Provides diagnostic support by querying the RDF graph

April 23, 2021 23