OPEN INFORMATION EXTRACTOR FOR SEVA

Jitin Krishnan Mentor: Patrick Coronado

Other Project Members: Trevor Reed (NASA JPL) Ben Brumback (George Mason University) Youngyo Na (Rowan University) SEVA: A Systems Engineer’s Virtual

Question-Answering

Explainable System

Systems Dynamic Information Personal Engineer Assistant Time & Scheduling

Hypothetical Scenarios Information Assimilation

Capture Experience Documents (Eg: SE Handbook) SEVA: ARCHITECTURAL COMPONENTS

INPUT NATURAL LANGUAGE QUESTION DYNAMIC PROCESSING UNIT KNOWLEDGE BASE TARGETTED (WITH TEMPORAL FACTS/STATEMENTS OPEN INFORMATION RELATIONS) EXTRACTION SYSTEM BASE KB + RULES DICTIONARY & COMMANDS VERB NORMALIZATION CUSTOM KB + RULES OUTPUT RULE EXTRACTION

SE QUERY LANGUAGE ANSWER INFERENCE ENGINE Motivation QUESTION* ONTOLOGY LANGUAGE HYPOTHETICAL MODE REASONER § Information Assimilation SYSTEM CASE-BASED EXPERIENCE RULE REASONER § Explainability RESPONSES § Personal System *in conversational mode TEMPORAL REASONER § Domain-specific (rule/experience building) Common Sense 2 SEVA: SYSTEM CONCEPT

NATURAL INTERACTIVE LANGUAGE Q&A PROCESSING LOGICKER ENDOWED MODELS INFERENCE MONITOR ENGINE

INTER-MODULE COMMUNICATION DYNAMIC PROTOCOL ONTOLOGY SEVA: Knowledge Representation Example

Sentence: Instruments have thermal zones

Subject Verb/Predicate Object NLP

First Order Logic (FOL) Formalism: ∀x ∃y Instrument(x) ∧ Thermal Zone(y) ⇒ have(x,y)

Concepts : Instrument, Thermal Zone (like classes in OOP) Instances : x, y Relationship : have KB ∀ = for all ∃ = there exists SEVA: Knowledge Representation Example

Instrument(STI) Instantiations and relation STI is an instrument between instances ABox part of(MassSpectrometerAP8717, STI) MassSpectrometerAP8717 is a part of STI

Conduit ≡ Pipe Conduit is same as Pipe Concepts and relations TBox MassSpectrometer ⊑ Spectrometer RBox Subclass relationship "#$%&' ∘ "#$%&' ⊑ "#$%&' Transitive property of the role

− Relations between "#$%&' ≡ *#+,-."-/0/% relations Inverse property of two roles SEVA: Types of Knowledge

STI is an instrument. STI has a length of 200 cm. Mass Spectrometer MS81Z is [STI; is-a; instrument] Project [STI; has; length] Specific a part of STI ABox OIE [length; has-value; 200 Documents . . . cm] [Mass Spectrometer MS81Z; part of; STI] Conduit is same as Pipe. Mass Spectrometer is a type of Spectrometer. Instruments have TBox SE Handbook, [Conduit; same as; Pipe] thermal zones. [Mass Spectrometer; Common-Sense OIE Knowledge . . . type of; Spectrometer] [Instruments; have; Entity Linking thermal zones] & Verb Normalization Verb Vocabulary, RBox Verb-oriented Net, DBPedia, [partOf: transitive] Common-Sense contextual synonyms OIE + [partOf; inverse-of; knowledge supervised hasComponent] KB Construction . . . & Population Open

• Identifies wide range of domain-independent relations • Traditional Information Extraction: uses predetermined templates

Examples: Stanford Open IE, Open IE by AI2, ClausIE

• For the experiment the domain language complexity is reduced – we work with only simple English sentences NLP Basics SEVA – Targeted Open IE (TOIE) Sentence: STI, an instrument, weighs 56 kg Pattern Matching on the dependency tree

Dependency Extracting relations of type: is-a, transitive-verb, has-property, Parse Tree has-value

NN = Noun A set of : VBZ = Verb form nsubj, dobj, case, nmod, , amod

Chunking example: "NP:{(|)+}"

NLTK, Stanford CoreNLP, POS Tagger SEVA-TOIE: Comparison THANK YOU!