Ontologies and the Semantic Web

SMBM-2006 Jena, April 9, 2006

Steffen Staab http://isweb.uni-koblenz.de

Steffen Staab (1) ISWeb – Informationssysteme & Semantic Web Agenda

1. Ontologies 2. Semantic Web 3. Semantic Web Languages 4. Some Applications (Ontoprise) 5. Ontologies & Text

Steffen Staab (2) ISWeb – Informationssysteme & Semantic Web Part I

Introduction to Ontologies

Steffen Staab (3) ISWeb – Informationssysteme & Semantic Web Origin and History

• Ontology in Philosophy • a philosophical discipline, branch of philosophy that deals with the nature and the organization of reality

• Science of Being (Aristotle, Metaphysics, IV, 1)

• Tries to answer the questions: • What characterizes being? • Eventually, what is being?

Steffen Staab (4) ISWeb – Informationssysteme & Semantic Web Aristotle - Ontology

• Before: study of the nature of being

• Since Aristotle: study of knowledge representation and reasoning • Terminology: – Genus: (Classes) – Species: (Subclasses) – Differentiae: (Characteristics which allow to group or distinguish objects from each other) • Syllogisms (Inference Rules)

Steffen Staab (5) ISWeb – Informationssysteme & Semantic Web Example for differentiae (adapted from Uta Priss, in preparation)

real cartoon cat dog rabbit fish gorilla koala mammal Garfield X X X

Snoopy X X X

Bugs X X X Bunny Nemo X X

Copito X X X

Osmond X X X

Steffen Staab (6) ISWeb – Informationssysteme & Semantic Web Organizing the Objects as a Lattice

Steffen Staab (7) ISWeb – Informationssysteme & Semantic Web What is an Ontology? Gruber 93:

An Ontology is a formal specification ⇒ Executable, Discussable of a shared ⇒ Group of persons conceptualization ⇒ About concepts of a domain of interest ⇒ Between application and „unique truth“

Steffen Staab (8) ISWeb – Informationssysteme & Semantic Web Why Develop an Ontology?

• To make domain assumptions explicit – Easier to change domain assumptions – Easier to understand and update legacy data • To separate domain knowledge from operational knowledge – Re-use domain and operational knowledge separately •A community reference for applications •To share a consistent understanding of what information means

Steffen Staab (9) ISWeb – Informationssysteme & Semantic Web Menu Taxonomy

Object

Person Topic Document

Student Researcher Semantics

Doctoral Student PhD Student F-Logic Ontology

Taxonomy := Segmentation, classification and ordering of elements into a classification system according to their relationships between each other

Steffen Staab (10) ISWeb – Informationssysteme & Semantic Web Menu Thesaurus

Object

Person Topic Document

Student Researcher Semantics

Doktoral Student PhD Student F-Logic Ontology

synonym similar

• Terminology for specific domain • Taxonomy plus fixed relationships (similar, synonym, related to) • originate from bibliography

Steffen Staab (11) ISWeb – Informationssysteme & Semantic Web Menu Topic Map

Object

knows described_in Person Topic Document writes

Student Researcher Semantics

Doktoral Student PhD Student F-Logic Ontology

synonym similar Tel Affiliation

• Topics (nodes), relationships and occurences (to documents) • ISO-Standard • typically for navigation- and visualisation

Steffen Staab (12) ISWeb – Informationssysteme & Semantic Web Ontology (in our sense)

Object is_a-1

knows described_in Person Topic Document writes is_a-1

Student Researcher Semantics F-Logic Ontology

is_a-1 subTopicOf similar Affiliation DoktoralPhDPhD Student Student Student PhD Student F-Logic Ontology Rules instance_of-1 T described_insimilar D T is_about D Tel Affiliation York Sure P writes D is_about T P knows T

+49 721 608 6592 AIFB

• Representation Language: Predicate Logic (F-Logic) • Standards: RDF(S); OWL Steffen Staab (13) ISWeb – Informationssysteme & Semantic Web Ontologies - Some Examples

• General purpose ontologies: – DOLCE, http://www.loa-cnr.it/DOLCE.html – The Upper Cyc Ontology, http://www.cyc.com/cyc-2-1/index.html – IEEE Standard Upper Ontology, http://suo.ieee.org/ • Domain and application-specific ontologies: – GALEN, http://www.openclinical.org/prj_galen.html – Foundational Model of Anatomy, http://sig.biostr.washington.edu/projects/fm/AboutFM.html – RETSINA Calendering Agent, http://ilrt.org/discovery/2001/06/schemas/ical-full/hybrid.rdf – Dublin Core, http://dublincore.org/ • Semantic Desktop Ontologies – Semantics-Aware instant Messaging: SAM Ontology, http://www.uni-koblenz.de/FB4/Institutes/IFI/AGStaab/Research/sam – Haystack, http://haystack.lcs.mit.edu/ – Gnowsis, http://www.gnowsis.org/ – Piggybank, http://simile.mit.edu/piggy-bank/ • Web Services Ontologies – Core ontology of services http://cos.ontoware.org – Web Service Modeling ontology http://www.wsmo.org –OWL-S, http://www.daml.org/services/owl-s/1.0/ • Ontologies in a wider sense – GO - Gene Ontology, http://www.geneontology.org/ –UMLS, http://www.nlm.nih.gov/research/umls/ – Agrovoc, http://www.fao.org/agrovoc/ – Art and Architecture, http://www.getty.edu/research/tools/vocabulary/aat/ – DTD standardizations, e.g. HR-XML, http://www.hr-xml.org/ – WordNet / EuroWordNet, http://www.cogsci.princeton.edu/~wn

Steffen Staab (14) ISWeb – Informationssysteme & Semantic Web Ontologies and Their Relatives

General Formal logical Is-a constraints Catalog / ID Thesauri Frames

Informal Terms/ Formal Value Is-a Axioms Glossary Instance Restric- Disjoint tions Inverse Relations, ...

Steffen Staab (15) ISWeb – Informationssysteme & Semantic Web Ontologies and Their Relatives (cont´d)

Topic Maps Front-End Thesauri

Navigation Taxonomies Information Retrieval Query Expansion Sharing of Knowledge

Queries Ontologies Semantic Networks Consistency Checking EAI Mediation Reasoning

Extended ER-Models Predicate Logic Back-End

Steffen Staab (16) ISWeb – Informationssysteme & Semantic Web Applications of Ontologies

• Natural Language Processing and Machine Translation, e.g. Nirenburg et al. 2004, Maedche et al. 2001, Agirre et al. 1996, Beale et al. 1995 • Semantic Web, see http://www.w3.org/2001/sw/ and http://www.w3.org/2001/sw/WebOnt/ • Knowledge Engineering & Management, e.g. Fensel 2001, Mullholland et al. 2000; Staab & Schnurr, 2000; Sure et al., 2000, Abecker et al. 1997 • Electronic Commerce, e.g. RosettaNet3 and Ontology.org4 • Information Retrieval and Information Integration, e.g. Kashyap, 1999; Mena et al., 1998; Wiederhold, 1992 • Intelligent Search Engines, e.g. WebKB (Martin et al. 2000), SHOE (Heflin & Hendler, 2000), OntoSeek (Guarino et al., 1999), Ontobroker (Decker et al., 1999) • Digital Libraries, e.g. Amann & Fundulaki, 1999 • Enhanced User Interfaces, e.g. (Kesseler, 1996), Inxight5 • Software Agents, e.g. OnTo-agents, FIPA, (Gluschko et al., 1999; Smith & Poulter, 1999) • Business Process Modeling, e.g. Decker et al., 1997; TOVE, 1995; Uschold et al., 1998

Steffen Staab (17) ISWeb – Informationssysteme & Semantic Web Overview Literature

S. Staab, R. Studer. Handbook on Ontologies. Springer, 2004.

Steffen Staab (18) ISWeb – Informationssysteme & Semantic Web Part II

Introduction to Semantic Web

Steffen Staab (1) ISWeb – Informationssysteme & Semantic Web Syntax is not sufficient

Andreas • Tel • E-Mail Steffen Staab (2) ISWeb – Informationssysteme & Semantic Web Information Convergence

• Convergence not just in devices, also in “information” – Your personal information (phone, PDA,…) Calendar, photo, home page, files… – Your “professional” life (laptop, desktop, … Grid) Web site, publications, files, databases, … – Your “community” contexts (Web) Hobbies, blogs, fanfic, social networks…

• The Web teaches us that people will work to share – How do we CREATE, SEARCH, and BROWSE in the non-text based parts of our lives?

Steffen Staab (3) ISWeb – Informationssysteme & Semantic Web WWW vs. Semantic Web

WWW := Semantic Web := Hypertext & Semantic Web Language/Data & Internet & Ontologies & Social Phenomenon Internet & Social Phenomenon

Steffen Staab (4) ISWeb – Informationssysteme & Semantic Web Let’s try XML XML is unspecific: n No predetermined vocabulary o No semantics for relationships

Ön& o must be specified upfront

Only possible in close cooperations – Small, reasonably stable group – Common interests or authorities Not possible in the Web or on a broad scale in general !

Steffen Staab (5) ISWeb – Informationssysteme & Semantic Web Meaning of Informationen: (or: what it means to be a computer)

name

education

CV work

private

Steffen Staab (6) ISWeb – Informationssysteme & Semantic Web XML ≠ Meaning, XML = Structure

< name ναµε >

<<εδυχατιον education >

< CVΧς >> <<ωορκ work >>

<<πριϖατε private >>

Steffen Staab (7) ISWeb – Informationssysteme & Semantic Web Some Principal Ideas

• URI – uniform resource identifiers • XML – common syntax • Interlinked Tim Berners- Lee, Weaving • Layers of semantics – the Web from database to knowledge base to proofs

Design principles of WWW applied to Semantics!!

Steffen Staab (8) ISWeb – Informationssysteme & Semantic Web TheSemanticWeb on oneSlide

cooperatesWithcooperatesWith Ontology rdfs:Domain rdfs:Range PersonPerson rdfs:subClass

EmployeeEmployee rdfs:subClass rdfs:subClass PostDocPostDoc ProfessorProfessor rdf:type rdf:type

Siegfried Handschuh rdf:ID="person_sst"> Steffen Staab Meta- "http://www.uni-koblenz.de/~staab/ data #person_sst"/> ... ... swrc:cooperatesWith

Web page

Steffen Staab (9) URL http://www.deri.ie/~shaISWeb – Informationssysteme & Semantic Webhttp://www.uni-koblenz.de/~staab The Semantic Web - Inference

Nepomuk

swrc:project swrc:homepage

swrc:name swrc:cooperatesWith swrc:project Handschuh

swrc:affiliation

swrc:member swrc:member

DERI Visualization of a Logic Representation:

OWL,Steffen Staab F-Logic, (10) etc. ISWeb – Informationssysteme & Semantic Web The new Semantic Web Stack

Tim Berners-Lee, ISWC November 2005, http://www.w3.org/2005/Talks/1110-iswc-tbl/#(12) Trust Proof Logic framework

OWL Rules Signature

DLP bit of OWL/Rule Encryption SparQL RDF Schema RDF Core XML Namespaces URI Unicode

Steffen Staab (11) ISWeb – Informationssysteme & Semantic Web Knowledge Provisioning

Steffen Staab (12) ISWeb – Informationssysteme & Semantic Web Tools for markup...

PhotoStuff Demo

Steffen Staab (13) ISWeb – Informationssysteme & Semantic Web Semi-automatic

Steffen Staab (14) ISWeb – Informationssysteme & Semantic Web Not tied to specific domains

Visual VDE plug-in Shape Shape Descriptor launch selection erasure selection

Save Shape Color Descriptor Prototype selection extraction Instances

Domain Ontology Browser Selected region

Draw panel

M-OntoMat is publicly available http://acemedia.org/aceMedia/results/software/m-ontomat-annotizer.html Steffen Staab (15) ISWeb – Informationssysteme & Semantic Web Shared Workspace (Xarop + Screenshot)

Steffen Staab (16) ISWeb – Informationssysteme & Semantic Web Social networks: e.g. Friend of a Friend (FOAF)

• Say stuff about yourself (or others) in OWL files, link to who you “know”

Estimates of the number of Foaf users range from 2M-5M Steffen Staab (17) ISWeb – Informationssysteme & Semantic Web Using FOAF in other contexts

Jennifer Golbeck http://trust.mindswap.org

Steffen Staab (18) ISWeb – Informationssysteme & Semantic Web Get a B&N price (In Euros)

Steffen Staab (19) ISWeb – Informationssysteme & Semantic Web Of a particular book

Steffen Staab (20) ISWeb – Informationssysteme & Semantic Web In its German edition?

Steffen Staab (21) ISWeb – Informationssysteme & Semantic Web Steffen Staab (22) ISWeb – Informationssysteme & Semantic Web Now.

• RDF, RDFS and OWL are ready for prime time

– Designs are stable, implementations maturing • Major Research investment translating into application development and commercial spinoffs

– Adobe 6.0 embraces RDF – IBM releases tools, data and partnering – HP extending Jena to OWL – OWL Engines by Ontoprise GmbH, Network Inference, Racer GmbH – Ontoprise is a strategic partner for Oracle and Software AG – Proprietary OWL ontologies for vertical markets • c.f. pharmacology, HMO/health care, ... Soft drinks

Steffen Staab (23) ISWeb – Informationssysteme & Semantic Web Now: Plenty of annotations – unfortunately, not in the open • Taggings are daily practice: – Flickr, http://www.flickr.com/ – Delicious, http://del.icio.us/ – Cite-u-like, http://www.citeulike.org/ –Bibsonomy,… • Plenty of annotations – Dooyoo, E-pinions – Quipe, http://www.quipe.com/ – Froogle, http://froogle.google.com/ – Google Base, http://base.google.com/ –RSS – E-Science data curation, http://www.jisc.ac.uk/index.cfm?name=pub_escience – Semantic Wikis • Web 2.0 – would be easier with Semantic Web!

Steffen Staab (24) ISWeb – Informationssysteme & Semantic Web The Semantic Wave

YOU ARE HERE 2006

YOU ARE HERE 2003

(Berners-Lee, 03)

Steffen Staab (25) ISWeb – Informationssysteme & Semantic Web Semantic Technologies vs. Semantic Web Semantic Technologies Semantic Web • Used by „Early Adopters“ • Still „research-oriented“

•Mature • Currently: Used in Intranets – Deductive Databases (Research since early 80ies) • Currently: Used for internet – Description logics applications with simple (Research since late 70ies) ontologies (Dublin Core, RSS, PICS, FOAF,…) - Ontobroker (Research prototype since 1990; commercial since 1999) • Quite some way to go for full fledged success, initial take-up now by some focus groups • A lot of knowledge about integration with existing technology (databases, modelling, …)

Steffen Staab (26) ISWeb – Informationssysteme & Semantic Web Application areas for Semantic Technologies • Software engineering: conceptual approaches need semantic interchange language • Data description: – Databases in bioinformatics – Multimedia data (complementary to MPEG 7/21) • Data integration: data exchange benefits from semantic interchange language • „Plug n‘play“ for dynamic (not necessarily „automatic“!!!) business process configuration: needs rich semantic descriptions

Steffen Staab (27) ISWeb – Informationssysteme & Semantic Web Prospectives of Semantic Web or WWW vs. Semantic Web revisited

WWW := Semantic Web := Hypertext & Semantic Web Language/Data & Internet & Ontologies & Social Phenomenon Internet & Social Phenomenon

Without New and Without Social Phenomenon important Social Phenomenon paradigms at = Intranet their time, but = Semantic Data „less“ outreach Integration Steffen Staab (28) ISWeb – Informationssysteme & Semantic Web „Less“ vs „More“ Outreach

„Less“ equals a multi-billion dollar market „More“ equals a change as radical as triggered by the WWW

Steffen Staab (29) ISWeb – Informationssysteme & Semantic Web Overview Literature

Frank van Harmelen, Grigoris Antinou. Semantic Web Primer, MIT Press 2005.

Steffen Staab (30) ISWeb – Informationssysteme & Semantic Web Part III

Semantic Web Languages

Steffen Staab (1) ISWeb – Informationssysteme & Semantic Web RDF

Steffen Staab (2) ISWeb – Informationssysteme & Semantic Web RDF Data Model

• Resources – A resource is a thing you talk about (can reference) – Resources have URI’s – RDF definitions are itself Resources (linkage) • Properties – slots, defines relationship to other resources or atomic values • Statements –“Resource has Property with Value” – (Values can be resources or atomic XML data) • Similar to Frame Systems

Steffen Staab (3) ISWeb – Informationssysteme & Semantic Web A simple Example

• Statement – “Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila” • Structure – Resource (subject) http://www.w3.org/Home/Lassila – Property (predicate) http://www.schema.org/#Creator – Value (object) "Ora Lassila” • Directed graph s:Creator http://www.w3.org/Home/Lassila Ora Lassila

Steffen Staab (4) ISWeb – Informationssysteme & Semantic Web Another Example

• To add properties to Creator, point through a intermediate Resource.

http://www.w3.org/Home/Lassila

s:Creator

Person://fi/654645635

Name Email

Ora Lassila [email protected]

Steffen Staab (5) ISWeb – Informationssysteme & Semantic Web Collection Containers

• Multiple occurrences of the same PropertyType doesn’t establish a relation between the values – The Millers own a boat, a bike, and a TV set – The Millers need (a car or a truck) – (Sarah and Bob) bought a new car • RDF defines three special Resources: – Bag unordered values rdf:Bag – Sequence ordered values rdf:Seq – Alternative single value rdf:Alt • Core RDF does not enforce ‘set’ semantics amongst values

Steffen Staab (6) ISWeb – Informationssysteme & Semantic Web Example: Bag

The students in course 6.001 are Amy, Tim, /courses/6.001 John, Mary, Rdf:Bag rdf:type and Sue /Students/Amy students rdf:_1 rdf:_2 /Students/Tim

bagid1 rdf:_3 /Students/John rdf:_4

/Students/Mary rdf:_5

/Students/Sue

Steffen Staab (7) ISWeb – Informationssysteme & Semantic Web Example: Alternative • The source code for X11 may be found at ftp.x.org, ftp.cs.purdue.edu, or ftp.eu.net

http://x.org/package/X11 rdf:Alt rdf:type

rdf:_1 altid ftp.x.org rdf:_2

ftp.cs.purdue.edu rdf:_3

ftp.eu.net

Steffen Staab (8) ISWeb – Informationssysteme & Semantic Web Statements about Statements (Requirement 2: Dispute Statements)

• Making statements about statements requires a process for transforming them into Resources – subject the original referent – predicate the original property type – object the original value – type rdf:Statement

Steffen Staab (9) ISWeb – Informationssysteme & Semantic Web Example: Reification

• Ralph Swick believes that – the creator of the resource http://www.w3.org/Home/Lassila is Ora Lassila

http://www.w3.org/Home/Lassila s:Creator rdf:predicate s:Creator rdf:subject

genid1

rdf:type rdf:object b:believedBy rdf:Statement Ora Lassila Ralph Swick

Steffen Staab (10) ISWeb – Informationssysteme & Semantic Web RDF Syntax I

• Datamodel does not enforce particular syntax • Specification suggests many different syntaxes based on XML • General form: Subject (OID) Starts an RDF-Description Ora Lassila Literal

Resource (possibly another RDF-description) Properties

Steffen Staab (11) ISWeb – Informationssysteme & Semantic Web Resulting Graph

http://www.w3.org/Home/Lassila

s:Creator s:createdWith

Ora Lassila http://www.w3c.org/amaya

Ora Lassila

Steffen Staab (12) ISWeb – Informationssysteme & Semantic Web RDF Syntax II: Syntactic Varieties Typing Information Subject (OID) In-Element Property

Steffen Staab (19) ISWeb – Informationssysteme & Semantic Web Description Logics (Terminological Logics, DLs) • Fragments of FOL • Most often decidable • Moderately expressive • Stem from semantic networks • W3C Standard OWL DL corresponds to SHOIN(D)

Steffen Staab (20) ISWeb – Informationssysteme & Semantic Web DLs – general structure • DLs are a Family of logic-based formalism for knowledge representation • Special language characterized by: – Constructors to define complex concepts and roles based on simpler ones. – Set of axiom to express facts using concepts, roles and individuals.

• ALC is the smallest DL, which is propositionally closed: – ∧, ∨, ¬ are constructors, noted by u, t, ¬. – Quantors define how roles are to be interpreted:

Man u ∃hasChild.Female u ∃hasChild.Male u ∀hasChild.(Rich t Happy)

Steffen Staab (21) ISWeb – Informationssysteme & Semantic Web Further DL concepts and role constructors • Number restrictions (cardinality constraints) for roles: ≥3 hasChild, ·1hasMother

• Qualified number restrictions: ≥2 hasChild.Female, ·1 hasParent.Male

• Nominals (definition by extension): {Italy, France, Spain}

• Concrete domains (datatypes): hasAge.(≥21)

• Inverse roles: hasChild– ≡ hasParent • Transitive roles: hasAncestor* (descendant) • Role composition: hasParent.hasBrother (uncle)

Steffen Staab (22) ISWeb – Informationssysteme & Semantic Web DL Knowledge Bases

• DL Knowledge Bases consist of two parts (in general): – TBox: Axioms, describing the structure of a modelled domain (conceptual schema): • HappyFather ≡ Man u ∃hasChild.Female u … • Elephant v Animal u Large u Grey • transitive(hasAncestor)

– Abox: Axiome describing concrete situations (data, facts): • HappyFather(John) • hasChild(John, Mary)

• The distinction between TBox/ABox does not have a deep logical distinction … but it is common useful modelling practice.

Steffen Staab (23) ISWeb – Informationssysteme & Semantic Web General DL Architecture

Knowledge Base

Tbox (schema)

Man ≡ Human u Male Happy-Father ≡ Man u ∃ has-child.Female u …

Abox (data) Interface

Happy-Father(John)

has-child(John, Mary) Inference System

Steffen Staab (24) ISWeb – Informationssysteme & Semantic Web Knowledge modelling in OWL Example ontology and conclusion from http://owl.man.ac.uk/2003/why/latest/#2 • Also an example for OWL Abstract Syntax.

Namespace(a = ) Ontology( ObjectProperty(a:drives) ObjectProperty(a:eaten_by) ObjectProperty(a:eats inverseOf(a:eaten_by) domain(a:animal)) … Class(a:adult partial annotation(rdfs:comment "Things that are adult.") Class(a:animal partial restriction(a:eats someValuesFrom (owl:Thing))) Class(a:animal_lover complete intersectionOf(restriction(a:has_pet minCardinality(3)) a:person)) …)

Steffen Staab (25) ISWeb – Informationssysteme & Semantic Web Knowledge modelling: examples Class(a:bus_driver complete intersectionOf(a:person restriction(a:drives someValuesFrom (a:bus)))) bus_driver ≡ person u ∃drives.bus Class(a:driver complete intersectionOf(a:person restriction(a:drives someValuesFrom (a:vehicle)))) driver ≡ person u ∃drives.vehicle Class(a:bus partial a:vehicle) bus v vehicle • A bus driver is a person that drives a bus. • A bus is a vehicle. • A bus driver drives a vehicle, so must be a driver. The subclass is inferred due to subclasses being used in existential quantification. Steffen Staab (26) ISWeb – Informationssysteme & Semantic Web Knowledge modelling: examples

Class(a:driver complete intersectionOf(a:person restriction(a:drives someValuesFrom (a:vehicle)))) driver ≡ person u ∃drives.vehicle

Class(a:driver partial a:adult) driver v adult

Class(a:grownup complete intersectionOf(a:adult a:person)) grownup ≡ adult u person • Drivers are defined as persons that drive cars (complete definition) • We also know that drivers are adults (partial definition) • So all drivers must be adult persons (e.g. grownups)

An example of axioms being used to assert additional necessary information about a class. We do not need to know that a driver is an adult in order to recognize one, but once we have recognized a driver, we know that they must be adult.

Steffen Staab (27) ISWeb – Informationssysteme & Semantic Web ∃partof.animal t animal ≡/ plant t ∃partof.plant Knowledge modelling: Examples Class(a:cow partial a:vegetarian) DisjointClasses(unionOf(restriction(a:part_of someValuesFrom (a:animal)) a:animal) unionOf(a:plant restriction(a:part_of someValuesFrom (a:plant)))) Class(a:vegetarian complete intersectionOf( restriction(a:eats allValuesFrom (complementOf(restriction(a:part_of someValuesFrom (a:animal))))) restriction(a:eats allValuesFrom (complementOf(a:animal))) a:animal)) Class(a:mad_cow complete intersectionOf(a:cow restriction(a:eats someValuesFrom (intersectionOf(restriction(a:part_of someValuesFrom (a:sheep)) a:brain))))) Class(a:sheep partial a:animal restriction(a:eats allValuesFrom (a:grass)))

• Cows are naturally vegetarians • A mad cow is one that has been eating sheeps brains • Sheep are animals Thus a mad cow has been eating part of an animal, which is inconsistent with the definition of a vegetarian Steffen Staab (28) ISWeb – Informationssysteme & Semantic Web Knowledge modelling: Example

Individual(a:Walt type(a:person) value(a:has_pet a:Huey) value(a:has_pet a:Louie) value(a:has_pet a:Dewey)) Individual(a:Huey type(a:duck)) Individual(a:Dewey type(a:duck)) Individual(a:Louie type(a:duck)) DifferentIndividuals(a:Huey a:Dewey a:Louie) Class(a:animal_lover complete intersectionOf(a:person restriction(a:has_pet minCardinality(3)))) ObjectProperty(a:has_pet domain(a:person) range(a:animal))

• Walt has pets Huey, Dewey and Louie. • Huey, Dewey and Louie are all distinct individuals. • Walt has at least three pets and is thus an animal lover.

Note that in this case, we don’t actually need to include person in the definition of animal lover (as the domain restriction will allow us to draw this inference).

Steffen Staab (29) ISWeb – Informationssysteme & Semantic Web Knowledge modelling: Some Research Challenges • Concluding with – uncertainty (fuzzy, probabilistic) – Inkonsistencies (paraconsistent) –Rules – Further AI-Paradigms (nonmonotonic reasoning, preferences …) • Maintenance (updates, infrastructure, etc) • Scalability of reasoning •…

Steffen Staab (30) ISWeb – Informationssysteme & Semantic Web Application Scenario: Semantic Inference

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -1- Audi: Semantic Testcar Configuration

Background

Complex dependencies decrease the speed of development Knowledge is distributed over different departments

Goal

Design of a Semantic Guide for capturing the dependencies Configuration of components Integration into existing order system Engineers can concentrate on creative efforts

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -2- Inference: to conclude implicit facts

Testengine is ready to test in a car

Rule 2: All parts have to be tested and released

Testengine has been tested and released

Fit of Testengine and Chassis 17

Rule 1: A Chassis has to be suited for the power of the engine

Chassis 17 is suited for 110 KW

Testengine has 104 KW

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -3- Application Scenario: Semantic Data Integration and Search

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -4- Knowledge Management for your Projects

ƒ Users keep their established ? ? software tools ? ƒ A knowledge model (ontology) both integrates and structures Object the information ƒ The ontology is enriched with Person Topic Document specific expertise ƒ The ontology empowers a Decision Metho- Technician Content Application Maker dology context-aware and easy-to-use search and navigationsystem ƒ All information to stay in their original place

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -5- Deutsche Post IT Solutions (DHL group)

A Companywide Search- & Integration-Project

Goals ƒ Improve the effectivity and quality of work

ƒ Integrated serach over multiple sources ƒ Usage of an ontology to improve results ƒ Simple interface ƒ Proof of Concept for SemanticWeb technology for whole group

Facts ƒ Users: 1000 people ƒ project duration: 2 months

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -6- Editorial Process for Ontology Evolution

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -7- Application Scenario: Semantic Data Integration - II

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -8- Integration Problems

Languages Name Conflict Value Conflict Structure Conflict Duplicates Missing Information Multiple Interfaces

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -9- Integration Problems

Languages Name Conflict Value Conflict Structure Conflict Duplicates Missing Information Multiple Interfaces

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -10- Integration Problems

Languages Name Conflict Value Conflict Structure Conflict Duplicates Missing Information Multiple Interfaces

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -11- Integration Problems

Languages Name Conflict Value Conflict Structure Conflict Duplicates Missing Information Multiple Interfaces

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -12- Integration Problems

Languages Name Conflict Value Conflict Structure Conflict Duplicates Missing Information Multiple Interfaces

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -13- Integration Problems

Languages Name Conflict Value Conflict Structure Conflict Duplicates Missing Information Multiple Interfaces

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -14- VielfältigeIntegration Integrationsprobleme Problems

Languages Name Conflict Value Conflict Structure Conflict Duplicates Missing Information Multiple Interfaces

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -15- Import and Mapping of DB-Structures

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -16- Application Scenario: Intelligent Question Answering

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -17- Vulcan Inc: OntoBroker passes Advanced Placement Test

Background • Development of a Digital Aristotle • Phase 1 successfully closed in 2003 • Phase 2 since January 2004

Functions • Capturing of extensive set of chemical knowledge • System passed the „Advanced Placement Test“ • Query is answered and answer is explained

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -18- Ontobroker™ passed the Advanced Placement Test! ƒ Correct Answers ƒ Correct Explanations

Performance ƒ CYCORP 1650 Minutes ƒ Student 240 Minutes ƒ Stanford Research 38 Minutes ƒ Ontoprise 9 Minutes

www.ontoprise.de

© 2006 ontoprise GmbH Home | Menu | Technology | References | End -19- Semantic Web Applications – on the Internet

Steffen Staab (1) ISWeb – Informationssysteme & Semantic Web [IEEE Data Engineering, 2002] Conceptual architecture for semantic portal

Presentation RDF HTML HTML Navi- Query Output Page Form Gation API & Use (HTML)

Presen- Presen- Navi- tation tation Input gation View Selection View ... View View View

Common OntologIE Semantics Datenbank

Integration

Common X(HT)ML Rel-DB RDF API K-Edutella data model Wrapper Wrapper ... Wrapper FileS RDF Sources Relational ... DatabaseSteffen Staab (2) P2P ISWeb – Informationssysteme & Semantic Web [CRIS 2002] OntoWeb-Portal { } http://www.ontoweb.org

Participating Siten

...

{ }

Content Participating Site2 Syndication Service { }

Annotated Ontology Generated Web Pages Participating Site1 Content Objects Browse & Query Front End OntoWeb Community EU IST Projekt

OntoWeb Steffen Staab (3) ISWeb – Informationssysteme & Semantic Web P2P Application: Bibster

Steffen Staab (4) ISWeb – Informationssysteme & Semantic Web Ontologies & Text

Part V

Steffen Staab (5) ISWeb – Informationssysteme & Semantic Web OL from Text as Reverse Engineering Shared World Model

Reverse Engineering

Write

Steffen Staab (6) ISWeb – Informationssysteme & Semantic Web Some pre-History of Ontology Learning • AI: Knowledge Acquisition

– Since 60s/70s: Semantic Network Extraction and similar for Story Understanding • Systems: e.g. MARGIE (Schank et al., 1973), LUNAR (Woods, 1973)

• NLP: Lexical Knowledge Extraction

– 70s/80s: Extraction of Lexical Semantic Representations from Machine Readable Dictionaries • Systems: e.g. ACQUILEX LKB (Copestake et al.)

– 80s/90s: Extraction of Semantic Lexicons from Corpora for Information Extraction Systems • Systems: e.g. AutoSlog (Riloff, 1993), CRYSTAL (Soderland et al., 1995)

• IR: Thesaurus Extraction

– Since 60s: Extraction of Keywords, Thesauri and Controlled Vocabularies • Based on construction and use of thesauri in IR (Sparck-Jones, 1966/1986, 1971) • Systems: e.g. Sextant (Grefenstette, 1992), DR-Link (Liddy, 1994)

Steffen Staab (7) ISWeb – Informationssysteme & Semantic Web Some Current Work on Ontology Learning from Text

Term Extraction • Statistical Analysis • Patterns • (Shallow) Linguistic Parsing • Term Disambiguation & Compositional Interpretation • Combinations

Taxonomy Extraction • Statistical Analysis & Clustering (e.g. FCA) • Patterns • (Shallow) Linguistic Parsing • WordNet • Combinations

Relation Extraction • Anonymous Relations (e.g. with Association Rules) • Named Relations (Linguistic Parsing) • (Linguistic) Compound Analysis • Web Mining, Social Network Analysis • Combinations

Relation Label Extraction • Extension of Association Rules Algorithm

Definition Extraction • (Linguistic) Compound Analysis (incl. WordNet)

Steffen Staab (8) ISWeb – Informationssysteme & Semantic Web Some Current Work on Ontology Learning from Text

AIFB – TextToOnto (Maedche and Staab, 2000; Cimiano et al., 2005) – Term Extraction and Taxonomy Extraction • Statistical Analysis • Conceptual Clustering (FCA), Patterns, WordNet (+ Combination) – Relation Extraction • Anonymous Relations (Association Rules) • Named Relations (Subcategorization Frames)

CNTS Univ. Antwerpen, VUB (Reinberger et al., 2004) – Concept Formation + Relation Extraction • Shallow Linguistic Parsing • Clustering

DFKI – OntoLT (Buitelaar et al., 2004), RelExt (Schutz and Buitelaar, 2005) – Term Extraction • Shallow Linguistic Parsing & Statistical Analysis – Taxonomy and Relation Extraction • Shallow Linguistic Parsing & manually defined mapping rules • Named Relations (Subcategorization Frames)

Steffen Staab (9) ISWeb – Informationssysteme & Semantic Web Some Current Work on Ontology Learning from Text

Economic Univ., Prague (Kavalec and Svatek, 2005) – Relation Label Extraction • Extension of Association Rules Algorithm

Free Univ. Amsterdam (Sabou, 2005) – Term and Taxonomy Extraction (for Web Service Ontologies) • Shallow Linguistic Analysis & Patterns

Jozef Stefan Inst., Ljubljana -- OntoGen (Fortuna et al., 2005) – Term and Taxonomy Extraction • Statistical Analysis & Clustering – Relations • Web Mining, Social Network Analysis

Univ. Paris -- ASIUM (Faure and Nedellec, 1998) – Taxonomy Extraction (& Subcategorization Frames) • Shallow Linguistic Parsing • Clustering

Steffen Staab (10) ISWeb – Informationssysteme & Semantic Web Some Current Work on Ontology Learning from Text

Univ. Rome – OntoLearn (Navigli and Velardi, 2004; Velardi et al., 2005) – Term Extraction and Interpretation • Shallow Linguistic Parsing &Term Disambiguation & Compositional Interpretation – Relations • Classification of the relation between terms in a compound into predefined set of (thematic) relations – Definitions • Rules for Gloss Generation

Univ. of Zürich (Rinaldi et al., 2005) – Term and Taxonomy Extraction • Shallow Linguistic Analysis & Patterns

Steffen Staab (11) ISWeb – Informationssysteme & Semantic Web Ontology Learning Layer Cake

∀x, y (sufferFrom (x, y) → ill(x)) Rules & Axioms cure(dom:DOCTOR,range:DISEASE) Relations is_a(DOCTOR,PERSON) Taxonomy

DISEASE:= Concepts

{disease, illness, Krankheit} (Multilingual) Synonyms disease, illness, hospital Terms

Introduced in: Philipp Cimiano, PhD Thesis University of Karlsruhe, forthcoming / also available as Springer book, end of 2006

Steffen Staab (12) ISWeb – Informationssysteme & Semantic Web Ontology Learning Layer Cake

∀x, y (sufferFrom (x, y) → ill(x)) Rules & Axioms cure(dom:DOCTOR,range:DISEASE) Relations is_a(DOCTOR,PERSON) Taxonomy

DISEASE:= Concepts

{disease, illness, Krankheit} (Multilingual) Synonyms disease, illness, hospital Terms

Steffen Staab (13) ISWeb – Informationssysteme & Semantic Web Terms Terms are at the basis of the ontology learning process

– Terms express more or less complex semantic units – But what is a term?

Huge Selection of Top Brand Computer Terminals Available for Immediate Delivery Because Vecmar carries such a large inventory of high-quality computer terminals, including: ADDS terminals, Boundless terminals, DEC terminals, HP terminals, IBM terminals, LINK terminals, NCR terminals and Wyse terminals, your order can often ship same day. Every computer terminal shipped to you is protected with careful packing, including thick boxes. All of our shipping options - including international - are available through major carriers.

– Extracted term candidates (phrases)

- computer - terminal - computer terminal - ? high-quality computer terminal - ? top brand computer terminal - ? HP terminal, DEC terminal, …

Steffen Staab (14) ISWeb – Informationssysteme & Semantic Web Term Extraction Determine most relevant phrases as terms

– Linguistic Methods • Rules over linguistically analyzed text – Linguistic analysis – Part-of-Speech Tagging, Morphological Analysis, … – Extract patterns – Adjective-Noun, Noun-Noun, Adj-Noun-Noun, … – Ignore Names (DEC, HP, …), Certain Adjectives (quality, top, …), etc.

– Statistical Methods • Co-occurrence (collocation) analysis for term extraction within the corpus • Comparison of frequencies between domain and general corpora – Computer Terminal will be specific to the Computer domain – Dining Table will be less specific to the Computer domain – Hybrid Methods • Linguistic rules to extract term candidates • Statistical (pre- or post-) filtering

Steffen Staab (15) ISWeb – Informationssysteme & Semantic Web Ontology Learning Layer Cake

∀x, y (sufferFrom (x, y) → ill(x)) Rules & Axioms cure(dom:DOCTOR,range:DISEASE) Relations is_a(DOCTOR,PERSON) Taxonomy

DISEASE:= Concepts

{disease, illness, Krankheit} (Multilingual) Synonyms disease, illness, hospital Terms

Steffen Staab (16) ISWeb – Informationssysteme & Semantic Web Extraction of Synonyms

Term Classification and Clustering

– Classification • Classifying terms to existing class systems, e.g., by extending WordNet (with SynSets corresponding to classes)

– Clustering • Clusters according to similar distributions, e.g., by measuring co-occurrence between terms

Steffen Staab (17) ISWeb – Informationssysteme & Semantic Web Ontology Learning Layer Cake

∀x, y (sufferFrom(x, y) → ill(x)) Rules & Axioms cure(dom:DOCTOR,range:DISEASE) Relations is_a(DOCTOR,PERSON) Taxonomy

DISEASE:= Concepts

{disease, illness, Krankheit} (Multilingual) Synonyms disease, illness, hospital Terms

Steffen Staab (18) ISWeb – Informationssysteme & Semantic Web The Semiotic Triangle Ogden & Richards, 1923

• based on Structural Linguistics studies (de Saussure, 1916)

• adopted in Knowledge Representation (e.g. Sowa, 1984)

Steffen Staab (19) ISWeb – Informationssysteme & Semantic Web Concepts: Intension, Extension, Lexicon A term may indicate a concept, if we can define its – Intension • (in)formal definition of the set of objects that this concept describes – a disease is an impairment of health or a condition of abnormal functioning

– Extension • a set of objects (instances) that the definition of this concept describes – influenza, cancer, heart disease, …

Discussion: what is an instance? - ‘heart disease’ or ‘my uncle’s heart disease’

– Lexical Realizations • the term itself and its multilingual synonyms – disease, illness, Krankheit, maladie, …

Discussion: synonyms vs. instances – ‘disease’, ‘heart disease’, ‘cancer’, …

Steffen Staab (20) ISWeb – Informationssysteme & Semantic Web Concepts: Intension

Extraction of a Definition for a Concept from Text

– Informal Definition • e.g., a gloss for the concept as used in WordNet • OntoLearn (Navigli and Velardi, 2004; Velardi et al., 2005) uses natural language generation to compositionally build up a WordNet gloss for automatically extracted concepts – ‘Integration Strategy’ : “strategy for the integration of …”

– Formal Definition • e.g., a logical form that defines all formal constraints on class membership • Inductive Logic Programming, Formal Concept Analysis, …

Steffen Staab (21) ISWeb – Informationssysteme & Semantic Web Concepts: Extension

Extraction of Instances for a Concept from Text

– Commonly referred to as Ontology Population – Relates to Knowledge Markup (Semantic Metadata) – Uses Named-Entity Recognition and Information Extraction

– Instances can be:

• Names for objects, e.g. – Person, Organization, Country, City, …

• Event instances (with participant and property instances), e.g. – Football Match (with Teams, Players, Officials, ...) – Disease (with Patient-Name, Symptoms, Date, …) Steffen Staab (22) ISWeb – Informationssysteme & Semantic Web Concepts: Lexicon Extraction of Synonyms and Translations for a Concept from Text – (Multilingual) Term Extraction – see previous slides – Representation of Lexical Information in Ontologies

rdfs:Class rdf:type meta- URI rdfs:subClassOf property ... classes Legend feat:ClassWithFeats

feat:ClassWithFeats rdfs:Class o:StorageProduct if:ImgFeat

rdfs: feat:ClassWithFeats subClassOf feat:ClassWithFeats rdfs:Class classes o:Cupboard o:Refrigerator lf:LingFeat feat:lingFeat feat:imgFeat ... feat:lingFeat

lf:LingFeat lf:LingFeat if:ImgFeat lf:lang “de” lf:lang “de” if:color “#111111” lf:term “Schrank” lf:term “Kühlschrank” if:shape “cuboid” instances lf:morph lf:morph lf:texture “&keypatchSet_223” lf:context ... lf:context ...

lf:Morph ... lf:head “Schrank” lf:pos “noun”

Steffen Staab (23) ISWeb – Informationssysteme & Semantic Web The Mathematical Definition of an Ontology [Stumme et al.; abbrev. from Cimiano-06] • Structure: C := (C,

– L-Axiom System: Arbitrary Axioms (may include patterns) Steffen Staab (24) ISWeb – Informationssysteme & Semantic Web Lexicon

Def: A Lexicon for an ontology is a structure

Lex:={SC,SR,RefC,RefR}

SC,SR are called signs for concepts and relations, respectively.

RefC,RefR, are binary relations denoting lexical references for concepts and relations, respectively.

Example:

RefC(„car“)={car-concept1,car-concept2} RefC(„automobile“)={car-concept1} -1 RefC (car-concept1)={„car“, „automobile“} Steffen Staab (25) ISWeb – Informationssysteme & Semantic Web Ontology Learning Layer Cake

∀x, y (sufferFrom (x, y) → ill(x)) Rules & Axioms cure(dom:DOCTOR,range:DISEASE) Relations is_a(DOCTOR,PERSON) Taxonomy

DISEASE:= Concepts

{disease, illness, Krankheit} (Multilingual) Synonyms disease, illness, hospital Terms

Steffen Staab (26) ISWeb – Informationssysteme & Semantic Web Distributional Hypothesis & Vector Space Model • Harris, 1986 – „Words are (semantically) similar to the extent to which they share similar words“ • Firth, 1957 – „You shall know a word by the company it keeps“

• Idea: collect context information and represent it as a vector: book_obj rent_obj drive_obj ride_obj join_obj apartment X X car X X X motor-bike X X X X excursion X X trip X X • compute similarity among vectors wrt. a measure

Steffen Staab (27) ISWeb – Informationssysteme & Semantic Web Context Features

• Four-grams [Schuetze 93]

• Word-windows [Grefenstette 92]

• Predicate-Argument relations (every man loves a woman) Modifier Relations (fast car, the hood of the car) – [Grefenstette 92, Cimiano 04b, Gasperin et al. 03]

• Appositions (Ferrari, the fastest car in the world) – [Hahn & Schnattinger 98, Caraballo 99]

• Coordination (ladies and gentlemen) – [Caraballo 99, Dorow and Widdows 03] Steffen Staab (28) ISWeb – Informationssysteme & Semantic Web Overall Process

Or other clustering mechanism

Steffen Staab (29) ISWeb – Informationssysteme & Semantic Web Using Syntactic Surface Dependencies Mopti is the biggest city along the Niger with one of the most vibrant ports and a large bustling market. Mopti has a traditional ambience that other towns seem to have lost. It is also the center of the local tourist industry and suffers from hard-sell overload. The nearby junction towns of Gao and San offer nice views over the Niger’s delta.

city: biggest(1) ambience: traditional(1) center: of_tourist_industry(1) junction town: nearby(1) market: bustling(1) port: vibrant(1) overload:suffer_from(1) tourist industry: center_of(1), local(1) town: seem_subj(1) view: nice(1), offer_obj(1) Steffen Staab (30) ISWeb – Informationssysteme & Semantic Web Context Extraction Process

• extract syntactic dependencies from text ⇒ verb/object, verb/subject, verb/PP relations ⇒ car: drive_obj, crash_subj, sit_in, …

s crashed_subj(cars) sit_in(car) dp vp sat_in(car) crash_subj(car) drove_obj(car) drive_obj(car) vdp

LoPar tgrep lemmatization

Steffen Staab (31) ISWeb – Informationssysteme & Semantic Web Weighting

• Observation: – output of the parser can be erroneous – not all attribute/object pairs are significant

• Conditional Probability: P(n | varg ) • Consider attribute/object pairs with weight over threshold t

Steffen Staab (32) ISWeb – Informationssysteme & Semantic Web Set Theoretical & Probabilistic Clustering

bookable rentable drivable ridable joinable

• Set theoretical apartment X X – Formal Concept Analysis car X X X motor-bike X X X X [Ganter and Wille 1999] excursion X X trip X X

Steffen Staab (33) ISWeb – Informationssysteme & Semantic Web Tourism Formal Context

bookable rentable driveable rideable joinable appartment X X car X X X motor-bike X X X X excursion X X trip X X

Steffen Staab (34) ISWeb – Informationssysteme & Semantic Web Tourism Lattice

Steffen Staab (35) ISWeb – Informationssysteme & Semantic Web Concept Hierarchy

bookable

rentable joinable

driveable appartment excursion trip

rideable car

motor-bike

Steffen Staab (36) ISWeb – Informationssysteme & Semantic Web Compacting the hierarchy

bookable

rentable joinable

driveable appartment excursion trip motor-bike car

Steffen Staab (37) ISWeb – Informationssysteme & Semantic Web Evaluation - Data Sets

• Tourism (118 Mio. tokens): – http://www.all-in-all.de/english – http://www.lonelyplanet.com – British National Corpus (BNC) – handcrafted tourism ontology (289 concepts) • Finance (185 Mio. tokens): – Reuters news from 1987 – GETESS finance ontology (1178 concepts)

Steffen Staab (38) ISWeb – Informationssysteme & Semantic Web Precision/Recall/F-Measure

FCA (Tourism)

1,2

1

0,8 Prec 0,6 Recall F 0,4

0,2

0 0 0,2 0,4 0,6 0,8 1 threshold t

Steffen Staab (39) ISWeb – Informationssysteme & Semantic Web Lexical Recall, F‘

FCA (Tourism)

0,5 0,45 0,4 0,35 0,3 F 0,25 LR 0,2 F' 0,15 0,1 0,05 0 0 0,2 0,4 0,6 0,8 1 threshold t

Steffen Staab (40) ISWeb – Informationssysteme & Semantic Web Comparison (Tourism, F‘)

Comparison (Tourism)

0,5 0,45 0,4 0,35 FCA 0,3 Complete Linkage 0,25 Average Linkage 0,2 Single Linkage 0,15 Bi-Section-Kmeans 0,1 0,05 0 0 0,2 0,4 0,6 0,8 1 threshold t

Steffen Staab (41) ISWeb – Informationssysteme & Semantic Web Comparison (Finance, F‘)

Comparison (Finance)

0,45 0,4 0,35 FCA 0,3 Complete-Linkage 0,25 Average Linkage 0,2 Single Linkage 0,15 Bi-Section-Kmeans 0,1 0,05 0 0 0,2 0,4 0,6 0,8 1 threshold t

Steffen Staab (42) ISWeb – Informationssysteme & Semantic Web Clustering – Comparison

F-Measure Worst Case Understandability Time Complexity FCA 43.81/41.02% O(2n) Good (pract. better!) Agglomerative 36.78/33.35% O(n2 log(n)) Fair Clustering 36.55/32.92% O(n2) 38.57/32.15% O(n2) Divisive 36.42/32.77% O(n2) Weak-Fair Clustering

Steffen Staab (43) ISWeb – Informationssysteme & Semantic Web Problem 1: Labeling of Clusters • Caraballo’s Method [1999]: – Agglomerative Clustering – Labeling Clusters with hypernyms derived from Hearst patterns – Removing unlabeled concepts thus compacting the hierarchy

• Evaluation: select 20 nouns with at least 20 hypernyms and present them to human judges with the 3 best hypernyms for each

•Results: – Best Hypernym (33% (Majority) / 39% (Any) – Any Hypernym (47.5% (Majority) / 60.5% (Any))

Steffen Staab (44) ISWeb – Informationssysteme & Semantic Web Problem 2: Spurious Similarities

• Guided Clustering [Cimiano 2005c]: – Integrate a externally derived hypernym oracle into the agglomerative clustering algorithm – Two terms are only clustered if they have a common hypernym according to the oracle – Label the cluster with the common hypernym ⇒Demonstrably better hierarchies ⇒Labels for the cluster

⇒Reuse techniques from Clustering with constraints!

Steffen Staab (45) ISWeb – Informationssysteme & Semantic Web Conclusion about Comparison

• FCA is an interesting alternative to similarity-based clustering approaches – high traceability due to intensional description of clusters – Problem: worst case exponential in the size of the formal context – But: Zipfian distribution of attributes

Steffen Staab (46) ISWeb – Informationssysteme & Semantic Web Using Ontologies with Text Retrieval

Steffen Staab (47) ISWeb – Informationssysteme & Semantic Web Using Ontologies

Ontologies as:

• background knowledge for text clustering and classification • basis for recommender systems • background knowledge in ILP • knowledge for models in Statistical Relational Learning

Steffen Staab (48) ISWeb – Informationssysteme & Semantic Web Text Clustering & Classification Approaches Documents Bag of Words oman has granded … Obj1 2 2 1 … Obj2 1 1 0 … Obj300 2 … Obj40 0 2 …

background knowledge

clustering/ classification algorithm

Steffen Staab (49) ISWeb – Informationssysteme & Semantic Web Text Clustering & Classification Approaches Documents Bag of Words Dok 17892 crude ======Oman 2 Oman has granted term crude oil has 1 customers retroactive discounts from granted 1 official prices of 30 to 38 cents per barrel term 1 on liftings made during February, March crude 1 and April, the weekly newsletter Middle oil 2 East Economic Survey (MEES) said. customers 1 MEES said the price adjustments, arrived retroactive 1 at through negotiations between the discounts 1 Omani oil ministry and companies ...... concerned, are designed to compensate for the difference between market- related prices and the official price of 17.63 dlrs per barrel adopted byFurther non- preprocessing steps OPEC Oman since February. -Stopwords REUTER -Stemming Steffen Staab (50) ISWeb – Informationssysteme & Semantic Web WordNet as an example and ontology

Root entity Strategies: something all, first, context substance physical object 109377 Concepts chemical artifact (synsets) compound covering bless cover organic Use of superconcepts compound coating (Hypernyms in Wordnet) • Exploit more generalized concepts lipid paint oil, •anoint e.g.: chemicalcover compound with oil is the oil 3rd superconcept of oil oil paint

crude oil oil color

144684 lexicallexical entries EN:oilSteffen Staab (51) entries ISWeb – InformationssystemeEN:anoint & Semantic WebEN:inunct Ontology-based representation

Oman 1 Oman 1 Oman 1 has 1 granted 1 granted 1 granted 1 term 1 term 1 term 1 (C) term 1 (C) term 1 crude 1 crude 1 crude 1 oil 1 (C) crude 1 (C) crude 1 customers 1 oil 1 oil 1 retroactive 1 (C) oil 1 (C) oil 1 discounts 1 customer 1 (C) lipid 1 ...... (C) customer 1 (C) compound 1 ......

1 2 3

Steffen Staab (52) ISWeb – Informationssystemestrategy: add& Semantic Web Evaluation parameter • min 15, max 100, 2619 documents of the reuters corpus CLUSTERCOUNT60 EXAMPLE100 MINCOUNT 15 • clusterEvaluation k = 60, with BiSec-KMeans of Text Clustering

avgMittelwert - purity - PURITY 0,650

0,618 0,616

0,600 0,570

0,550

0,500 WEIGHT PRUNE

0,450 tfidf - 30 without - 30

0,400

0,350

0,300 add repl add only repl add only repl add only repl add only repl add only repl add only integrat. context context first all context first all disambig. 00 5depth false true backgro.. Steffen Staab (53) ISWebONTO – InformationssystemeHYPDEPTH HYPDIS HYPINT& Semantic Web Evaluation: OHSUMED Classification Results Top 50 classes with WordNet and AdaBoost

Steffen Staab (54) ISWeb – Informationssysteme & Semantic Web Combine FCA & Text- clustering 1. preprocess Reuters documents and enrich them with background knowledge (Wordnet) 2. calculate a reasonable number k (100) of clusters with BiSec-k-Means using cosine similarity 3. extract a description for all clusters 4. relate clusters (objects) with FCA 5. use the visualization of the concept lattice for better understanding

Steffen Staab (55) ISWeb – Informationssysteme & Semantic Web Explaining Clustering Results with FCA

refiner

oil

compound, chemical compound chain of concepts with increasing specificity

Steffen Staab (56) ISWeb – Informationssysteme & Semantic Web Explaining Clustering Results with FCA

Crude oil barrel

Steffen Staab (57) ISWeb – Informationssysteme & Semantic Web Explaining Clustering Results with FCA

resin palm

• Resulting concept lattice can also be interpreted as a concept hierarchy directly on the documents • all documents in one cluster obtain exactly Steffen Staab (58) ISWeb – Informationssysteme &the Semantic same Web description Conclusion: Ontologies + Text

• Ontologies may be discovered as regularities unterlying some text

• Ontologies improve access to text – By annotation (cf part 2) – By retrieval (this part)

Steffen Staab (59) ISWeb – Informationssysteme & Semantic Web