Ontologies and the Semantic Web
SMBM-2006 Jena, April 9, 2006
Steffen Staab http://isweb.uni-koblenz.de
Steffen Staab (1) ISWeb – Informationssysteme & Semantic Web Agenda
1. Ontologies 2. Semantic Web 3. Semantic Web Languages 4. Some Applications (Ontoprise) 5. Ontologies & Text
Steffen Staab (2) ISWeb – Informationssysteme & Semantic Web Part I
Introduction to Ontologies
Steffen Staab (3) ISWeb – Informationssysteme & Semantic Web Origin and History
• Ontology in Philosophy • a philosophical discipline, branch of philosophy that deals with the nature and the organization of reality
• Science of Being (Aristotle, Metaphysics, IV, 1)
• Tries to answer the questions: • What characterizes being? • Eventually, what is being?
Steffen Staab (4) ISWeb – Informationssysteme & Semantic Web Aristotle - Ontology
• Before: study of the nature of being
• Since Aristotle: study of knowledge representation and reasoning • Terminology: – Genus: (Classes) – Species: (Subclasses) – Differentiae: (Characteristics which allow to group or distinguish objects from each other) • Syllogisms (Inference Rules)
Steffen Staab (5) ISWeb – Informationssysteme & Semantic Web Example for differentiae (adapted from Uta Priss, in preparation)
real cartoon cat dog rabbit fish gorilla koala mammal Garfield X X X
Snoopy X X X
Bugs X X X Bunny Nemo X X
Copito X X X
Osmond X X X
Steffen Staab (6) ISWeb – Informationssysteme & Semantic Web Organizing the Objects as a Lattice
Steffen Staab (7) ISWeb – Informationssysteme & Semantic Web What is an Ontology? Gruber 93:
An Ontology is a formal specification ⇒ Executable, Discussable of a shared ⇒ Group of persons conceptualization ⇒ About concepts of a domain of interest ⇒ Between application and „unique truth“
Steffen Staab (8) ISWeb – Informationssysteme & Semantic Web Why Develop an Ontology?
• To make domain assumptions explicit – Easier to change domain assumptions – Easier to understand and update legacy data • To separate domain knowledge from operational knowledge – Re-use domain and operational knowledge separately •A community reference for applications •To share a consistent understanding of what information means
Steffen Staab (9) ISWeb – Informationssysteme & Semantic Web Menu Taxonomy
Object
Person Topic Document
Student Researcher Semantics
Doctoral Student PhD Student F-Logic Ontology
Taxonomy := Segmentation, classification and ordering of elements into a classification system according to their relationships between each other
Steffen Staab (10) ISWeb – Informationssysteme & Semantic Web Menu Thesaurus
Object
Person Topic Document
Student Researcher Semantics
Doktoral Student PhD Student F-Logic Ontology
synonym similar
• Terminology for specific domain • Taxonomy plus fixed relationships (similar, synonym, related to) • originate from bibliography
Steffen Staab (11) ISWeb – Informationssysteme & Semantic Web Menu Topic Map
Object
knows described_in Person Topic Document writes
Student Researcher Semantics
Doktoral Student PhD Student F-Logic Ontology
synonym similar Tel Affiliation
• Topics (nodes), relationships and occurences (to documents) • ISO-Standard • typically for navigation- and visualisation
Steffen Staab (12) ISWeb – Informationssysteme & Semantic Web Ontology (in our sense)
Object is_a-1
knows described_in Person Topic Document writes is_a-1
Student Researcher Semantics F-Logic Ontology
is_a-1 subTopicOf similar Affiliation DoktoralPhDPhD Student Student Student PhD Student F-Logic Ontology Rules instance_of-1 T described_insimilar D T is_about D Tel Affiliation York Sure P writes D is_about T P knows T
+49 721 608 6592 AIFB
• Representation Language: Predicate Logic (F-Logic) • Standards: RDF(S); OWL Steffen Staab (13) ISWeb – Informationssysteme & Semantic Web Ontologies - Some Examples
• General purpose ontologies: – DOLCE, http://www.loa-cnr.it/DOLCE.html – The Upper Cyc Ontology, http://www.cyc.com/cyc-2-1/index.html – IEEE Standard Upper Ontology, http://suo.ieee.org/ • Domain and application-specific ontologies: – GALEN, http://www.openclinical.org/prj_galen.html – Foundational Model of Anatomy, http://sig.biostr.washington.edu/projects/fm/AboutFM.html – RETSINA Calendering Agent, http://ilrt.org/discovery/2001/06/schemas/ical-full/hybrid.rdf – Dublin Core, http://dublincore.org/ • Semantic Desktop Ontologies – Semantics-Aware instant Messaging: SAM Ontology, http://www.uni-koblenz.de/FB4/Institutes/IFI/AGStaab/Research/sam – Haystack, http://haystack.lcs.mit.edu/ – Gnowsis, http://www.gnowsis.org/ – Piggybank, http://simile.mit.edu/piggy-bank/ • Web Services Ontologies – Core ontology of services http://cos.ontoware.org – Web Service Modeling ontology http://www.wsmo.org –OWL-S, http://www.daml.org/services/owl-s/1.0/ • Ontologies in a wider sense – GO - Gene Ontology, http://www.geneontology.org/ –UMLS, http://www.nlm.nih.gov/research/umls/ – Agrovoc, http://www.fao.org/agrovoc/ – Art and Architecture, http://www.getty.edu/research/tools/vocabulary/aat/ – DTD standardizations, e.g. HR-XML, http://www.hr-xml.org/ – WordNet / EuroWordNet, http://www.cogsci.princeton.edu/~wn
Steffen Staab (14) ISWeb – Informationssysteme & Semantic Web Ontologies and Their Relatives
General Formal logical Is-a constraints Catalog / ID Thesauri Frames
Informal Terms/ Formal Value Is-a Axioms Glossary Instance Restric- Disjoint tions Inverse Relations, ...
Steffen Staab (15) ISWeb – Informationssysteme & Semantic Web Ontologies and Their Relatives (cont´d)
Topic Maps Front-End Thesauri
Navigation Taxonomies Information Retrieval Query Expansion Sharing of Knowledge
Queries Ontologies Semantic Networks Consistency Checking EAI Mediation Reasoning
Extended ER-Models Predicate Logic Back-End
Steffen Staab (16) ISWeb – Informationssysteme & Semantic Web Applications of Ontologies
• Natural Language Processing and Machine Translation, e.g. Nirenburg et al. 2004, Maedche et al. 2001, Agirre et al. 1996, Beale et al. 1995 • Semantic Web, see http://www.w3.org/2001/sw/ and http://www.w3.org/2001/sw/WebOnt/ • Knowledge Engineering & Management, e.g. Fensel 2001, Mullholland et al. 2000; Staab & Schnurr, 2000; Sure et al., 2000, Abecker et al. 1997 • Electronic Commerce, e.g. RosettaNet3 and Ontology.org4 • Information Retrieval and Information Integration, e.g. Kashyap, 1999; Mena et al., 1998; Wiederhold, 1992 • Intelligent Search Engines, e.g. WebKB (Martin et al. 2000), SHOE (Heflin & Hendler, 2000), OntoSeek (Guarino et al., 1999), Ontobroker (Decker et al., 1999) • Digital Libraries, e.g. Amann & Fundulaki, 1999 • Enhanced User Interfaces, e.g. (Kesseler, 1996), Inxight5 • Software Agents, e.g. OnTo-agents, FIPA, (Gluschko et al., 1999; Smith & Poulter, 1999) • Business Process Modeling, e.g. Decker et al., 1997; TOVE, 1995; Uschold et al., 1998
Steffen Staab (17) ISWeb – Informationssysteme & Semantic Web Overview Literature
S. Staab, R. Studer. Handbook on Ontologies. Springer, 2004.
Steffen Staab (18) ISWeb – Informationssysteme & Semantic Web Part II
Introduction to Semantic Web
Steffen Staab (1) ISWeb – Informationssysteme & Semantic Web Syntax is not sufficient
Andreas • Tel • E-Mail Steffen Staab (2) ISWeb – Informationssysteme & Semantic Web Information Convergence
• Convergence not just in devices, also in “information” – Your personal information (phone, PDA,…) Calendar, photo, home page, files… – Your “professional” life (laptop, desktop, … Grid) Web site, publications, files, databases, … – Your “community” contexts (Web) Hobbies, blogs, fanfic, social networks…
• The Web teaches us that people will work to share – How do we CREATE, SEARCH, and BROWSE in the non-text based parts of our lives?
Steffen Staab (3) ISWeb – Informationssysteme & Semantic Web WWW vs. Semantic Web
WWW := Semantic Web := Hypertext & Semantic Web Language/Data & Internet & Ontologies & Social Phenomenon Internet & Social Phenomenon
Steffen Staab (4) ISWeb – Informationssysteme & Semantic Web Let’s try XML XML is unspecific: n No predetermined vocabulary o No semantics for relationships
Ön& o must be specified upfront
Only possible in close cooperations – Small, reasonably stable group – Common interests or authorities Not possible in the Web or on a broad scale in general !
Steffen Staab (5) ISWeb – Informationssysteme & Semantic Web Meaning of Informationen: (or: what it means to be a computer)
name
education
CV work
private
Steffen Staab (6) ISWeb – Informationssysteme & Semantic Web XML ≠ Meaning, XML = Structure
< name ναµε >
<<εδυχατιον education >
< CVΧς >> <<ωορκ work >>
<<πριϖατε private >>
Steffen Staab (7) ISWeb – Informationssysteme & Semantic Web Some Principal Ideas
• URI – uniform resource identifiers • XML – common syntax • Interlinked Tim Berners- Lee, Weaving • Layers of semantics – the Web from database to knowledge base to proofs
Design principles of WWW applied to Semantics!!
Steffen Staab (8) ISWeb – Informationssysteme & Semantic Web TheSemanticWeb on oneSlide
cooperatesWithcooperatesWith Ontology rdfs:Domain rdfs:Range PersonPerson rdfs:subClass
EmployeeEmployee rdfs:subClass rdfs:subClass PostDocPostDoc ProfessorProfessor rdf:type rdf:type
Web page
Steffen Staab (9) URL http://www.deri.ie/~shaISWeb – Informationssysteme & Semantic Webhttp://www.uni-koblenz.de/~staab The Semantic Web - Inference
Nepomuk
swrc:project swrc:homepage
swrc:name swrc:cooperatesWith swrc:project Handschuh
swrc:affiliation
swrc:member swrc:member
DERI Visualization of a Logic Representation:
OWL,Steffen Staab F-Logic, (10) etc. ISWeb – Informationssysteme & Semantic Web The new Semantic Web Stack
Tim Berners-Lee, ISWC November 2005, http://www.w3.org/2005/Talks/1110-iswc-tbl/#(12) Trust Proof Logic framework
OWL Rules Signature
DLP bit of OWL/Rule Encryption SparQL RDF Schema RDF Core XML Namespaces URI Unicode
Steffen Staab (11) ISWeb – Informationssysteme & Semantic Web Knowledge Provisioning
Steffen Staab (12) ISWeb – Informationssysteme & Semantic Web Tools for markup...
PhotoStuff Demo
Steffen Staab (13) ISWeb – Informationssysteme & Semantic Web Semi-automatic
Steffen Staab (14) ISWeb – Informationssysteme & Semantic Web Not tied to specific domains
Visual VDE plug-in Shape Shape Descriptor launch selection erasure selection
Save Shape Color Descriptor Prototype selection extraction Instances
Domain Ontology Browser Selected region
Draw panel
M-OntoMat is publicly available http://acemedia.org/aceMedia/results/software/m-ontomat-annotizer.html Steffen Staab (15) ISWeb – Informationssysteme & Semantic Web Shared Workspace (Xarop + Screenshot)
Steffen Staab (16) ISWeb – Informationssysteme & Semantic Web Social networks: e.g. Friend of a Friend (FOAF)
• Say stuff about yourself (or others) in OWL files, link to who you “know”
Estimates of the number of Foaf users range from 2M-5M Steffen Staab (17) ISWeb – Informationssysteme & Semantic Web Using FOAF in other contexts
Jennifer Golbeck http://trust.mindswap.org
Steffen Staab (18) ISWeb – Informationssysteme & Semantic Web Get a B&N price (In Euros)
Steffen Staab (19) ISWeb – Informationssysteme & Semantic Web Of a particular book
Steffen Staab (20) ISWeb – Informationssysteme & Semantic Web In its German edition?
Steffen Staab (21) ISWeb – Informationssysteme & Semantic Web Steffen Staab (22) ISWeb – Informationssysteme & Semantic Web Now.
• RDF, RDFS and OWL are ready for prime time
– Designs are stable, implementations maturing • Major Research investment translating into application development and commercial spinoffs
– Adobe 6.0 embraces RDF – IBM releases tools, data and partnering – HP extending Jena to OWL – OWL Engines by Ontoprise GmbH, Network Inference, Racer GmbH – Ontoprise is a strategic partner for Oracle and Software AG – Proprietary OWL ontologies for vertical markets • c.f. pharmacology, HMO/health care, ... Soft drinks
Steffen Staab (23) ISWeb – Informationssysteme & Semantic Web Now: Plenty of annotations – unfortunately, not in the open • Taggings are daily practice: – Flickr, http://www.flickr.com/ – Delicious, http://del.icio.us/ – Cite-u-like, http://www.citeulike.org/ –Bibsonomy,… • Plenty of annotations – Dooyoo, E-pinions – Quipe, http://www.quipe.com/ – Froogle, http://froogle.google.com/ – Google Base, http://base.google.com/ –RSS – E-Science data curation, http://www.jisc.ac.uk/index.cfm?name=pub_escience – Semantic Wikis • Web 2.0 – would be easier with Semantic Web!
Steffen Staab (24) ISWeb – Informationssysteme & Semantic Web The Semantic Wave
YOU ARE HERE 2006
YOU ARE HERE 2003
(Berners-Lee, 03)
Steffen Staab (25) ISWeb – Informationssysteme & Semantic Web Semantic Technologies vs. Semantic Web Semantic Technologies Semantic Web • Used by „Early Adopters“ • Still „research-oriented“
•Mature • Currently: Used in Intranets – Deductive Databases (Research since early 80ies) • Currently: Used for internet – Description logics applications with simple (Research since late 70ies) ontologies (Dublin Core, RSS, PICS, FOAF,…) - Ontobroker (Research prototype since 1990; commercial since 1999) • Quite some way to go for full fledged success, initial take-up now by some focus groups • A lot of knowledge about integration with existing technology (databases, modelling, …)
Steffen Staab (26) ISWeb – Informationssysteme & Semantic Web Application areas for Semantic Technologies • Software engineering: conceptual approaches need semantic interchange language • Data description: – Databases in bioinformatics – Multimedia data (complementary to MPEG 7/21) • Data integration: data exchange benefits from semantic interchange language • „Plug n‘play“ for dynamic (not necessarily „automatic“!!!) business process configuration: needs rich semantic descriptions
Steffen Staab (27) ISWeb – Informationssysteme & Semantic Web Prospectives of Semantic Web or WWW vs. Semantic Web revisited
WWW := Semantic Web := Hypertext & Semantic Web Language/Data & Internet & Ontologies & Social Phenomenon Internet & Social Phenomenon
Without New and Without Social Phenomenon important Social Phenomenon paradigms at = Intranet their time, but = Semantic Data „less“ outreach Integration Steffen Staab (28) ISWeb – Informationssysteme & Semantic Web „Less“ vs „More“ Outreach
„Less“ equals a multi-billion dollar market „More“ equals a change as radical as triggered by the WWW
Steffen Staab (29) ISWeb – Informationssysteme & Semantic Web Overview Literature
Frank van Harmelen, Grigoris Antinou. Semantic Web Primer, MIT Press 2005.
Steffen Staab (30) ISWeb – Informationssysteme & Semantic Web Part III
Semantic Web Languages
Steffen Staab (1) ISWeb – Informationssysteme & Semantic Web RDF
Steffen Staab (2) ISWeb – Informationssysteme & Semantic Web RDF Data Model
• Resources – A resource is a thing you talk about (can reference) – Resources have URI’s – RDF definitions are itself Resources (linkage) • Properties – slots, defines relationship to other resources or atomic values • Statements –“Resource has Property with Value” – (Values can be resources or atomic XML data) • Similar to Frame Systems
Steffen Staab (3) ISWeb – Informationssysteme & Semantic Web A simple Example
• Statement – “Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila” • Structure – Resource (subject) http://www.w3.org/Home/Lassila – Property (predicate) http://www.schema.org/#Creator – Value (object) "Ora Lassila” • Directed graph s:Creator http://www.w3.org/Home/Lassila Ora Lassila
Steffen Staab (4) ISWeb – Informationssysteme & Semantic Web Another Example
• To add properties to Creator, point through a intermediate Resource.
http://www.w3.org/Home/Lassila
s:Creator
Person://fi/654645635
Name Email
Ora Lassila [email protected]
Steffen Staab (5) ISWeb – Informationssysteme & Semantic Web Collection Containers
• Multiple occurrences of the same PropertyType doesn’t establish a relation between the values – The Millers own a boat, a bike, and a TV set – The Millers need (a car or a truck) – (Sarah and Bob) bought a new car • RDF defines three special Resources: – Bag unordered values rdf:Bag – Sequence ordered values rdf:Seq – Alternative single value rdf:Alt • Core RDF does not enforce ‘set’ semantics amongst values
Steffen Staab (6) ISWeb – Informationssysteme & Semantic Web Example: Bag
The students in course 6.001 are Amy, Tim, /courses/6.001 John, Mary, Rdf:Bag rdf:type and Sue /Students/Amy students rdf:_1 rdf:_2 /Students/Tim
bagid1 rdf:_3 /Students/John rdf:_4
/Students/Mary rdf:_5
/Students/Sue
Steffen Staab (7) ISWeb – Informationssysteme & Semantic Web Example: Alternative • The source code for X11 may be found at ftp.x.org, ftp.cs.purdue.edu, or ftp.eu.net
http://x.org/package/X11 rdf:Alt rdf:type
rdf:_1 altid ftp.x.org rdf:_2
ftp.cs.purdue.edu rdf:_3
ftp.eu.net
Steffen Staab (8) ISWeb – Informationssysteme & Semantic Web Statements about Statements (Requirement 2: Dispute Statements)
• Making statements about statements requires a process for transforming them into Resources – subject the original referent – predicate the original property type – object the original value – type rdf:Statement
Steffen Staab (9) ISWeb – Informationssysteme & Semantic Web Example: Reification
• Ralph Swick believes that – the creator of the resource http://www.w3.org/Home/Lassila is Ora Lassila
http://www.w3.org/Home/Lassila s:Creator rdf:predicate s:Creator rdf:subject
genid1
rdf:type rdf:object b:believedBy rdf:Statement Ora Lassila Ralph Swick
Steffen Staab (10) ISWeb – Informationssysteme & Semantic Web RDF Syntax I
• Datamodel does not enforce particular syntax • Specification suggests many different syntaxes based on XML • General form: Subject (OID) Starts an RDF-Description
Resource (possibly another RDF-description) Properties
Steffen Staab (11) ISWeb – Informationssysteme & Semantic Web Resulting Graph
http://www.w3.org/Home/Lassila
s:Creator s:createdWith
Ora Lassila http://www.w3c.org/amaya
Steffen Staab (12) ISWeb – Informationssysteme & Semantic Web RDF Syntax II: Syntactic Varieties Typing Information Subject (OID) In-Element Property
Steffen Staab (19) ISWeb – Informationssysteme & Semantic Web Description Logics (Terminological Logics, DLs) • Fragments of FOL • Most often decidable • Moderately expressive • Stem from semantic networks • W3C Standard OWL DL corresponds to SHOIN(D)
Steffen Staab (20) ISWeb – Informationssysteme & Semantic Web DLs – general structure • DLs are a Family of logic-based formalism for knowledge representation • Special language characterized by: – Constructors to define complex concepts and roles based on simpler ones. – Set of axiom to express facts using concepts, roles and individuals.
• ALC is the smallest DL, which is propositionally closed: – ∧, ∨, ¬ are constructors, noted by u, t, ¬. – Quantors define how roles are to be interpreted:
Man u ∃hasChild.Female u ∃hasChild.Male u ∀hasChild.(Rich t Happy)
Steffen Staab (21) ISWeb – Informationssysteme & Semantic Web Further DL concepts and role constructors • Number restrictions (cardinality constraints) for roles: ≥3 hasChild, ·1hasMother
• Qualified number restrictions: ≥2 hasChild.Female, ·1 hasParent.Male
• Nominals (definition by extension): {Italy, France, Spain}
• Concrete domains (datatypes): hasAge.(≥21)
• Inverse roles: hasChild– ≡ hasParent • Transitive roles: hasAncestor* (descendant) • Role composition: hasParent.hasBrother (uncle)
Steffen Staab (22) ISWeb – Informationssysteme & Semantic Web DL Knowledge Bases
• DL Knowledge Bases consist of two parts (in general): – TBox: Axioms, describing the structure of a modelled domain (conceptual schema): • HappyFather ≡ Man u ∃hasChild.Female u … • Elephant v Animal u Large u Grey • transitive(hasAncestor)
– Abox: Axiome describing concrete situations (data, facts): • HappyFather(John) • hasChild(John, Mary)
• The distinction between TBox/ABox does not have a deep logical distinction … but it is common useful modelling practice.
Steffen Staab (23) ISWeb – Informationssysteme & Semantic Web General DL Architecture
Knowledge Base
Tbox (schema)
Man ≡ Human u Male Happy-Father ≡ Man u ∃ has-child.Female u …
Abox (data) Interface
Happy-Father(John)
has-child(John, Mary) Inference System
Steffen Staab (24) ISWeb – Informationssysteme & Semantic Web Knowledge modelling in OWL Example ontology and conclusion from http://owl.man.ac.uk/2003/why/latest/#2 • Also an example for OWL Abstract Syntax.
Namespace(a =
Steffen Staab (25) ISWeb – Informationssysteme & Semantic Web Knowledge modelling: examples Class(a:bus_driver complete intersectionOf(a:person restriction(a:drives someValuesFrom (a:bus)))) bus_driver ≡ person u ∃drives.bus Class(a:driver complete intersectionOf(a:person restriction(a:drives someValuesFrom (a:vehicle)))) driver ≡ person u ∃drives.vehicle Class(a:bus partial a:vehicle) bus v vehicle • A bus driver is a person that drives a bus. • A bus is a vehicle. • A bus driver drives a vehicle, so must be a driver. The subclass is inferred due to subclasses being used in existential quantification. Steffen Staab (26) ISWeb – Informationssysteme & Semantic Web Knowledge modelling: examples
Class(a:driver complete intersectionOf(a:person restriction(a:drives someValuesFrom (a:vehicle)))) driver ≡ person u ∃drives.vehicle
Class(a:driver partial a:adult) driver v adult
Class(a:grownup complete intersectionOf(a:adult a:person)) grownup ≡ adult u person • Drivers are defined as persons that drive cars (complete definition) • We also know that drivers are adults (partial definition) • So all drivers must be adult persons (e.g. grownups)
An example of axioms being used to assert additional necessary information about a class. We do not need to know that a driver is an adult in order to recognize one, but once we have recognized a driver, we know that they must be adult.
Steffen Staab (27) ISWeb – Informationssysteme & Semantic Web ∃partof.animal t animal ≡/ plant t ∃partof.plant Knowledge modelling: Examples Class(a:cow partial a:vegetarian) DisjointClasses(unionOf(restriction(a:part_of someValuesFrom (a:animal)) a:animal) unionOf(a:plant restriction(a:part_of someValuesFrom (a:plant)))) Class(a:vegetarian complete intersectionOf( restriction(a:eats allValuesFrom (complementOf(restriction(a:part_of someValuesFrom (a:animal))))) restriction(a:eats allValuesFrom (complementOf(a:animal))) a:animal)) Class(a:mad_cow complete intersectionOf(a:cow restriction(a:eats someValuesFrom (intersectionOf(restriction(a:part_of someValuesFrom (a:sheep)) a:brain))))) Class(a:sheep partial a:animal restriction(a:eats allValuesFrom (a:grass)))
• Cows are naturally vegetarians • A mad cow is one that has been eating sheeps brains • Sheep are animals Thus a mad cow has been eating part of an animal, which is inconsistent with the definition of a vegetarian Steffen Staab (28) ISWeb – Informationssysteme & Semantic Web Knowledge modelling: Example
Individual(a:Walt type(a:person) value(a:has_pet a:Huey) value(a:has_pet a:Louie) value(a:has_pet a:Dewey)) Individual(a:Huey type(a:duck)) Individual(a:Dewey type(a:duck)) Individual(a:Louie type(a:duck)) DifferentIndividuals(a:Huey a:Dewey a:Louie) Class(a:animal_lover complete intersectionOf(a:person restriction(a:has_pet minCardinality(3)))) ObjectProperty(a:has_pet domain(a:person) range(a:animal))
• Walt has pets Huey, Dewey and Louie. • Huey, Dewey and Louie are all distinct individuals. • Walt has at least three pets and is thus an animal lover.
Note that in this case, we don’t actually need to include person in the definition of animal lover (as the domain restriction will allow us to draw this inference).
Steffen Staab (29) ISWeb – Informationssysteme & Semantic Web Knowledge modelling: Some Research Challenges • Concluding with – uncertainty (fuzzy, probabilistic) – Inkonsistencies (paraconsistent) –Rules – Further AI-Paradigms (nonmonotonic reasoning, preferences …) • Maintenance (updates, infrastructure, etc) • Scalability of reasoning •…
Steffen Staab (30) ISWeb – Informationssysteme & Semantic Web Application Scenario: Semantic Inference
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -1- Audi: Semantic Testcar Configuration
Background
Complex dependencies decrease the speed of development Knowledge is distributed over different departments
Goal
Design of a Semantic Guide for capturing the dependencies Configuration of components Integration into existing order system Engineers can concentrate on creative efforts
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -2- Inference: to conclude implicit facts
Testengine is ready to test in a car
Rule 2: All parts have to be tested and released
Testengine has been tested and released
Fit of Testengine and Chassis 17
Rule 1: A Chassis has to be suited for the power of the engine
Chassis 17 is suited for 110 KW
Testengine has 104 KW
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -3- Application Scenario: Semantic Data Integration and Search
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -4- Knowledge Management for your Projects
Users keep their established ? ? software tools ? A knowledge model (ontology) both integrates and structures Object the information The ontology is enriched with Person Topic Document specific expertise The ontology empowers a Decision Metho- Technician Content Application Maker dology context-aware and easy-to-use search and navigationsystem All information to stay in their original place
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -5- Deutsche Post IT Solutions (DHL group)
A Companywide Search- & Integration-Project
Goals Improve the effectivity and quality of work
Integrated serach over multiple sources Usage of an ontology to improve results Simple interface Proof of Concept for SemanticWeb technology for whole group
Facts Users: 1000 people project duration: 2 months
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -6- Editorial Process for Ontology Evolution
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -7- Application Scenario: Semantic Data Integration - II
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -8- Integration Problems
Languages Name Conflict Value Conflict Structure Conflict Duplicates Missing Information Multiple Interfaces
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -9- Integration Problems
Languages Name Conflict Value Conflict Structure Conflict Duplicates Missing Information Multiple Interfaces
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -10- Integration Problems
Languages Name Conflict Value Conflict Structure Conflict Duplicates Missing Information Multiple Interfaces
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -11- Integration Problems
Languages Name Conflict Value Conflict Structure Conflict Duplicates Missing Information Multiple Interfaces
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -12- Integration Problems
Languages Name Conflict Value Conflict Structure Conflict Duplicates Missing Information Multiple Interfaces
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -13- Integration Problems
Languages Name Conflict Value Conflict Structure Conflict Duplicates Missing Information Multiple Interfaces
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -14- VielfältigeIntegration Integrationsprobleme Problems
Languages Name Conflict Value Conflict Structure Conflict Duplicates Missing Information Multiple Interfaces
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -15- Import and Mapping of DB-Structures
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -16- Application Scenario: Intelligent Question Answering
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -17- Vulcan Inc: OntoBroker passes Advanced Placement Test
Background • Development of a Digital Aristotle • Phase 1 successfully closed in 2003 • Phase 2 since January 2004
Functions • Capturing of extensive set of chemical knowledge • System passed the „Advanced Placement Test“ • Query is answered and answer is explained
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -18- Ontobroker™ passed the Advanced Placement Test! Correct Answers Correct Explanations
Performance CYCORP 1650 Minutes Student 240 Minutes Stanford Research 38 Minutes Ontoprise 9 Minutes
www.ontoprise.de
© 2006 ontoprise GmbH Home | Menu | Technology | References | End -19- Semantic Web Applications – on the Internet
Steffen Staab (1) ISWeb – Informationssysteme & Semantic Web [IEEE Data Engineering, 2002] Conceptual architecture for semantic portal
Presentation RDF HTML HTML Navi- Query Output Page Form Gation API & Use (HTML)
Presen- Presen- Navi- tation tation Input gation View Selection View ... View View View
Common OntologIE Semantics Datenbank
Integration
Common X(HT)ML Rel-DB RDF API K-Edutella data model Wrapper Wrapper ... Wrapper FileS RDF Sources Relational ... DatabaseSteffen Staab (2) P2P ISWeb – Informationssysteme & Semantic Web [CRIS 2002] OntoWeb-Portal { } http://www.ontoweb.org
Participating Siten
...
{ }
Content Participating Site2 Syndication Service { }
Annotated Ontology Generated Web Pages Participating Site1 Content Objects Browse & Query Front End OntoWeb Community EU IST Projekt
OntoWeb Steffen Staab (3) ISWeb – Informationssysteme & Semantic Web P2P Application: Bibster
Steffen Staab (4) ISWeb – Informationssysteme & Semantic Web Ontologies & Text
Part V
Steffen Staab (5) ISWeb – Informationssysteme & Semantic Web OL from Text as Reverse Engineering Shared World Model
Reverse Engineering
Write
Steffen Staab (6) ISWeb – Informationssysteme & Semantic Web Some pre-History of Ontology Learning • AI: Knowledge Acquisition
– Since 60s/70s: Semantic Network Extraction and similar for Story Understanding • Systems: e.g. MARGIE (Schank et al., 1973), LUNAR (Woods, 1973)
• NLP: Lexical Knowledge Extraction
– 70s/80s: Extraction of Lexical Semantic Representations from Machine Readable Dictionaries • Systems: e.g. ACQUILEX LKB (Copestake et al.)
– 80s/90s: Extraction of Semantic Lexicons from Corpora for Information Extraction Systems • Systems: e.g. AutoSlog (Riloff, 1993), CRYSTAL (Soderland et al., 1995)
• IR: Thesaurus Extraction
– Since 60s: Extraction of Keywords, Thesauri and Controlled Vocabularies • Based on construction and use of thesauri in IR (Sparck-Jones, 1966/1986, 1971) • Systems: e.g. Sextant (Grefenstette, 1992), DR-Link (Liddy, 1994)
Steffen Staab (7) ISWeb – Informationssysteme & Semantic Web Some Current Work on Ontology Learning from Text
Term Extraction • Statistical Analysis • Patterns • (Shallow) Linguistic Parsing • Term Disambiguation & Compositional Interpretation • Combinations
Taxonomy Extraction • Statistical Analysis & Clustering (e.g. FCA) • Patterns • (Shallow) Linguistic Parsing • WordNet • Combinations
Relation Extraction • Anonymous Relations (e.g. with Association Rules) • Named Relations (Linguistic Parsing) • (Linguistic) Compound Analysis • Web Mining, Social Network Analysis • Combinations
Relation Label Extraction • Extension of Association Rules Algorithm
Definition Extraction • (Linguistic) Compound Analysis (incl. WordNet)
Steffen Staab (8) ISWeb – Informationssysteme & Semantic Web Some Current Work on Ontology Learning from Text
AIFB – TextToOnto (Maedche and Staab, 2000; Cimiano et al., 2005) – Term Extraction and Taxonomy Extraction • Statistical Analysis • Conceptual Clustering (FCA), Patterns, WordNet (+ Combination) – Relation Extraction • Anonymous Relations (Association Rules) • Named Relations (Subcategorization Frames)
CNTS Univ. Antwerpen, VUB (Reinberger et al., 2004) – Concept Formation + Relation Extraction • Shallow Linguistic Parsing • Clustering
DFKI – OntoLT (Buitelaar et al., 2004), RelExt (Schutz and Buitelaar, 2005) – Term Extraction • Shallow Linguistic Parsing & Statistical Analysis – Taxonomy and Relation Extraction • Shallow Linguistic Parsing & manually defined mapping rules • Named Relations (Subcategorization Frames)
Steffen Staab (9) ISWeb – Informationssysteme & Semantic Web Some Current Work on Ontology Learning from Text
Economic Univ., Prague (Kavalec and Svatek, 2005) – Relation Label Extraction • Extension of Association Rules Algorithm
Free Univ. Amsterdam (Sabou, 2005) – Term and Taxonomy Extraction (for Web Service Ontologies) • Shallow Linguistic Analysis & Patterns
Jozef Stefan Inst., Ljubljana -- OntoGen (Fortuna et al., 2005) – Term and Taxonomy Extraction • Statistical Analysis & Clustering – Relations • Web Mining, Social Network Analysis
Univ. Paris -- ASIUM (Faure and Nedellec, 1998) – Taxonomy Extraction (& Subcategorization Frames) • Shallow Linguistic Parsing • Clustering
Steffen Staab (10) ISWeb – Informationssysteme & Semantic Web Some Current Work on Ontology Learning from Text
Univ. Rome – OntoLearn (Navigli and Velardi, 2004; Velardi et al., 2005) – Term Extraction and Interpretation • Shallow Linguistic Parsing &Term Disambiguation & Compositional Interpretation – Relations • Classification of the relation between terms in a compound into predefined set of (thematic) relations – Definitions • Rules for Gloss Generation
Univ. of Zürich (Rinaldi et al., 2005) – Term and Taxonomy Extraction • Shallow Linguistic Analysis & Patterns
Steffen Staab (11) ISWeb – Informationssysteme & Semantic Web Ontology Learning Layer Cake
∀x, y (sufferFrom (x, y) → ill(x)) Rules & Axioms cure(dom:DOCTOR,range:DISEASE) Relations is_a(DOCTOR,PERSON) Taxonomy
DISEASE:=
{disease, illness, Krankheit} (Multilingual) Synonyms disease, illness, hospital Terms
Introduced in: Philipp Cimiano, PhD Thesis University of Karlsruhe, forthcoming / also available as Springer book, end of 2006
Steffen Staab (12) ISWeb – Informationssysteme & Semantic Web Ontology Learning Layer Cake
∀x, y (sufferFrom (x, y) → ill(x)) Rules & Axioms cure(dom:DOCTOR,range:DISEASE) Relations is_a(DOCTOR,PERSON) Taxonomy
DISEASE:=
{disease, illness, Krankheit} (Multilingual) Synonyms disease, illness, hospital Terms
Steffen Staab (13) ISWeb – Informationssysteme & Semantic Web Terms Terms are at the basis of the ontology learning process
– Terms express more or less complex semantic units – But what is a term?
Huge Selection of Top Brand Computer Terminals Available for Immediate Delivery Because Vecmar carries such a large inventory of high-quality computer terminals, including: ADDS terminals, Boundless terminals, DEC terminals, HP terminals, IBM terminals, LINK terminals, NCR terminals and Wyse terminals, your order can often ship same day. Every computer terminal shipped to you is protected with careful packing, including thick boxes. All of our shipping options - including international - are available through major carriers.
– Extracted term candidates (phrases)
- computer - terminal - computer terminal - ? high-quality computer terminal - ? top brand computer terminal - ? HP terminal, DEC terminal, …
Steffen Staab (14) ISWeb – Informationssysteme & Semantic Web Term Extraction Determine most relevant phrases as terms
– Linguistic Methods • Rules over linguistically analyzed text – Linguistic analysis – Part-of-Speech Tagging, Morphological Analysis, … – Extract patterns – Adjective-Noun, Noun-Noun, Adj-Noun-Noun, … – Ignore Names (DEC, HP, …), Certain Adjectives (quality, top, …), etc.
– Statistical Methods • Co-occurrence (collocation) analysis for term extraction within the corpus • Comparison of frequencies between domain and general corpora – Computer Terminal will be specific to the Computer domain – Dining Table will be less specific to the Computer domain – Hybrid Methods • Linguistic rules to extract term candidates • Statistical (pre- or post-) filtering
Steffen Staab (15) ISWeb – Informationssysteme & Semantic Web Ontology Learning Layer Cake
∀x, y (sufferFrom (x, y) → ill(x)) Rules & Axioms cure(dom:DOCTOR,range:DISEASE) Relations is_a(DOCTOR,PERSON) Taxonomy
DISEASE:=
{disease, illness, Krankheit} (Multilingual) Synonyms disease, illness, hospital Terms
Steffen Staab (16) ISWeb – Informationssysteme & Semantic Web Extraction of Synonyms
Term Classification and Clustering
– Classification • Classifying terms to existing class systems, e.g., by extending WordNet (with SynSets corresponding to classes)
– Clustering • Clusters according to similar distributions, e.g., by measuring co-occurrence between terms
Steffen Staab (17) ISWeb – Informationssysteme & Semantic Web Ontology Learning Layer Cake
∀x, y (sufferFrom(x, y) → ill(x)) Rules & Axioms cure(dom:DOCTOR,range:DISEASE) Relations is_a(DOCTOR,PERSON) Taxonomy
DISEASE:=
{disease, illness, Krankheit} (Multilingual) Synonyms disease, illness, hospital Terms
Steffen Staab (18) ISWeb – Informationssysteme & Semantic Web The Semiotic Triangle Ogden & Richards, 1923
• based on Structural Linguistics studies (de Saussure, 1916)
• adopted in Knowledge Representation (e.g. Sowa, 1984)
Steffen Staab (19) ISWeb – Informationssysteme & Semantic Web Concepts: Intension, Extension, Lexicon A term may indicate a concept, if we can define its – Intension • (in)formal definition of the set of objects that this concept describes – a disease is an impairment of health or a condition of abnormal functioning
– Extension • a set of objects (instances) that the definition of this concept describes – influenza, cancer, heart disease, …
Discussion: what is an instance? - ‘heart disease’ or ‘my uncle’s heart disease’
– Lexical Realizations • the term itself and its multilingual synonyms – disease, illness, Krankheit, maladie, …
Discussion: synonyms vs. instances – ‘disease’, ‘heart disease’, ‘cancer’, …
Steffen Staab (20) ISWeb – Informationssysteme & Semantic Web Concepts: Intension
Extraction of a Definition for a Concept from Text
– Informal Definition • e.g., a gloss for the concept as used in WordNet • OntoLearn (Navigli and Velardi, 2004; Velardi et al., 2005) uses natural language generation to compositionally build up a WordNet gloss for automatically extracted concepts – ‘Integration Strategy’ : “strategy for the integration of …”
– Formal Definition • e.g., a logical form that defines all formal constraints on class membership • Inductive Logic Programming, Formal Concept Analysis, …
Steffen Staab (21) ISWeb – Informationssysteme & Semantic Web Concepts: Extension
Extraction of Instances for a Concept from Text
– Commonly referred to as Ontology Population – Relates to Knowledge Markup (Semantic Metadata) – Uses Named-Entity Recognition and Information Extraction
– Instances can be:
• Names for objects, e.g. – Person, Organization, Country, City, …
• Event instances (with participant and property instances), e.g. – Football Match (with Teams, Players, Officials, ...) – Disease (with Patient-Name, Symptoms, Date, …) Steffen Staab (22) ISWeb – Informationssysteme & Semantic Web Concepts: Lexicon Extraction of Synonyms and Translations for a Concept from Text – (Multilingual) Term Extraction – see previous slides – Representation of Lexical Information in Ontologies
rdfs:Class rdf:type meta- URI rdfs:subClassOf property ... classes Legend feat:ClassWithFeats
feat:ClassWithFeats rdfs:Class o:StorageProduct if:ImgFeat
rdfs: feat:ClassWithFeats subClassOf feat:ClassWithFeats rdfs:Class classes o:Cupboard o:Refrigerator lf:LingFeat feat:lingFeat feat:imgFeat ... feat:lingFeat
lf:LingFeat lf:LingFeat if:ImgFeat lf:lang “de” lf:lang “de” if:color “#111111” lf:term “Schrank” lf:term “Kühlschrank” if:shape “cuboid” instances lf:morph lf:morph lf:texture “&keypatchSet_223” lf:context ... lf:context ...
lf:Morph ... lf:head “Schrank” lf:pos “noun”
Steffen Staab (23) ISWeb – Informationssysteme & Semantic Web The Mathematical Definition of an Ontology [Stumme et al.; abbrev. from Cimiano-06] • Structure: C := (C, – – L-Axiom System: Arbitrary Axioms (may include patterns) Steffen Staab (24) ISWeb – Informationssysteme & Semantic Web Lexicon Def: A Lexicon for an ontology is a structure Lex:={SC,SR,RefC,RefR} SC,SR are called signs for concepts and relations, respectively. RefC,RefR, are binary relations denoting lexical references for concepts and relations, respectively. Example: RefC(„car“)={car-concept1,car-concept2} RefC(„automobile“)={car-concept1} -1 RefC (car-concept1)={„car“, „automobile“} Steffen Staab (25) ISWeb – Informationssysteme & Semantic Web Ontology Learning Layer Cake ∀x, y (sufferFrom (x, y) → ill(x)) Rules & Axioms cure(dom:DOCTOR,range:DISEASE) Relations is_a(DOCTOR,PERSON) Taxonomy DISEASE:= {disease, illness, Krankheit} (Multilingual) Synonyms disease, illness, hospital Terms Steffen Staab (26) ISWeb – Informationssysteme & Semantic Web Distributional Hypothesis & Vector Space Model • Harris, 1986 – „Words are (semantically) similar to the extent to which they share similar words“ • Firth, 1957 – „You shall know a word by the company it keeps“ • Idea: collect context information and represent it as a vector: book_obj rent_obj drive_obj ride_obj join_obj apartment X X car X X X motor-bike X X X X excursion X X trip X X • compute similarity among vectors wrt. a measure Steffen Staab (27) ISWeb – Informationssysteme & Semantic Web Context Features • Four-grams [Schuetze 93] • Word-windows [Grefenstette 92] • Predicate-Argument relations (every man loves a woman) Modifier Relations (fast car, the hood of the car) – [Grefenstette 92, Cimiano 04b, Gasperin et al. 03] • Appositions (Ferrari, the fastest car in the world) – [Hahn & Schnattinger 98, Caraballo 99] • Coordination (ladies and gentlemen) – [Caraballo 99, Dorow and Widdows 03] Steffen Staab (28) ISWeb – Informationssysteme & Semantic Web Overall Process Or other clustering mechanism Steffen Staab (29) ISWeb – Informationssysteme & Semantic Web Using Syntactic Surface Dependencies Mopti is the biggest city along the Niger with one of the most vibrant ports and a large bustling market. Mopti has a traditional ambience that other towns seem to have lost. It is also the center of the local tourist industry and suffers from hard-sell overload. The nearby junction towns of Gao and San offer nice views over the Niger’s delta. city: biggest(1) ambience: traditional(1) center: of_tourist_industry(1) junction town: nearby(1) market: bustling(1) port: vibrant(1) overload:suffer_from(1) tourist industry: center_of(1), local(1) town: seem_subj(1) view: nice(1), offer_obj(1) Steffen Staab (30) ISWeb – Informationssysteme & Semantic Web Context Extraction Process • extract syntactic dependencies from text ⇒ verb/object, verb/subject, verb/PP relations ⇒ car: drive_obj, crash_subj, sit_in, … s crashed_subj(cars) sit_in(car) dp vp sat_in(car) crash_subj(car) drove_obj(car) drive_obj(car) vdp LoPar tgrep lemmatization Steffen Staab (31) ISWeb – Informationssysteme & Semantic Web Weighting • Observation: – output of the parser can be erroneous – not all attribute/object pairs are significant • Conditional Probability: P(n | varg ) • Consider attribute/object pairs with weight over threshold t Steffen Staab (32) ISWeb – Informationssysteme & Semantic Web Set Theoretical & Probabilistic Clustering bookable rentable drivable ridable joinable • Set theoretical apartment X X – Formal Concept Analysis car X X X motor-bike X X X X [Ganter and Wille 1999] excursion X X trip X X Steffen Staab (33) ISWeb – Informationssysteme & Semantic Web Tourism Formal Context bookable rentable driveable rideable joinable appartment X X car X X X motor-bike X X X X excursion X X trip X X Steffen Staab (34) ISWeb – Informationssysteme & Semantic Web Tourism Lattice Steffen Staab (35) ISWeb – Informationssysteme & Semantic Web Concept Hierarchy bookable rentable joinable driveable appartment excursion trip rideable car motor-bike Steffen Staab (36) ISWeb – Informationssysteme & Semantic Web Compacting the hierarchy bookable rentable joinable driveable appartment excursion trip motor-bike car Steffen Staab (37) ISWeb – Informationssysteme & Semantic Web Evaluation - Data Sets • Tourism (118 Mio. tokens): – http://www.all-in-all.de/english – http://www.lonelyplanet.com – British National Corpus (BNC) – handcrafted tourism ontology (289 concepts) • Finance (185 Mio. tokens): – Reuters news from 1987 – GETESS finance ontology (1178 concepts) Steffen Staab (38) ISWeb – Informationssysteme & Semantic Web Precision/Recall/F-Measure FCA (Tourism) 1,2 1 0,8 Prec 0,6 Recall F 0,4 0,2 0 0 0,2 0,4 0,6 0,8 1 threshold t Steffen Staab (39) ISWeb – Informationssysteme & Semantic Web Lexical Recall, F‘ FCA (Tourism) 0,5 0,45 0,4 0,35 0,3 F 0,25 LR 0,2 F' 0,15 0,1 0,05 0 0 0,2 0,4 0,6 0,8 1 threshold t Steffen Staab (40) ISWeb – Informationssysteme & Semantic Web Comparison (Tourism, F‘) Comparison (Tourism) 0,5 0,45 0,4 0,35 FCA 0,3 Complete Linkage 0,25 Average Linkage 0,2 Single Linkage 0,15 Bi-Section-Kmeans 0,1 0,05 0 0 0,2 0,4 0,6 0,8 1 threshold t Steffen Staab (41) ISWeb – Informationssysteme & Semantic Web Comparison (Finance, F‘) Comparison (Finance) 0,45 0,4 0,35 FCA 0,3 Complete-Linkage 0,25 Average Linkage 0,2 Single Linkage 0,15 Bi-Section-Kmeans 0,1 0,05 0 0 0,2 0,4 0,6 0,8 1 threshold t Steffen Staab (42) ISWeb – Informationssysteme & Semantic Web Clustering – Comparison F-Measure Worst Case Understandability Time Complexity FCA 43.81/41.02% O(2n) Good (pract. better!) Agglomerative 36.78/33.35% O(n2 log(n)) Fair Clustering 36.55/32.92% O(n2) 38.57/32.15% O(n2) Divisive 36.42/32.77% O(n2) Weak-Fair Clustering Steffen Staab (43) ISWeb – Informationssysteme & Semantic Web Problem 1: Labeling of Clusters • Caraballo’s Method [1999]: – Agglomerative Clustering – Labeling Clusters with hypernyms derived from Hearst patterns – Removing unlabeled concepts thus compacting the hierarchy • Evaluation: select 20 nouns with at least 20 hypernyms and present them to human judges with the 3 best hypernyms for each •Results: – Best Hypernym (33% (Majority) / 39% (Any) – Any Hypernym (47.5% (Majority) / 60.5% (Any)) Steffen Staab (44) ISWeb – Informationssysteme & Semantic Web Problem 2: Spurious Similarities • Guided Clustering [Cimiano 2005c]: – Integrate a externally derived hypernym oracle into the agglomerative clustering algorithm – Two terms are only clustered if they have a common hypernym according to the oracle – Label the cluster with the common hypernym ⇒Demonstrably better hierarchies ⇒Labels for the cluster ⇒Reuse techniques from Clustering with constraints! Steffen Staab (45) ISWeb – Informationssysteme & Semantic Web Conclusion about Comparison • FCA is an interesting alternative to similarity-based clustering approaches – high traceability due to intensional description of clusters – Problem: worst case exponential in the size of the formal context – But: Zipfian distribution of attributes Steffen Staab (46) ISWeb – Informationssysteme & Semantic Web Using Ontologies with Text Retrieval Steffen Staab (47) ISWeb – Informationssysteme & Semantic Web Using Ontologies Ontologies as: • background knowledge for text clustering and classification • basis for recommender systems • background knowledge in ILP • knowledge for models in Statistical Relational Learning Steffen Staab (48) ISWeb – Informationssysteme & Semantic Web Text Clustering & Classification Approaches Documents Bag of Words oman has granded … Obj1 2 2 1 … Obj2 1 1 0 … Obj300 2 … Obj40 0 2 … background knowledge clustering/ classification algorithm Steffen Staab (49) ISWeb – Informationssysteme & Semantic Web Text Clustering & Classification Approaches Documents Bag of Words Dok 17892 crude ======Oman 2 Oman has granted term crude oil has 1 customers retroactive discounts from granted 1 official prices of 30 to 38 cents per barrel term 1 on liftings made during February, March crude 1 and April, the weekly newsletter Middle oil 2 East Economic Survey (MEES) said. customers 1 MEES said the price adjustments, arrived retroactive 1 at through negotiations between the discounts 1 Omani oil ministry and companies ...... concerned, are designed to compensate for the difference between market- related prices and the official price of 17.63 dlrs per barrel adopted byFurther non- preprocessing steps OPEC Oman since February. -Stopwords REUTER -Stemming Steffen Staab (50) ISWeb – Informationssysteme & Semantic Web WordNet as an example and ontology Root entity Strategies: something all, first, context substance physical object 109377 Concepts chemical artifact (synsets) compound covering bless cover organic Use of superconcepts compound coating (Hypernyms in Wordnet) • Exploit more generalized concepts lipid paint oil, •anoint e.g.: chemicalcover compound with oil is the oil 3rd superconcept of oil oil paint crude oil oil color 144684 lexicallexical entries EN:oilSteffen Staab (51) entries ISWeb – InformationssystemeEN:anoint & Semantic WebEN:inunct Ontology-based representation Oman 1 Oman 1 Oman 1 has 1 granted 1 granted 1 granted 1 term 1 term 1 term 1 (C) term 1 (C) term 1 crude 1 crude 1 crude 1 oil 1 (C) crude 1 (C) crude 1 customers 1 oil 1 oil 1 retroactive 1 (C) oil 1 (C) oil 1 discounts 1 customer 1 (C) lipid 1 ...... (C) customer 1 (C) compound 1 ...... 1 2 3 Steffen Staab (52) ISWeb – Informationssystemestrategy: add& Semantic Web Evaluation parameter • min 15, max 100, 2619 documents of the reuters corpus CLUSTERCOUNT60 EXAMPLE100 MINCOUNT 15 • clusterEvaluation k = 60, with BiSec-KMeans of Text Clustering avgMittelwert - purity - PURITY 0,650 0,618 0,616 0,600 0,570 0,550 0,500 WEIGHT PRUNE 0,450 tfidf - 30 without - 30 0,400 0,350 0,300 add repl add only repl add only repl add only repl add only repl add only repl add only integrat. context context first all context first all disambig. 00 5depth false true backgro.. Steffen Staab (53) ISWebONTO – InformationssystemeHYPDEPTH HYPDIS HYPINT& Semantic Web Evaluation: OHSUMED Classification Results Top 50 classes with WordNet and AdaBoost Steffen Staab (54) ISWeb – Informationssysteme & Semantic Web Combine FCA & Text- clustering 1. preprocess Reuters documents and enrich them with background knowledge (Wordnet) 2. calculate a reasonable number k (100) of clusters with BiSec-k-Means using cosine similarity 3. extract a description for all clusters 4. relate clusters (objects) with FCA 5. use the visualization of the concept lattice for better understanding Steffen Staab (55) ISWeb – Informationssysteme & Semantic Web Explaining Clustering Results with FCA refiner oil compound, chemical compound chain of concepts with increasing specificity Steffen Staab (56) ISWeb – Informationssysteme & Semantic Web Explaining Clustering Results with FCA Crude oil barrel Steffen Staab (57) ISWeb – Informationssysteme & Semantic Web Explaining Clustering Results with FCA resin palm • Resulting concept lattice can also be interpreted as a concept hierarchy directly on the documents • all documents in one cluster obtain exactly Steffen Staab (58) ISWeb – Informationssysteme &the Semantic same Web description Conclusion: Ontologies + Text • Ontologies may be discovered as regularities unterlying some text • Ontologies improve access to text – By annotation (cf part 2) – By retrieval (this part) Steffen Staab (59) ISWeb – Informationssysteme & Semantic Web