Introduction to the

Bangalore, 21 February, 2007

Ivan Herman, W3C

2007-02-02 Ivan Herman Introduction

2007-02-02 Ivan Herman Towards a Semantic Web

The current Web represents information using natural language (English, Hungarian, Hindi, Kannada,…) graphics, multimedia, page layout Humans can process this easily can deduce facts from partial information can create mental associations are used to various sensory information (well, sort of… people with disabilities may have serious problems on the Web with rich media!)

2007-02-02 Ivan Herman Towards a Semantic Web

Tasks often require to combine data on the Web: hotel and travel information may come from different sites searches in different digital libraries etc. Again, humans combine these information easily even if different terminologies are used!

2007-02-02 Ivan Herman However…

However: machines are ignorant! partial information is unusable difficult to make sense from, e.g., an image drawing analogies automatically is difficult difficult to combine information is same as ? how to combine different XML hierarchies? …

2007-02-02 Ivan Herman Example: Searching

The best-known example… Google et al. are great, but there are too many false or missin e.g., if you search in for “yacht racing”, the America’s Cup will be far down in the ranking order adding (maybe application specific) descriptions to resources s

g hits

hould improve this

2007-02-02 Ivan Herman Example: Automatic Airline Reservation

Your automatic airline reservation knows about your preferences builds up using your past can combine the local knowledge with remote services: airline preferences dietary requirements calendaring etc It communicates with remote information (i.e., on the Web!) (M. Dertouzos: The Unfinished Revolution)

2007-02-02 Ivan Herman Example: Data(base) Integration

Databases are very different in structure, in content Lots of applications require managing several after company mergers combination of administrative data for e-Government biochemical, genetic, pharmaceutical research etc. Most of these data are now on the Web (though not necessarily public yet) The semantics of the data(bases) should be known (how this semantics is mapped on internal structures is immaterial)

2007-02-02 Ivan Herman Example: Digital Libraries

It is a bit like the search example It means catalogs on the Web librarians have known how to do that for centuries goal is to have this on the Web, World-wide extend it to multimedia data, too But it is more: software agents should also be librarians! help you in finding the right publications

2007-02-02 Ivan Herman Introductory Example

We will use a simplistic example to introduce the main Semantic Web concepts We take, as an example area, data integration

2007-02-02 Ivan Herman The Rough Structure of Data Integration

1. Map the various data onto an abstract data representation make the data independent of its internal representation… 2. Merge the resulting representations 3. Start making queries on the whole! queries that could not have been done on the individual data se

ts

2007-02-02 Ivan Herman A Simplifed Bookstore Data (Dataset “A”)

ID Author Title Publisher Year ISBN 0-00-651409-X id_xyz The Glass Palace id_qpr 2000

ID Name Home page id_xyz Amitav Ghosh http://www.amitavghosh.com/

ID Publisher Name City id_qpr Harper Collins London

2007-02-02 Ivan Herman 1st Step: Export Your Data as a Set of Relations

2007-02-02 Ivan Herman Some Notes on the Exporting the Data

Relations form a graph the nodes refer to the “real” data or contain some literal how the graph is represented in machine is immaterial for now Data export does not necessarily mean physical conversion of the data relations can be generated on-the-fly at query time via SQL “bridges” scraping HTML pages extracting data from Excel sheets etc. One can export part of the data

2007-02-02 Ivan Herman Another Bookstore Data (dataset “F”)

ID Titre Auteur Traducteur Original ISBN 2020386682 Le Palais des miroirs i_abc i_qrs ISBN 0-00-651409-X

ID Nom i_abc Amitav Ghosh i_qrs Christiane Besse

2007-02-02 Ivan Herman 2nd Step: Export Your Second Set of Data

2007-02-02 Ivan Herman 3rd Step: Start Merging Your Data

2007-02-02 Ivan Herman 3rd Step: Start Merging Your Data (cont.)

2007-02-02 Ivan Herman 3rd Step: Merge Identical Resources

2007-02-02 Ivan Herman Start Making Queries…

User of data “F” can now ask queries like: « donnes-moi le titre de l’original » (ie: “give me the title of the original”) This information is not in the dataset “F”… …but can be automatically retrieved by merging with dataset “A”!

2007-02-02 Ivan Herman However, More Can Be Achieved…

We “feel” that a:author and f:auteur should be the same But an automatic merge doest not know that! Let us add some extra information to the merged data: a:author same as f:auteur both identify a “Person”: a term that a community may have already defined: a “Person” is uniquely identified by his/her name and, say, homepage or email it can be used as a “category” for certain type of resources

2007-02-02 Ivan Herman 3rd Step Revisited: Use the Extra Knowledge

2007-02-02 Ivan Herman Start Making Richer Queries!

User of dataset “F” can now query: « donnes-moi la page d’acceuil de l’auteur de l’original » (ie, “give me the home page of the original’s author”) The data is not in dataset “F”… …but was made available by: merging datasets “A” and “F” adding three simple extra statements as an extra knowledge using existing terminologies as part of that extra knoweledge

2007-02-02 Ivan Herman Combine With Different Datasets

Using, e.g., the “Person”, the dataset can be combined with other sources For example, data in Wikipedia can be extracted using simple (e.g., XSLT) tools there is an active development to add some simple semantic “tag we tacitly presuppose their existence in our example…

” to wikipedia entries

2007-02-02 Ivan Herman Merge with Wikipedia Data

2007-02-02 Ivan Herman Is That Surprising?

Maybe but, in fact, no… What happened via automatic means is done all the time, every day by the users of the Web! The difference: a bit of extra rigor (e.g., naming the relationships) is necessary so that machines could do this, too

2007-02-02 Ivan Herman What Did We Do?

We combined different datasets all may be of different origin somewhere on the web all may have different formats (mysql, excel sheet, XHTML, etc) all may have different names for relations (e.g., multilingual) We could combine the data because some URI-s were identical (the ISBN-s in this case) We could add some simple additional knowledge, also using common terminologies that a community has produced As a result, new relations could be found and retrieved

2007-02-02 Ivan Herman It Could Become Even More Powerful

The added extra knowledge could be much more complex to the merged datasets e.g., a full classification of various type of library data, ty etc) geographical information information on inventories, prices etc. This is where ontologies, extra rules, etc, may come in Even more powerful queries can be asked as a resultpes of books (literature or not, fiction, poetry,

2007-02-02 Ivan Herman What did we do? (cont)

2007-02-02 Ivan Herman The Abstraction Pays Off Because…

… the graph representation is independent on the exact structures in, say, a relational … a change in local database schemas, XHTML structures, etc, do not affect the whole, only the “export” step “schema independence” … new data, new connections can be added seamlessly, regardless of the structure of other datasources

2007-02-02 Ivan Herman So Where is the Semantic Web?

The Semantic Web provides technologies to make such integration (hopefully you get a full picture at the end of the tutorial…)

possible!

2007-02-02 Ivan Herman Basic RDF

2007-02-02 Ivan Herman RDF Triples

Let us begin to formalize what we did! we “connected” the data… but a simple connection is not enough… it should be named someh hence the RDF Triples: a labelled connection between two resour

ow ces

2007-02-02 Ivan Herman RDF Triples (cont.)

An RDF Triple (s,p,o) is such that: “s”, “p” are URI-s, ie, resources on the Web; “o” is a URI or a conceptually: “p” connects, or relates the “s” and ”o” note that we use URI-s for naming: i.e., we can use http://www.example.org/original here is the complete triple:

(, , )

RDF literal is a general model for such triples (with machine readable formats like RDF/XML, , n3, RXR, …) … and that’s it! (simple, isn't it? )

2007-02-02 Ivan Herman RDF Triples (cont.)

RDF Triples are also referred to as “triplets”, or “statement” The s, p, o resources are also referred to as “subject”, “predicate”, ”object”, or “subject”, ”property”, ”object” Resources can use any URI; e.g., it can denote an element within an XML file on the Web, not only a “full” resource, e.g.: http://www.example.org/file.xml#xpointer(id('home')) http://www.example.org/file.html#home RDF Triples form a directed, labelled graph (best way to think about them!)

2007-02-02 Ivan Herman A Simple RDF Example (in RDF/XML)

Le palais des mirroirs

(Note: namespaces are used to simplify the URI-s)

2007-02-02 Ivan Herman A Simple RDF Example (in Turtle)

f:titre "Le palais des mirroirs"@fr; f:original .

2007-02-02 Ivan Herman Few Words About Syntax

Syntax (RDF/XML, Turtle) is only that: syntax The important thing is the underlying model and concepts In what follows, we will see examples both in RDF/XML and in Turtle, but we will not go into all the syntactical rules here this is a mechanical thing only, well documented

2007-02-02 Ivan Herman URI-s Play a Fundamental Role

URI-s made the merge possible Anybody can create (meta)data on any resource on the Web e.g., the same XHTML file could be annotated through other term semantics is added to existing Web resources via URI-s URI-s make it possible to link (via properties) data with one a URI-s ground RDF into the Web information can be retrieved using existing tools this makes the “Semantic Web”, well… “Semantic Web”

s

nother

2007-02-02 Ivan Herman “Internal” Nodes

Consider the following statement: “the publisher is a «thing» that has a name and an address” Until now, nodes were identified with a URI. But… …what is the URI of «thing»?

2007-02-02 Ivan Herman One Solution: Define Extra URI-s

Give an id with rdf:ID (essentially, defining a URI)

HarpersCollins HarpersCollins

Defines a fragment identifier within the RDF file Identical to the id in HTML, SVG, … (i.e., it can be referred to with regular URI-s from the outside) Note: this is an RDF/XML feature, not part of the RDF model! Turtle has something similar, too

2007-02-02 Ivan Herman Blank Nodes

Use an internal identifier

HarpersCollins HarpersCollins a:publisher _:A234. _:A234 a:p_name "HarpersCollins".

A234 is invisible from outside the file (it is not a “real” URI! ); it is an internal identifier for a resource

2007-02-02 Ivan Herman Blank Nodes: the System Can Also Do It

Let the system create a nodeID internally (you do not really care about the name…)

HarpersCollins

2007-02-02 Ivan Herman Same in Turtle

a:publisher [ a:p_name "HarpersCollins"; … ].

2007-02-02 Ivan Herman Blank Nodes: Some More Remarks

Blank nodes require attention when merging blanks nodes with identical nodeID-s in different graphs are di the implementation must be be careful with its naming schemes w From a logic point of view, blank nodes represent an “existential” statement (“there is a resource such that…”)

fferent hen merging

2007-02-02 Ivan Herman RDF in Programming Practice

For example, using Java+Jena (HP’s Bristol Lab): a “Model” object is created the RDF file is parsed and results stored in the Model the Model offers methods to retrieve: triples (property,object) pairs for a specific subject (subject,property) pairs for specific object etc. the rest is conventional programming… Similar tools exist in Python, PHP, etc. (see later)

2007-02-02 Ivan Herman Jena Example

// create a model Model model=new ModelMem(); Resource subject=model.createResource("URI_of_Subject") // 'in' refers to the input file model.read(new InputStreamReader(in)); StmtIterator iter=model.listStatements(subject,null,null); while(iter.hasNext()) { st = iter.next(); p = st.getProperty(); o = st.getObject(); do_something(p,o); }

2007-02-02 Ivan Herman Merge in Practice

Environments merge graphs automatically e.g., in Jena, the Model can load several files the load merges the new statements automatically

2007-02-02 Ivan Herman RDFSchemas

2007-02-02 Ivan Herman Need for RDF Schemas

This is the simple form of our “extra knowledge”: define the terms we can use what restrictions apply what extra relationships are there? This is where RDF Schemas come in officially: “RDF Vocabulary Description Language”; the term “Sc reasons…

hema” is retained for historical

2007-02-02 Ivan Herman Classes, Resources, …

Think of well known in traditional ontologies: use the term “novel” “every novel is a fiction” “«The Glass Palace» is a novel” etc. RDFS defines resources and classes: everything in RDF is a “resource” “classes” are also resources, but… …they are also a collection of possible resources (i.e., “indiv “fiction”, “novel”, …

iduals”)

2007-02-02 Ivan Herman Classes, Resources, … (cont.)

Relationships are defined among classes/resources: “typing”: an individual belongs to a specific class (“«The Glas to be more precise: “«isbn:000651409X» is a novel” “subclassing”: instance of one is also the instance of the othe RDFS formalizes these notions in RDF

s Palace» is a novel”)

r (“every novel is a fiction”)

2007-02-02 Ivan Herman Classes, Resources in RDF(S)

RDFS defines rdfs:Resource, rdfs:Class as nodes; rdf:type, rdfs:subClassOf as properties (these are all special URI-s, we just use the namespace abbrevi

ation)

2007-02-02 Ivan Herman Schema Example in RDF/XML

The schema part (“application’s data types”):

The RDF data on a specific novel (“using the type”):

In traditional knowledge representation this separation is often referred to as: “Terminological axioms” and “Assertions”

2007-02-02 Ivan Herman Further Remarks on Types

A resource may belong to several classes rdf:type is just a property… “«The Glass Palace» is a novel, but «The Glass Palace» is also i.e., it is not like a datatype! The type information may be very important for applications e.g., it may be used for a categorization of possible nodes probably the most frequently used rdf predicate… (remember the “Person” in our example?)

an «inventory item»…”

2007-02-02 Ivan Herman Inferred Properties

( rdf:type #Fiction) is not in the original RDF data… …but can be inferred from the RDFS rules Better RDF environments return that triplet, too

2007-02-02 Ivan Herman Inference: Let Us Be Formal…

The RDF Semantics document has a list of (44) entailment rules: “if such and such triplets are in the graph, add this and this do that recursively until the graph does not change this can be done in polynomial time for a specific graph The relevant rule for our example:

If: uuu rdfs:subClassOf xxx . vvv rdf:type uuu . triplet” Then add: vvv rdf:type xxx .

Whether those extra triplets are physically added to the graph, or deduced when needed is an implementation issue

2007-02-02 Ivan Herman Properties

Property is a special class (rdf:Property) properties are also resources identified by URI-s Properties are constrained by their range and domain i.e., what individuals can serve as object and subject There is also a possibility for a “sub-property” all resources bound by the “sub” are also bound by the other

2007-02-02 Ivan Herman Properties (cont.)

Properties are also resources (named via URI–s)… So properties of properties can be expressed as… RDF properties this twists your mind a bit, but you can get used to it For example, (P rdfs:range C) means: 1. P is a property 2. C is a class instance 3. when usingP , the “object” must be an individual in C this is an RDF statement with subject P, object C, and property rdfs:range

2007-02-02 Ivan Herman Property Specification Example

2007-02-02 Ivan Herman Property Specification Serialized

In XML/RDF:

In Turtle:

:title rdf:type rdf:Property; rdf:domain :Fiction; rdf:range rdfs:Literal.

2007-02-02 Ivan Herman Literals

Literals may have a data type floats, integers, booleans, etc, defined in XML Schemas one can also define complex structures and restrictions via regular expressions, … full XML fragments (Natural) language can also be specified (via xml:lang)

2007-02-02 Ivan Herman A Bit of RDFS Can Take You Far…

Remember the power of merge? We could have used, in our example: f:auteur is a subproperty of a:author and vice versa (although we will see other ways to do that…) Of course, in some cases, more complex knowledge is necessary are necessary (see later…)

2007-02-02 Ivan Herman Some Predefined Classes (Collections, Containers)

2007-02-02 Ivan Herman Predefined Classes and Properties

RDF(S) has some predefined classes and properties They are not new “concepts” in the RDF Model, just resources with an agreed semantics Examples: collections (a.k.a. lists) containers: sequence, bag, alternatives reification rdfs:comment, rdf:seeAlso, rdf:value

2007-02-02 Ivan Herman Collections (Lists)

We could have the following statement: “The book inventory is a «thing» that consists of ,…” But we also want to express the constituents in this order Using blank nodes is not enough

n/000651409X>,

2007-02-02 Ivan Herman Collections (Lists) (cont.)

Familiar structure for Lisp programmers…

2007-02-02 Ivan Herman The Same in RDF/XML and Turtle

:Inventory axsvg:consistsOf (

2007-02-02 Ivan Herman RDF Data Access, a.k.a. Query (SPARQL)

2007-02-02 Ivan Herman Querying RDF Graphs/Repositories

Remember the Jena idiom:

StmtIterator iter=model.listStatements(subject,null,null); while(iter.hasNext()) { st = iter.next(); p = st.getProperty(); o = st.getObject(); do_something(p,o);

In practice, more complex queries into the RDF data are necessary something like: “give me the (a,b) pair of resources, for which there is an x such that (x parent a) and (b brother x) holds” (ie, return the uncles) these rules may become quite complex Queries become very important for distributed RDF data! This is the goal of SPARQL (Query Language for RDF)

2007-02-02 Ivan Herman Analyze the Jena Example

StmtIterator iter=model.listStatements(subject,null,null); while(iter.hasNext()) { st = iter.next(); p = st.getProperty(); o = st.getObject(); do_something(p,o);

The (subject,?p,?o) is a pattern for what we are looking for (with ?p and ?o as “unknowns”)

2007-02-02 Ivan Herman General: Graph Patterns

The fundamental idea: generalize the approach to graph patterns: the pattern contains unbound symbols by binding the symbols (if possible), subgraphs of the RDF grap if there is such a selection, the query returns the bound resou SPARQL is based on similar systems that already existed in some enviro is a programming language-independent query language

h are selected rces

nments

2007-02-02 Ivan Herman Our Jena Example in SPARQL

SELECT ?p ?o WHERE {subject ?p ?o}

The triplets in WHERE define the graph pattern, with ?p and ?o “unbound” symbols The query returns a list of matching p,o pairs

2007-02-02 Ivan Herman Simple SPARQL Example

SELECT ?isbn ?price ?currency # note: not ?x! WHERE { ?isbn a:price ?x. ?x rdf:value ?price. ?x p:currency ?currency.

Returns: [[<..49X>,33,£], [<..49X>,50,€], [<..6682>,60,€], [<..6682>,78,$]]

2007-02-02 Ivan Herman Pattern Constraints

SELECT ?isbn ?price ?currency WHERE { ?isbn a:price ?x. ?x rdf:value ?price. ?x p:currency ?currency. FILTER(?currency == €) }

Returns: [ [<..49X>,50,€], [<..6682>,60,€]] SPARQL defines a base set of operators and functions

2007-02-02 Ivan Herman Other SPARQL Features

Optional triples: if they are available, return the values, otherwise ignore but return everything else Limit the number of returned results; remove duplicates, sort them, … Construct a graph combining a separate pattern and the query results Use datatypes and/or language tags when matching a pattern SPARQL is quite mature already recommendation expected 3rdQ of 2007 there are a number of

implementations

already

2007-02-02 Ivan Herman SPARQL Usage in Practice

Locally, i.e., bound to a programming environments like Jena Remotely, e.g., over the network or into a database separate documents define the protocol and the result format SPARQL Protocol for RDF with HTTP and SOAP bindings SPARQL results in XML or JSON formats There are already a number of applications, demos, etc.,

2007-02-02 Ivan Herman Get to RDF(S) Data

2007-02-02 Ivan Herman Simplest: Write your own RDF Data…

The simplest aproach: write your own RDF data in your preferred syntax… You may add RDF to XML directly (in its own namespace); e.g., in SVG:

... ... ...

However: this does not scale!

2007-02-02 Ivan Herman RDF Can Also Be Extracted/Generated

Use intelligent “scrapers” or “wrappers” to extract a structure (hence RDF) from a Web page… using conventions in, e.g., class names or meta elements … and then generate RDF automatically (e.g., via an XSLT script) This is similar to what “” do (without referring to RDF, though) they may not extract RDF but use the data directly instead in Web&2.0 applications, different other applications may extract it to yield RDF (e.g., RSS1.0)

but the application is not all that

2007-02-02 Ivan Herman Formalizing the Scraper Approach: GRDDL

GRDDL formalizes the scraper approach. For example:

Some Document ... ... 2006-01-02 ...

yields, by running the file through dc-extract.xsl

Some subject 2006-01-02

2007-02-02 Ivan Herman GRDDL (cont)

The user has to provide dc-extract.xsl and use its conventions (making use of the corresponding meta-s, class id-s, etc…) … but, by using the profile attribute, a client is instructed to find and run the transformation processor automatically There is a mechanism for XML in general a transformation can also be defined on an XML schema level A “bridge” to “” Currently a Working Group, with a recommendation planned in the 2nd Quarter of 2007

2007-02-02 Ivan Herman via dedicated properties)

Another Upcoming Solution: RDFa

RDFa extends (X)HTML a bit by: defining general attributes to add metadata to any elements (a class in microformats, but

provides an almost complete “serialization” of RDF in XHTML the same mechanism can also be used for general XML content

bit like the

2007-02-02 Ivan Herman RDFa (cont.)

For example

March 23, 2004 Rollers hit casino for £1.3m By Steve Bird. See also video footage

yields, by running the file through a processor:

dc:date "March 23, 2004"; dc:title "Rollers hit casino for £1.3m; dc:creator "Steve Bird"; dcmtype:MovingImage .

2007-02-02 Ivan Herman RDFa (cont.)

It is a bit like the microformats approach but with more rigor and fully generic makes it easy to mix different vocabularies, which is not that It can easily be combined with GRDDL

easy with microformats

2007-02-02 Ivan Herman RDFa and GRDDL

Both solutions aim at “binding” existing structural data with RDF GRDDL “brings” structured data to RDF RDFa “brings” RDF to structured data (HTML) The same URI may be interpreted as a web page to be displayed by a browser as RDF data to be integrated (compare to a credit card: a human can read its number and owner; a card reader can access to its data)

2007-02-02 Ivan Herman Bridge to Relational Databases

Most of the data are stored in relational databases “RDFying” them is an impossible task “Bridges” are being defined: a layer between RDF and the database RDB tables are “mapped” to RDF graphs on the fly in some cases the mapping is generic (columns represent propert references to other tables via blank nodes)… … in other cases separate mapping files define the details This is a very important source of RDF data

ies, cells are, e.g., literals or

2007-02-02 Ivan Herman SPARQL As a Unifying Force

2007-02-02 Ivan Herman RDF(S) in Practice

2007-02-02 Ivan Herman We have seen Jena

// create a model Model model=new ModelMem(); Resource subject=model.createResource("URI_of_Subject") // 'in' refers to the input file model.read(new InputStreamReader(in)); StmtIterator iter=model.listStatements(subject,null,null); while(iter.hasNext()) { st = iter.next(); p = st.getProperty(); o = st.getObject(); do_something(p,o); }

2007-02-02 Ivan Herman Jena (cont)

But Jena is much more; it has a large number of classes/methods adding triplets to a graph, serialize it comparing full RDF graphs manage typed literals etc. an “RDFS Reasoner” a full SPARQL implementation a layer (Joseki) to create a triple database and more… Probably the most widely used RDF environment in Java today

2007-02-02 Ivan Herman Lots of Other tools

Lots of tools are available. Are listed on W3C’s : RDF programming environment for 14+ languages, including C, C++ PHP,… (no Cobol or Ada yet !) 13+ Triple Stores, ie, database systems to store (possibly huge specialized editors, validators, … SPARQL “endpoints” (ie, you can experiment with RDF data withou etc Some of the tools are Open Source, some are not; some are very mature, some are not : it is the usual picture of software tools, nothing special, Python, any more! Java, Javascript, Ruby, Anybody can start developing RDF-based applications!) today datasets

t any installations)

2007-02-02 Ivan Herman Ontologies (OWL)

2007-02-02 Ivan Herman Ontologies

RDFS is useful, but does not solve all the possible requirements Complex applications may want more possibilities: can a program reason about some terms? E.g.: “if «Person» resources «A» and «B» have the same «:email» property, then «A» and «B» are identical” if somebody else defines a set of terms: are they the same? construct classes, not just name them restrict a property range when used for a specific class disjointness or equivalence of classes etc.

2007-02-02 Ivan Herman Ontologies (cont.)

There is a need to support ontologies on the Semantic Web:

“defines the concepts and relationships used to describe and re knowledge”

We need Web Ontology Languages RDFS can be considered as a simple ontology language OWL gives a much more complex set of possibilities Languages should be a compromise between rich semantics for meaningful applications feasibility, implementability present an area of

2007-02-02 Ivan Herman Classes in OWL

In RDFS, you can subclass existing classes… that’s all In OWL, you can construct classes from existing ones: enumerate its content through intersection, union, complement through property restrictions To do so, OWL introduces its own Class and Thing to differentiate the classes from individuals

2007-02-02 Ivan Herman OWL Classes can be “Enumerated”

The OWL solution, where possible content is explicitly listed:

2007-02-02 Ivan Herman Same Serialized

:£ rdf:type owl:Thing. :€ rdf:type owl:Thing. :$ rdf:type owl:Thing. :Currency rdf:type owl:Class; owl:oneOf (:€ :£ :$).

The class consists of exactly of those individuals

2007-02-02 Ivan Herman Union of Classes

Essentially, like a set-theoretical union:

2007-02-02 Ivan Herman Same Serialized

:Novel rdf:type owl:Class. :Short_Story rdf:type owl:Class. :Poetry rdf:type owl:Class. :Literature rdf:type owlClass; owl:unionOf (:Novel :Short_Story :Poetry).

Other possibilities: complementOf, intersectionOf

2007-02-02 Ivan Herman Property Restrictions

(Sub)classes created by restricting the property values on that class For example, “a listed price is a price which is in either €, £, or $” means: the value of “p:currency” when applied to a resource on listed in other cases it can also take the value of ¥, i.e., it is not the same as range! …thereby define the class of “listed price”

price must take one of those values…

2007-02-02 Ivan Herman Property Restrictions in OWL

Restriction may be by: value constraints (i.e., further restrictions on the range) all values must be from a class (like the price example) some value must be from a class cardinality constraints (i.e., how many times the property can be used on an instance?) minimum cardinality maximum cardinality exact cardinality

2007-02-02 Ivan Herman Property Restriction Example

“the value of “p:currency” when applied to a resource on listed price must take one of those values…”:

2007-02-02 Ivan Herman Same Serialized

:Listed_Price rdf:type owl:Class; rdfs:subClassOf [ rdf:type owl:Restriction; owl:onProperty ; owl:allValuesFrom :Currency. ].

“allValuesFrom” could be replaced by “someValuesFrom”, “cardinality”, “minCardinality”, or “maxCardinality”

2007-02-02 Ivan Herman Property Characterization

In OWL, one can characterize the behavior of properties (symmetric, transitive, functional, inverse functional…) OWL also separates data properties “datatype property” means that its range are typed literals

2007-02-02 Ivan Herman Characterization Example

“foaf:email” is inverse functional

Could also be FunctionalProperty, TransitiveProperty, SymmetricProperty

2007-02-02 Ivan Herman OWL: Additional Requirements

Ontologies may be extremely large: their management requires special care they may consist of several modules come from different places and must be integrated Ontologies are on the Web. That means applications may use several, different ontologies, or… … same ontologies but in different languages equivalence of, and relations among terms become an issue

2007-02-02 Ivan Herman Term Equivalence/Relations

For classes: owl:equivalentClass: two classes have the same individuals owl:disjointWith: no individuals in common For properties: owl:equivalentProperty remember the a:author vs. f:auteur? owl:inverseOf: inverse relationship For individuals: owl:sameAs: two URI refer to the same individual (e.g., concept) owl:differentFrom: negation of owl:sameAs

2007-02-02 Ivan Herman Example: Connecting to French

2007-02-02 Ivan Herman Versioning, Annotation

Special class owl:Ontology with special properties: owl:imports, owl:versionInfo, owl:priorVersion owl:backwardCompatibleWith, owl:incompatibleWith rdfs:label, rdfs:comment can also be used One instance of such class is expected in an ontology file Deprecation control: owl:DeprecatedClass, owl:DeprecatedProperty types

2007-02-02 Ivan Herman However: Ontologies are Hard!

A full ontology-based application is a very complex system Hard to implement, may be heavy to run… … and not all applications may need it! Three layers of OWL are defined: Lite, DL, and Full decreasing level of complexity and expressiveness “Full” is the whole thing “DL ()” restricts Full in some respects “Lite” restricts DL even more

2007-02-02 Ivan Herman OWL Full

No constraints on the various constructs owl:Class is equivalent to rdfs:Class owl:Thing is equivalent to rdfs:Resource This means that: Class can also be an individual (it is possible to talk about class one can make statements on RDFS constructs (e.g., declare rdf:type to be functional…) etc. A real superset of RDFS But: an OWL Full ontology may be undecidable!

of classes, etc.)

2007-02-02 Ivan Herman OWL Description Logic (DL)

Goal: maximal subset of OWL Full against which current research a decidable reasoning procedure is realizable

Class, Thing, ObjectProperty, DatatypePropery are strictly separated : a class cannot be an individual of another class object properties’ values must usually be an owl:Thing (except, e.g., for rdf:type) No mixture of owl:Class and rdfs:Class in definitions (essentially: use OWL concepts only!) No statements on RDFS resources can assure that No characterization of datatype properties possible …

2007-02-02 Ivan Herman OWL Lite

Goal: provide a minimal useful subset, easily implemented

All of DL’s restrictions, plus some more: class construction can be done only through intersection or pro cardinality restriction with 0 and 1 only … Simple class hierarchies can be built Property constraints and characterizations can be used

perty constraints

2007-02-02 Ivan Herman Note on OWL layers

OWL Layers were defined to reflect compromises: expressibility vs. implementability Some application just need to express and interchange terms (with possible scruffiness): OWL Full is fine they may build application specific reasoning instead of using Some applications need rigor; then OWL DL/Lite might be the good choice Research may lead to new decidable subsets of OWL see, e.g., H.J. ter Horst’s paper at ISWC2004 or in the Journal

a general one

of Web Semantics (October 2005)

2007-02-02 Ivan Herman Ontology Development

The hard work is to create the ontologies requires a good knowledge of the area to be described some communities have good expertise already (e.g., librarians) OWL is just a tool to formalize ontologies Large scale ontologies are often developed in a community process Ontologies should be shared and reused can be via the simple namespace mechanisms… …or via explicit inclusions Applications can also be developed with very small ontologies, though

2007-02-02 Ivan Herman Ontology Examples

International country list example for an OWL Lite ontology There are also some large ontologies in the public: eClassOwl the Gene Ontology: eBusiness ontology for products and services, 75,000 classes UniProt : protein sequence: to describe and annotation gene and genedata, producthundreds attributes of millions in any of organism

and 5,500 properties

triples(!)

2007-02-02 Ivan Herman Simple Knowledge Organization System (SKOS)

2007-02-02 Ivan Herman Simple Knowledge Organization System

Goal: porting (“Webifying”) thesauri: representing and sharing classifications, glossaries, thesauri, etc, as developed in the “Print World”. For example: Dewey Decimal Classification terms… DMOZ categories (a.k.a. The system must be simple , toArt allow and Architecture for a quick Thesaurus, port of ACMtraditio classificationnal data of(done keyword by

“traditional” people…) Open Directory Project This is where SKOS comes in

)

s and

2007-02-02 Ivan Herman Example: Entries in a Glossary (1)

“Assertion” “(i) Any expression which is claimed to be true. (ii) The act of claiming something to be true.”

“Class” “A general concept, category or classification. Something used primarily to classify or categorize other things.”

“Resource” “(i) An entity; anything in the universe. (ii) As a class name: the class of everything; the most inclusive category possible.”

(from the RDF Semantics Glossary)

2007-02-02 Ivan Herman Example: Entries in a Glossary (2)

2007-02-02 Ivan Herman Example: Taxonomy (1)

Illustrates “broader” and “narrower”

General Travelling Politics

SemWeb RDF OWL

(From MortenF’s weblog categories. Note that the categorization is arbitrary!)

2007-02-02 Ivan Herman Example: Taxonomy (2)

2007-02-02 Ivan Herman Example: Thesaurus (1)

Term Economic cooperation

Used For Economic co-operation

Broader terms Economic policy

Narrower terms Economic integration, European economic cooperation, …

Related terms Interdependence

Scope Note Includes cooperative measures in banking, trade, …

(from UK Archival Thesaurus)

2007-02-02 Ivan Herman Example: Thesaurus (2)

2007-02-02 Ivan Herman SKOS Core Overview

Classes and Predicates: Basic description (Concept, ConceptScheme, …) Labelling (prefLabel, altLabel, prefSymbol, altSymbol …) Documentation (definition, scopeNote, changeNote, …) Semantic relations (broader, narrower, related) Subject indexing (subject, isSubjectOf, …) Grouping (Collection, OrderedCollection, …) Subject Indicator (subjectIndicator) Some simple inference rules (a bit like the RDFS inference rules) to define some semantics

2007-02-02 Ivan Herman Why Having SKOSand OWL?

OWL’s precision not always necessary or even appropriate “OWL a sledge hammer/SKOS a nutcracker”, or “OWL a Harley/SKOS complement each other, can be used in combination to optimize c Role of SKOS is to bring the worlds of library classification and Web technolog to be simple and undemanding enough in terms of cost and requir A typical example: the Glossary of project of W3C stores all terms in SKOS (and extracted from W3C documents) a bike” ost/benefit

y together

ed expertise

2007-02-02 Ivan Herman Rules

OWL-DL and OWL-Lite are based on Description Logic; there are things that DL cannot express a well known examples is Horn rules (eg, the “uncle” relationsh (P1 ∧ P2 ∧ …) → C e.g.: for any «X», «Y» and «Z»: “if «Y» is a parent of «X», and «Z» is a brother of «Y» then «Z» is the uncle of «X»” there are a number of attempts to combined these: There is also an increasing number of rule-based system that want to interchange rules a new type of data (potentially) on the Web to be interchanged… Hence the ongoing work on Rules at W3C (still at its earlyip): phase!) RuleML , SWRL , cwm, …

2007-02-02 Ivan Herman Some Typical Use Cases

Negotiate eBusiness contracts across platforms: supply vendor-neutral representation of your business rules so that others may find you Describe privacy requirements and policies, and let clients “merge” those (e.g., when paying with a credit card) Medical decision support, combining rules on diagnoses, drug prescription conditions, etc, Extend RDFS (or OWL) with rule-based statements (e.g., the uncle example)

2007-02-02 Ivan Herman What Have We achieved?

2007-02-02 Ivan Herman Remember the integration example?

2007-02-02 Ivan Herman Same With What We Learnt

2007-02-02 Ivan Herman What is Coming?

2007-02-02 Ivan Herman Beyond Rules: Trust

Can I trust a (meta)data on the Web? is the author the one who claims he/she is, can I check his/her can I trust the inference engine? etc. There are issues to solve, e.g., how to “name” a full graph protocols and policies to encode/sign full or partial graphs (b uniqueness) how to “express” trust? (e.g., trust in context) credentials? It is on the “future” stack of W3C and the SW Community …

lank nodes may be a problem to achieve

2007-02-02 Ivan Herman Other Issues…

Improve the inference algorithms and implementations, scalability, reasoning with OWL Full Better modularization (import or refer to part of ontologies) Ontology management on the Web Extensions of RDF and/or OWL (based on experience and theoretical advances) “Sub OWL Lite” ontology level for some applications (RDFS++, OWL Feather, pD*,…) Temporal, spatial, fuzzy, probablistic, etc, reasoning …

2007-02-02 Ivan Herman Available Documents, Tools

2007-02-02 Ivan Herman Available Specifications: Primers, Guides

The “RDF Primer” and the “OWL Guide” give a formal introduction to RDF(S) and OWL SKOS has its separate “SKOS Core Guide” The Semantic Web Activity Homepage has links for all the specifications

2007-02-02 Ivan Herman “Core” Vocabularies

A number of public “core” vocabularies evolve to be used by applications, e.g.: digital right: managementabout information resources, digital libraries, with extensio FOAF DOAP: about people and their organizations MusicBrainz: on the descriptions of software projects SIOC : Semantically-Interlinked: on the description Online of CDs, Communities music tracks, … vCard in RDF … They share the underlying RDF model (provides mechanismsns for for exrights,tensibility, permissions, sharing, …)

2007-02-02 Ivan Herman Some Books

J. Davies, D. Fensel, F. van Harmelen: Towards the Semantic Web (2002) S. Powers: Practical RDF (2003) F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. Patel-Schneider: The Description Logic Handbook (2003) G. Antoniu, F. van Harmelen: Semantic Web Primer (2004) A. Gómez-Pérez, M. Fernández-López, O. Corcho: Ontological Engineering (2004) …

See the separate Wiki page collecting books

2007-02-02 Ivan Herman Further Information

Dave Beckett’s Resources at Bristol University huge list of documents, publications, tools, … Semantic Web Community Portals, e.g.: Semanticweb.org SW Tutorials on XML.com Planet RDF: a blog aggregator on SW topics The Semantic Web Activity has a number of further links

2007-02-02 Ivan Herman SWBP Working Group Documents

Documents for Survey of RDF/ Maps Interoperability “Ontology Driven Architectures in Software Engineering”

2007-02-02 Ivan Herman Further Information (cont)

Description Logic links: online course teaching material and links by Enrico Franconi, “Ontology Development 101” OWL Reasoning Examples by Ian Horrocks Lots of papers at WWW2003, WWW2004, WWW2005, and WWW2006; see also the ISWC200X conference proceedings (online since ISWC2007) and their European and Asian local variants

2007-02-02 Ivan Herman Public Fora and Resources at W3C

Semantic Web Interest Group a forum developers with archived (and public) mailing list, and a constant IRC presence on freenode.net#swig; anybody can sign up on the list

Semantic Web Education and Outreach Interest Group public archives of the Interest Group (although only members can sign up on the list directly)

Semantic Web Deployment Working Group public archives of the Working Group (although only members can sign up on the list directly)

2007-02-02 Ivan Herman Tools

The W3C Wiki page: http://esw.w3.org/topic/SemanticWebTools lists lots of tools we have already seen some also: ontology editors, validators, OWL reasoners, program deve You should go there for further details! Worth noting: major companies offer (or will offer) Semantic Web tools or systems using Semantic Web: Adobe, Oracle, IBM, HP, Software AG, webMethods, Northrop Gruman, Altova,…

lopment tools,…

2007-02-02 Ivan Herman Some Application Examples

2007-02-02 Ivan Herman Semantic Web ≠ an academic research only!

SW has indeed a strong foundation in research results But remember: (1) the Web was born at CERN… (2) …was first picked up by high energy physicists… (3) …then by academia at large… (4) …then by small businesses and start-ups… (5) “big business” came only later! network effect kicked in early… Semantic Web is now at #4, and moving to #5!

2007-02-02 Ivan Herman May start with small communities

The needs of a deployment application area: have serious problem or opportunity have the intellectual interest to pick up new things have motivation to fix the problem its data connects to other application areas have an influence as a showcase for others The high energy physics community played this role for the Web in the 90’s

2007-02-02 Ivan Herman Some RDF deployment areas

Library metadata Defense Life sciences yes, connections among Problem to single-domain yes, serious data genetics, proteomics, clinical solve? integration integration needs trials, regulatory,… yes: funded early Willingness to yes: OCLC push and yes: intellectual level high, DAML (OWL) adopt? Dublin Core initiative(*) much modeling done already. work Motivation strong strong very strong phone calls chemistry, regulatory, medical, Links to other library data records, etc etc

(*) note that the Dublin Core Initiative’s work go way beyond digital libraries these days

2007-02-02 Ivan Herman Some RDF deployment areas (cont)

These are just examples Others are coming to the fore: eGovernment, energy sector (oil industry), financial services,… Health care and life science sector is now very active also at W3C, in the form of an Interest Group

2007-02-02 Ivan Herman The “corporate” landscape is moving

Some of the names of active participants in W3C SW related groups: ILOG, HP, Agfa, SRI International, Fair Isaac Corp., Oracle, Boeing, IBM, Chevron, Siemens, Nokia, Merck, Pfizer, AstraZeneca, Sun, Citigroup,… “Corporate Semantic Web” listed as major technology by Gartner in 2006 The Semantic Technology Conference series also attract lots of participants speakers in 2006: from IBM, Cisco, BellSouth, GE, Walt Disney, not all referring to Semantic Web (eg, RDF, OWL,…) but semantic but they might come around!

Nokia, Oracle, … s in general

2007-02-02 Ivan Herman Applications are not always very complex…

Eg: simple semantic annotations of patients’ data greatly enhances communications among doctors What is needed: some simple ontologies, an RDFa/microformat type editing environment Simple but powerful!

2007-02-02 Ivan Herman Data Integration R&D

Boeing, MITRE Corp., Elsevier, EU Projects like Sculpteur and Artiste, national projects like MuseoSuomi, DartGrid from Zhe Jiang University, … A general question: can I access your (RDF) data directly?

2007-02-02 Ivan Herman Example: antibodies demo

Scenario: find the known antibodies for a protein in a specific species Combine (“scrape”…) three different data sources Use SPARQL as an integration tool (see also demo online)

2007-02-02 Ivan Herman Portals

Vodafone's Live Mobile Portal search application (e.g. ringtone, game, picture) using RDF page views per download decreased 50% ringtone up 20% in 2 months A number of other portal examples: Sun’s White Paper Collections and System Handbook collections; Nokia’s S60 support portal; Harper’s Online magazine linking items via an internal ontology; Oracle’s virtual press room; Opera’s community site, Yahoo! Food… A general question again: can I access your (RDF) data directly?

2007-02-02 Ivan Herman Adobe's XMP

Adobe’s tool to add RDF-based metadata to most of their file formats used for more effective organization supported in Adobe Creative Suite support from 30+ major asset management vendors, with separate Windows Vista The tool is available for all!

XMP conferences; will be used in

2007-02-02 Ivan Herman Improved Search via Ontology: GoPubMed

Improved search on top of pubmed.org search results are ranked using the specialized ontologies extra search terms are generated and terms are highlighted Importance of domain specific ontologies for search improvement

2007-02-02 Ivan Herman Other Application Areas Come to the Fore

Knowledge management Business intelligence Linking virtual communities Management of multimedia data (e.g., video and image depositories) Content adaptation and labeling (e.g., for mobile usage) etc

2007-02-02 Ivan Herman

Thank you for your attention!

These slides are publicly available on:

http://www.w3.org/2007/Talks/0221-Bangalore-IH/

in XHTML and PDF formats; the XHTML version has active links that you can follow

2007-02-02 Ivan Herman