The Semantic Web: an Introduction 2/24/04 2:29 AM
Total Page:16
File Type:pdf, Size:1020Kb
The Semantic Web: An Introduction 2/24/04 2:29 AM The Semantic Web: An Introduction This document is designed as being a simple but comprehensive introductory publication for anybody trying to get into the Semantic Web: from beginners through to long time hackers. Recommended pre-reading: the Semantic Web in Breadth. Table Of Contents 1. What Is The Semantic Web? 2. Simple Data Modelling: Schemata 3. Ontologies, Inferences, and DAML 4. The Power Of Semantic Web Languages 5. Trust and Proof 6. Ambient Information and SEM 7. Evolution 8. Does It Work? What Semantic Web Applications Are There? 9. What Now? Further Reading What Is The Semantic Web? The Semantic Web is a mesh of information linked up in such a way as to be easily processable by machines, on a global scale. You can think of it as being an efficient way of representing data on the World Wide Web, or as a globally linked database. The Semantic Web was thought up by Tim Berners-Lee, inventor of the WWW, URIs, HTTP, and HTML. There is a dedicated team of people at the World Wide Web consortium (W3C) working to improve, extend and standardize the system, and many languages, publications, tools and so on have already been developed. However, Semantic Web technologies are still very much in their infancies, and although the future of the project in general appears to be bright, there seems to be little consensus about the likely direction and characteristics of the early Semantic Web. What's the rationale for such a system? Data that is geneally hidden away in HTML files is often useful in some contexts, but not in others. The problem with the majority of data on the Web that is in this form at the moment is that it is difficult to use on a large scale, because there is no global system for publishing data in such a way as it can be easily processed by anyone. For example, just think of information about local sports events, weather information, plane times, Major League Baseball statistics, and television guides... all of this information is presented by numerous sites, but all in HTML. The problem with that is that, is some contexts, it is difficult to use this data in the ways that one might want to do so. So the Semantic Web can be seen as a huge engineering solution... but it is more than that. We will find that as it becomes easier to publish data in a repurposable form, so more people will want to pubish data, and there will be a knock-on or domino effect. We may find that a large number of Semantic Web applications can be used for a variety of different tasks, increasing the modularity of applications on the Web. But enough subjective reasoning... onto how this will be accomplished. The Semantic Web is generally built on syntaxes which use URIs to represent data, usually in triples based structures: i.e. many triples of URI data that can be held in databases, or interchanged on the http://infomesh.net/2001/swintro/ Page 1 of 17 The Semantic Web: An Introduction 2/24/04 2:29 AM world Wide Web using a set of particular syntaxes developed especially for the task. These syntaxes are called "Resource Description Framework" syntaxes. URI - Uniform Resource Identifier A URI is simply a Web identifier: like the strings starting with "http:" or "ftp:" that you often find on the World Wide Web. Anyone can create a URI, and the ownership of them is clearly delegated, so they form an ideal base technology with which to build a global Web on top of. In fact, the World Wide Web is such a thing: anything that has a URI is considered to be "on the Web". The syntax of URIs is carefully governed by the IETF, who published RFC 2396 as the general URI specification. The W3C maintains a list of URI schemes. RDF - Resource Description Framework A triple can simply be described as three URIs. A language which utilises three URIs in such a way is called RDF: the W3C have developed an XML serialization of RDF, the "Syntax" in the RDF Model and Syntax recommendation. RDF XML is considered to be the standard interchange format for RDF on the Semantic Web, although it is not the only format. For example, Notation3 (which we shall be going through later on in this article) is an excellent plain text alternative serialization. Once information is in RDF form, it becomes easy to process it, since RDF is a generic format, which already has many parsers. XML RDF is quite a verbose specification, and it can take some getting used to (for example, to learn XML RDF properly, you need to understand a little about XML and namespaces beforehand...), but let's take a quick look at an example of XML RDF right now:- <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:foaf="http://xmlns.com/0.1/foaf/" > <rdf:Description rdf:about=""> <dc:creator rdf:parseType="Resource"> <foaf:name>Sean B. Palmer</foaf:name> </dc:creator> <dc:title>The Semantic Web: An Introduction</dc:title> </rdf:Description> </rdf:RDF> This piece of RDF basically says that this article has the title "The Semantic Web: An Introduction", and was written by someone whose name is "Sean B. Palmer". Here are the triples that this RDF produces:- <> <http://purl.org/dc/elements/1.1/creator> _:x0 . this <http://purl.org/dc/elements/1.1/title> "The Semantic Web: An Introduction" . _:x0 <http://xmlns.com/0.1/foaf/name> "Sean B. Palmer" . This format is actually a plain text serialization of RDF called "Notation3", which we shall be covering later on. Note that some people actually prefer using XML RDF to Notation3, but it is generally accepted that Notation3 is easier to use, and is of course convertable to XML RDF anyway. Why RDF? When people are confronted with XML RDF for the first time, they usually have two questions: "why use RDF rather than XML?", and "do we use XML Schema in conjunction with RDF?". http://infomesh.net/2001/swintro/ Page 2 of 17 The Semantic Web: An Introduction 2/24/04 2:29 AM The answer to "why use RDF rather than XML?" is quite simple, and is twofold. Firstly, the benefit that one gets from drafting a language in RDF is that the information maps directly and unambiguously to a model, a model which is decentralized, and for which there are many generic parsers already available. This means that when you have an RDF application, you know which bits of data are the semantics of the application, and which bits are just syntactic fluff. And not only do you know that, everyone knows that, often implicitly without even reading a specification because RDF is so well known. The second part of the twofold answer is that we hope that RDF data will become a part of the Semantic Web, so the benefits of drafting your data in RDF now draws parallels with drafting your information in HTML in the early days of the Web. The answer to "do we use XML Schema in conjunction with RDF?" is almost as brief. XML Schema is a language for restricting the syntax of XML applications. RDF already has a built in BNF that sets out how the language is to be used, so on the face of it the answer is a solid "no". However, using XML Schema in conjunction with RDF may be useful for creating datatypes and so on. Therefore the answer is "possibly", with a caveat that it is not really used to control the syntax of RDF. This is a common misunderstanding, perpetuated for too long now. Screen Scraping, and Forms For the Semantic Web to reach its full potential, many people need to start publishing data as RDF. Where is this information going to come from? A lot of it can be derived from many data publications that exist today, using a process called "screen scraping". Screen scraping is the act of literally getting the data from a source into a more manageable form (i.e. RDF) using whatever means come to hand. Two useful tools for screen scraping are XSLT (an XML transformations language), and RegExps (in Perl, Python, and so on). However, screen scraping is often a tedious solution, so another way to approach it is to build proper RDF systems that take input from the user and then store it straight away in RDF. Data such as you may enter when signing up for a new mail account, buying some CDs online, or searching for a used car can all be stored as RDF and then used on the Semantic Web. Notation3: RDF Made Easy As you will have seen above, XML RDF can be rather difficult, but thankfully, there is are simpler teaching forms of RDF. One of these is called "Notation3", and was developed by Tim Berners-Lee. There is some documentation covering N3, including a specification, and an excellent Primer. The design criteria behind Notation3 were fairly simple: design a simple easy to learn scribblable RDF format, that is easy to parse and build larger applications on top of. In Notation3, we can simply write out the URIs in a triple, delimiting them with a "<" and ">" symbols. For example, here's a simple triple consisting of three URI triples:- <http://xyz.org/#a> <http://xyz.org/#b> <http://xyz.org/#c> .