The Semantic Web

OIL: An Infrastructure for the Semantic Web

Dieter Fensel and Frank van Harmelen, Vrije Universiteit, Amsterdam , University of Manchester, UK Deborah L. McGuinness, Stanford University Peter F. Patel-Schneider, Bell Laboratories

esearchers in first developed to facilitate knowl- R edge sharing and reuse. Since the beginning of the 1990s, ontologies have become a popular research topic, and several AI research communities—including Ontologies play a knowledge engineering, natural language processing, and knowledge representation— major role in have investigated them. More recently, the notion of application programs have direct access to data an ontology is becoming widespread in fields such semantics. The resource description framework2 supporting information as intelligent information integration, cooperative has taken an important additional step by defining information systems, information retrieval, elec- a syntactical convention and a simple data model exchange across tronic commerce, and knowledge management. for representing machine-processable data seman- Ontologies are becoming popular largely because of tics. RDF is a standard for the Web metadata the various networks. A what they promise: a shared and common under- World Wide Web Consortium (www.w3c.org/rdf) standing that reaches across people and application develops, and it defines a data model based on prerequisite for such a systems. triples: object, property, and value. The RDF Currently, ontologies applied to the World Wide Schema3 takes a step further into a richer represen- role is the development Web are creating the Semantic Web.1 Originally, the tation formalism and introduces basic ontological Web grew mainly around HTML, which provides a modeling primitives into the Web. With RDFS, we of a joint standard for standard for structuring documents that browsers can can talk about classes, subclasses, subproperties, translate in a canonical way to render those docu- domain and range restrictions of properties, and so specifying and ments. On the one hand, HTML’s simplicity helped forth in a Web-based context. We took RDFS as a spur the Web’s fast growth; on the other, its simplic- starting point and enriched it into a full-fledged exchanging ontologies. ity seriously hampered more advanced Web appli- Web-based ontology language called OIL.4 We cations in many domains and for many tasks. This included these aspects: The authors present led to XML (see Figure 1), which lets developers define arbitrary domain- and task-specific extensions • A more intuitive choice of some of the modeling OIL, a proposal for (even HTML appears as an XML application— primitives and richer ways to define concepts and XHTML). attributes. such a standard. XML is basically a defined way to provide a seri- • The definition of a formal semantics for OIL. alized syntax for tree structures—it is an important • The development of customized editors and infer- first step toward building a Semantic Web, where ence engines to work with OIL.

38 1094-7167/01/$10.00 © 2001 IEEE IEEE INTELLIGENT SYSTEMS Ontologies: A revolution tion spread over different sources. for information access and • Maintaining: Sustaining weakly struc- DAML-O OIL integration tured text sources is difficult and time- Many definitions of ontologies have sur- consuming when such sources become RDFS faced in the last decade, but the one that in large. Keeping such collections consis- our opinion best characterizes an ontology’s tent, correct, and up to date requires a XHTML RDF essence is this: “An ontology is a formal, mechanized representation of semantics HTML XML explicit specification of a shared conceptual- and constraints that can help detect ization.”5 In this context, conceptualization anomalies. refers to an abstract model of some phenom- • Automatic document generation: Adaptive Figure 1. The layer language model for enon in the world that identifies that phe- Web sites that enable dynamic reconfigu- the Web. nomenon’s relevant concepts. Explicit means ration according to user profiles or other that the type of concepts used and the con- relevant aspects could prove very useful. tomer as an instant market overview. Their straints on their use are explicitly defined, and The generation of semistructured infor- functionality is provided through wrappers, formal means that the ontology should be mation presentations from semistructured which use keyword search to find product machine understandable. Different degrees data requires a machine-accessible repre- information together with assumptions on of formality are possible. Large ontologies sentation of the semantics of these infor- regularities in presentation format and text such as WordNet (www.cogsci.princeton. mation sources. extraction heuristics. This technology has edu/~wn) provide a thesaurus for over two severe limitations: 100,000 terms explained in natural language. Using ontologies, semantic annotations On the other end of the spectrum is will allow structural and semantic definitions • Effort: Writing a wrapper for each online (www.cyc.com), which provides formal of documents. These annotations could pro- store is time-consuming, and changes in axiomating theories for many aspects of com- vide completely new possibilities: intelligent store presentation or organization increase monsense knowledge. Shared reflects the search instead of keyword matching, query maintenance. notion that an ontology captures consensual answering instead of information retrieval, • Quality: The extracted product informa- knowledge—that is, it is not restricted to document exchange between departments tion is limited (it contains mostly price some individual but is accepted by a group. through ontology mappings, and definitions information), error-prone, and incomplete. The three main application areas of ontol- of views on documents. For example, a wrapper might extract the ogy technology are knowledge management, product price, but it usually misses indi- Web commerce, and electronic business. Web commerce rect costs such as shipping. E-commerce is an important and growing Knowledge management business area for two reasons. First, e-com- Most product information is provided in KM is concerned with acquiring, maintain- merce extends existing business models—it natural language; automatic text recognition ing, and accessing an organization’s knowl- reduces costs, extends existing distribution is still a research area with significant edge. Its purpose is to exploit an organization’s channels, and might even introduce new dis- unsolved problems. However, the situation intellectual assets for greater productivity, new tribution possibilities. Second, it enables com- will drastically change in the near future value, and increased competitiveness. Owing pletely new business models and gives them when standard representation formalisms for to globalization and the Internet’s impact, a much greater importance than they had data structure and semantics are available. many organizations are increasingly geo- before. What has up to now been a peripheral Software agents will then understand prod- graphically dispersed and organized around aspect of a business field can suddenly receive uct information. Meta online stores will grow virtual teams. With the large number of online its own important revenue flow. with little effort, which will enable complete documents, several document management Examples of business field extensions are market transparency in the various dimen- systems have entered the market. However, online stores; examples of new business sions of the diverse product properties. these systems have weaknesses: fields are shopping agents and online mar- Ontology mappings, which translate different ketplaces and auction h