Knowledge Representation Formalism for Building Semantic Web Ontologies
Total Page:16
File Type:pdf, Size:1020Kb
Knowledge Representation Formalism For Building Semantic Web Ontologies by Basak Taylan A survey submitted to the Graduate Faculty in Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy, The City University of New York. 2017 Contents 1 INTRODUCTION 6 2 SEMANTIC WEB 11 3 KNOWLEDGE REPRESENTATION 24 3.1 KNOWLEDGE . 24 3.2 KNOWLEDGE REPRESENTATION FORMALISM . 27 3.2.1 Roles Of Knowledge Representation . 27 3.3 KNOWLEDGE REPRESENTATION METHODS . 33 3.3.1 NETWORKED REPRESENTATION . 33 3.3.1.1 SEMANTIC NETWORKS . 33 3.3.1.2 CONCEPTUAL GRAPHS . 39 3.3.2 STRUCTURAL REPRESENTATION . 42 3.3.2.1 FRAMES . 42 3.3.2.2 KL-ONE KNOWLEDGE REPRESENTATION SYSTEMS . 49 3.3.3 LOGIC-BASED REPRESENTATION . 53 3.3.3.1 PROPOSITIONAL LOGIC(PL) . 54 3.3.3.2 FIRST-ORDER LOGIC(FOL) . 58 2 CONTENTS 3 3.3.3.3 DESCRIPTION LOGICS . 64 3.3.3.4 FRAME LOGIC(F-LOGIC) . 76 4 APPLICATIONS 86 4.1 The Open Mind Common Sense Project(OMCS) . 86 4.2 ConceptNet 3: a Flexible, Multilingual Semantic Network for Common Sense Knowledge . 89 4.3 WordNet . 94 4.4 FrameNet . 99 4.5 VerbNet . 102 4.6 The Brandeis Semantic Ontology (BSO) . 107 5 CONCLUSION 109 Abstract The initial idea of Tim Berners Lee when designing The Wold Wide Web was creating a commonly accessible area in the network for information sharing by means of hyperlinks without concerning about platform dependency in late 1980s[28]. Since then, the Web has been growing dramatically and it has became a major information retrieval means. By 2014, number of websites reached over a billion. Looking for a particular information within these globally connected pages can be explained by analogy with looking for a black cat in a coal cellar. Thus, manual search within such huge network becomes more and more difficult as the number of web pages increase. This necessity lead adding another layer,“the meaning”, on top of the current Web. This additional layer also known as Semantic Web, adds machine readability to the current web pages, that are designed for human consumption. By having machine processable data, data will be suitable for both machine and human consumption, information will be accessed faster, and search results will be more accurate. In addition, by being able to do inferences on the Web data, the pieces that consists of the answer that are partially located on different web pages can be combined and instead of referring a list of web pages, that consists of part of the answer, the answer itself collected from different web resources as a whole can be returned by using Semantic Web. 4 CONTENTS 5 Knowledge representation and ontology construction plays a crucial role in order to establish such layer on top of the current Web. In section-1, we are going to present a literature review about Web history and evolution of the Web since its invention. In section-2, we will introduce the Semantic Web structure. In section-3 we will present major knowledge representation formalism and methods that influenced construction of Ontologies for Semantic Web. In section-4, we will introduce some of the applications that are used to build Ontologies. In section-5, we will have the conclusion of Ontology building for Semantic Web. Chapter 1 INTRODUCTION The World Wide Web,“the embodiment of human knowledge[1].”, was first proposed in 1989 by Tim Berners-Lee at the European Organization for Nuclear Research (CERN) in Geneva. The idea of creating the web was enabling an area in the computer that is accessible by other people. After his proposal, Berners-Lee coded the first browser, and WEB server working on NEXT computers[78, 18, 16, 20]. After invention of the platform independent “line mode” browser[17] developed by Nicola Pellow in 1991, the web evolved rapidly. The web evolution consists of three phases: web 1.0, web 2.0, web 3.0 [30]. Web 1.0, the Web of documents, includes development in The World Wide Web within the time range between 1989 and 2005. The first generation of the Web consists of static pages, where information was only accessed only in “read-only” mode. Users had limited interaction with the pages. Thus, communication was unidirectional.Web 1.0 includes core web protocols, HTML, HTTP, and URI [32, 30, 66, 95]. 6 CHAPTER 1. INTRODUCTION 7 Web 2.0, the second generation of the Web (a.k.a “the wisdom web, people-centric web, participative web, or read-write web”), is a result of a brainstorming session between O’Reilly and MediaLive International in a conference. It allows users to be content creators by participation, collabora- tion, and information sharing on the web. Since user can both read from and write into the web pages, communication is bi-directional. Web 2.0 is differ- ent than web 1.0 in various aspects such as technological(Ajax, JavaScript, Cascading Style Sheets (CSS), Document Object Model (DOM), Extensible HTML (XHTML), XSL Transformations (XSLT)/XML, and Adobe Flash), structural(page layout) and sociological aspects(notion of friends and groups) [32, 79, 30, 66, 95].Youtube, Flickr, personal blogs, search engines, Wikipedia, Voice over IP, chat applications, instant messages, etc can be shown as Web 2.0 platform applications. The World Wide Web has become irreplaceable means of accessing the information. According to [64], The Indexed Web contains at least 4.6 billion pages as of today (18 August 2017), and it is continuously getting larger and larger while even this paper is being typed. In this rapidly growing environment, accessing correct information within acceptable time limit is one of challenges that every Internet user experiences. Technological developments changed our way for seeking for information and most of us became dependent on search engines. Difficulty of finding a little piece of information in such a big environment can be explained by analogy with looking for a black cat in a coal cellar in less than a second. Despite the many advanced searching algorithms that are used in search engines, search engines are still suffering from returning completely irrelevant results along with the correct answers. One of the reasons that causes this undesired situation is that CHAPTER 1. INTRODUCTION 8 textual or graphical resources on the Internet are designed for mostly human consumption [9]. In addition to that, query results are independent web pages. If we are looking for information spread over multiple pages, current web technology falls short of satisfying our needs [4]. Web 3.0, the third generation of the Web (a.k.a “Semantic Web, executable Web, or read-write-execute Web”), is an extension of Web 2.0, that aims to add semantics to the Web by enabling machine processable documents [30, 66, 95, 19, 56]. Semantic Web can be considered as a globally linked database where the information is suitable for both human and machine consumption. Google Squared, Hakia, Walfram Alfa, IBM Watson, Browser Plugin Zemanta, “like button” of Facebook, E-Commerce travelling service site TripIt are only some of the Semantic Web platform applications. Figure-1.1 and Figure-1.2 summarizes the differences of Web 1.0, Web 2.0, and Web 3.0. Web 4.0 (a.k.a read-write-concurrency, or the symbiotic web) is the future generation of the Web. It is still in idea level. It aims creating human- machine interaction. It will be possible to build more powerful interfaces such as mind-controlled interfaces with web 4.0[30, 95, 47]. CHAPTER 1. INTRODUCTION 9 Figure 1.1: Comparisons of Web 1.0, Web 2.0 and Web 3.0 [30] CHAPTER 1. INTRODUCTION 10 Figure 1.2: Comparisons of Web 1.0, Web 2.0 and Web 3.0 [95] Chapter 2 SEMANTIC WEB The World Wide Web was initially designed to create a universal environment for document sharing. Over the years main purpose of the internet has shifted from document sharing to information retrieval. Search engines have become irreplaceable part of our daily life. As a result, information presented on web pages were mainly designed in a way that it makes it easier for human consumption. However, accessing the correct information in such a rapidly growing environment within a reasonable amount of time is getting harder and harder. We can use analogy of trying to find a needle in a haystack to explain looking for information in Web environment. Semantic Web was first introduced by Tim Berners Lee in 2001. As Berners Lee stated in [19], Semantic Web is not a separate Web; but it is the extension of the current web. Semantic Web (a.k.a Web3.0) is designed for both human and machine consumption. In other words, Semantic Web aims applying machine processable layer on top of the human processable version. Although HTML tags are used to create web pages in current Web, those 11 CHAPTER 2. SEMANTIC WEB 12 tags do contain any information about the structure but only focus on the presentation[80]. This makes current keyword-based search engines sensitive to vocabulary. Documents that use different terminology than the key words are often not seen in the search results. In addition, search results not only returns relevant answers but also returns mildly relevant or completely irrelevant documents as well. Ratio of correct information becomes too small compared to total results. Also, current search engines are not capable of giving an answer to a question but it returns location of a single document that contains the keywords[4]. By having Semantic Web layer, search engines will not only return the location of the documents but they will also be able to manage question answering.