Semantic Web Search Using Natural Language

Total Page:16

File Type:pdf, Size:1020Kb

Semantic Web Search Using Natural Language University of West Bohemia Faculty of Applied Sciences Semantic Web Search Using Natural Language Ivan Habernal Doctoral Thesis Submitted in partial fulllment of the requirements for a degree of Doctor of Philosophy in Computer Science and Engineering Supervisor: Professor Václav Matou²ek Department of Computer Science and Engineering Plze¬, 2012 Západo£eská univerzita v Plzni Fakulta aplikovaných v¥d Vyhledávání v Sémantickém webu pouºitím p°irozeného jazyka Ing. Ivan Habernal diserta£ní práce k získání akademického titulu doktor v oboru Informatika a výpo£etní technika kolitel: Prof. Ing. Václav Matou²ek, CSc. Katedra informatiky a výpo£etní techniky Plze¬ 2012 Prohl´aˇsen´ı Pˇredkl´ad´amt´ımto k posouzen´ıa obhajobˇedisertaˇcn´ıpr´aci zpracovanou na z´avˇer doktorsk´eho studia na Fakultˇeaplikovan´ych vˇedZ´apadoˇcesk´euniver- zity v Plzni. Prohlaˇsuji t´ımto, ze tuto pr´acijsem vypracoval samostatnˇe, s pouˇzit´ım od- born´eliteratury a dostupn´ych pramen˚uuveden´ych v seznamu, jenˇzje sou- ˇc´ast´ıt´eto pr´ace. V Plzni dne 31. bˇrezna 2012 Ing. Ivan Habernal i Abstract This thesis presents a complete end-to-end system for the Semantic Web search using a Natural Language. The system is placed into the context of recent research in Information Retreival, Semantic Web, Natural Language Understanding, and Natural Language Interfaces. The key feature of Natu- ral Language Interfaces is that users can search for the required information by posing their questions using natural language instead of e.g. filling web forms. The developed system uses the Semantic Web technologies in both tradi- tional and new forms. The idea of the Semantic Web has brought many interesting concepts into domain modeling and data sharing. Furthermore, the development in Natural Language Interfaces to Semantic Web has shown that bridging the gap between the Semantic Web and Natural Language In- terfaces can uncover new research challenges. The main contributions of this thesis are as follows. First, a unique for- malism for capturing a natural language question semantics, based upon Semantic Web standards, was proposed. Second, the statistical model for the semantic analysis based upon supervised training was developed. Third, the evaluation of the fully functional end-to-end system with a real data and real queries was conducted. The system was tested in the accommodation domain using real data ac- quired from the Web as well as the corpus of real queries in natural lan- guage. The thesis deals with both theoretical and practical issues that must be solved in a fully functional system. A complete work-flow is described, including preparation of data, natural language corpus, ontology design, annotation, semantic model and search. Finally, a very detailed evaluation with promising results is presented and discussed. Special attention is also paid to open issues, such as a perfor- mance, a usability, a portability, or sources of a real Web data. ii Abstrakt Disertaˇcn´ıpr´acepopisuje kompletn´ısyst´em pro vyhled´av´an´ıv s´emantick´em webu pouˇzit´ım pˇrirozen´eho jazyka. Syst´emje pˇredstaven v kontextu v´yzku- mu na poli Information Retrieval, s´emantick´eho webu, porozumˇen´ıpˇriro- zen´emu jazyku a rozhran´ı vyuˇz´ıvaj´ıc´ıch pˇrirozen´yjazyk. Hlavn´ı v´yhodou rozhran´ıvyuˇz´ıvaj´ıc´ıch pˇrirozen´yjazyk je moˇznost zadat ot´azku celou vˇetou nam´ısto vyplˇnov´an´ıwebov´ych formul´aˇr˚unebo pouˇzit´ıpouze kl´ıˇcov´ych slov. Vyvinut´ysyst´emvyuˇz´ıv´atechnologie s´emantick´eho webu tradiˇcn´ımi i no- v´ymi zp˚usoby. Myˇslenka s´emantick´eho webu vnesla mnoho zaj´ımav´ych kon- cept˚udo modelov´an´ıdom´ena sd´ılen´ıdat napˇr´ıˇcdom´enami. Nav´ıckombi- nace s´emantick´eho webu a rozhran´ıvyuˇz´ıvaj´ıch pˇrirozen´yjazyk sk´yt´anov´e moˇznosti pro vylepˇsen´ıuˇzivatelsk´eho komfortu pˇri vyhled´av´an´ı. Disertaˇcn´ıpr´acem´atyto hlavn´ıpˇr´ınosy. Zaprv´e:byl navrˇzen nov´yformalis- mum pro zachycen´ıs´emantiky ot´azky v pˇrirozen´emjazyce. Tento formalis- mum vyuˇz´ıv´atechnologi´ıs´emantick´eho webu. Zadruh´e:byl vyvinut statis- tick´ymodel pro s´emantickou anal´yzu zaloˇzen´yna strojov´emuˇcen´ı. Zatˇret´ı: syst´embyl otestov´anna re´aln´ych datech a re´aln´ych ot´azk´ach. Syst´embyl testov´anna dom´enˇepro vyhled´av´an´ıubytov´an´ı. Data byla z´ısk´a- na z re´aln´ych webov´ych port´al˚ustejnˇejako testovac´ıot´azky v pˇrirozen´em jazyce. Pr´ace se zab´yv´ateoretick´ymi i praktick´ymi probl´emy, kter´emus´ıb´yt ve funkˇcn´ım syst´emu vyˇreˇseny. Je pops´ancel´ypostup z´ısk´an´ıdat, korpus ot´azek, n´avrh ontologi´ı, anotace, s´emantick´aanal´yza a vyhled´av´an´ı. Na z´avˇer je provedeno velmi d˚ukladn´evyhodnocen´ıfunkˇcnosti syst´emu. Po- zornost je tak´ezamˇeˇrena na otevˇren´eprobl´emy, napˇr. v´ykon, pouˇzitelnost, pˇrenositelnost na jinou dom´enu a jazyk a zdroje webov´ych dat. iii Contents I Introduction 1 1 Motivation and Introduction 2 1.1 Thesis Outline . .3 1.2 Discussion of the Terminology Used . .4 II State of the Art 7 2 Natural Language Interfaces to Structured Data 8 2.1 Natural Language Interfaces to Databases . .9 2.2 Semantic Web and Ontologies . .9 2.3 Architecture of NLISW . 11 2.3.1 Knowledge Base and Ontologies . 11 2.3.2 Question Understanding . 12 2.3.3 Knowledge Base Querying and Answer Representation 14 2.4 Evaluation Metrics and Testing Datasets . 14 2.5 Overview of Existing Systems . 15 2.5.1 PowerAqua and AquaLog . 15 2.5.2 ORAKEL . 16 2.5.3 FREyA . 16 2.5.4 PANTO . 17 2.5.5 QACID . 17 2.5.6 QUETAL QA . 18 2.5.7 Other related work . 19 iv 3 Statistical Methods for Natural Language Understanding 20 3.1 Introduction to NLU . 21 3.1.1 Basic Approaches . 21 3.1.2 Semantic Representation . 22 3.2 Sequential models . 22 3.2.1 Hidden understanding model . 22 3.2.2 Flat-concept parsing . 24 3.2.3 Conditional Random Fields . 25 3.3 Stochastic Semantic Parsing . 26 3.3.1 Preprocessing . 26 3.3.2 Probabilistic semantic grammars . 26 3.3.3 Vector-state Markov model . 28 3.3.4 Hidden Vector-state Markov model . 30 3.3.5 Context-based Stochastic Parser . 32 3.4 Other approaches to semantic parsing . 33 3.5 Evaluation of NLU Systems . 34 3.5.1 Exact match . 34 3.5.2 PARSEVAL . 34 3.5.3 Tree edit distance . 35 3.6 Existing corpora . 35 3.6.1 ATIS . 35 3.6.2 DARPA . 36 3.6.3 Other corpora . 36 3.7 Existing systems . 37 3.7.1 HVS Parser . 37 3.7.2 Scissor . 37 3.7.3 Wasp . 37 3.7.4 Krisp . 38 3.7.5 Other systems . 38 III Natural Language-Based Semantic Web Search Sys- v tem 39 4 Target Domain 40 4.1 Domain Requirements . 40 4.2 Natural Language Question Corpus . 42 4.2.1 NL Question Corpus Statistics . 43 4.2.2 Question Examples . 44 4.3 Knowledge Base and Domain Ontology . 45 4.3.1 Pattern-based Web Information Extraction . 45 4.3.2 Domain Ontology . 46 4.3.3 Qualitative Analysis of the Knowledge Base . 49 4.3.4 Semantic Inconsistencies Causing Practical Issues . 52 4.4 Test Data Preparation . 53 4.4.1 Search Results . 54 4.4.2 Semantic Interpretation . 54 4.4.3 Assigning the Correct Results . 54 5 Semantic Annotation of NL Questions 57 5.1 Describing NL Question Semantics . 57 5.1.1 Domain Ontology versus NL Question Ontology . 58 5.1.2 Two-Layer Annotation Approach . 58 5.1.3 Ontology of NL Question . 58 5.2 NL Question Corpus Annotation . 62 6 Semantic Analysis of Natural Language Questions 64 6.1 Semantic Analysis Model Overview . 64 6.2 Formal Definition of Semantic Annotation . 65 6.3 Statistical model . 66 6.4 Named Entity Recognition . 67 6.4.1 Maximum Entropy NER . 68 6.4.2 LINGVOParser . 68 6.4.3 String Similarity Matching . 68 vi 6.4.4 OntologyNER . 68 7 Semantic Interpretation 69 7.1 Transformation of Semantic Annotation into Query Language 69 7.1.1 Semantic Interpretation of Named Entities . 70 7.2 Practical Issues of Semantic Interpretation . 71 7.3 Semantic Reasoning and Result Representation . 72 8 Evaluation 74 8.1 Semantic Analysis Evaluation . 74 8.1.1 Evaluation of NER . 74 8.1.2 Evaluation of the Semantic Analysis Model . 76 8.2 Matching Named Entities to KB Instances . 78 8.3 End-to-end Performance Evaluation . 79 8.3.1 Fulltext Search . 79 8.3.2 Simulation of End-to-end Performance With Correct Semantic Annotation . 80 8.3.3 The end-to-end results of SWSNL system . 80 8.4 Evaluation on Other Domains and Languages . 81 8.4.1 ConnectionsCZ Corpus . 81 8.4.2 ATIS Corpus Subset . 82 9 Conclusion 85 9.1 Open Issues . 85 9.1.1 Performance Issues . 85 9.1.2 Deployment and Development . 86 9.1.3 Problems Caused by Real Web Data . 86 9.1.4 Research-related Issues . 86 9.1.5 Use in the Business Sector . 87 9.2 Future Work . 87 9.3 Final Conclusion . 88 9.3.1 Major Contributions . 88 9.3.2 Review of Aims of Ph.D. thesis . 88 vii A Hybrid Semantic Analysis System { ATIS Data Evaluation 90 A.1 Introduction . 91 A.2 Related Work . 91 A.3 Semantic Representation . 92 A.4 Data................................ 93 A.4.1 LINGVOSemantics corpus . 93 A.4.2 ATIS corpus . 93 A.5 System Description . 94 A.5.1 Lexical Class Identification .
Recommended publications
  • A Data-Driven Framework for Assisting Geo-Ontology Engineering Using a Discrepancy Index
    University of California Santa Barbara A Data-Driven Framework for Assisting Geo-Ontology Engineering Using a Discrepancy Index A Thesis submitted in partial satisfaction of the requirements for the degree Master of Arts in Geography by Bo Yan Committee in charge: Professor Krzysztof Janowicz, Chair Professor Werner Kuhn Professor Emerita Helen Couclelis June 2016 The Thesis of Bo Yan is approved. Professor Werner Kuhn Professor Emerita Helen Couclelis Professor Krzysztof Janowicz, Committee Chair May 2016 A Data-Driven Framework for Assisting Geo-Ontology Engineering Using a Discrepancy Index Copyright c 2016 by Bo Yan iii Acknowledgements I would like to thank the members of my committee for their guidance and patience in the face of obstacles over the course of my research. I would like to thank my advisor, Krzysztof Janowicz, for his invaluable input on my work. Without his help and encour- agement, I would not have been able to find the light at the end of the tunnel during the last stage of the work. Because he provided insight that helped me think out of the box. There is no better advisor. I would like to thank Yingjie Hu who has offered me numer- ous feedback, suggestions and inspirations on my thesis topic. I would like to thank all my other intelligent colleagues in the STKO lab and the Geography Department { those who have moved on and started anew, those who are still in the quagmire, and those who have just begun { for their support and friendship. Last, but most importantly, I would like to thank my parents for their unconditional love.
    [Show full text]
  • Semi-Automated Ontology Based Question Answering System Open Access
    Journal of Advanced Research in Computing and Applications 17, Issue 1 (2019) 1-5 Journal of Advanced Research in Computing and Applications Journal homepage: www.akademiabaru.com/arca.html ISSN: 2462-1927 Open Semi-automated Ontology based Question Answering System Access 1, Khairul Nurmazianna Ismail 1 Faculty of Computer Science, Universiti Teknologi MARA Melaka, Malaysia ABSTRACT Question answering system enable users to retrieve exact answer for questions submit using natural language. The demand of this system increases since it able to deliver precise answer instead of list of links. This study proposes ontology-based question answering system. Research consist explanation of question answering system architecture. This question answering system used semi-automatic ontology development (Ontology Learning) approach to develop its ontology. Keywords: Question answering system; knowledge Copyright © 2019 PENERBIT AKADEMIA BARU - All rights reserved base; ontology; online learning 1. Introduction In the era of WWW, people need to search and get information fast online or offline which make people rely on Question Answering System (QAS). One of the common and traditionally use QAS is Frequently Ask Question site which contains common and straight forward answer. Currently, QAS have emerge as powerful platform using various techniques such as information retrieval, Knowledge Base, Natural Language Processing and Hybrid Based which enable user to retrieve exact answer for questions posed in natural language using either pre-structured database or a collection of natural language documents [1,4]. The demand of QAS increases day by day since it delivers short, precise and question-specific answer [10]. QAS with using knowledge base paradigm are better in Restricted- domain QA system since it ability to focus [12].
    [Show full text]
  • EVERYTHING YOU NEED to KNOW ABOUT SEMANTIC SEO by GENNARO CUOFANO, ANDREA VOLPINI, and MARIA SILVIA SANNA the Semantic Web Is Here
    WHITE PAPER EVERYTHING YOU NEED TO KNOW ABOUT SEMANTIC SEO BY GENNARO CUOFANO, ANDREA VOLPINI, AND MARIA SILVIA SANNA The Semantic Web is here. Those that are taking advantage of Semantic Technologies to build a Semantic SEO strategy are benefiting from staggering results. From a research paper put together with the team atWordLift , presented at SEMANTiCS 2017, we documented that structured data is compelling from the digital marketing standpoint. For instance, on the analysis of the design-focused website freeyork.org, after three months of using structured data in their WordPress website we saw the following improvements: • +12.13% new users • +18.47% increase in organic traffic • +2.4 times increase in page views • +13.75% of sessions duration In other words, many still think of Semantic Technologies belonging to the future, when in reality quite a few players in the digital marketing space are taking advantage of them already. Semantic SEO is a new and powerful way to make your content strategy more effective. In this article/guide I will explain from scratch what Semantic SEO is and why it’s important. Why Semantic SEO? In a nutshell, search engines need context to understand a query properly and to fetch relevant results for it. Contexts are built using words, expressions, and other combinations of words and links as they appear in bodies of knowledge such as encyclopedias and large corpora of text. Semantic SEO is a marketing technique that improves the traffic of a website byproviding meaningful data that can unambiguously answer a specific search intent. It is also a way to create clusters of content that are semantically grouped into topics rather than keywords.
    [Show full text]
  • On Using Product-Specific Schema.Org from Web Data Commons: an Empirical Set of Best Practices
    On using Product-Specific Schema.org from Web Data Commons: An Empirical Set of Best Practices Ravi Kiran Selvam Mayank Kejriwal [email protected] [email protected] USC Information Sciences Institute USC Information Sciences Institute Marina del Rey, California Marina del Rey, California ABSTRACT both to advertise and find products on the Web, in no small part Schema.org has experienced high growth in recent years. Struc- due to the ability of search engines to make good use of this data. tured descriptions of products embedded in HTML pages are now There is also limited evidence that structured data plays a key role not uncommon, especially on e-commerce websites. The Web Data in populating Web-scale knowledge graphs such as the Google Commons (WDC) project has extracted schema.org data at scale Knowledge Graph that are essential to modern semantic search from webpages in the Common Crawl and made it available as an [17]. RDF ‘knowledge graph’ at scale. The portion of this data that specif- However, even though most e-commerce platforms have their ically describes products offers a golden opportunity for researchers own proprietary datasets, some resources do exist for smaller com- and small companies to leverage it for analytics and downstream panies, researchers and organizations. A particularly important applications. Yet, because of the broad and expansive scope of this source of data is schema.org markup in webpages. Launched in data, it is not evident whether the data is usable in its raw form. In the early 2010s by major search engines such as Google and Bing, this paper, we do a detailed empirical study on the product-specific schema.org was designed to facilitate structured (and even knowl- schema.org data made available by WDC.
    [Show full text]
  • Implementation of Intelligent Semantic Web Search Engine
    International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 4 Issue 04, April-2015 Implementation of Intelligent Semantic Web Search Engine Dinesh Jagtap*1, Nilesh Argade1, Shivaji Date1, Sainath Hole1, Mahendra Salunke2 1Department of Computer Engineering, 2Associate Professor, Department of Computer Engineering Sinhgad Institute of Technology and Science, Pune, Maharashtra, India 411041. II. BACKGROUND Abstract—Search engines are design for to search particular The idea of search engine and info retrieval from search information for a large database that is from World Wide engine is not a new concept. The interesting thing about Web. There are lots of search engines available. Google, traditional search engine is that different search engine will yahoo, Bing are the search engines which are most widely provide different result for the same query. While used search engines in today. The main objective of any search engines is to provide particular or required information was available in web, we have some fields of information with minimum time. The semantics web search problem in search engines. Information retrieval by engines are the next version of traditional search engines. The searching information on the web is not a fresh idea but has main problem of traditional search engines is that different challenges when it is compared to general information retrieval from the database is difficult or takes information retrieval. Different search engines return long time. Hence efficiency of search engines is reduced. To different search results due to the variation in indexing and overcome this intelligent semantic search engines are search process. Google, Yahoo, and Bing have been out introduced.
    [Show full text]
  • Expert Knowledge Management Based on Ontology in a Digital Library
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by idUS. Depósito de Investigación Universidad de Sevilla EXPERT KNOWLEDGE MANAGEMENT BASED ON ONTOLOGY IN A DIGITAL LIBRARY Antonio Martín, Carlos León Departamento de Tecnología Electrónica, Seville University, Avda. Reina Mercedes S/N, Seville, Spain [email protected], [email protected] Keywords: Ontology, web services, Case-based reasoning, Digital Library, knowledge management, Semantic Web. Abstract: The architecture of the future Digital Libraries should be able to allow any users to access available knowledge resources from anywhere and at any time and efficient manner. Moreover to the individual user, there is a great deal of useless information in addition to the substantial amount of useful information. The goal is to investigate how to best combine Artificial Intelligent and Semantic Web technologies for semantic searching across largely distributed and heterogeneous digital libraries. The Artificial Intelligent and Semantic Web have provided both new possibilities and challenges to automatic information processing in search engine process. The major research tasks involved are to apply appropriate infrastructure for specific digital library system construction, to enrich metadata records with ontologies and enable semantic searching upon such intelligent system infrastructure. We study improving the efficiency of search methods to search a distributed data space like a Digital Library. This paper outlines the development of a Case- Based Reasoning prototype system based in an ontology for retrieval information of the Digital Library University of Seville. The results demonstrate that the used of expert system and the ontology into the retrieval process, the effectiveness of the information retrieval is enhanced.
    [Show full text]
  • Semantically Enhanced and Minimally Supervised Models for Ontology Construction, Text Classification, and Document Recommendation Alkhatib, Wael (2020)
    Semantically Enhanced and Minimally Supervised Models for Ontology Construction, Text Classification, and Document Recommendation Alkhatib, Wael (2020) DOI (TUprints): https://doi.org/10.25534/tuprints-00011890 Lizenz: CC-BY-ND 4.0 International - Creative Commons, Namensnennung, keine Bearbei- tung Publikationstyp: Dissertation Fachbereich: 18 Fachbereich Elektrotechnik und Informationstechnik Quelle des Originals: https://tuprints.ulb.tu-darmstadt.de/11890 SEMANTICALLY ENHANCED AND MINIMALLY SUPERVISED MODELS for Ontology Construction, Text Classification, and Document Recommendation Dem Fachbereich Elektrotechnik und Informationstechnik der Technischen Universität Darmstadt zur Erlangung des akademischen Grades eines Doktor-Ingenieurs (Dr.-Ing.) genehmigte Dissertation von wael alkhatib, m.sc. Geboren am 15. October 1988 in Hama, Syrien Vorsitz: Prof. Dr.-Ing. Jutta Hanson Referent: Prof. Dr.-Ing. Ralf Steinmetz Korreferent: Prof. Dr.-Ing. Steffen Staab Tag der Einreichung: 28. January 2020 Tag der Disputation: 10. June 2020 Hochschulkennziffer D17 Darmstadt 2020 This document is provided by tuprints, e-publishing service of the Technical Univer- sity Darmstadt. http://tuprints.ulb.tu-darmstadt.de [email protected] Please cite this document as: URN:nbn:de:tuda-tuprints-118909 URL:https://tuprints.ulb.tu-darmstadt.de/id/eprint/11890 This publication is licensed under the following Creative Commons License: Attribution-No Derivatives 4.0 International https://creativecommons.org/licenses/by-nd/4.0/deed.en Wael Alkhatib, M.Sc.: Semantically Enhanced and Minimally Supervised Models, for Ontology Construction, Text Classification, and Document Recommendation © 28. January 2020 supervisors: Prof. Dr.-Ing. Ralf Steinmetz Prof. Dr.-Ing. Steffen Staab location: Darmstadt time frame: 28. January 2020 ABSTRACT The proliferation of deliverable knowledge on the web, along with the rapidly in- creasing number of accessible research publications, make researchers, students, and educators overwhelmed.
    [Show full text]
  • Download Special Issue
    Scientific Programming Scientific Programming Techniques and Algorithms for Data-Intensive Engineering Environments Lead Guest Editor: Giner Alor-Hernandez Guest Editors: Jezreel Mejia-Miranda and José María Álvarez-Rodríguez Scientific Programming Techniques and Algorithms for Data-Intensive Engineering Environments Scientific Programming Scientific Programming Techniques and Algorithms for Data-Intensive Engineering Environments Lead Guest Editor: Giner Alor-Hernández Guest Editors: Jezreel Mejia-Miranda and José María Álvarez-Rodríguez Copyright © 2018 Hindawi. All rights reserved. This is a special issue published in “Scientific Programming.” All articles are open access articles distributed under the Creative Com- mons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Editorial Board M. E. Acacio Sanchez, Spain Christoph Kessler, Sweden Danilo Pianini, Italy Marco Aldinucci, Italy Harald Köstler, Germany Fabrizio Riguzzi, Italy Davide Ancona, Italy José E. Labra, Spain Michele Risi, Italy Ferruccio Damiani, Italy Thomas Leich, Germany Damian Rouson, USA Sergio Di Martino, Italy Piotr Luszczek, USA Giuseppe Scanniello, Italy Basilio B. Fraguela, Spain Tomàs Margalef, Spain Ahmet Soylu, Norway Carmine Gravino, Italy Cristian Mateos, Argentina Emiliano Tramontana, Italy Gianluigi Greco, Italy Roberto Natella, Italy Autilia Vitiello, Italy Bormin Huang, USA Francisco Ortin, Spain Jan Weglarz, Poland Chin-Yu Huang, Taiwan Can Özturan, Turkey
    [Show full text]
  • Ontology Learning and Its Application to Automated Terminology Translation
    Natural Language Processing Ontology Learning and Its Application to Automated Terminology Translation Roberto Navigli and Paola Velardi, Università di Roma La Sapienza Aldo Gangemi, Institute of Cognitive Sciences and Technology lthough the IT community widely acknowledges the usefulness of domain ontolo- A gies, especially in relation to the Semantic Web,1,2 we must overcome several bar- The OntoLearn system riers before they become practical and useful tools. Thus far, only a few specific research for automated ontology environments have ontologies. (The “What Is an Ontology?” sidebar on page 24 provides learning extracts a definition and some background.) Many in the and is part of a more general ontology engineering computational-linguistics research community use architecture.4,5 Here, we describe the system and an relevant domain terms WordNet,3 but large-scale IT applications based on experiment in which we used a machine-learned it require heavy customization. tourism ontology to automatically translate multi- from a corpus of text, Thus, a critical issue is ontology construction— word terms from English to Italian. The method can identifying, defining, and entering concept defini- apply to other domains without manual adaptation. relates them to tions. In large, complex application domains, this task can be lengthy, costly, and controversial, because peo- OntoLearn architecture appropriate concepts in ple can have different points of view about the same Figure 1 shows the elements of the architecture. concept. Two main approaches aid large-scale ontol- Using the Ariosto language processor,6 OntoLearn a general-purpose ogy construction. The first one facilitates manual extracts terminology from a corpus of domain text, ontology engineering by providing natural language such as specialized Web sites and warehouses or doc- ontology, and detects processing tools, including editors, consistency uments exchanged among members of a virtual com- checkers, mediators to support shared decisions, and munity.
    [Show full text]
  • Semi-Automatic Terminology Ontology Learning Based on Topic Modeling
    The paper has been accepted at Engineering Applications of Artificial Intelligence on 9 May 2017. This paper can cite as: Monika Rani, Amit Kumar Dhar, O.P. Vyas, Semi-automatic terminology ontology learning based on topic modeling, Engineering Applications of Artificial Intelligence, Volume 63, August 2017, Pages 108-125,ISSN 0952- 1976,https://doi.org/10.1016/j.engappai.2017.05.006. (http://www.sciencedirect.com/science/article/pii/S0952197617300891. Semi-Automatic Terminology Ontology Learning Based on Topic Modeling Monika Rani*, Amit Kumar Dhar and O. P. Vyas Department of Information Technology, Indian Institute of Information Technology, Allahabad, India [email protected] Abstract- Ontologies provide features like a common vocabulary, reusability, machine-readable content, and also allows for semantic search, facilitate agent interaction and ordering & structuring of knowledge for the Semantic Web (Web 3.0) application. However, the challenge in ontology engineering is automatic learning, i.e., the there is still a lack of fully automatic approach from a text corpus or dataset of various topics to form ontology using machine learning techniques. In this paper, two topic modeling algorithms are explored, namely LSI & SVD and Mr.LDA for learning topic ontology. The objective is to determine the statistical relationship between document and terms to build a topic ontology and ontology graph with minimum human intervention. Experimental analysis on building a topic ontology and semantic retrieving corresponding topic ontology for the user‟s query demonstrating the effectiveness of the proposed approach. Keywords: Ontology Learning (OL), Latent Semantic Indexing (LSI), Singular Value Decomposition (SVD), Probabilistic Latent Semantic Indexing (pLSI), MapReduce Latent Dirichlet Allocation (Mr.LDA), Correlation Topic Modeling (CTM).
    [Show full text]
  • Folksannotation: a Semantic Metadata Tool for Annotating Learning Resources Using Folksonomies and Domain Ontologies
    FolksAnnotation: A Semantic Metadata Tool for Annotating Learning Resources Using Folksonomies and Domain Ontologies Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, The University of Southampton, Southampton, UK {hsak04r/hcd}@ecs.soton.ac.uk Abstract Furthermore, web resources stored in social bookmarking services have potential for educational There are many resources on the Web which are use. In order to realize this potential, we need to add suitable for educational purposes. Unfortunately the an extra layer of semantic metadata to the web task of identifying suitable resources for a particular resources stored in social bookmarking services; this educational purpose is difficult as they have not can be done via adding pedagogical values to the web typically been annotated with educational metadata. resources so that they can be used in an educational However, many resources have now been annotated in context. an unstructured manner within contemporary social In this paper, we will focus on how to benefit from 2 bookmaking services. This paper describes a novel tool social bookmarking services (in particular del.icio.us ), called ‘FolksAnnotation’ that creates annotations with as vehicles to share and add semantic metadata to educational semantics from the del.icio.us bookmarked resources. Thus, the research question we bookmarking service, guided by appropriate domain are tackling is: how folksonomies can support the ontologies. semantic annotation of web resources from an educational perspective ? 1. Introduction The organization of the paper is as follows: Section 2 introduces folksonomies and social bookmarking Metadata standards such as Dublin Core (DC) and services. In section 3, we review some recent research IEEE-LOM 1 have been widely used in e-learning on folksonomies and social bookmarking services.
    [Show full text]
  • A Comparative Analysis for English and Ukrainian Texts Processing Based on Semantics and Syntax Approach
    A Comparative Analysis for English and Ukrainian Texts Processing Based on Semantics and Syntax Approach Victoria Vysotska, Svitlana Holoshchuk and Roman Holoshchuk Lviv Polytechnic National University, S. Bandera Street, 12, Lviv, 79013, Ukraine Abstract The article deals with the application of generating grammars in linguistic modelling. Analysis of sentence syntax modelling is used to automate studying and synthesising natural language texts. The complex research of automatic processing of English and Ukrainian texts considering the semantics and syntax of natural languages is conducted. A comparative linguistic approach to the syntax and semantics of English and Ukrainian languages synthesises the corpora of natural language texts. The automatic operation of relevant text data is carried out. Comparative analysis of these languages facilitates the development of the linguistic algorithm for converting natural language texts of Germanic and Slavic type. Keywords 1 Natural language processing, word order, Ukrainian language, NLP, English language, noun group, verb group, direct word order, grammatical category, base form, structural scheme, information resource processing, natural language text, terminal chain identification, terminal chain production, keywords identification, blocked words, rubrics 1. Introduction Each language is unique in its structure and characteristics of the linguistic unit’s construction for delivering meaningful texts. For automatic processing of coherent data texts, developing the algorithms for analysis and
    [Show full text]