Knowledge Extraction for Hybrid Question Answering

Total Page:16

File Type:pdf, Size:1020Kb

Knowledge Extraction for Hybrid Question Answering KNOWLEDGEEXTRACTIONFORHYBRID QUESTIONANSWERING Von der Fakultät für Mathematik und Informatik der Universität Leipzig angenommene DISSERTATION zur Erlangung des akademischen Grades Doctor rerum naturalium (Dr. rer. nat.) im Fachgebiet Informatik vorgelegt von Ricardo Usbeck, M.Sc. geboren am 01.04.1988 in Halle (Saale), Deutschland Die Annahme der Dissertation wurde empfohlen von: 1. Professor Dr. Klaus-Peter Fähnrich (Leipzig) 2. Professor Dr. Philipp Cimiano (Bielefeld) Die Verleihung des akademischen Grades erfolgt mit Bestehen der Verteidigung am 17. Mai 2017 mit dem Gesamtprädikat magna cum laude. Leipzig, den 17. Mai 2017 bibliographic data title: Knowledge Extraction for Hybrid Question Answering author: Ricardo Usbeck statistical information: 10 chapters, 169 pages, 28 figures, 32 tables, 8 listings, 5 algorithms, 178 literature references, 1 appendix part supervisors: Prof. Dr.-Ing. habil. Klaus-Peter Fähnrich Dr. Axel-Cyrille Ngonga Ngomo institution: Leipzig University, Faculty for Mathematics and Computer Science time frame: January 2013 - March 2016 ABSTRACT Over the last decades, several billion Web pages have been made available on the Web. The growing amount of Web data provides the world’s largest collection of knowledge.1 Most of this full-text data like blogs, news or encyclopaedic informa- tion is textual in nature. However, the increasing amount of structured respectively semantic data2 available on the Web fosters new search paradigms. These novel paradigms ease the development of natural language interfaces which enable end- users to easily access and benefit from large amounts of data without the need to understand the underlying structures or algorithms. Building a natural language Question Answering (QA) system over heteroge- neous, Web-based knowledge sources requires various building blocks. These build- ing components include (a) knowledge extraction approaches over textual data such as blogs, news, templated websites or product descriptions to capture the semantics of the content of a document, (b) linguistic parsing modules for natural language and (c) efficient query execution and ranking functions. However, ex- isting knowledge extraction building blocks lack comparable evaluation settings, performance or quality. In this thesis, three main research challenges are identified and addressed: 1. The ongoing transition from the current Document Web of textual data to the Web of Data requires scalable and accurate approaches for the extraction of structured data in Resource Description Framework (RDF) from Web-based documents. We address this key step for bridging the Semantic Gap, i.e., ex- tracting RDF from text, with three approaches. First, this thesis presents CE- TUS, an approach for recognizing entity types in textual documents to popu- late RDF knowledge bases. Second, we describe the knowledge base-agnostic, multi-lingual, high-quality and efficient named entity disambiguation frame- work AGDISTIS. Third, we tackle the scalable extraction of consistent RDF data from highly templated websites via our framework REX. 2. The need to bridge between the textual Document Web and the structured data on the Web of Data has led to the development of a considerable number of annotation tools and frameworks. However, these approaches are currently hard to compare since the published evaluation results are calculated on diverse datasets and evaluated based on different measures The resulting Evaluation Gap is tackled by a novel set of benchmarking corpora as well as with GERBIL, our evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation approaches on multiple datasets. 3. The decentral architecture behind both Webs generated a distributed infor- mation landscape across data sources with varying structure. Furthermore, current keyword-based search paradigms contradict the human demand for expressing information needs in natural language. Especially, complex infor- mation relationships are often expressed as complex questions by humans. 1 http://www.worldwidewebsize.com/ 2 http://stats.lod2.eu/ The author will make use of the we-form as narrative. v These observations describe an Information Gap, i.e., the user generated de- mand for natural language interfaces based on the structured Web of Data and the Document Web. Thus, we introduce HAWK, a novel natural language search framework for hybrid QA. Our framework is able to combine struc- tured and textual data sources to answer complex natural language ques- tions. vi PUBLICATIONS This thesis is based on the following publications and proceedings. References to the appropriate publications are included at the respective chapters and sections. awards and notable mentions • Best Research Paper Award for Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Lorenz Bühmann, Michael Röder, Daniel Gerber, Sandro Athaide Coelho, Sören Auer, and Andreas Both. AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data. In International Semantic Web Confer- ence (ISWC), pages 457–471, 2014 • 2nd prize at Question Answering over Linked Data (QALD) 5 for Ricardo Usbeck and Axel-Cyrille Ngonga Ngomo. HAWK@QALD5 – Trying to An- swer Hybrid Questions with Various Simple Ranking Techniques. In CLEF 2015 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings (CEUR- WS.org/Vol-1391), 2015 • Best Demo Award for Ricardo Usbeck, Michael Röder, and Axel-Cyrille Ngonga Ngomo. Evaluating entity annotators using GERBIL. In The Seman- tic Web: ESWC 2015 Satellite Events, Revised Selected Papers, Demos & Posters, pages 159–164, 2015 • 1st prize at Task 2 of Open Knowledge Extraction (OKE) Challenge for Michael Röder, Ricardo Usbeck, René Speck, and Axel-Cyrille Ngonga Ngomo. CETUS – A Baseline Approach to Type Extraction. In 1st Open Knowledge Ex- traction Challenge at International Semantic Web Conference, 2015 • Best of Workshop (WASABI) for Didier Cherix, Ricardo Usbeck, Andreas Both, and Jens Lehmann. CROCUS: Cluster-based Ontology Data Cleansing. In Joint Second International Workshop on Semantic Web Enterprise Adoption and Best Practice and Second International Workshop on Finance and Economics on the Semantic Web at ESWC, 2014 • Best of Workshop (LD4IE) for Axel-Cyrille Ngonga Ngomo, Michael Röder, and Ricardo Usbeck. Cross-document coreference resolution using latent features. In 2nd International Workshop on Linked Data for Information Extraction at the International Semantic Web Conference, pages 33–44, 2014 conferences, peer-reviewed 1. Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Lorenz Bühmann, Michael Röder, Daniel Gerber, Sandro Athaide Coelho, Sören Auer, and Andreas Both. AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data. In International Semantic Web Conference (ISWC), pages 457–471, 2014 vii 2. Ricardo Usbeck, Michael Röder, Axel-Cyrille Ngonga Ngomo, Ciro Baron, Andreas Both, Martin Brümmer, Diego Ceccarelli, Marco Cornolti, Didier Cherix, Bernd Eickmann, Paolo Ferragina, Christiane Lemke, Andrea Moro, Roberto Navigli, Francesco Piccinno, Giuseppe Rizzo, Harald Sack, René Speck, Raphaël Troncy, Jörg Waitelonis, and Lars Wesemann. GERBIL – Gen- eral Entity Annotation Benchmark Framework. In 24th International Confer- ence on World Wide Web, 2015 3. Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Lorenz Bühmann, and Chris- tina Unger. HAWK – Hybrid Question Answering Using Linked Data. In Extended Semantic Web Conference, pages 353–368, 2015 4. Michael Röder, Ricardo Usbeck, Daniel Gerber, Sebastian Hellmann, and An- dreas Both. N3 - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format. In Conference on language resources and evaluation (LREC), 2014 5. Lorenz Bühmann, Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Muham- mad Saleem, Andreas Both, Valter Crescenzi, Paolo Merialdo, and Disheng Qiu. Web-Scale Extension of RDF Knowledge Bases from Templated Web- sites. In International Semantic Web Conference (ISWC), pages 66–81, 2014 book chapters, peer-reviewed 6. Anja Jentzsch, Ricardo Usbeck, and Denny Vrandecic. An Incomplete and Simplifying Introduction to Linked Data. In Jens Lehmann and Johanna Voelker, editors, Perspectives on Ontology Learning, pages 21–33. AKA / IOS Press, 2014 workshops, peer-reviewed 7. Ricardo Usbeck, Erik Körner, and Axel-Cyrille Ngonga Ngomo. Answering Boolean Hybrid Questions with HAWK. In NLIWOD workshop at International Semantic Web Conference (ISWC), 2015 8. Michael Röder, Ricardo Usbeck, René Speck, and Axel-Cyrille Ngonga Ngomo. CETUS – A Baseline Approach to Type Extraction. In 1st Open Knowledge Ex- traction Challenge at International Semantic Web Conference, 2015 9. Ricardo Usbeck. Combining Linked Data and Statistical Information Re- trieval - Next Generation Information Systems. In The Semantic Web: Trends and Challenges - 11th International Conference ESWC 2014, pages 845–854, 2014 10. Michael Röder, Ricardo Usbeck, and Axel-Cyrille Ngonga Ngomo. Develop- ing a Sustainable Platform for Entity Annotation Benchmarks. In Developers Workshop 2015 at ESWC, 2015 11. Ricardo Usbeck and Axel-Cyrille Ngonga Ngomo. HAWK@QALD5 – Trying to Answer Hybrid Questions with Various Simple Ranking Techniques. In CLEF 2015 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings (CEUR-WS.org/Vol-1391), 2015 viii demos and posters, peer-reviewed 12. Ricardo Usbeck, Axel-Cyrille
Recommended publications
  • Realising the Potential of Algal Biomass Production Through Semantic Web and Linked Data
    LEAPS: Realising the Potential of Algal Biomass Production through Semantic Web and Linked data ∗ Monika Solanki Johannes Skarka Craig Chapman KBE Lab Karlsruhe Institute of KBE Lab Birmingham City University Technology (ITAS) Birmingham City University [email protected] [email protected] [email protected] ABSTRACT In order to derive fuels from biomass, algal operation plant Recently algal biomass has been identified as a potential sites are setup that facilitate biomass cultivation and con- source of large scale production of biofuels. Governments, version of the biomass into end use products, some of which environmental research councils and special interests groups are biofuels. Microalgal biomass production in Europe is are funding several efforts that investigate renewable energy seen as a promising option for biofuels production regarding production opportunities in this sector. However so far there energy security and sustainability. Since microalgae can be has been no systematic study that analyses algal biomass cultivated in photobioreactors on non-arable land this tech- potential especially in North-Western Europe. In this paper nology could significantly reduce the food vs. fuel dilemma. we present a spatial data integration and analysis frame- However, until now there has been no systematic analysis work whereby rich datasets from the algal biomass domain of the algae biomass potential for North-Western Europe. that include data about algal operation sites and CO2 source In [20], the authors assessed the resource potential for mi- sites amongst others are semantically enriched with ontolo- croalgal biomass but excluded all areas not between 37◦N gies specifically designed for the domain and made available and 37◦S, thus most of Europe.
    [Show full text]
  • Knowledge Extraction by Internet Monitoring to Enhance Crisis Management
    El Hamali Knowledge Extraction by Internet Monitoring to Enhance Crisis Management Knowledge Extraction by Internet Monitoring to Enhance Crisis Management El Hamali Samiha Nadia Nouali-TAboudjnent, Omar Nouali National School for Computer Science ESI, Research Center of Scientific and Technique Algiers, Algeria Information CERIST, Algeria [email protected] {nnouali, onouali}@cerist.dz ABSTRACT This paper presents our work on developing a system for Internet monitoring and knowledge extraction from different web documents which contain information about disasters. This system is based on ontology of the disasters domain for the knowledge extraction and it presents all the information extracted according to the kind of the disaster defined in the ontology. The system disseminates the information extracted (as a synthesis of the web documents) to the users after a filtering based on their profiles. The profile of a user is updated automatically by interactively taking into account his feedback. Keywords Crisis management, internet monitoring, knowledge extraction, ontology, information filtering. INTRODUCTION Several crises and disasters happened in the last decades. After the southern Asian tsunami, the Katrina hurricane in the United States, and the Kashmir earthquake, people throughout the world used the Web as a source of information and a means of communication. In the hours and days immediately following a disaster, a lot of information becomes available on the web. With a single Google search, 17 millions “emergency management” sites and documents have been listed (Min & Peishih, 2008). The user finds a lot of information which comes from different sources, for instance : during the first 3 days of tsunami disaster in south-east Asia, 4000 reports were gathered by the Google News service, and within a month of the incident, 200000 distinct news reports from several online sources (Yiming, et al., 2007 IEEE).
    [Show full text]
  • Injection of Automatically Selected Dbpedia Subjects in Electronic
    Injection of Automatically Selected DBpedia Subjects in Electronic Medical Records to boost Hospitalization Prediction Raphaël Gazzotti, Catherine Faron Zucker, Fabien Gandon, Virginie Lacroix-Hugues, David Darmon To cite this version: Raphaël Gazzotti, Catherine Faron Zucker, Fabien Gandon, Virginie Lacroix-Hugues, David Darmon. Injection of Automatically Selected DBpedia Subjects in Electronic Medical Records to boost Hos- pitalization Prediction. SAC 2020 - 35th ACM/SIGAPP Symposium On Applied Computing, Mar 2020, Brno, Czech Republic. 10.1145/3341105.3373932. hal-02389918 HAL Id: hal-02389918 https://hal.archives-ouvertes.fr/hal-02389918 Submitted on 16 Dec 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Injection of Automatically Selected DBpedia Subjects in Electronic Medical Records to boost Hospitalization Prediction Raphaël Gazzotti Catherine Faron-Zucker Fabien Gandon Université Côte d’Azur, Inria, CNRS, Université Côte d’Azur, Inria, CNRS, Inria, Université Côte d’Azur, CNRS, I3S, Sophia-Antipolis, France I3S, Sophia-Antipolis, France
    [Show full text]
  • Handling Semantic Complexity of Big Data Using Machine Learning and RDF Ontology Model
    S S symmetry Article Handling Semantic Complexity of Big Data using Machine Learning and RDF Ontology Model Rauf Sajjad *, Imran Sarwar Bajwa and Rafaqut Kazmi Department of Computer Science & IT, The Islamia University Bahawalpur, Bahawalpur 63100, Pakistan; [email protected] (I.S.B.); [email protected] (R.K.) * Correspondence: [email protected]; Tel.: +92-62-925-5466 Received: 3 January 2019; Accepted: 14 February 2019; Published: 1 March 2019 Abstract: Business information required for applications and business processes is extracted using systems like business rule engines. Since the advent of Big Data, such rule engines are producing rules in a big quantity whereas more rules lead to more complexity in semantic analysis and understanding. This paper introduces a method to handle semantic complexity in rules and support automated generation of Resource Description Framework (RDF) metadata model of rules and such model is used to assist in querying and analysing Big Data. Practically, the dynamic changes in rules can be a source of conflict in rules stored in a repository. It is identified during the literature review that there is a need of a method that can semantically analyse rules and help business analysts in testing and validating the rules once a change is made in a rule. This paper presents a robust method that not only supports semantic analysis of rules but also generates RDF metadata model of rules and provide support of querying for the sake of semantic interpretation of the rules. The results of the experiments manifest that consistency checking of a set of big data rules is possible through automated tools.
    [Show full text]
  • Ts 124 623 V9.3.0 (2011-10)
    ETSI TS 124 623 V9.3.0 (2011-10) Technical Specification Digital cellular telecommunications system (Phase 2+); Universal Mobile Telecommunications System (UMTS); LTE; Extensible Markup Language (XML) Configuration Access Protocol (XCAP) over the Ut interface for Manipulating Supplementary Services (3GPP TS 24.623 version 9.3.0 Release 9) 3GPP TS 24.623 version 9.3.0 Release 9 1 ETSI TS 124 623 V9.3.0 (2011-10) Reference RTS/TSGC-0124623v930 Keywords GSM,LTE,UMTS ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N° 348 623 562 00017 - NAF 742 C Association à but non lucratif enregistrée à la Sous-Préfecture de Grasse (06) N° 7803/88 Important notice Individual copies of the present document can be downloaded from: http://www.etsi.org The present document may be made available in more than one electronic version or in print. In any case of existing or perceived difference in contents between such versions, the reference version is the Portable Document Format (PDF). In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at http://portal.etsi.org/tb/status/status.asp If you find errors in the present document, please send your comment to one of the following services: http://portal.etsi.org/chaircor/ETSI_support.asp Copyright Notification No part may be reproduced except as authorized by written permission.
    [Show full text]
  • A Comparison of Knowledge Extraction Tools for the Semantic Web
    A Comparison of Knowledge Extraction Tools for the Semantic Web Aldo Gangemi1;2 1 LIPN, Universit´eParis13-CNRS-SorbonneCit´e,France 2 STLab, ISTC-CNR, Rome, Italy. Abstract. In the last years, basic NLP tasks: NER, WSD, relation ex- traction, etc. have been configured for Semantic Web tasks including on- tology learning, linked data population, entity resolution, NL querying to linked data, etc. Some assessment of the state of art of existing Knowl- edge Extraction (KE) tools when applied to the Semantic Web is then desirable. In this paper we describe a landscape analysis of several tools, either conceived specifically for KE on the Semantic Web, or adaptable to it, or even acting as aggregators of extracted data from other tools. Our aim is to assess the currently available capabilities against a rich palette of ontology design constructs, focusing specifically on the actual semantic reusability of KE output. 1 Introduction We present a landscape analysis of the current tools for Knowledge Extraction from text (KE), when applied on the Semantic Web (SW). Knowledge Extraction from text has become a key semantic technology, and has become key to the Semantic Web as well (see. e.g. [31]). Indeed, interest in ontology learning is not new (see e.g. [23], which dates back to 2001, and [10]), and an advanced tool like Text2Onto [11] was set up already in 2005. However, interest in KE was initially limited in the SW community, which preferred to concentrate on manual design of ontologies as a seal of quality. Things started changing after the linked data bootstrapping provided by DB- pedia [22], and the consequent need for substantial population of knowledge bases, schema induction from data, natural language access to structured data, and in general all applications that make joint exploitation of structured and unstructured content.
    [Show full text]
  • From Cataloguing Cards to Semantic Web 1
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Scientific Open-access Literature Archive and Repository Archeologia e Calcolatori 20, 2009, 111-128 REPRESENTING KNOWLEDGE IN ARCHAEOLOGY: FROM CATALOGUING CARDS TO SEMANTIC WEB 1. Introduction Representing knowledge is the basis of any catalogue. The Italian Ca- talogue was based on very valuable ideas, developed in the late 1880s, with the basic concept of putting objects in their context. Initially cataloguing was done manually, with typewritten cards. The advent of computers led to some early experimentations, and subsequently to the de�nition of a more formalized representation schema. The web produced a cultural revolution, and made the need for technological and semantic interoperability more evi- dent. The Semantic Web scenario promises to allow a sharing of knowledge, which will make the knowledge that remained unexpressed in the traditional environments available to any user, and allow objects to be placed in their cultural context. In this paper we will brie�y recall the principles of cataloguing and lessons learned by early computer experiences. Subsequently we will descri- be the object model for archaeological items, discussing its strong and weak points. In section 5 we present the web scenario and approaches to represent knowledge. 2. Cataloguing: history and principles The Italian Catalogue of cultural heritage has its roots in the experiences and concepts, developed in the late 1880s and early 1900s, by the famous art historian Adolfo Venturi, who was probably one of the �rst scholars to think explicitly in terms of having a frame of reference to describe works of art, emphasizing as the main issue the context in which the work had been produced.
    [Show full text]
  • CN-Dbpedia: a Never-Ending Chinese Knowledge Extraction System
    CN-DBpedia: A Never-Ending Chinese Knowledge Extraction System Bo Xu1,YongXu1, Jiaqing Liang1,2, Chenhao Xie1,2, Bin Liang1, B Wanyun Cui1, and Yanghua Xiao1,3( ) 1 Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai, China {xubo,yongxu16,jqliang15,xiech15,liangbin,shawyh}@fudan.edu.cn, [email protected] 2 Data Eyes Research, Shanghai, China 3 Shanghai Internet Big Data Engineering and Technology Center, Shanghai, China Abstract. Great efforts have been dedicated to harvesting knowledge bases from online encyclopedias. These knowledge bases play impor- tant roles in enabling machines to understand texts. However, most cur- rent knowledge bases are in English and non-English knowledge bases, especially Chinese ones, are still very rare. Many previous systems that extract knowledge from online encyclopedias, although are applicable for building a Chinese knowledge base, still suffer from two challenges. The first is that it requires great human efforts to construct an ontology and build a supervised knowledge extraction model. The second is that the update frequency of knowledge bases is very slow. To solve these chal- lenges, we propose a never-ending Chinese Knowledge extraction system, CN-DBpedia, which can automatically generate a knowledge base that is of ever-increasing in size and constantly updated. Specially, we reduce the human costs by reusing the ontology of existing knowledge bases and building an end-to-end facts extraction model. We further propose a smart active update strategy to keep the freshness of our knowledge base with little human costs. The 164 million API calls of the published services justify the success of our system.
    [Show full text]
  • Download Slides
    a platform for all that we know savas parastatidis http://savas.me savasp transition from web to apps increasing focus on information (& knowledge) rise of personal digital assistants importance of near-real time processing http://aptito.com/blog/wp-content/uploads/2012/05/smartphone-apps.jpg today... storing computing computers are huge amounts great tools for of data managing indexing example google and microsoft both have copies of the entire web (and more) for indexing purposes tomorrow... storing computing computers are huge amounts great tools for of data managing indexing acquisition discovery aggregation organization we would like computers to of the world’s information also help with the automatic correlation analysis and knowledge interpretation inference data information knowledge intelligence wisdom expert systems watson freebase wolframalpha rdbms google now web indexing data is symbols (bits, numbers, characters) information adds meaning to data through the introduction of relationship - it answers questions such as “who”, “what”, “where”, and “when” knowledge is a description of how the world works - it’s the application of data and information in order to answer “how” questions G. Bellinger, D. Castro, and A. Mills, “Data, Information, Knowledge, and Wisdom,” Inform. pp. 1–4, 2004 web – the data platform web – the information platform web – the knowledge platform foundation for new experiences “wisdom is not a product of schooling but of the lifelong attempt to acquire it” representative examples wolframalpha watson source:
    [Show full text]
  • Datatone: Managing Ambiguity in Natural Language Interfaces for Data Visualization Tong Gao1, Mira Dontcheva2, Eytan Adar1, Zhicheng Liu2, Karrie Karahalios3
    DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization Tong Gao1, Mira Dontcheva2, Eytan Adar1, Zhicheng Liu2, Karrie Karahalios3 1University of Michigan, 2Adobe Research 3University of Illinois, Ann Arbor, MI San Francisco, CA Urbana Champaign, IL fgaotong,[email protected] fmirad,[email protected] [email protected] ABSTRACT to be both flexible and easy to use. General purpose spread- Answering questions with data is a difficult and time- sheet tools, such as Microsoft Excel, focus largely on offer- consuming process. Visual dashboards and templates make ing rich data transformation operations. Visualizations are it easy to get started, but asking more sophisticated questions merely output to the calculations in the spreadsheet. Asking often requires learning a tool designed for expert analysts. a “visual question” requires users to translate their questions Natural language interaction allows users to ask questions di- into operations on the spreadsheet rather than operations on rectly in complex programs without having to learn how to the visualization. In contrast, visual analysis tools, such as use an interface. However, natural language is often ambigu- Tableau,1 creates visualizations automatically based on vari- ous. In this work we propose a mixed-initiative approach to ables of interest, allowing users to ask questions interactively managing ambiguity in natural language interfaces for data through the visualizations. However, because these tools are visualization. We model ambiguity throughout the process of often intended for domain specialists, they have complex in- turning a natural language query into a visualization and use terfaces and a steep learning curve. algorithmic disambiguation coupled with interactive ambigu- Natural language interaction offers a compelling complement ity widgets.
    [Show full text]
  • Directions in AI Research and Applications at Siemens Corporate
    AI Magazine Volume 11 Number 1 (1991)(1990) (© AAAI) Research in Progress of linguistic phenomena. The com- Directions in AI Research putational work concerns questions of adequate processing models and algorithms, as embodied in the and Applications at actual interfaces being developed. These topics are explored in the Siemens Corporate Research framework of three projects: The nat- ural language consulting dialogue and Development project (Wisber) takes up research- oriented topics (descriptive grammar formalisms and linguistically adequate grammar specification, handling of Wolfram Buettner, Klaus Estenfeld, discourse, and so on), the data-access Hans Haugeneder, and Peter Struss project (Sepp) focuses on the practi- cal application of state-of-the-art technology, and the work on gram- mar-development environments ■ Many barriers exist today that prevent particular, Prolog extensions; and (4) (Ape) centers on the problem of ade- effective industrial exploitation of current design and analysis of neural networks. quate tools for specifying linguistic and future AI research. These barriers can The lab’s 26 researchers are orga- knowledge sources and efficient pro- only be removed by people who are work- nized into four groups corresponding cessing methods. ing at the scientific forefront in AI and to these areas. Together, they provide Wisber is jointly funded by the know potential industrial needs. Siemens with innovative software The Knowledge Processing Laboratory’s German government and several research and development concentrates in technologies, appropriate applications, industrial and academic partners. the following areas: (1) natural language prototypical implementations of AI The goal of the project is to develop interfaces to knowledge-based systems and systems, and evaluations of new a knowledge-based advice-giving databases; (2) theoretical and experimen- techniques and trends.
    [Show full text]
  • Real-Time Population of Knowledge Bases: Opportunities and Challenges
    Real-time Population of Knowledge Bases: Opportunities and Challenges Ndapandula Nakashole, Gerhard Weikum Max Planck Institute for Informatics Saarbrucken,¨ Germany {nnakasho,weikum}@mpi-inf.mpg.de Abstract 2011; Nakashole 2011) which is now three years old. The NELL system (Carlson 2010) follows a Dynamic content is a frequently accessed part “never-ending” extraction model with the extraction of the Web. However, most information ex- process going on 24 hours a day. However NELL’s traction approaches are batch-oriented, thus focus is on language learning by iterating mostly not effective for gathering rapidly changing data. This paper proposes a model for fact on the same ClueWeb09 corpus. Our focus is on extraction in real-time. Our model addresses capturing the latest information enriching it into the the difficult challenges that timely fact extrac- form of relational facts. Web-based news aggrega- tion on frequently updated data entails. We tors such as Google News and Yahoo! News present point out a naive solution to the main research up-to-date information from various news sources. question and justify the choices we make in However, news aggregators present headlines and the model we propose. short text snippets. Our focus is on presenting this information as relational facts that can facilitate re- 1 Introduction lational queries spanning new and historical data. Challenges. Timely knowledge extraction from fre- Motivation. Dynamic content is an important part quently updated sources entails a number of chal- of the Web, it accounts for a substantial amount of lenges: Web traffic. For example, much time is spent read- ing news, blogs and user comments in social media.
    [Show full text]