Adaptive User Interfaces for Information Extraction

From: AAAI Technical Report SS-00-01. Compilation copyright © 2000, AAAI (www.aaai.org). All rights reserved. Adaptive User Interfaces for Information Extraction Peter Vanderheyden and Robin Cohen Department of Computer Science, University of Waterloo Waterloo, Ontario, Canada Abstract Information Extraction: What Is It? Informationextraction systemsdistill specific items of Information extraction involves "extraction or pulling informationfrom large quantities of text. Current lit- out of pertinent information from large volumes of erature in the field of information extraction focuses texts" (MUC-71999). It is often defined by example, largely on howdifferent language models and machine terms of the results the system would return for a given learning approaches can improve system accuracy, with query, and often contrasted with what an information little mention of the role of interaction betweenthe user and the system. In this paper, we explore many retrieval system might return for a similar query. opportunities for interaction and adaptation during information extraction, and propose a system design to The query-result perspective investigate these opportunities further. Information retrieval generally interprets a query as a string of unrelated words or word phrases 2, matching Introduction it against the words in the corpus documents. Output Muchhas been written about the flood of information from the IR system consists of either whole documents currently available, and the challenge of locating rel- or segments, deemedrelevant to the query words. Thus, evant items amid that flood. Several areas of applied for P’s question "Is terrorism on the rise?", if the corpus research -- information extraction (IE), information re- contains a document that mentions a ’rise’ in ’terror- trieval (IR), and knowledge discovery in text (KDT) ism’, and assuming P has confidence in the document, seek to address this challenge. Weare specifically in- such a document will have answered P’s question. terested in the field of information extraction. In this An IE system takes as its input a user’s query ex- paper, we discuss opportunities for interaction between pressed (or expressable) as a sentence template, con- user and system, and for adaptation by the system to taining unfilled slots, and relations betweenthose slots. facilitate this interaction. Wethen develop objectives For example, a query to identify terrorism events con- for designing a system that supports the user effectively tained in a document could be written as: during information extraction. "A terrorism event occurred in which {GroupA} Wemotivate this discussion by observing that little {ActB} {GroupB}, and {Acre} {0bjectC}." is mentioned in current literature about the interaction between an information extraction system and a user. The system then processes each document in a corpus In contrast, for example, information retrieval litera- and, for each occurrence in the document of an event ture discusses interactions between users and computer- that adequately fills the query template, the system re- based systems as well as between human librarians and turns the filled template (e.g., (ActB}="kidnapped’, library patrons (Twidale & Nichols 1998). {GroupS}="10 civilians", etc.). Given P’s question, The following will be an ongoing example to which converted into an appropriate template form, the sys- we refer throughout the paper: Person P is concerned tem would ideally return all terrorism events mentioned that there has been a high numberof terrorism cases re- in the corpus. Thus one possible scenario is that, if the ported recently t, and is looking for more information on IR system failed to answer P’s question satisfactorily, the matter. What P really wants to know (P’s "global the IE system could be used as a follow-up approach information need") is the following: "Is terrorism on (Gaizauskas ~ Robertson 1997). The data returned the rise?" System S is able to accept many different by the IE system could be analyzed (the number of kinds of queries, and then to search for information in events counted) and plotted as a function of time, to a corpus of documents. How should P and S interact to satisfy P’s information need? 2Most current information retrieval systems consider only minor variations (e.g., morphological) on the query XThereader mayinsert any other topic in place of "ter- words, though natural language approaches to IR exist as rorism" (e.g., managementsuccessions, cancer, rap music). well (Lewis & Sparck Jones 1996). 136 see whether terrorism cases were indeed increasing in user to another, or within the span of a single task; this frequency. is by no meansan exhaustive list. It is not our intention Knowledge discovery in text (Feldman & Dagan to suggest that any one system should explicitly make 1995) begins with the kind of well-structured data often use of all of these parameters. However, the designers returned by an information extraction system. The sys- of a system should at least be aware of them, in order tem and user then cooperate, the system using statisti- to take into account any that are relevant to the specific cal calculations and the user using intuition and knowl- design and intended application of a given system. edge of the world, to find correlative patterns among Many parameters mentioned in this section come data categories, possibly defining new categories in the from (Allen 1996) and are not specific to information ex- process. For example, a KDTsystem might analyze traction, but are relevant to various information tasks. the terrorist event data returned by the IE query mentioned earlier, to identify patterns of occurrence over Information-based variations the a range of years. Thus S would compute the an- What kind of information is the user interested in? swer to P’s question on its own, rather than relying What kind of information is contained in the docu- on a document (that P trusts) in the corpus contain- ments? Different kinds of information may not be con- ing the answer. Finding information often does involve ducive to the same techniques for representation or in- trying one approach, getting partial answers, integrat- teraction. (For example, diagrammatic information is ing this new knowledge, and trying again with the same more successfully conveyed in a diagram than as text.) approach or with a different one. Whydoes the user want the information that the system is trying to find? For example, is the user: The natural language perspective ¯ searching for something in particular; Some degree of language understanding is required to recognize how the terms in the query template are re- ¯ generally browsing, or exploring, to become ac- lated to one another. The documents in the corpus quainted with information available on the topic; maycontain either unstructured natural language (e.g., ¯ integrating new information into what the user al- news wire articles, in the MUCs;e.g., (MUC-61995) ready knows; or, and (MUC-71999)) or semi-structured text (e.g., ¯ trying to gain new perspectives on what the user al- mulaic reviews of restaurants or movies, with consistent ready know? structure; discussed in manyof the papers in (AAAI’98- AIII 1998)). When documents contain unstructured The user’s motivation for performing the IE task will natural language, information extraction can be seen as affect how she examines each document and what kind a natural language understanding task restricted to the of information she focuses on. As a result, it mayaffect domain of knowledge relevant to the user’s query. The howthe user prefers to interact with the system. system typically contains a knowledge base of words Howmany solutions to the query is the user looking and phrases, semantic relations, and rules of syntax in for? If the user is interested in finding a small number the language. Howthese syntax rules are expressed is of solutions then a less accurate (and possibly faster) very muchaffected by the specific language model being procedure may suffice. This is in contrast to a user used (e.g., finite state automata (Appelt et ai. 1995); interested in all solutions that occur in the corpus, or statistical dependency model (Weischedel 1995)). for the "best" solution (which may require finding all Information extraction systems acquire additional solutions, then using somecriterion to select the best). knowledge (about the lexicon, semantics, etc.) dur- Similarly, is the user satisfied with finding someof the ing the course of the task through machine learning elements in the query (i.e., in information extraction, techniques. Typically, this involves "batch" learning -- this means that partially filled templates are accept- learning in large increments -- with no user involvement able)? If so, how many and which elements (if they are within each batch, and the specific approach taken to not weighted equally in importance) are required? will be very much dependent on the language model. With respect to documents in the corpus, does it matter how documents are related to one another? For ex- Summary: there’s still something missing ample, in the legal domain it is important to relate a given case to possible precedents (Rose & Belew 1991). A good deal of the process involved in information extraction has yet to be discussed. For example, the de- User-based variations sign of an information extraction system should con- Howdoes the user process different kinds of informa- sider the kinds of users, queries, and corpora that will be involved. And information extraction may occur in tion? In any given situation, different users will vary with respect to: isolation, but in the context of other activities. ¯ what the user perceives, identifies, or focuses on3; The Information Task In Context 3Allen (1996) discusses individuals’ "boundedrational- In this section, we briefly mention some of the param- ity", the fact that people do not always make use of all eters that will vary from one task to another, from one available information-- i.e., real people are not ideal users.

Adaptive User Interfaces for Information Extraction

Information Extraction from Electronic Medical Records Using Multitask Recurrent Neural Network with Contextual Word Embedding

The History and Recent Advances of Natural Language Interfaces for Databases Querying

A Meaningful Information Extraction System for Interactive Analysis of Documents Julien Maitre, Michel Ménard, Guillaume Chiron, Alain Bouju, Nicolas Sidère

Span Model for Open Information Extraction on Accurate Corpus

Information Extraction Overview

Open Information Extraction from The

Information Extraction Using Natural Language Processing

What Is Special About Patent Information Extraction?

Naturaljava: a Natural Language Interface for Programming in Java

Named Entity Recognition: Fallacies, Challenges and Opportunities

Information Extraction in Text Mining Matt Ulinsm Western Washington University

Extracting Latent Beliefs and Using Epistemic Reasoning to Tailor a Chatbot