A Re-Examination of IR Techniques in QA System

Total Page:16

File Type:pdf, Size:1020Kb

A Re-Examination of IR Techniques in QA System A re-examination of IR techniques in QA system Yi Chang Hongbo Xu Shuo Bai Institute of Computing Institute of Computing Institute of Computing Technology, Chinese Technology, Chinese Technology, Chinese Academy of Sciences, Academy of Sciences, Academy of Sciences, Beijing 100080 Beijing 100080 Beijing 100080 [email protected] [email protected] [email protected] t.ac.cn .cn tive mature while IE techniques are still under de- Abstract veloping, most of current researches have focused on answer extraction (Moldovan et al., 2002; The performance of Information Retrieval Soubbotin et al., 2001; Srihari and Li, 1999). There in the Question Answering system is not is little detailed investigation into the IR perform- satisfactory from our experiences in ance which impacts on overall QA system per- TREC QA Track. In this article, we take a formance. Clarke et al. (2000) proposed a passage comparative study to re-examine IR tech- retrieval technique based on passage length and niques on document retrieval and sen- term weights. Tellex et al. (2003) make a quantita- tence level retrieval respectively. Our tive evaluation of various passage retrieval alg o- study shows: 1) query reformulation rithms for QA. Monz (2003) compares the should be a necessary step to achieve a effectiveness of some common document retrieval better retrieval performance; 2) The tech- techniques when they were used in QA. Roberts niques for document retrieval are also ef- and Gaizauskas (2003) use coverage and answer fective in sentence level retrieval, and redundancy to evaluate a variety of passage re- single sentence will be the appropriate re- trieval approaches with TREC QA questions. trieval granularity. In most of current researches, the granularity for information retrieval in QA is passage or document. What is the potential of IR in QA and 1 Introduction what is the most appropriate granularity for re- trieval still need to be explored thoroughly. Information Retrieval (IR) and Information Extrac- We have built our QA system based on the co- tion (IE) are generally regarded as the two key operation of IE and IR. According to our score and techniques to Natural Language Question Answer- rank on past several TREC conferences, although ing (QA) System that returns exact answers. IE we are making progress each year, the results are techniques are incorporated to identify the exact still far from satisfactory. As our recent study answer while IR techniques are used to narrow the shows, IR results in much more loss comparing search space that IE will process, that is, the output with IE. Therefore, we re-examine two important of the IR is the input of IE. questions that have ever been overlooked: According to TREC QA overview (Voorhees, 2001; Voorhees, 2002), most current question an- · Whether a question is a good query for re- swering systems rely on document retrieval to pro- trieval in QA? vide doc uments or passages that are likely to contain the answer to a question. Since document- oriented information retrieval techniques are rela- · Whether the techniques for document re- two consecutive Bi-sentences; a phrase means a trieval are effective on sentence level re- sequence of keywords or one keyword in a ques- trieval? tion, where a keyword is a word in the question but not in Stop-word list. In this paper, we compare some alternative IR To answer each question, Question Analyzing techniques and all our experiments are based on Module makes use of NLP techniques to identify TREC 2003 QA AQUAINT corpus. To make a the right type of information that the question re- thorough analysis, we focus on those questions quires. Question Analyzing Module also preproc- with short, fact-based answers, called Factoid esses the question and makes a query for further que stions in TREC QA. retrieval. Document Retrieval Engine use the ques- In Section 2, we describe our system architec- n ture and evaluate the performance of each module. tion to get relevant documents and selects top ranked relevant documents. Since the selected Then in section 3, according to the comparison of four document retrieval methods, we find the rea- doc uments contain too much information , Sentence Level Retrieval Module matches the question with son to limit our retrieval performance. We then the selected relevant documents to get relevant Bi- present in Section 4 the results of four sentence m level retrieval methods and in Section 5 we re- sentences, and selects top ranked Bi-sentences. Entity Recognizing Module identifies the candi- search different retrieval granularities. Finally, date entities from the selected Bi-sentences, and Section 6 summarizes the conclusions. Answering Selecting Module chooses the answer in a voting method. 2 System Description In our TREC 2003 runs, we incorporate PRISE Our system to answer Factoid questions contains and Multilevel retrieval method and we select 50 five major modules, namely Question Analyzing documents and 20 Bi-sentences. As our recent Module, Document Retrieval Engine, Sentence study shows: among the 383 Factoid questions Level Retrieval Module, Entity Recognizing Mod- whose answer is not NIL, there are only 275 ques- ule and Answer Selecting Module. Figure 2.1 illus- tions whose answer could be got from the top 50 trates the architecture. relevant documents, while there are only 132 out of 275 questions whose answer could be extracted from the top 20 relevant Bi-sentences. All our sta- tistics are based on the examination of both answer and the corresponding Document ID. Figure 2.2 illustrates the loss distribution of each module in our QA system. It is amazing that more than sixty percent of loss is caused by IR while we used to take it for granted that IE is the In this paper, a Bi-sentence means two consecu- bottleneck of QA system. So we should re-examine tive sentences and there is no overlapping between the IR techniques in QA system, and we take ac- count of document retrieval and sentence level re- basic SMART system, the performance of docu- trieval respectively. ment retrieval was greatly improved. In TREC 2002 Web Topic Distillation Task, this method has 3 Document Retrieval Methods been proven to be very effective and efficient. Inspired by the success in Web Track, we Is a question a good query? If the answer is YES, studied the performance of Enhanced-SMART sys- the retrieval algorithm will determine the perform- tem in TREC QA Track. ance of document retrieval in QA system. When we implement a document retrieval system with 3.4 Enhanced-SMART with pseudo-relevant high performance, we could get a satisfactory re- feedback (E-SMART-FB) sult. To explore this, we retrieve relevant docu- We think one of the difficulties of IR in QA system ments for each question from the AQUAINT is that the number of keywords is too limited in a document set with four IR systems: PRISE, basic question. It is natural to expand the query with SMART, Enhanced-SMART and Enhanced pseudo-relevant feedback methods. SMART with pseudo-relevant feedback. In these Pseudo-relevant feedback is an effective method experiments, we use the question itself as the query to make up for lack of keywords in a query. In our for IR system. pseudo-relevant feedback, several keywords with 3.1 PRISE the highest term frequency in the top k ranked documents returned by the first retrieval are added The NIST PRISE is a full text retrieval system into the initial query, and then we make a second based on Tfidf weighting, which uses fast and retrieval. In the experiments on Web Track after space efficient indexing and searching algorithms TREC 2002, we correct a bug in SMART feedback proposed by Harman and Candela (1989). To fa- component. Our experiments shows that the cilitate those participants in the TREC QA Track evaluation performance in Web Topic Distillation who do not have ready access to a document re- Task can outperform that of any other system when trieval system, NIST also publishes a ranked we introduce pseudo-relevant feedback into the document list retrieved by PRISE, in which the top Enhanced-SMART. 1000 ranked documents per question are provided. We try to explore the potential of document re- trieval in TREC QA Track, so we also do experi- 3.2 Basic SMART ments in QA document retrieval by Enhanced- SMART (Salton, 1971) is a well-known and effec- SMART with pseudo-relevant feedback. tive information retrieval system based on Vector Space Model (VSM), which was invented by Sal- 3.5 Performance Comparison ton in the 1960s. Here we give the experiments Figure 3.1 shows the comparison of four document with basic SMART and compare its performance retrieval methods. X coordinate means the number with PRISE to make a complete evaluation. of documents we select to make statistics. Y coor- dinate means the number of the questions that 3.3 Enhanced-SMART (E-SMART) could be correctly answered, and here we mean the According to the study of Xu and Yang (2002) , the exact answers of these questions are contained in classical weighting methods used in basic SMART the selected documents. such as lnc-ltc do not behave well in TREC. According to figure 3.1, the results of Enhanced- In TREC 2002 Web Track, Yang improved the SMART retrieval technique, which proved to be traditional Lnu-Ltu weighting method. Following one of the best in TREC Web Track, are almost the the general hypothesis that relevance of a docu- same as PRISE; basic SMART performs worst, ment in retrieval is irrelevant to its length, Yang and the performances of other three methods are thought over the normalization of the query’s similar.
Recommended publications
  • Technical Reference Manual for the Standardization of Geographical Names United Nations Group of Experts on Geographical Names
    ST/ESA/STAT/SER.M/87 Department of Economic and Social Affairs Statistics Division Technical reference manual for the standardization of geographical names United Nations Group of Experts on Geographical Names United Nations New York, 2007 The Department of Economic and Social Affairs of the United Nations Secretariat is a vital interface between global policies in the economic, social and environmental spheres and national action. The Department works in three main interlinked areas: (i) it compiles, generates and analyses a wide range of economic, social and environmental data and information on which Member States of the United Nations draw to review common problems and to take stock of policy options; (ii) it facilitates the negotiations of Member States in many intergovernmental bodies on joint courses of action to address ongoing or emerging global challenges; and (iii) it advises interested Governments on the ways and means of translating policy frameworks developed in United Nations conferences and summits into programmes at the country level and, through technical assistance, helps build national capacities. NOTE The designations employed and the presentation of material in the present publication do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. The term “country” as used in the text of this publication also refers, as appropriate, to territories or areas. Symbols of United Nations documents are composed of capital letters combined with figures. ST/ESA/STAT/SER.M/87 UNITED NATIONS PUBLICATION Sales No.
    [Show full text]
  • A Structural Analysis of Mide Chants
    A Structural Analysis of Mide Chants GEORGE FULFORD McMaster University Introduction In this paper I shall investigate the relationship between words and im­ agery in seven song scrolls used by members of an Ojibwa religious society known as the Midewiwin. These texts were collected in the late 1880s by W.J. Hoffman for the Bureau of American Ethnology and subsequently published in their seventh Annual Report (Hoffman 1891).1 All the pictographs which I shall discuss were inscribed on birch bark and used by members of the Midewiwin to record chants used in their ceremonies. According to Hoffman (1891:192) these chants consisted of only a few words or short phrases. They were sung by single individuals — never in chorus — and were repeated over and over again, usually to the accompaniment of a wooden kettle drum. In a previous study (Fulford 1989) I analyzed patterns of structural variation among these pictographs and outlined how three complex symbols — the otter, bear and bird — evolved from clan emblems into pictographic markers. The focus of my earlier study was purely iconographic; in this paper I shall explore the verbal structure of Midewiwin chants in order to show some of the ways in which they were pictographically encoded. For the sake of convenience, I have limited my discussion to song scrolls sharing the otter symbol. Six of the seven scrolls that I shall examine contain this marking device. Although one (designated Scroll C in the appendix) lacks an otter, it displays many other formal similarities with Hoffman published transcriptions and translations of 23 songs performed at the White Earth Reservation in northwestern Minnesota.
    [Show full text]
  • 5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721
    Internet Engineering Task Force (IETF) P. Faltstrom, Ed. Request for Comments: 5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721 The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) Abstract This document specifies rules for deciding whether a code point, considered in isolation or in context, is a candidate for inclusion in an Internationalized Domain Name (IDN). It is part of the specification of Internationalizing Domain Names in Applications 2008 (IDNA2008). Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc5892. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
    [Show full text]
  • Etir Code Lists
    eTIR Code Lists Code lists CL01 Equipment size and type description code (UN/EDIFACT 8155) Code specifying the size and type of equipment. 1 Dime coated tank A tank coated with dime. 2 Epoxy coated tank A tank coated with epoxy. 6 Pressurized tank A tank capable of holding pressurized goods. 7 Refrigerated tank A tank capable of keeping goods refrigerated. 9 Stainless steel tank A tank made of stainless steel. 10 Nonworking reefer container 40 ft A 40 foot refrigerated container that is not actively controlling temperature of the product. 12 Europallet 80 x 120 cm. 13 Scandinavian pallet 100 x 120 cm. 14 Trailer Non self-propelled vehicle designed for the carriage of cargo so that it can be towed by a motor vehicle. 15 Nonworking reefer container 20 ft A 20 foot refrigerated container that is not actively controlling temperature of the product. 16 Exchangeable pallet Standard pallet exchangeable following international convention. 17 Semi-trailer Non self propelled vehicle without front wheels designed for the carriage of cargo and provided with a kingpin. 18 Tank container 20 feet A tank container with a length of 20 feet. 19 Tank container 30 feet A tank container with a length of 30 feet. 20 Tank container 40 feet A tank container with a length of 40 feet. 21 Container IC 20 feet A container owned by InterContainer, a European railway subsidiary, with a length of 20 feet. 22 Container IC 30 feet A container owned by InterContainer, a European railway subsidiary, with a length of 30 feet. 23 Container IC 40 feet A container owned by InterContainer, a European railway subsidiary, with a length of 40 feet.
    [Show full text]
  • Management of Stroke in Neonates and Children: a Scientific Statement from the American Heart Association/American Stroke Association
    UCSF UC San Francisco Previously Published Works Title Management of Stroke in Neonates and Children: A Scientific Statement From the American Heart Association/American Stroke Association. Permalink https://escholarship.org/uc/item/0sw5j7c1 Journal Stroke, 50(3) ISSN 0039-2499 Authors Ferriero, Donna M Fullerton, Heather J Bernard, Timothy J et al. Publication Date 2019-03-01 DOI 10.1161/str.0000000000000183 Peer reviewed eScholarship.org Powered by the California Digital Library University of California AHA/ASA Scientific Statement Management of Stroke in Neonates and Children A Scientific Statement From the American Heart Association/American Stroke Association The American Academy of Neurology affirms the value of this statement as an educational tool for neurologists. Donna M. Ferriero, MD, MS, FAHA, Co-Chair; Heather J. Fullerton, MD, MAS, Co-Chair; Timothy J. Bernard, MD, MSCS; Lori Billinghurst, MD, MSc, FRCPC; Stephen R. Daniels, MD, PhD; Michael R. DeBaun, MD, MPH; Gabrielle deVeber, MD; Rebecca N. Ichord, MD; Lori C. Jordan, MD, PhD, FAHA; Patricia Massicotte, MSc, MD, MHSc; Jennifer Meldau, MSN; E. Steve Roach, MD, FAHA; Edward R. Smith, MD; on behalf of the American Heart Association Stroke Council and Council on Cardiovascular and Stroke Nursing Purpose—Much has transpired since the last scientific statement on pediatric stroke was published 10 years ago. Although stroke has long been recognized as an adult health problem causing substantial morbidity and mortality, it is also an important cause of acquired brain injury in young patients, occurring most commonly in the neonate and throughout childhood. This scientific statement represents a synthesis of data and a consensus of the leading experts in childhood cardiovascular disease and stroke.
    [Show full text]
  • K191201 Trade/Device Name
    November 15, 2019 Hiossen, Inc. Peter Lee QA/RA Manager 85 Ben Fairless Drive Fairless Hills, Pennsylvania 19030 Re: K191201 Trade/Device Name: EM SA Implant System Regulation Number: 21 CFR 872.3640 Regulation Name: Endosseous Dental Implant Regulatory Class: Class II Product Code: DZE Dated: November 14, 2019 Received: November 15, 2019 Dear Peter Lee: We have reviewed your Section 510(k) premarket notification of intent to market the device referenced above and have determined the device is substantially equivalent (for the indications for use stated in the enclosure) to legally marketed predicate devices marketed in interstate commerce prior to May 28, 1976, the enactment date of the Medical Device Amendments, or to devices that have been reclassified in accordance with the provisions of the Federal Food, Drug, and Cosmetic Act (Act) that do not require approval of a premarket approval application (PMA). You may, therefore, market the device, subject to the general controls provisions of the Act. Although this letter refers to your product as a device, please be aware that some cleared products may instead be combination products. The 510(k) Premarket Notification Database located at https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpmn/pmn.cfm identifies combination product submissions. The general controls provisions of the Act include requirements for annual registration, listing of devices, good manufacturing practice, labeling, and prohibitions against misbranding and adulteration. Please note: CDRH does not evaluate information related to contract liability warranties. We remind you, however, that device labeling must be truthful and not misleading. If your device is classified (see above) into either class II (Special Controls) or class III (PMA), it may be subject to additional controls.
    [Show full text]
  • Partitions with Prescribed Hook Differences
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Europ. J. Combinatorics (1987) 8, 341-350 Partitions with Prescribed Hook Differences GEORGE E. ANDREWS,· R. J. BAXTER, D. M. BRESSOUD,t W. H. BURGE, P. J. FORRESTER AND G. VIENNOT We investigate partition identities related to off-diagonal hook differences. Our results generalize previous extensions of the Rogers-Ramanujan identities. The identity of the related polynomials with constructs in statistical mechanics is discussed. 1. INTRODUCTION I. Schur's polynomial proof of the Rogers-Ramanujan identities [15] has been the starting point for several diverse studies. The polynomials Schur introduced were generalized to provide partition identities related to successive ranks of partitions [1], [2], [5], [10), [11], [12], [13]. Subsequently these same polynomials were found to be the key element in providing Rogers-Ramanujan type identities for Regime II of the hard hexagon model [4], [9; ch. 14]. In [6], three of us considered an extensive generalization of the hard hexagon model, and we found that polynomial approximations to certain sums of eigenvalues played an essen­ tial role. Again these polynomials were clearly of the same general structure as those introduced by Schur. In this paper we consider in full generality the families of polynomials that arose in [6]. The statistic of partitions that plays the main role is that of 'off-diagonal hook difference', a topic introduced in [13]. In Section 2 we provide the necessary combinatorial preliminaries. In Section 3 we prove a general partition theorem which generalized the results in [1], [2], [5], [10], [11].
    [Show full text]
  • BASHKIR MANUAL Nicholas Poppe (University of Washington), and Andreas Tietze (University of California, Los Angeles), Consulting Editors
    f,;7 . I i / Indiana University Publications Graduate School Uralic and Altaic Series THOMAS A. SEBEOK, Editor Billie Greene, Assistant to the Editor Fred W. Householder, Jr., John R. Krueger, Felix J. Oinas, Alo Raun, Denis Sinor, and Odan Schiltz, Associate Editors GyuIa Decsy (University of Hamburg), Lawrence Krader (Syracuse Univer­ sity), John Lotz (Columbia University), Samuel E. Martin (Yale University), BASHKIR MANUAL Nicholas Poppe (University of Washington), and Andreas Tietze (University of California, Los Angeles), Consulting Editors Vol. American Studies in Uralic Linguistics, Edited by the Indiana University Committee on Uralic Studies (1960) o.p. Vol. 2 Buriat Grammar, by Nicholas Poppe (1960) - $3.00 Vol. 3 The Structure and Development of the Finnish Language, by Lauri Hakulinen (trans. by John Atkinson) (1961) $7.00 Vol. 4 Dagur Mongolian: Grammar Texts, and Lexicon, by Samuel E. Martin (1961) $5.00 Vol. 5 An Eastern Cheremis Manual: Phonology, Grammar, Texts, and Glossary, by Thomas A. Sebeok and Frances J. Ingemann (1961) - $4.00 Vol. 6 The Phonology of Modern Standard Turkish, byR.B.Lees(1961) - $3.50 Vol. 7 Chuvash Manual: Introduction, Grammar, Reader, and Vocabulary, by John R. Krueger (1961) $5.50 Vol. 8 Buriat Reader, by James E. Bosson (supervised and edited by Nicholas Poppe) (1962) - $2.00 Vol. 9 Latvian and Finnic Linguistic Convergences, by Valdis J. Zeps (1962) - $5.00 Vol. 10 Uzhek Newspaper Reader (with Glossary), by Nicholas Poppe, Jr. (1962) - $2.00 Vol. 11 Hungarian Reader (Folklore and Literature) With Notes, Edited by John Lotz (1962) $1.00 (continued inside back cover) t._._ Vt5f ~ HL9~1 AMERICAN COUNCIL OF LEARNED SOCIETIES Research and Studies in Uralic and Altaic Languages BASHKIR MANUAL Descriptive Grammar and Texts Project No.
    [Show full text]
  • 2Nd Stage International Geopolitical Competition TEST
    2nd stage International Geopolitical Competition TEST 11 VII 2020 RULES • The test duration is 60 minutes and it contains 60 test questions. You shall put your answers in previously sent form and it shouldn’t be sent back later than midday (Warsaw time). • You are prohibited to use any help from other people. Your personal notes, books, or the Internet is allowed. • For correct response you will gain 1 point, for no answer there is no penalty, and for wrong one there is -1 point penalty. • You are not allowed to change previously given answers. To answer a question put X mark on a table in your printed form. 1. What does powermetrics as subdiscipline of geopolitics deal with? A. History of geopolitical doctrines B. Statistics of international agreements C. Biographies of outstanding geopoliticians D. Measuring the length of national borders E. Measuring the power of political units F. Measurement of the soldiers’ physical ability QA: prof.. dr hab.. M. Sułek 2. Who introduced the concept of soft power to international relations science? A. Niccoló Machiavelli B. Rudolf Kjellén C. John Mearsheimer D. Karl Haushofer E. Joseph Nye F. Henry Kissinger QA: prof.. dr hab.. M. Sułek 3. Smart power is the connected way of using of: A. Hard power and sharp power B. Hard power and soft power C. Hard power and sticky power D. Soft power and sticky power E. Sharp power and soft power F. Sharp power and sticky power QA: prof.. dr hab.. M. Sułek 4. The main determinant of the state’s international position is: A.
    [Show full text]
  • Delaware River Water Quality ^Ristol to Marcus Hook Pennsylvania August 1949 to December 1963 R«» W
    Delaware River Water Quality ^ristol to Marcus Hook Pennsylvania August 1949 to December 1963 r«» W. B. KEIGHTON ONTRIBUTIONS TO THE HYDROLOGY OF THE UNITED STATES GEOLOGICAL SURVEY WATER-SUPPLY PAPER 1809-O ^repared in cooperation with the :ity of Philadelphia '.JNITED STATES GOVERNMENT PRINTING OFFICE, WASHINGTON : 1965 UNITED STATES DEPARTMENT OF THE INTERIOR STEWART L. UDALL, Secretory GEOLOGICAL SURVEY Thomas B. Nolan, Director For sale by the Superintendent of Documents, U.S. Government Printing Office Washington, D.C. 20402 - Price 25 cents (paper cover) CONTENTS Page Abstract_______________________________________________________ 01 Introduction. _____________________________________________________ 1 Purpose and scope_____________________________________________ 4 Previous investigations and acknowledgments.____________________ 4 Some basic physical and hydrologic facts______.______________________ 5 Graphic summaries of water-quality facts__________________________ 9 Water temperature____________________________________________ 9 Hydrogen-ion concentration (pH)______________________________ 10 Dissolved-solids concentration_________________________________ 10 Composition of dissolved solids__________________________________ 19 Chloride___________________________________-_ 24 Dissolved oxygen._____________________________________________ 27 Suspended sediment.__________________________________________ 30 Tabular summaries of water-quality characteristics ____________________ 36 Local variations in water-quality patterns___________________-__--___-
    [Show full text]
  • Is Multihop QA in DIRE Condition? Measuring and Reducing Disconnected Reasoning
    Is Multihop QA in DIRE Condition? Measuring and Reducing Disconnected Reasoning Harsh Trivedi†∗ Niranjan Balasubramanian† Tushar Khot‡ Ashish Sabharwal‡ † Stony Brook University, Stony Brook, U.S.A. fhjtrivedi,[email protected] ‡ Allen Institute for AI, Seattle, U.S.A. ftushark,[email protected] Abstract Input Which country got independence when the cold war started? Has there been real progress in multi-hop The war started in 1950. question-answering? Models often exploit The cold war started in 1947. dataset artifacts to produce correct answers, France finally got its independence. without connecting information across multi- India got independence from UK in 1947. Set of Facts ple supporting facts. This limits our ability 30 countries were involved in World War 2. to measure true progress and defeats the pur- pose of building multi-hop QA datasets. We Disconnected Reasoning Output make three contributions towards addressing Answer SF Simple Combination this. First, we formalize such undesirable be- Answer havior as disconnected reasoning across sub- India India sets of supporting facts. This allows develop- No Interaction Supporting Facts (SF) ing a model-agnostic probe for measuring how much any model can cheat via disconnected reasoning. Second, using a notion of con- trastive support sufficiency, we introduce an Figure 1: Example of disconnected reasoning, a form automatic transformation of existing datasets of bad multifact reasoning: Model arrives at the answer that reduces the amount of disconnected rea- by simply combining its outputs from two subsets of soning. Third, our experiments1 suggest that the input, neither of which contains all supporting facts. there hasn’t been much progress in multifact From one subset, it identifies the blue supporting fact QA in the reading comprehension setting.
    [Show full text]
  • A Comparison of Circle and J Hook Performance Within the Grenadian Pelagic Longline Fishery Anthony G
    Nova Southeastern University NSUWorks HCNSO Student Theses and Dissertations HCNSO Student Work 4-25-2019 A Comparison of Circle and J Hook Performance within the Grenadian Pelagic Longline Fishery Anthony G. Burns Nova Southeastern University, [email protected] Follow this and additional works at: https://nsuworks.nova.edu/occ_stuetd Part of the Marine Biology Commons, and the Oceanography and Atmospheric Sciences and Meteorology Commons Share Feedback About This Item NSUWorks Citation Anthony G. Burns. 2019. A Comparison of Circle and J Hook Performance within the Grenadian Pelagic Longline Fishery. Master's thesis. Nova Southeastern University. Retrieved from NSUWorks, . (510) https://nsuworks.nova.edu/occ_stuetd/510. This Thesis is brought to you by the HCNSO Student Work at NSUWorks. It has been accepted for inclusion in HCNSO Student Theses and Dissertations by an authorized administrator of NSUWorks. For more information, please contact [email protected]. Thesis of Anthony G. Burns Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science M.S. Marine Biology Nova Southeastern University Halmos College of Natural Sciences and Oceanography April 2019 Approved: Thesis Committee Major Professor: David Kerstetter Committee Member: Amy C. Hirons, Ph.D. Committee Member: Paul T. Arena, Ph.D. This thesis is available at NSUWorks: https://nsuworks.nova.edu/occ_stuetd/510 HALMOS COLLEGE OF NATURAL SCIENCES AND OCEANOGRAPHY A COMPARISON OF CIRCLE AND J HOOK PERFORMANCE WITHIN THE GRENADIAN PELAGIC LONGLINE FISHERY By: Anthony G. Burns Submitted to the Faculty of Halmos College of Natural Sciences and Oceanography in partial fulfillment of the requirements for the degree of Master of Science with a specialty in: Marine Biology Nova Southeastern University May 2019 Thesis of Anthony G.
    [Show full text]