<<

Queson Answering

Evangelos Kanoulas [email protected]

EVI Google () (Apple) Question answering Question answering

hp://youtu.be/WFR3lOm_xhE?t=20s Connecons to Related Fields

• Informaon retrieval • Natural language processing • • Machine learning • Arficial intelligence Queson Answering

Types of Quesons in Modern Systems • Factoid quesons – Who wrote “The Universal Declaraon of Human Rights”? – How many calories are there in two slices of apple pie? – What is the average age of the onset of ausm? – Where is Apple Computer based? • Complex (narrave) quesons: – In children with an acute febrile illness, what is the efficacy of acetaminophen in reducing fever? – What do scholars think about Jefferson’s posion on dealing with pirates? Commercial systems: mainly factoid quesons

Where is the Louvre Museum In Paris, France located? What’s the abbreviaon for L.P. limited partnership? What are the names of Odin’s Huginn and Muninn ravens? What currency is used in China? The yuan What kind of nuts are used in almonds marzipan? What instrument does Max drums Roach play? What is the telephone number 650-723-2300 Paradigms for QA

• IR-based approaches – TREC; Google

• Knowledge-based approaches – Apple Siri; Wolfram Alpha; Amazon Evi

• Hybrid approaches – IBM Many quesons can already be answered by web search • a IR-based Queson Answering

• a IR-based Factoid QA

Document DocumentDocumentDocument Document Document Indexing Answer

Passage Question Retrieval Processing Docume Document Query Document Document Passage Answer Document DocumeRelevantnt passages Formulation Retrieval nt Retrieval Processing Question Docs Answer Type Detection Knowledge-based approaches

• Build a semanc representaon of the query – Times, dates, locaons, enes, numeric quanes • Map from this semancs to query structured resources – Geospaal databases – Ontologies ( infoboxes, dbPedia, WordNet, ) – Restaurant review sources and reservaon services – Scienfic databases Knowledge-based approaches

• Build a semanc representaon of the query – Times, dates, locaons, enes, numeric quanes • Map from this semancs to query structured resources Hybrid approaches (IBM Watson)

• Build a shallow semanc representaon of the query • Generate answer candidates using IR methods – Augmented with ontologies and semi-structured data • Score each candidate using richer knowledge sources – Geospaal databases – Temporal reasoning – Taxonomical classificaon IR-based Factoid QA

Document DocumentDocumentDocument Document Document Indexing Answer

Passage Question Retrieval Processing Docume Document Query Document Document Passage Answer Document DocumeRelevantnt passages Formulation Retrieval nt Retrieval Processing Question Docs Answer Type Detection IR-based Factoid QA

• Queson processing – Detect queson type, answer type, focus, relaons – Formulate queries to send to a search engine • Passage retrieval – Retrieve ranked documents – Break into suitable passages and rerank • Answer processing – Extract candidate answers – Rank candidates • using evidence from the text and external sources IR-based Factoid QA

• Queson processing – Detect queson type, answer type, focus, relaons – Formulate queries to send to a search engine • Passage retrieval – Retrieve ranked documents – Break into suitable passages and rerank • Answer processing – Extract candidate answers – Rank candidates • using evidence from the text and external sources Queson Processing Things to extract from the queson • Answer Type Detecon – Decide the named enty type (person, place) of the answer • Queson Type classificaon – Is this a definion queson, a math queson, a list queson? • Focus Detecon – Find the queson words that are replaced by the answer • Relaon Extracon – Find relaons between enes in the queson • Query Formulaon – Choose query keywords for the IR system Queson Processing

“They are two states you could be re-entering, if you're crossing Florida's northern border.”

• Answer Type: US state • Query: two states, entering, crossing, Florida, northern, border • Focus: two states • Relaons: borders(Florida, ?x, north) Answer Type Detecon: Named Enes

• Who founded Virgin Airlines? – PERSON • What Canadian city has the largest populaon? – CITY. Answer Type Taxonomy

country city state

reason expression LOCATION definition abbreviation ABBREVIATION DESCRIPTION individual food ENTITY HUMAN title currency NUMERIC group

animal date money percent distance size Answer Type Detecon

• Hand-wrien rules • Machine Learning • Hybrids Answer Type Detecon

• Regular expression-based rules can get some cases: – Who {is|was|are|were} PERSON • Other rules use the queson headword: • the headword of the first noun phrase aer the wh- word – Which city in China has the largest number of foreign financial companies? – What is the state flower of California? Answer Type Detecon

• Most oen, we treat the problem as machine learning classificaon – Define a taxonomy of queson types – Annotate training data for each queson type – Train classifiers for each queson class using a rich set of features. • features include those hand-wrien rules! Features for Answer Type Detecon

• Queson words and phrases • Part-of-speech tags • Parse features (headwords) • Named Enes • Semancally related words Keyword Selecon Algorithm

• 1. Select all non-stop words in quotaons • 2. Select all NNP words in recognized named enes • 3. Select all complex nominals with their adjecval modifiers • 4. Select all other complex nominals • 5. Select all nouns with their adjecval modifiers • 6. Select all other nouns • 7. Select all verbs • 8. Select all adverbs • 9. Select the queson focus words • 10. Select all other words Choosing keywords from the query

Who coined the term “cyberspace” in his novel “Neuromancer”?

1 1

4 4

7 cyberspace/1 Neuromancer/1 term/4 novel/4 coined/7 IR-based Factoid QA

• Queson processing – Detect queson type, answer type, focus, relaons – Formulate queries to send to a search engine • Passage retrieval – Retrieve ranked documents – Break into suitable passages and re-rank • Answer processing – Extract candidate answers – Rank candidates • using evidence from the text and external sources Passage Retrieval

• Step 1: IR engine retrieves documents using query terms • Step 2: Segment the documents into shorter units – something like paragraphs • Step 3: Passage ranking – Use answer type to help rerank passages

29 Features for Passage Ranking

• Number of named enes of the right type in passage • Number of query words in passage • Number of queson n-grams also in passage • Proximity of query keywords to each other in passage • Longest sequence of queson words • Rank of the document containing passage IR-based Factoid QA

• Queson processing – Detect queson type, answer type, focus, relaons – Formulate queries to send to a search engine • Passage retrieval – Retrieve ranked documents – Break into suitable passages and rerank • Answer processing – Extract candidate answers – Rank candidates • using evidence from the text and external sources Answer Extracon

• Run an answer-type named-enty tagger on the passages – Each answer type requires a named-enty tagger that detects it – If answer type is CITY, tagger has to tag CITY • Can be full NER, simple regular expressions, or hybrid • Return the string with the right type: – Who is the prime minister of India (PERSON) • Manmohan Singh, Prime Minister of India, had told le leaders that the deal would not be renegoated. – How tall is Mt. Everest? (LENGTH) • The official height of Mount Everest is 29035 feet Candidate Answers Ranking

• Answer type match – Candidate contains a phrase with the correct answer type. • Queson keywords – # of queson keywords in the candidate. • Keyword distance – Distance in words between the candidate and query keywords • Novelty factor – A word in the candidate is not in the query. • Punctuaon locaon – The candidate is immediately followed by a comma, period, quotaon marks, semicolon, or exclamaon mark. • Sequences of queson terms – The length of the longest sequence of queson terms that occurs in the candidate answer. IR-based Factoid QA

Document DocumentDocumentDocument Document Document Indexing Answer

Passage Question Retrieval Processing Docume Document Query Document Document Passage Answer Document DocumeRelevantnt passages Formulation Retrieval nt Retrieval Processing Question Docs Answer Type Detection Knowledge-based approaches

• Build a semanc representaon of the query – Times, dates, locaons, enes, numeric quanes • Map from this semancs to query structured resources