Towards Generating Human-like Deep Questions

Liangming Pan Curiosity is the beginning of wisdom

Can machines ask questions like humans?

2 Question Generation

Context Question generate

q Texts q Knowledge q Images

Why do humans take O2 to produce Were Bill Gates and Satya Who is having a birthday? CO2 while plants do the opposite? Nadella once colleagues? 3 Question Generation

What can question generation be used for?

q Texts q Knowledge q Images

Why do humans take O2 to produce Were Bill Gates and Satya Who is having a birthday? CO2 while plants do the opposite? Nadella once colleagues? 4 Applications of Question Generation q Education q Dialogue Generate quiz questions from KG Generate clarification questions

I am working for an employer in Canada. Do I need to carry on paying UK National Insurance

evaluate

generate Have you been working abroad 52 weeks or less?

(Seyler et al., ICTIR 2017) (Marzieh et al., EMNLP 2018) 5 Applications of Question Generation q Question Answering q Reading Comprehension Data augmentation for QA QA-style Summarization

Evaluate Human Questions QA Model

Train

Documents Generated Questions

(Duan et al., EMNLP 2017) (Lewis et al., ACL 2019) (Krishna and Iyyer, ACL 2019) (Puri et al., EMNLP 2020) 6 Question depth: Bloom’s Taxonomy

Produce new or original idea how would you design a new..? what would happen if you...? Justify a stand or decision do you agree that...? what do you think about...? Draw connections among ideas what are the parts or features of...? Classify...according to...; Use information in new situation how is...an example of...?; how is...related to...?; Explain ideas or concepts why is … happen? what does … mean? Recall facts and basic concepts who, what, when, where, how..? describe...?

7 Question depth: Bloom’s Taxonomy

Current faculty includeProduce the new anthropologist or original idea how would you design a new..? what would happen if you...? Marshall Sahlins and the Shakespeare scholar David Bevington.Justify a stand or decision do you agree that...? what do you think about...? Draw connections among ideas What Shakespearewhat are the scholar parts or featuresis currently of...? onClassify...according the faculty? to...; Use information in new situation how is...an example of...?; how is...related to...?; Explain ideas or concepts why is … happen? what does … mean? Recall facts and basic concepts who, what, when, where, how..? describe...?

8 Question depth: Bloom’s Taxonomy

Produce new or original idea how would you design a new..? what would happen if you...? Justify a stand or decision do you agree that...? what do you think about...? Draw connections among ideas what are the parts or features of...? Classify...according to...; Use information in new situation Mosthow of is...anthe astronomers, example of...?; scientists, how is...related and philosophers to...?; had ideasExplain about ideas the universeor concepts that we now know were not entirelywhy correct.is … happen? In what what ways does …do mean? you think ideas that have turned out to be “wrong” are valuable in developing ourRecall understanding facts and basic of concepts the universe? who, what, when, where, how..? describe...?

9 Why asking deep question is hard?

Ancient Greek believes that the earth is the center of the universe. Understanding, Reasoning In 1543, Copernicus proposed Most of the scientists, and philosophers that the sun was at the center of had ideas about the universe that we the universe. now know were not entirely correct. They are all proved wrong.

Are those “wrong ideas” about the universe center also valuable? Failures are also valuable If yes, why they are valuable? in scientific discovery.

Information-Seeking Prior Knowledge 10 Why asking deep question is hard?

Ancient Greek believes that the earth is the center of the universe. Understanding, Reasoning In 1543, Copernicus proposed that the sun was at the center of Prior Knowledge the universe. They are all proved wrong. Information-Seeking

In what ways do you think ideas that have turned out to be “wrong” are valuable in developing our understanding of the universe?

11 Core Problems (My Interests)

How to generate questions that promote deep reasoning?

All Join Hands Song Train QA Is Rise to prominence Model Rock Band From Early 1970 Tables Documents Generated QA pairs British

[Pan et al., ACL 2020] [Pan et al., NAACL 2021]

How does QG benefits downstream NLP applications?

Supported

Refuted [Pan et al., ACL 2021] Generated Fact QA pairs NEI Verification

12 Core Problems (My Interests)

How to generate questions that promote deep reasoning?

All Join Hands

Song

Slade Is Rise to prominence Rock Band From Early 1970

British

[Pan et al., ACL 2020]

13 Reasoning with Semantic Graphs

All Join Hands

Song

Slade Is Rise to prominence Rock Band From Early 1970

British

Liangming Pan, Yuxi Xie, Yansong Feng, Tat-Seng Chua, Min-Yen Kan

14 [ACL 2020] Pan et al: Semantic Graphs for Generating Deep Questions Multi-hop Question Generation

HotpotQA q We choose multi-hop questions, a typical type of deep questions, to Paragraph A: All Join Hands "All Join Hands" is a song by the British rock band Slade, explore reasoning in QG. released in 1984 as the lead single from the band's twelfth studio "Rogues Gallery".

Paragraph B: Slade q To generate a multi-hop question, it Slade are an English glam rock band from Wolverhampton. They rose to prominence during the early requires the integration and reasoning 1970s with 17 consecutive top 20 hits and six number ones on the UK Singles Chart. over different pieces of information.

Question: When did the rock band that sang "All Join Hands" rise to prominence? q In contrast, shallow QG only requires Answer: The early 1970s local contexts and rarely involves reasoning. (Yang et al., EMNLP 2018) 15 The Seq2Seq Baseline Question

Encoder Decoder

Answer Passage 1 Passage 2

q The key to multi-hop reasoning is to: o Understand the relationship between entities. o Select reasoning chains and perform multi-step reasoning.

Difficult when representing inputs as a sequence of tokens. 16 Semantic Graphs

How about we represent the documents in a structured way?

Paragraph A: All Join Hands "All Join Hands" is a song by the British rock band Slade, released in 1984 as the lead single from the band's twelfth studio album "Rogues Gallery".

Paragraph B: Slade Slade are an English glam rock band from Wolverhampton. They rose to prominence during the early 1970s with 17 consecutive top 20 hits and six number ones on the UK Singles Chart.

17 Semantic Graphs

How about we represent the documents in a structured way?

All Join Hands

Paragraph A: All Join Hands Song "All Join Hands" is a song by the British rock band Slade, released in 1984 as the lead single from the band's Slade twelfth studio album "Rogues Gallery". Is Paragraph B: Slade Rise to prominence Slade are an English glam rock band from Rock Band Wolverhampton. They rose to prominence during the early 1970s with 17 consecutive top 20 hits and six number ones From Early 1970 on the UK Singles Chart. British

18 Semantic Graphs q Different reasoning chains correspond to different questions.

All Join Hands

Song Slade Question: When did the rock band that sang "All Join Hands" rise to prominence? Rise to prominence Answer: The early 1970s Early 1970

19 Semantic Graphs

q Different reasoning chains correspond to different questions.

All Join Hands

Song Slade Question: Which country is the rock Is band that sang “All Join Hands” from?

Rock Band Answer: Britain From

British

20 Constructing Semantic Graphs

1) Semantic Parsing: • Dependency Parsing (DP) • Semantic Role Labeling (SRL)

2) Pruning and Merging • Based on heuristic rules • Merge tokens within a phrase as a single node • Filter out unimportant nodes (for example, punctuation)

3) Connect Similar Nodes • Merge semantic graphs for different sentences into a unified graph. 21 Model Architecture When did the rock band that sang "All Join Hands" rise to prominence? All Join Hands Slade Rock Band

LSTM Decoder FNN Layer

Attention ⋯ ⋯

LSTM Encoder Gated Graph Neural Networks

All Join Hands Paragraph A: All Join Hands "All Join Hands" is a song by the British rock band Slade, Song released in 1984 as the lead single from the band's twelfth studio album "Rogues Gallery". Slade Is Paragraph B: Slade Rise to prominence Slade are an English glam rock band from Rock Band Wolverhampton. They rose to prominence during the early From Early 1970 1970s with 17 consecutive top 20 hits and six number ones on the UK Singles Chart. British 23 Experimental Results

BLEU4 & METEOR on HotpotQA

20 18.19

15.43 15

11.81 Text Only (Seq2Seq) Graph Only 10 8.46

5

0

BLEU4 METEOR 24 Experimental Results

BLEU4 & METEOR on HotpotQA

20 19.25 18.19

15.43 14.66 15

11.81 Text Only (Seq2Seq) Graph Only 10 8.46 Ours (Text + Graph)

5

0

BLEU4 METEOR 25 Experimental Results

BLEU4 & METEOR on HotpotQA 20.15 20 19.25 18.19

15.53 15.43 14.66 15

11.81 Text Only (Seq2Seq) Graph Only 10 8.46 Ours (Text + Graph) Ours + MultiTask 5

0

BLEU4 METEOR 26 Human Evaluation • 300 random test samples • 3 human annotators • Evaluation metrics: Fluency, Relevance, Complexity

Seq2Seq Ours Ours + MultiTask Ground-Truth 4.62 4.74 4.67 4.41 4.15 4.25 4.17 4.22 3.79 3.91 3.89 3.45

Fluency Relevance Complexity 27 Error Analysis q Semantic Error: the question has logic or commonsense error; q Answer Revealing: the question reveals the answer; q Ghost Entity: the question refers to entities that do not occur in the document; q Redundant: the question contains unnecessary repetition; q Unanswerable: the question does not have the above errors but cannot be answered by the document.

Seq2Seq Ours 17.7 16.3 13.9

8.3 8.2 8.3 6.8 4.9 2.1 1.4

Semantic Error Answer Ghost Entity Redundant Unanswerable Revealing 28 Error Analysis q Semantic Error: the question has logic or commonsense error; q Answer Revealing: the question reveals the answer; q Ghost Entity: the question refers to entities that do not occur in the document; q Redundant: the question contains unnecessary repetition; q Unanswerable: the question does not have the above errors but cannot be answered by the document.

Seq2Seq Ours 17.7 16.3 13.9

8.3 8.2 8.3 6.8 4.9 2.1 1.4

Semantic Error Answer Ghost Entity Redundant Unanswerable Revealing 29 Limitations q Implicit Reasoning • Lack of an explicit modeling of the reasoning process. • For example: retrieve(A); retrieve(B); Compare(A, B) q Supervised Learning • Requires large-scale human-written multi-hop questions. • In practice, unsupervised question generation is more desirable.

How can we generate multi-hop questions: 1) with an explicit reasoning process 2) without seeing any human-written examples?

30 Core Problems (My Interests)

How to generate questions that promote deep reasoning?

[Pan et al., NAACL 2021]

31 Composing shallow questions into deep questions

Liangming Pan, Wenhu Chen, Wenhan Xiong, Min-Yen Kan, William Yang Wang

32 [NAACL 2021] Pan et al: Unsupervised Multi-hop Question Answering by Question Generation Answer Multi-hop Questions

When did the rock band that Multi-hop sang "All Join Hands" rise to Question prominence?

Paragraph A: All Join Hands Which rock band sang "All "All Join Hands" is a song by the British rock band Slade, released in 1984 as the lead single from the band's Join Hands"? twelfth studio album "Rogues Gallery".

Ans = Slade

Paragraph B: Slade When did Slade rise to Slade are an English glam rock band from Wolverhampton. They rose to prominence during the early prominence? 1970s with 17 consecutive top 20 hits and six number ones on the UK Singles Chart.

Ans = Early 1970s

33 Generate Multi-hop Questions

When did the rock band that Multi-hop sang "All Join Hands" rise to Question prominence?

Paragraph A: All Join Hands Which rock band sang "All "All Join Hands" is a song by the British rock band Slade, released in 1984 as the lead single from the band's Join Hands"? twelfth studio album "Rogues Gallery".

Ans = Slade

Paragraph B: Slade When did Slade rise to Slade are an English glam rock band from Wolverhampton. They rose to prominence during the early prominence? 1970s with 17 consecutive top 20 hits and six number ones on the UK Singles Chart.

34 Zero-shot Multi-hop QA

Human-written Evaluate Input Sources QA Pairs QA Model

Train MQA-QG

Documents Tables Generated QA Pairs

35 Paragraph B: Slade Paragraph A: All Join Hands Slade are an English glam rock band from "All Join Hands" is a song by the British Wolverhampton. They rose to prominence rock band Slade, released in 1984 ⋯ ⋯ during the early 1970s with ⋯ ⋯ ⋯

What rock band sang #� ���������� Slade “All Join Hands”? Answer: Slade

#� �����ℎ��� �����ℎ��� #�

When did the rock band Slade sang “All #� ���������� Slade rose to prominence? Join Hands”? Answer: Early 1970s #� �����������

When did the rock band that sang "All Join Hands" rise to prominence? Answer: Early 1970s 36 Medal Championship Name Event Kirsten Carlijn Wild (born 15 October 1982) is Silver 2010 Pruszkow Tim Veldt Men’s omnium a Dutch professional racing cyclist, ⋯ ⋯ ⋯. Women’s Wild competed in two track cycling events at Bronze 2011 Apeldoorn Kristen Wild omnium the 2012 Summer Olympics. Gold 2013 Apeldoorn Elis Ligtlee Women’s keirin

Kirsten Wild #� ���������� #� �����ℎ���

Kirsten Wild of Netherlands What is the birthdate of won the bronze medal in the #� ����������� Kirsten Wild? 2011 Apeldoorn. Answer: 15 October 1982 #� �����������

What is the birthdate of the athlete that of Netherlands won the bronze medal in the 2011 Apeldoorn? Answer: 15 October 1982

37 The Blending Function

!: Kirsten Wild ": What is the birthdate of Kirsten Wild? Answer: 15 October 1982 #: Kirsten Wild of Netherlands won the bronze medal in the 2011 Apeldoorn.

What is the birthdate of the _____ that of Netherlands won the bronze medal in the 2011 Apeldoorn?

What is the birthdate of the athlete that of Netherlands won the bronze medal in the 2011 Apeldoorn? Answer: 15 October 1982 BERT

38 Evaluation Datasets q HotpotQA q HybridQA Text + Text Table + Text

(Yang et al., EMNLP 2018) (Chen et al., EMNLP 2020) 39 Supervised QA Performance

HotpotQA HybridQA 90 70 80 60 70 50 60 50 40

40 30 F1 Score F1 Score F1 30 20 20 10 10 0 0 Bridge Comparison Overall In-Table In-Text Overall Supervised 83.5 80.3 82.8 Supervised 58.6 46.4 50 Zero-Shot 72.2 54.4 68.6 Zero-Shot 40.6 25 30.5

40 Zero-shot QA Performance

HotpotQA HybridQA 90 70 80 60 70 50 60 50 40

40 30 F1 Score F1 Score F1 30 20 20 10 10 0 0 Bridge Comparison Overall In-Table In-Text Overall Supervised 83.5 80.3 82.8 Supervised 58.6 46.4 50 Zero-Shot 72.2 54.4 68.6 Zero-Shot 40.6 25 30.5

41 Few-shot QA Performance q HotpotQA q HybridQA The F1 score for progressively larger training dataset sizes for finetuning.

42 Examples of Generated Questions

Type Question Answer When did the one that won the Eurovision Song Contest in 1966 join Gals and Pals? 1963 Table-Text How many students attend the teams that played in the Dryden Township Conference? 1900 American What album did the Oak Ridge Boys release in 1989? Text-Table Dreams When was the name that is the name of the bridge that crosses Youngs Bay completed? Summer Craig Which Canadian cinematographer is best known for his work on Fargo? Text-Text Wrobleski What is illegal in the country that is Bashar Hafez al - Assad ’s father? Cannabis Who was born first, Terry Southern or Neal Town Stephenson? Terry Southern Comparison Are Beth Ditto and Mary Beth Patterson of the same nationality? Yes

43 Core Problems (My Interests)

How does QG benefits downstream NLP applications?

[Pan et al., ACL 2021]

44 Question Generation for Fact Verification

Liangming Pan, Wenhu Chen, Wenhan Xiong, Min-Yen Kan, William Yang Wang

45 [ACL 2021] Pan et al: Few-shot Fact Checking with Claim Generation Fact Checking

This work

46 Fact Checking “Immigrants are a drain on the economy”

v Labels: Supports, Refutes, Not Enough Info v Pipeline v Document Retrieval v Sentence Retrieval v Claim Verification 47 Fact Verification

Claim

Evidence

Label

48 Motivation

Training fact verification models require large scale human-annotated (Evidence, Claim, Label) data, such as FEVER.

• It is time-consuming to ask humans to annotate the training data.

• Could we automatically generate (Evidence, Claim, Label) to train the fact verification model?

49 Zero/Few-shot Fact Verification

Supported

Refuted

Document Evidence Not Enough Info

Generated (Evidence, Claim) Pairs

Human-labeled (Evidence, Claim) Pre-Training

Fine-Tuning Fact Checking SUPPORT Model REFUTE Evidence Claim NEI 50 Claim Generation with QG

Question Generator QA-to-Claim Model 1992 Los Angeles riots

SUPPORTED Evidence ( ) ⋯ ⋯ Q: Where did the Rodney King � The Rodney King riots took place riots happen? The 1992 Los Angeles riots, also known as A: Los Angeles County in Los Angeles County. the Rodney King riots were a series of riots, lootings, arsons, and civil disturbances that Answer Replacement REFUTED occurred in Los Angeles County, California in April and May 1992. Q: Where did the Rodney King The Rodney King riots took place riots happen? ⋯ ⋯ A: San Francisco County in San Francisco County. Extra Contexts (����) NOT ENOUGH INFO By the time the riots ended, 63 people had been killed. Q: How many people were killed in 63 people were killed in the the Rodney King riots? Rodney King riots. ⋯ ⋯ A: 63

51 Examples of Generated Claims

56 Zero-shot Fact Verification

100 95.1 90 87.8 85.5 78.1 80 77.1

70 62.6 60 50 Supervised 40 F1 Score F1 QACG 30

20

10

0 FEVER-S/R FEVER-S/R/N FEVER-Symmetric

57 Zero-shot Fact Verification

100 95.1 90 87.8 85.5 78.1 80 77.1 70.2 70 67.8 62.6 60 55.6 Supervised 52.7 49.8 50 QACG 40 F1 Score F1 35.3 LM for FC 30 Preplexity 20

10

0 FEVER-S/R FEVER-S/R/N FEVER-Symmetric

58 Few-shot Fact Verification

60 Discussions v Our model requires a good question generator. v To generate deep claims, you need to generate deep questions.

61 Summary

How to generate questions that promote deep reasoning?

All Join Hands Song Train Slade QA Is Rise to prominence Model Rock Band From Early 1970 Tables Documents Generated QA pairs British

[Pan et al., ACL 2020] [Pan et al., NAACL 2021]

How does QG benefits downstream NLP applications?

Supported

Refuted [Pan et al., ACL 2021] Generated Fact QA pairs NEI Verification

62 Summary

We are still far from real human-raised deep questions

Most of the astronomers, scientists, and philosophers had ideas about the universe that we now know where not entirely correct. In what ways do you think ideas that have turned out to be “wrong” are valuable in developing our understanding of the universe?

Understanding, Reasoning Prior Knowledge Information-Seeking

63 Future Direction: Prior Knowledge

Photosynthesis in green plants converts water, carbon dioxide, and minerals into oxygen and energy-rich organic compounds.

Human takes in oxygen Sunlight is important for and breath out carbon plants to maintain life. dioxide to maintain life.

Why do humans take O2 to produce Does sunlight take any part in the CO2 while plants do the opposite? photosynthesis?

64 Future Direction: Information-Seeking

I am sitting in a restaurant with a friend. He orders: 'Tournedos, medium rare ...' I order spaghetti. Soon my friend is served baked chicken with French fries and I am served spaghetti; we are both happy with our dishes.

Now, suppose I would like to understand what really happened.

What question should I ask?

• Didn't you order tournedos? • You really don't mind that they changed your order? • Oh, have you changed your order in the meantime? • Why are you in the restaurant? • Is the spaghetti delicious? 65 Know more about QG

Survey for Neural Question Generation Question Generation Paper List

• https://arxiv.org/pdf/1905.08949.pdf • https://github.com/teacherpeterpan /Question-Generation-Paper-List

66 Collaborators

Yuxi Xie Liangming Pan Min-Yen Kan [email protected] [email protected] [email protected]

Wenhu Chen Wenhan Xiong William Wang [email protected] [email protected] [email protected] 67 References

[1] Knowledge Questions from Knowledge Graphs. Dominic Seyler, Mohamed Yahya, Klaus Berberich. ICTIR 2017. [2] Interpretation of Natural Language Rules in Conversational Machine Reading. Marzieh Saeidi, Max Bartolo, Patrick Lewis, Sameer Singh, Tim Rocktäschel, Mike Sheldon, Guillaume Bouchard, Sebastian Riedel. EMNLP 2018. [3] Question Generation for Question Answering. Nan Duan, Duyu Tang, Peng Chen, Ming Zhou. EMNLP 2017. [4] Unsupervised Question Answering by Cloze Translation. Patrick Lewis, Ludovic Denoyer, Sebastian Riedel. ACL 2019. [5] Training Question Answering Models From Synthetic Data. Raul Puri, Ryan Spring, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro. EMNLP 2020. [6] Generating Question-Answer Hierarchies. Kalpesh Krishna and Mohit Iyyer. ACL 2019. [7] Recent Advances in Neural Question Generation. Liangming Pan, Wenqiang Lei, Tat-Seng Chua, Min-Yen Kan. arXiv 2019. [8] SQuAD: 100,000+ Questions for Machine Comprehension of Text. Pranav Rajpurkar, Jian Zhang,

Konstantin Lopyrev, Percy Liang. EMNLP 2016. 68 References

[9] Semantic Graphs for Generating Deep Questions. Liangming Pan, Yuxi Xie, Yansong Feng, Tat- Seng Chua, Min-Yen Kan. ACL 2020. [10] HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning. EMNLP 2018. [11] Exploring Question-Specific Rewards for Generating Deep Questions. Yuxi Xie, Liangming Pan, Dongzhe Wang, Min-Yen Kan, Yansong Feng. COLING 2020. [12] HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data. Wenhu Chen, Hanwen Zha, Zhiyu Chen, Wenhan Xiong, Hong Wang, William Wang. ENNLP 2020. [13] Unsupervised Multi-hop Question Answering by Question Generation. Liangming Pan, Wenhu Chen, Wenhan Xiong, Min-Yen Kan, William Yang Wang. arXiv 2020.

69 Thanks! Any questions?

Liangming Pan Email: [email protected]

70