Advances in Debating Technologies: Building AI That Can Debate Humans

Roy Bar-Haim Liat Ein-Dor Matan Orbach Elad Venezian Noam Slonim IBM Research {roybar, liate, matano, eladv, noams}@il.ibm.com

1 Tutorial Description cipled arguments to be used in conjunction with corpus-mined arguments, organizing the arguments 1.1 Background and Goals into a compelling narrative, recognizing the argu- Argumentation and debating are fundamental ca- ments made by the human opponent and making pabilities of human intelligence. They are essen- a rebuttal. For each of these problems we will tial for a wide range of everyday activities that present relevant scientific work from various re- involve reasoning, decision making or persuasion. search groups as well as our own. Many of the un- Computational Argumentation is defined as “the ap- derlying capabilities of Project Debater have been plication of computational methods for analyzing made freely available for academic research, and and synthesizing argumentation and human debate” the tutorial will include a detailed explanation of (Gurevych et al., 2016). Over the last few years, how to use and leverage these tools. this field has been rapidly evolving, as evident by A complementary goal of the tutorial is to pro- the growing research community, and the increas- vide a holistic view of a debating system. Such ing number of publications in top NLP and AI a view is largely missing in the academic litera- conferences. ture, where each paper typically addresses a spe- The tutorial focuses on Debating Technologies, cific problem in isolation. We present a complete a sub-field of computational argumentation defined pipeline of a debating system, and discuss the infor- as “computational technologies developed directly mation flow and the interaction between the various to enhance, support, and engage with human debat- components. We will also share our experience and ing” (Gurevych et al., 2016). A recent milestone in lessons learned from developing such a complex, this field is Project Debater, which was revealed in large scale NLP system. Finally, the tutorial will 2019 as the first AI system that can debate human discuss practical applications and future challenges experts on complex topics.1 Project Debater is the of debating technologies. third in the series of IBM Research AI’s grand challenges, following Deep Blue and Watson. It has 1.2 Contents been developed for over six years by a large team In this section we provide more details about the of researchers and engineers, and its live demon- contents of the tutorial. The tutorial outline and stration in February 2019 received massive media estimated schedule are listed in Section3. attention. This research effort has resulted in more than 50 scientific papers to date, and many datasets Introduction. The tutorial first provides an intro- freely available for research purposes. duction to computational argumentation. It then In this tutorial, we aim to answer the question: introduces the Project Debater grand challenge and “what does it take to build a system that can de- provides a high-level view of the building blocks bate humans”? Our main focus is on the scientific that comprise a debating system. problems such system must tackle. Some of these The next parts of the tutorial describe each of intriguing problems include argument retrieval for these building blocks in depth. a given debate topic, argument quality assessment Argument mining. The core of a debating and stance classification, identifying relevant prin- system is argument mining – finding rel- 1https://www.research.ibm.com/ evant arguments and argument components artificial-intelligence/project-debater/ (claim/conclusion, evidence/premise) for a given

1 Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Tutorial Abstracts, pages 1–5, August 1st, 2021. ©2021 Association for Computational Linguistics debate topic, either in a given article, or in a large From arguments to narrative. A debating sys- corpus. tem must arrange the arguments obtained from various sources (arguments mined from a corpus, Argument evaluation and analysis. The next principled arguments, and counter arguments for tasks in the pipeline involve analysis of the ex- rebuttal) into a coherent and persuasive narrative tracted arguments. Argument quality assessment that would keep the audience’s attention for several aims to select the more convincing arguments. minutes. This section describes the various steps in Stance classification aims to distinguish between the narrative generation pipeline. We also discuss arguments that support our side in the debate and the role of humor in keeping a debate lively. those supporting the opponent’s side. Moving forward – applications and implica- Modeling human dilemma. A complementary tions. In this part we discuss possible applica- source for argumentation that is widely used by tions and future directions for debating technolo- professional human debaters is principled argu- gies. As an example, we present Speech by Crowd, ments, which are relevant for a wide variety of a platform for crowdsourcing decision support. topics. A common example is the black market This platform collects arguments from large audi- argument, potentially relevant in the context of de- ences on debatable topics and generates meaningful bates on banning a specific product or a service narratives summarizing the arguments for each side (e.g., “we should ban alcohol”). By this argument, of the debate. We also discuss Key Point Analysis, imposing a ban leads to the creation of a black a novel method for extracting the main points in a market, which in turn makes products or services large collection of arguments, and quantifying the obtained therein less safe, leads to exploitation, at- prevalence of each point in the data. tracts criminal elements, and so on. We discuss recent work on creating a taxonomy of common Demo session - using debating technologies in principled arguments and automatically matching your application. Many of the Project Debater relevant arguments from this taxonomy to a given components presented in this tutorial have been debate topic. recently released as cloud APIs, and are freely available for academic use.2 In the final part of Listening comprehension and rebuttal. In ad- the tutorial, we provide an overview of these APIs, dition to presenting one side of the debate, engag- and demonstrate their use for building practical ing in a competitive debate further requires a debat- applications. ing system to effectively rebut arguments raised by the human opponent. The system must listen to an 1.3 Relevance to the Computational argumentative speech in real-time, understand the Linguistics Community main arguments, and produce persuasive counter- The tutorial is relevant to a broad audience of NLP arguments. researchers and practitioners, working on problems The nature of the argumentation domain and related to argumentation mining, stance classifica- the characteristics of competitive debates make the tion, discourse analysis, text summarization, NLG, understanding of such spoken content challeng- dialogue systems, and more. ing. Expressed ideas often span multiple, non- consecutive sentences and many arguments are 2 Tutorial Type alluded to rather than explicitly stated. Further This is a cutting-edge tutorial. The main differ- difficulty stems from the requirement to identify ence between this tutorial and previous tutorials on and rebut the most important parts of a speech that computational argumentation or argument mining is several minutes long. This contrasts with today’s (Slonim et al., 2016; Budzynska and Reed, 2019) conversational agents, which aim at understanding is that we focus on the science behind debating a single functional command from short inputs. systems — systems that can engage in a live de- Core NLP capabilities. This section describes bate with humans. Accordingly, a large portion several core NLP capabilities developed as part of the tutorial’s topics, e.g., listening comprehen- of Project Debater, including thematic clustering, sion, rebuttal, narrative generation and modeling highly scalable Wikification and semantic similar- 2https://early-access-program.debater. ity for phrases and Wikipedia concepts. res.ibm.com/academic_use

2 human dilemma, was not covered in previous tu- • From understanding to rebuttal torials. Some of the topics, like argument mining, Part 6: Core NLP capabilities (10 min) argument quality and stance classification were pre- • Thematic clustering viously discussed in the tutorial of Slonim et al. (2016), however we will mostly focus on more re- • Wikification cent advancements in these areas. The tutorial of Budzynska and Reed(2019) focused on argument • Multi-word and concept-level similarity structure parsing based on argumentation theory, Part 7: From Arguments to Narrative (10 min) which can be viewed as complementary to the con- • Narrative generation pipeline: argument fil- tent of the current tutorial. tering, redundancy removal, clustering, theme extraction, rephrasing and speech generation 3 Outline and Estimated Schedule • Keeping a live debate lively: the importance Part 1: Introduction (20 min) of humor • What is Computational Argumentation? Part 8: Moving Forward – Applications and Implications (20 min) • Project Debater - AI that can debate human experts; outside the AI comfort zone • Possible applications • Speech by crowd • Building blocks: decomposing the grand challenge • Key point analysis Part 2: Argument Mining (25 min) • Future directions Part 9: Demo Session - Using Debating Tech- • What is argument mining? nologies in Your Application (30 min) • Identification of argument components • Overview of Project Debater APIs • Document-level vs. sentence level approach • Usage examples • Corpus-wide argument mining 4 Prerequisites • Debate topic expansion The tutorial will be self-contained. We assume basic knowledge of NLP and machine learning, at • Token-level argument mining the level of introductory courses in these areas. Part 3: Argument Evaluation and Analysis (25 5 Reading List min) 1. A survey on argument mining: Lawrence and • Argument stance classification Reed(2019)

• Argument quality 2. Project Debater: Slonim et al.(2021)

Part 4: Modeling Human Dilemma (15 min) 3. Identiﬁcation of argument components within an article: Levy et al.(2014), Rinott et al. • Common principled arguments (2015), Lippi and Torroni(2015) • When do principled arguments apply? 4. Corpus-wide argument mining: Stab et al. (2018), Ein-Dor et al.(2020) Coffee break 5. Argument quality: Wachsmuth et al.(2017), Part 5: Listening Comprehension and Re- Habernal and Gurevych(2016) buttal (25 min) 6. Stance classiﬁcation: Bar-Haim et al.(2017) • Debate vs. classical conversation systems 7. Modeling human dilemma: Bilu et al.(2019) • Understanding the gist of long, spontaneous 8. Listening Comprehension: Mirkin et al. speech (2018)

3 6 Tutorial Presenters has been leading research activities within Project Debater on tasks such as semantic similarity and ar- • Roy Bar-Haim, IBM Research AI gumentation mining. She has a diverse background [email protected] in machine learning, having worked on a variety of https://researcher.watson.ibm.com/ domains including computational linguistics, com- researcher/view.php?person=il-ROYBAR putational biology, fraud detection and theoretical • Liat Ein-Dor, IBM Research AI physics. She has publications in all these ﬁelds. [email protected] https://researcher.watson.ibm.com/ Matan Orbach is a Research Staff Member at researcher/view.php?person=il-LIATE IBM Research AI. Since joining IBM in 2014, he • Matan Orbach, IBM Research AI has worked on a diverse set of NLP tasks, recently [email protected] focusing on multilingual stance detection and tar- https://researcher.watson.ibm.com/ geted sentiment analysis. Within Project Debater, researcher/view.php?person=il-MATANO Matan has led a team working on rebuttal generation through the use of principled arguments. Prior • Elad Venezian, IBM Research AI to joining IBM, he received his M.Sc. from the [email protected] faculty of Electrical Engineering at the Technion, https://researcher.watson.ibm.com/ where his research focused on graph-based semi- researcher/view.php?person=il-ELADV supervised learning.

• Noam Slonim, IBM Research AI [email protected] Elad Venezian is a Research Staff Member at https://researcher.watson.ibm.com/ IBM Research AI. He is currently the chief archi- researcher/view.php?person=il-NOAMS tect of Project Debater with a focus on making Project Debater technologies available to academia Roy Bar-Haim is a Research Staff Member at and business. Prior to this role, Elad served in dif- IBM Research AI. Since joining Project Debater in ferent technical and leadership roles in the Project 2013, he has been leading a global research team Debater grand challenge, among them, leading the working on stance classiﬁcation, sentiment anal- speech generation team. Elad received his M.Sc. ysis, argument mining and argument summariza- from the faculty of Electrical Engineering at the tion. He has published in leading NLP and AI Tel Aviv University, where his research focused on conferences and journals, including ACL, EMNLP, non-linear systems. AAAI, COLING, EACL, JAIR and JNLE. He reg- ularly reviews for top NLP and AI conferences, Noam Slonim is a Distinguished Engineer at and serves as a member of the TACL elite reviewer IBM Research AI. He received his doctorate from team. Roy received his Ph.D in Computer Science the Interdisciplinary Center for Neural Computa- from Bar-Ilan University. Before joining IBM, he tion at the Hebrew University and held a post-doc led NLP research teams in several startup compa- position at the Genomics Institute at Princeton Uni- nies. Roy delivered a one-hour talk about Project versity. Noam proposed to develop Project Debater Debater at the NeurIPS 2018 Expo. in 2011. He has been serving as the Principal Inves- Liat Ein-Dor is a Research Staff Member at tigator of the project since then. Noam published IBM Research AI. She received her Ph.D in theoret- around 60 peer reviewed articles, focusing on the ical physics from Bar-Ilan University in 2001 and last few years on advancing the emerging ﬁeld of has taught several courses there. In 2002 she was Computational Argumentation. Noam initiated and a postdoctoral fellow in Laboratoire de Physique co-organized the ACL-2016 tutorial on NLP Ap- Theorique´ de l’Ecole Normale Superieure´ Paris, proaches to Computational Argumentation and the and from 2003 to 2006 she was a Postdoctoral Fel- 2015 Dagstuhl workshop on Debating Technolo- low and a Research Consultant at the Weizmann In- gies. In EMNLP 2018 he co-chaired the Argu- stitute of Science. Since 2006, Liat has been work- ment Mining workshop. Noam delivered a keynote ing as a research scientist in the hi-tech industry, speech on Project Debater at EMNLP 2019. and joined IBM’s Haifa Research Lab in 2010. She

4 References on Artificial Intelligence, IJCAI’15, pages 185–191. AAAI Press. Roy Bar-Haim, Indrajit Bhattacharya, Francesco Din- uzzo, Amrita Saha, and Noam Slonim. 2017. Stance Shachar Mirkin, Guy Moshkowich, Matan Orbach, Lili classification of context-dependent claims. In Pro- Kotlerman, Yoav Kantor, Tamar Lavee, Michal Ja- ceedings of the 15th Conference of the European covi, Yonatan Bilu, Ranit Aharonov, and Noam Chapter of the Association for Computational Lin- Slonim. 2018. Listening comprehension over argu- guistics: Volume 1, Long Papers, pages 251–261, mentative content. In Proceedings of the 2018 Con- Valencia, Spain. Association for Computational Lin- ference on Empirical Methods in Natural Language guistics. Processing, pages 719–724, Brussels, Belgium. As- sociation for Computational Linguistics. Yonatan Bilu, Ariel Gera, Daniel Hershcovich, Ben- jamin Sznajder, Dan Lahav, Guy Moshkowich, Ruty Rinott, Lena Dankin, Carlos Alzate Perez, Anael Malet, Assaf Gavron, and Noam Slonim. Mitesh M. Khapra, Ehud Aharoni, and Noam 2019. Argument invention from first principles. In Slonim. 2015. Show me your evidence - an au- Proceedings of the 57th Annual Meeting of the Asso- tomatic method for context dependent evidence de- ciation for Computational Linguistics, pages 1013– tection. In Proceedings of the 2015 Conference on 1026, Florence, Italy. Association for Computational Empirical Methods in Natural Language Processing, Linguistics. pages 440–450, Lisbon, Portugal. Association for Computational Linguistics. Katarzyna Budzynska and Chris Reed. 2019. Ad- vances in argument mining. In Proceedings of the Noam Slonim, Yonatan Bilu, Carlos Alzate, Roy 57th Annual Meeting of the Association for Compu- Bar-Haim, Ben Bogin, Francesca Bonin, Leshem tational Linguistics: Tutorial Abstracts, pages 39– Choshen, Edo Cohen-Karlik, Lena Dankin, Lilach 42, Florence, Italy. Association for Computational Edelstein, Liat Ein-Dor, Roni Friedman-Melamed, Linguistics. Assaf Gavron, Ariel Gera, Martin Gleize, Shai Gretz, Dan Gutfreund, Alon Halfon, Daniel Hershcovich, Liat Ein-Dor, Eyal Shnarch, Lena Dankin, Alon Hal- Ron Hoory, Yufang Hou, Shay Hummel, Michal fon, Benjamin Sznajder, Ariel Gera, Carlos Alzate, Jacovi, Charles Jochim, Yoav Kantor, Yoav Katz, Martin Gleize, Leshem Choshen, Yufang Hou, David Konopnicki, Zvi Kons, Lili Kotlerman, Dalia Yonatan Bilu, Ranit Aharonov, and Noam Slonim. Krieger, Dan Lahav, Tamar Lavee, Ran Levy, Naf- 2020. Corpus wide argument mining - A working tali Liberman, Yosi Mass, Amir Menczel, Shachar solution. In Proceedings of the Thirty-Fourth AAAI Mirkin, Guy Moshkowich, Shila Ofek-Koifman, Conference on Artificial Intelligence, pages 7683– Matan Orbach, Ella Rabinovich, Ruty Rinott, Slava 7691. AAAI Press. Shechtman, Dafna Sheinwald, Eyal Shnarch, Ilya Shnayderman, Aya Soffer, Artem Spector, Ben- Iryna Gurevych, Eduard H. Hovy, Noam Slonim, jamin Sznajder, Assaf Toledo, Orith Toledo-Ronen, and Benno Stein. 2016. Debating Technologies Elad Venezian, and Ranit Aharonov. 2021. An au- (Dagstuhl Seminar 15512). Dagstuhl Reports, tonomous debating system. Nature, 591(7850):379– 5(12):18–46. 384.

Ivan Habernal and Iryna Gurevych. 2016. Which ar- Noam Slonim, Iryna Gurevych, Chris Reed, and Benno gument is more convincing? analyzing and predict- Stein. 2016. NLP approaches to computational argu- ing convincingness of web arguments using bidi- mentation. In Proceedings of the 54th Annual Meet- rectional LSTM. In Proceedings of the 54th An- ing of the Association for Computational Linguistics: nual Meeting of the Association for Computational Tutorial Abstracts, Berlin, Germany. Association for Linguistics (Volume 1: Long Papers), pages 1589– Computational Linguistics. 1599, Berlin, Germany. Association for Computa- Christian Stab, Tristan Miller, Benjamin Schiller, tional Linguistics. Pranav Rai, and Iryna Gurevych. 2018. Cross- topic argument mining from heterogeneous sources. John Lawrence and Chris Reed. 2019. Argument In Proceedings of the 2018 Conference on Em- mining: A survey. Computational Linguistics, pirical Methods in Natural Language Processing, 45(4):765–818. pages 3664–3674, Brussels, Belgium. Association Ran Levy, Yonatan Bilu, Daniel Hershcovich, Ehud for Computational Linguistics. Aharoni, and Noam Slonim. 2014. Context depen- Henning Wachsmuth, Nona Naderi, Yufang Hou, dent claim detection. In Proceedings of COLING Yonatan Bilu, Vinodkumar Prabhakaran, Tim Al- 2014, the 25th International Conference on Compu- berdingk Thijm, Graeme Hirst, and Benno Stein. tational Linguistics: Technical Papers, pages 1489– 2017. Computational argumentation quality assess- 1500, Dublin, Ireland. Dublin City University and ment in natural language. In Proceedings of the 15th Association for Computational Linguistics. Conference of the European Chapter of the Associa- tion for Computational Linguistics: Volume 1, Long Marco Lippi and Paolo Torroni. 2015. Context- Papers, pages 176–187, Valencia, Spain. Associa- independent claim detection for argument mining. tion for Computational Linguistics. In Proceedings of the 24th International Conference