A Corpus of Argument Networks: Using Graph Properties to Analyse Divisive Issues
Total Page:16
File Type:pdf, Size:1020Kb
A Corpus of Argument Networks: Using Graph Properties to Analyse Divisive Issues Barbara Konat1, John Lawrence1, Joonsuk Park2, Katarzyna Budzynska3;1, Chris Reed1 1Centre for Argument Technology, University of Dundee, Scotland 2Department of Computer Science, Cornell University, New York, USA 3Institute of Philosophy and Sociology, Polish Academy of Sciences, Poland Abstract Governments are increasingly utilising online platforms in order to engage with, and ascertain the opinions of, their citizens. Whilst policy makers could potentially benefit from such enormous feedback from society, they first face the challenge of making sense out of the large volumes of data produced. This creates a demand for tools and technologies which will enable governments to quickly and thoroughly digest the points being made and to respond accordingly. By determining the argumentative and dialogical structures contained within a debate, we are able to determine the issues which are divisive and those which attract agreement. This paper proposes a method of graph-based analytics which uses properties of graphs representing networks of arguments pro- & con- in order to automatically analyse issues which divide citizens about new regulations. By future application of the most recent advances in argument mining, the results reported here will have a chance to scale up to enable sense-making of the vast amount of feedback received from citizens on directions that policy should take. Keywords: Argument Analysis, Argument Structure, Automatic Text Interpretation, Divisiveness Measures, e-Participation, Graph- based Analytics 1. Introduction 2. Related work Automated identification of divisive issues in the context The aim of this paper is to show how the analysis of argu- of argumentative dialogue is closely related to two research mentative and dialogical structures, and in particular pro- areas: argument analysis and controversy detection. and con-arguments, allows for the principled identification of divisive issues in an online corpus of debates. More pre- 2.1. Argument analysis cisely, we show that structuring data as networks of inter- Argument analysis is one of two key areas, together with acting arguments, claims, reasons and conflicts means that argument evaluation, of the theory of argumentation (see the data is represented in the form of a graph. This repre- e.g. (van Eemeren et al., 2014)). The most recent ad- sentation, in turn, offers the possibility of using graph prop- vances in computational linguistics have been exploring erties to automatically analyse which issues attracted most methods and techniques for making automation of this pro- attention and which most divided disputants. cess possible and efficient. Argument mining aims at ex- traction of argument structures from natural language texts We present an annotated corpus which builds on previ- (see e.g. (Peldszus and Stede, 2013)). Models of arguments ous work in argument mining (Park and Cardie, 2014) by adopted in computational approaches vary with regards to adding further mark-up capturing dialogical interactions their complexity. In some studies the argument is under- between participants in an eRulemaking system for online stood as just one proposition, whereas in others as a binary, deliberative democracy. We demonstrate how the rich net- reason-conclusion pair. The theory of argumentation how- work structure available in this style of annotation supports ever allows for more developed and complex conceptualisa- multiple interpretations of divisiveness which in turn pro- tions, using argument diagrams (Toulmin, 1958), argument vide powerful insights into the nature of the debate – in- schemes (Walton et al., 2008), distinguishing between con- sights which can be valuable in creating summaries and vergent and linked arguments (Freeman, 1991) or between overviews of complex, multi-faceted debates for decision- supports and attacks (Dung, 1995). The initial stage in ar- makers who need sense-making tools to develop a coherent gument mining consists of manual annotation of an argu- understanding of large volumes of data from public contri- ment dataset, using tools which implement the theoretical butions – in this case on contentious policy issues within concepts. Several resources and tools for manual annota- the remit of the Department of Transportation in the US. tion and visualisation of arguments are available, including The paper is structured as follows. Section 2 discusses Carneades (Gordon et al., 2007), AIFdb (Lawrence et al., some related work in argument analysis and controversy 2012) and Araucaria (Reed and Rowe, 2004)). detection. Section 3 presents language resources used in In the last decade, attempts have been made to automate this study. Section 4 demonstrates the type of informa- the process of argument analysis, as manual annotation is tion which can be extracted by applying our model – we very time consuming. Text genres used in the studies range show how the structure of argument networks can be used from restricted, formalised language (legal texts or aca- to calculate measures of divisiveness with respect to citi- demic papers) to more unstructured natural language (such zens’ opinions about new regulations. Finally, Section 5 as spoken dialogue or Internet fora). The field of argument shows the automation of the model. mining started to gain interest over a decade ago with Ar- 3899 gumentative Zoning - an attempt at automated extraction erties to automatically compute controversial issues. This of arguments from scientific papers (Teufel, 1999; Teufel means our work presented here constitutes only one (final) and Moens, 2002). In the study, arguments are understood stage of the argument mining process, i.e. the automated as spans of texts serving various argumentative functions, interpretation and summarization of mined data. Second, and automated recognition obtained the highest F-score of automatic extraction of just two categories (arguments pro- 0.86 for the recognition of parts of papers in which an au- & con-) allows for the more fine-grained overview of com- thor refers to their own research and the lowest F-score ments on policies. Using the apparatus of graph theory, we of 0.26 for the recognition of parts in which an author can introduce different and precise definitions of divisive- presents arguments against other approaches. In (Moens ness (see Section 4.2. for two such definitions) in order to et al., 2007) similarly, spans of texts (sentences) are clas- capture various interpretations of what divides people. In sified as either argumentative or non-argumentative, find- order to make a clear distinction between our approach and ing that automated extraction of arguments is more efficient that of controversy mining, we introduce a distinctive term from well-structured texts than from informal Internet de- ‘divisive’ instead of ‘controversial’. bates (for example newspaper articles achieved 73.22% ac- curacy whereas discussion fora – 68.4%). Mining of argu- 3. Material and methods ments understood as graphs, explored in (Palau and Moens, 3.1. e-Participation in the US 2009) results in 73% accuracy on the Araucaria corpus and eRulemaking (online deliberative democracy) is a multi- 80% on the ECHR corpus (a corpus of texts issued by the step process of social media outreach that US federal agen- European Court of Human Rights). Another set of well- cies use to consult with citizens about new regulations on structured argumentative texts (“microtexts” in German) is health and safety, finance, and other complex topics. In used in (Peldszus, 2014) and provides a highest achieved F- order to ensure public awareness and participation of new score of 0.7 for automated extraction of reason-conclusion regulations, agencies are required to publish materials de- structures. scribing the legal basis, factual and technical support, pol- Mining the arguments from less structured texts can be sup- icy rationale, and costs and benefits of a proposal. They ported with various methods. Combining the statistical ap- must specify a comment period, usually 60 to 90 days, dur- proach with the application of argument schemes to argu- ing which anyone may send the agency their comments. ment mining (Lawrence and Reed, 2015) allows for the F- Further, agencies are required to respond to information, score of 0.83 on the corpus of Internet arguments. Naturally arguments, and criticisms presented by the public as part of occurring dialogues (either face-to-face or on-line) do not its final rule (Lubbers, 2012). have pre-defined structure, which makes extraction of argu- Once the agency introduces the new regulation (either in ments even more difficult. This is why mining arguments the original form or a modified one as a result of public from dialogue (Budzynska et al., 20xx) explores how ar- consultation), it is obliged to summarise the comments it gumentative structures are built upon dialogical structures received; respond to questions and criticisms; and explain (mainly illocutionary connections) in interaction. why it did not make changes. In case of complaints from 2.2. Controversy detection citizens, the court will use this documentation in order to make sure that the agency respected public comments in a Controversy detection seeks to develop automated meth- satisfactory way. The court will return the regulation to the ods for identifying events or issues which attract conflict- agency for modification, if it decided that some citizens’ ing opinions. Controversy of a given