VIANA: Visual Interactive Annotation of Argumentation

VIANA: Visual Interactive Annotation of Argumentation Fabian Sperrle* Rita Sevastjanova∗ Rebecca Kehlbeck∗ Mennatallah El-Assady∗ Figure 1: VIANA is a system for interactive annotation of argumentation. It offers five different analysis layers, each represented by a different view and tailored to a specific task. The layers are connected with semantic transitions. With increasing progress of the analysis, users can abstract away from the text representation and seamlessly transition towards distant reading interfaces. ABSTRACT are actively developing techniques for the extraction of argumen- Argumentation Mining addresses the challenging tasks of identi- tative fragments of text and the relations between them [41]. To fying boundaries of argumentative text fragments and extracting develop and train these complex, tailored systems, experts rely on their relationships. Fully automated solutions do not reach satis- large corpora of annotated gold-standard training data. However, factory accuracy due to their insufficient incorporation of seman- these training corpora are difficult and expensive to produce as they tics and domain knowledge. Therefore, experts currently rely on extensively rely on the fine-grained manual annotation of argumen- time-consuming manual annotations. In this paper, we present a tative structures. An additional barrier to unifying and streamlining visual analytics system that augments the manual annotation pro- this annotation process and, in turn, the generation of gold-standard cess by automatically suggesting which text fragments to annotate corpora is the subjectivity of the task. A reported agreement with next. The accuracy of those suggestions is improved over time by Cohen’s k [11] of 0:610 [64] between human annotators is consid- incorporating linguistic knowledge and language modeling to learn ered “substantial” [38] and is the state-of-the-art in the field. How- a measure of argument similarity from user interactions. Based on ever, for the development of automated techniques, we have to rely a long-term collaboration with domain experts, we identify and on the extraction of decisive features. Cabrio et al. [9] present a model five high-level analysis tasks. We enable close reading and mapping between discourse indicators and argumentation schmes, note-taking, annotation of arguments, argument reconstruction, ex- indicating a promising direction for more automation. We use such traction of argument relations, and exploration of argument graphs. automatically extracted discourse indicators as a reliable foundation To avoid context switches, we transition between all views through for annotation-guidance. After discourse units have been annotated seamless morphing, visually anchoring all text- and graph-based with discourse indicators and enriched with sentence-embedding layers. We evaluate our system with a two-stage expert user study vectors, they are used to train a measure of argument similarity. based on a corpus of presidential debates. The results show that ex- This measure is updated over time as users annotate more text. To perts prefer our system over existing solutions due to the speedup speed up training and remove clutter from the visual interface we in- provided by the automatic suggestions and the tight integration be- troduce a novel, viewport-dependent approach to suggestion decay. tween text and graph views. Including machine learning and visual analytics into annotation- systems presents a substantial step towards semi-automated argu- Keywords: Argumentation annotation, machine learning, user mentation annotation. Systems can rely on a bi-directional learning interaction, layered interfaces, semantic transitions loop to improve performance: first, they can learn from the avail- 1 INTRODUCTION able input data, providing both better recommendations and guidance for users. Second, systems can also learn from user interac- Argument mining is a flourishing research area that enables various tions to improve and guide the used machine learning algorithms. novel, linguistically-informed applications like semantic search en- Tackling these challenges and employing such progressive mixed- gines, chatbots or human-like discussion systems, as convincingly initiative learning, we present a novel visual analytics approach for demonstrated by IBM’s project debater [58]. To achieve reliable per- argumentation annotation in this paper. We base our design choices formance in these complex tasks, modern systems rely on the analy- on a long-term collaboration with experts from the humanities and sis of the underlying linguistic structures that characterize success- social sciences. We observed their work processes and underlying ful argumentation, rhetoric, and persuasion. Consequently, to dis- theories to gain insight into their respective fields [29]. The well- till the building blocks of argumentation from a text corpus, it is established Inference Anchoring Theory [8] (IAT) provides the solid not sufficient to employ off-the-shelf Natural Language Processing foundation of a theoretical framework that is capable of represent- techniques [65], which are typically developed for coarser analytical ing argumentative processes. Having acquired direct insight into ar- tasks (see [42] for an overview), such as with the high-level tasks of gumentation from our collaborations, we present a requirement and topic modeling [19] or sentiment analysis [5]. task analysis that informs the development of VIANA, our annota- Hence, to master the challenge of identifying argumentative sub- tion system, as well as future approaches in Fig. 3. structures in large text corpora, computational linguistic researchers Contributions – While we present VIANA in the context of the *University of Konstanz; e-mail: fi[email protected] Inference Anchoring Theory in this paper, its concepts are readily adaptable to other domain-specific theories and annotation prob- lems, for example from linguistics or political sciences. Thus, this paper’s contribution is two-fold. (i) We contribute a requirement- and task-analysis for effectively developing visual analytics systems in the field of argumentation annotation. (ii) We further taneously annotate texts from multiple sources. Like VIANA, both contribute the visual analytics application, VIANA, including a OVA and Araucaria enable annotation according to IAT. However, novel design of layered visual abstractions for a targeted analysis VIANA automatically aligns extracted argumentation graphs with the through semantic transitions, as well as a recommendation system text view and provides automatically updating suggestions to speed learning from both domain knowledge and user interaction, intro- up the annotation process. Stab et al. [60] have created a web-based ducing viewport-dependent suggestion decay. annotation tool, combining a text and graph view side by side. Users create arguments directly in the transcript, and each component is 2 RELATED WORK assigned an individual color. Relations between arguments (attack, support, sequence) are shown in a simple graph structure. All intro- Recent years have seen a rise of interactive machine learning [22] duced systems offer graph and text views and suffer from similar is- and such techniques are now commonly integrated into visual an- sues. It is usually hard to relate the graph structure to the original alytics systems, as recently surveyed by Endert et al. [21]. Often, text, and large input corpora make the presented information hard they are used to learn model refinements from user interaction [18] to manage. VIANA tackles these issues with task-specific interface or provide semantic interactions [20]. Semantic interactions are typ- layers and seamless transitions between text- and graph views. ically performed with the intent of refining or steering a machine- Interactive Recommender Systems – There are generally learning model. In VIANA, expert users perform implicit semantic three approaches to recommender systems: collaborative filtering, interactions, as their primary goal is the annotation of argumenta- content-based, and hybrid approaches [26]. Collaborative filtering tion. The result is a concealed machine teaching process [56] that is systems utilize ratings or interactions from other users to recom- not an end in itself, but a “by-product” of the annotation. mend items [28, 40, 53], while content-based systems [44] make pre- Close Reading and Annotation Interfaces – In their survey, dictions purely on attributes of the items under consideration. Hy- Janicke¨ et al. [32] present an overview of visualization techniques brid approaches combine both methods. Annotation suggestions by which support close and distant reading tasks. According to the VIANA are content-based. Various approaches have been developed authors, “close reading retains the ability to read the source text to react to changing ratings [37, 69] and evolving user preference without dissolving its structure.” [32] Distant reading generalizes or over time [24]. Feedback to recommender systems is either explicit, abstracts the text by presenting it using global features. for example in the form of ratings, or implicit, like in the number of Several systems combine the close and distant reading metaphors times a song has been played or skipped [47] or how often an item to provide deeper insights into textual data, such as LeadLine [13] has been preferred over another [39, 51]. Jannach et al. recently or EMDialog [30]. Koch et al. [36] have developed a

VIANA: Visual Interactive Annotation of Argumentation

Distant Reading, Computational Stylistics, and Corpus Linguistics the Critical Theory of Digital Humanities for Literature Subject Librarians David D

KB-Driven Information Extraction to Enable Distant Reading of Museum Collections Giovanni A

IXA-Pipe) and Parsing (Alpino)

ISSN: 1647-0818 Lingua

Enhancing Domain-Specific Entity Linking in DH

Prolegomena to the Automated Analysis of a Bilingual Poetry Corpus, with Particular Reference to an Annotated Edition of “The Cantos” of Ezra Pound

Where Close and Distant Readings Meet

Litbank: Born-Literary Natural Language Processing 1 Introduction

Cultivating Capacity, Creating Change

Digital Approaches Towards Serial Publications Joke Daems, Gunther Martens, Seth Van Hooland, and Christophe Verbruggen

The Distant Reading of Religious Texts: a “Big Data” Approach to Mind-Body Concepts In

(Open-Source-)OCR-Workflows