Alpacatag: an Active Learning-Based Crowd Annotation Framework for Sequence Tagging

AlpacaTag: An Active Learning-based Crowd Annotation Framework for Sequence Tagging Bill Yuchen Liny∗ Dong-Ho Leey∗ Frank F. Xuz Ouyu Lany and Xiang Reny fyuchen.lin,dongho.lee,olan,[email protected], [email protected] yComputer Science Department zLanguage Technologies Institute University of Southern California Carnegie Mellon University Abstract Consolidation & Incremental Training We introduce an open-source web-based data Real-time Deployment API Backend Model annotation framework (AlpacaTag) for se- Complicated Unlabeled Instances Downstream quence tagging tasks such as named-entity Systems recognition (NER). The distinctive advantages of AlpacaTag are three-fold. 1) Active in- Instance Sampling via telligent recommendation: dynamically sug- Active Learning gesting annotations and sampling the most in- Matching formative unlabeled instances with a back-end Frequent NPs + Dictionary A batch of raw sentences active learned model; 2) Automatic crowd consolidation: enhancing real-time inter- Recommendations annotator agreement by merging inconsistent labels from multiple annotators; 3) Real-time Crowd Annotators Crowd Annotations AlpacaTag model deployment: users can deploy their Figure 1: Overview of the AlpacaTag framework. models in downstream systems while new annotations are being made. AlpacaTag is Therefore, it is still an important research ques- a comprehensive solution for sequence label- ing tasks, ranging from rapid tagging with tion that how we can develop a better annotation recommendations powered by active learning framework to largely reduces human efforts. and auto-consolidation of crowd annotations Existing open-source sequence annotation tools to real-time model deployment. (Stenetorp et al., 2012; Druskat et al., 2014a; de Castilho et al., 2016; Yang et al., 2018a) mainly 1 Introduction focus on enhancing the friendliness of user inter- faces (UI) such as data management, fast tagging Sequence tagging is a major type of tasks in natu- with shortcut keys, supporting more platforms, ral language processing (NLP), including named- and multi-annotator analysis. We argue that there entity recognition (detecting and typing entity are still three important yet underexplored direc- names), keyword extraction (e.g. extracting as- tions of improvements: 1) active intelligent rec- pect terms in reviews or essential terms in queries), ommendation, 2) automatic crowd consolidation, chunking (extracting phrases), and word segmen- 3) real-time model deployment. Therefore, we tation (identifying word boundary in languages propose a novel web-based annotation tool named like Chinese). State-of-the-art supervised ap- AlpacaTag1 to address these three problems. proaches to sequence tagging are highly depen- Active intelligent recommendation (x3) aims dent on numerous annotations. New annota- to reduce human efforts at both instance-level and tions are usually necessary for a new domain or corpus-level by learning a back-end sequence tag- task, even though transfer learning techniques (Lin ging model incrementally with incoming human and Lu, 2018) can reduce the amount of them annotations. At the instance-level, AlpacaTag by reusing data of other related tasks. How- applies the model predictions on current unlabeled ever, manually annotating sequences can be time- sentences as tagging suggestions. Apart from that, consuming, expensive, and thus hard to scale. 1The source code is publicly available at http:// ∗Both authors contributed equally. inklab.usc.edu/AlpacaTag/. we also greedily match frequent noun phrases and 2 Overview of AlpacaTag a dictionary of already annotated spans as recom- As shown in Figure1, AlpacaTag has an ac- mendations. Annotators can easily confirm or de- tively learned back-end model (top-left) in addi- cline such recommendations with our specialized tion to a front-end web-UI (bottom). Thus, we UI. At the corpus-level, we use active learning al- have two separate servers: a back-end model gorithms (Settles, 2009) for selecting next batches server and a front-end annotation server. The of instances to annotate based on the model. The model server is built with PyTorch and supports goal is to select the most informative instances for a set of APIs for communications between the two human to tag and thus to achieve a more cost- servers and model deployment. The annotation effective way of using human efforts. server is built on Django in Python, which in- Automatic crowd consolidation (x4) of the an- teracts with administrators and annotators. notations from multiple annotators is an underex- To start annotating for a domain of interest, ad- plored topic in developing crowd-sourcing annota- mins should login and create a new project. Then, tion frameworks. As a crowdsourcing framework, they need to further import their raw corpus (with AlpacaTag collects annotations from multiple CSV or JSON format), and assign the tag space (non-expert) contributors with lower cost and a with associated colors and shortcut keys. Admins higher speed. However, annotators usually have can further set the batch size to sample instances different confidences, preferences, and biases in and to update back-end models. Annotators can annotating, which leads to possibly high inter- login and annotate (actively sampled) unlabeled annotator disagreement. It is shown very challeng- instances with tagging suggestions. We further ing to train models with such noisy crowd annota- present our three key features in the next sections. tions (Nguyen et al., 2017; Yang et al., 2018b). We argue that consolidating crowd labels during an- 3 Active Intelligent Recommendation notating can lead annotators to achieve real-time This section first introduces the back-end model consensus, and thus decrease disagreement of an- (x3.1) and then presents how we use the back- notations instead of exhausting post-processing. end model for both instance-level recommendations (tagging suggestions, x3.2) as well as corpus- Real-time model deployment (x5) is also a de- level recommendations (active sampling, x3.3). sired feature for users. We sometimes need to deploy a state-of-the-art sequence tagging model 3.1 Back-end Model: BLSTM-CRF while the crowdsourcing is still ongoing, such that The core component of the proposed AlpacaTag users can facilitate the developing of their tagging- framework is the back-end sequence tagging required systems with our APIs. model, which is learned with an incremental To the best of our knowledge, there is no ex- active learning scheme. We use the state-of- isting annotation framework enjoying such three the-art sequence tagging model as our back-end features. AlpacaTag is the first unified frame- model (Lample et al., 2016; Lin et al., 2017; work to address these problems, while inheriting Liu et al., 2018), which is based on bidirectional the advantages of existing tools. It thus provides LSTM networks with a CRF layer (BLSTM- a more comprehensive solution to crowdsourcing CRF). It can capture character-level patterns, and annotations for sequence tagging tasks. encode token sequences with pre-trained word em- beddings, as well as using CRF layers to capture In this paper, we first present the high-level structural dependencies in tagging. In this section, structure and design of the proposed AlpacaTag we assume the model is fixed as we are talking framework. Three key features are then introduced about how to use it for infer recommendations. in detail: active intelligent recommendation (x3), How to update it by consolidating crowd annota- automatic crowd consolidation (x4), and real-time tions is illustrated in the Section x4. model deployment (x5). Experiments (x6) are con- ducted for showing the effectiveness of the pro- 3.2 Tagging Suggestions (Instance-level Rec.) posed three features. Comparisons with related Apart from the inference results from the back- works are discussed in x7. Section x8 shows con- end model on the sentences, we also include two clusion and future directions. other kinds of tagging suggestions mentioned in lowing priority (ordered by their precision): dictionary matches > back-end model inferences > (frequent) noun phrases, while the coverage of the suggestions are in the opposite order. Figure2 illustrates how AlpacaTag presents the tagging suggestions for the sentence “Donald Trump had a dinner with Hillary Clinton in the white house.”. The two suggestions are shown (a) the sentence and annotations are at the upper section; tagging suggestions are shown as in the bottom section as underscored spans “Don- underlined spans in the lower section. ald Trump” and “Hilary Clinton” (Fig.2a). When annotators want to confirm “Hilary Clinton” as a true annotation, they first click the span and then click the right type in the appearing float window (Fig.2b). They can also press the customized shortcut key for fast tagging. Note that the PER button is underscored in Fig.2b, meaning that it is a recommended type for annotators to choose. Fig.2c shows that after confirming, it is added into final annotations. We want to emphasize that our (b) after click on a suggested span, a floating designed UI well solves the problem of confirming window will show up near for confirming the types and declining suggestions. The float windows re- (suggested type is bounded with red line). duce mouse movement time very much and let annotators easily confirm or change types by clicking or pressing shortcuts to correct suggested types. What if annotators want to tag spans not recommended? Normally annotating in AlpacaTag is as friendly as other tools: annotators can simply select the spans they want to tag in the upper section and click or press the associated type (Fig.3). We implement this UI design based on an open- (c) after click a suggested type or press a shortcut source Django framework named doccano2. key (e.g. ‘p’), confirmed annotations will show up in the upper annotation section. Figure 2: The workflow for annotators to confirm a given tagging suggestion (“Hillary Clinton” as PER). 2. click button or press shortcut key 3. get a tag 1. select a span Fig.1: (frequent) noun phrases and the dictionary of already tagged spans.

Alpacatag: an Active Learning-Based Crowd Annotation Framework for Sequence Tagging

9.1.1 Lesson 5

A Super-Lightweight Annotation Tool for Experts

Active Learning with a Human in the Loop

The Influence of Text Annotation Tools on Print and Digital Reading Comprehension

A Multi-Layer Markup Language for Geospatial Semantic Annotations Ludovic Moncla, Mauro Gaio

BOEMIE Ontology-Based Text Annotation Tool

PDF Annotation with Labels and Structure

YEDDA: a Lightweight Collaborative Text Span Annotation Tool

Close Reading Strategies for Difficult Text

Collaborative Annotation for Reliable Natural Language Processing

Observations on Annotations

Simplified Dependency Annotations with GFL-Web