YEDDA: a Lightweight Collaborative Text Span Annotation Tool
Total Page:16
File Type:pdf, Size:1020Kb
YEDDA: A Lightweight Collaborative Text Span Annotation Tool Jie Yang, Yue Zhang, Linwei Li, Xingxuan Li Singapore University of Technology and Design fjie yang, linwei li, xingxuan [email protected] yue [email protected] Abstract Raw Text Annotated Text In this paper, we introduce YEDDA, a Multi-Annotation lightweight but efficient and comprehen- Analysis Results sive open-source tool for text span an- Annotator 1 notation. YEDDA provides a systematic Annotator 2 Annotator Admin solution for text span annotation, rang- Interface Toolkits Administrator ing from collaborative user annotation to administrator evaluation and analysis. It Detailed Pairwise Annotator n Annotation Report overcomes the low efficiency of traditional Feedback text annotation tools by annotating entities through both command line and shortcut Figure 1: Framework of YEDDA. keys, which are configurable with custom labels. YEDDA also gives intelligent rec- ommendations by learning the up-to-date process but rarely consider the post-annotation annotated text. An administrator client quality analysis, which is necessary due to the is developed to evaluate annotation qual- inter-annotator disagreement. In addition to the ity of multiple annotators and generate de- annotation quality, efficiency is also critical in tailed comparison report for each annota- large-scale annotation task, while being relatively tor pair. Experiments show that the pro- less addressed in existing annotation tools (Ogren, posed system can reduce the annotation 2006; Stenetorp et al., 2012). Besides, many tools time by half compared with existing anno- (Ogren, 2006; Chen and Styler, 2013) require a tation tools. And the annotation time can complex system configuration on either local de- be further compressed by 16.47% through vice or server, which is not friendly to new users. intelligent recommendation. To address the challenges above, we propose YEDDA1 , a lightweight and efficient annotation 1 Introduction tool for text span annotation. A snapshot is shown in Figure2. Here text span boundaries are se- Natural Language Processing (NLP) systems rely arXiv:1711.03759v3 [cs.CL] 25 May 2018 lected and assigned with a label, which can be use- on large-scale training data (Marcus et al., 1993) ful for Named Entity Recognition (NER) (Tjong for supervised training. However, manual anno- Kim Sang and De Meulder, 2003), word seg- tation can be time-consuming and expensive. De- mentation (Sproat and Emerson, 2003), chunk- spite detailed annotation standards and rules, inter- ing (Tjong Kim Sang and Buchholz, 2000) ,etc. annotator disagreement is inevitable because of To keep annotation efficient and accurate, YEDDA human mistakes, language phenomena which are provides systematic solutions across the whole an- not covered by the annotation rules and the ambi- notation process, which includes the shortcut an- guity of language itself (Plank et al., 2014). notation, batch annotation with a command line, Existing annotation tools (Cunningham et al., intelligent recommendation, format exporting and 2002; Morton and LaCivita, 2003; Chen and Styler, 2013; Druskat et al., 2014) mainly focus 1Code is available at https://github.com/ on providing a visual interface for user annotation jiesutd/YEDDA. Operating System Self Command Line System Tool Analysis Size Language MacOS Linux Win Consistency Annotation Recommendation p p p p p WordFreak × × 1.1M Java p p p p p GATE × × 544M Java p p p p Knowtator × × × 1.5M Java p p p p Stanford × × × 88k Java p p p p Atomic × × × 5.8M Java p p p p p WebAnno × × 13.2M Java p p Anafora × × × × × 1.7M Python p p p BRAT × × × × 31.1M Python p p p p p p p YEDDA 80k Python Table 1: Annotation Tool Comparison . administrator evaluation/analysis. help identify named entities and relations. It sup- Figure1 shows the general framework of ports quality control during the annotation process YEDDA. It offers annotators with a simple and by integrating simple inter-annotator evaluation, efficient Graphical User Interface (GUI) to anno- while it cannot figure out the detailed disagreed tate raw text. For the administrator, it provides two labels. WordFreak (Morton and LaCivita, 2003) useful toolkits to evaluate multi-annotated text and adds a system recommendation function and inte- generate detailed comparison report for annotator grates active learning to rank the unannotated sen- pair. YEDDA has the advantages of being: tences based on the recommend confidence, while • Convenient: it is lightweight with an intuitive the post-annotation analysis is not supported. interface and does not rely on specific operating Web-based annotation tools have been devel- systems or pre-installed packages. oped to build operating system independent an- • Efficient: it supports both shortcut and com- notation environments. GATE3 (Bontcheva et al., mand line annotation models to accelerate the an- 2013) includes a web-based with collaborative an- notating process. notation framework which allows users to work • Intelligent: it offers user with real-time system collaboratively by annotating online with shared suggestions to avoid duplicated annotation. text storage. BRAT (Stenetorp et al., 2012) is • Comprehensive: it integrates useful toolkits to another web-based tool, which has been widely give the statistical index of analyzing multi-user used in recent years, it provides powerful an- annotation results and generate detailed content notation functions and rich visualization ability, comparison for annotation pairs. while it does not integrate the result analysis func- This paper is organized as follows: Section 2 tion. Anafora (Chen and Styler, 2013) and Atomic gives an overview of previous text annotation tools (Druskat et al., 2014) are also web-based and and the comparison with ours. Section 3 describes lightweight annotation tools, while they don’t sup- the architecture of YEDDA and its detail functions. port the automatic annotation and quality analysis Section 4 shows the efficiency comparison results either. WebAnno (Yimam et al., 2013; de Castilho of different annotation tools. Finally, Section 5 et al., 2016) supports both the automatic annota- concludes this paper and give the future plans. tion suggestion and annotation quality monitor- ing such as inter-annotator agreement measure- 2 Related Work ment, data curation, and progress monitoring. It compares the annotation disagreements only for There exists a range of text span annotation tools each sentence and shows the comparison within which focus on different aspects of the annota- the interface, while our system can generate a de- tion process. Stanford manual annotation tool2 tailed disagreement report in .pdf file through is a lightweight tool but does not support re- the whole annotated content. Besides, those web- sult analysis and system recommendation. Know- based annotation tools need to build a server tator (Ogren, 2006) is a general-task annotation through complex configurations and some of the tool which links to a biomedical onto ontology to servers cannot be deployed on Windows systems. 2http://nlp.stanford.edu/software/ stanford-manual-annotation-tool-2004-05-16. 3GATE is a general NLP tool which includes annotation tar.gz function. Figure 2: Annotator Interface. The differences between YEDDA and related Figure2 shows the interface of annotator client work are summarised in Table1 4. Here “Self on an English entity annotation file. The interface Consistency” represents whether the tool works consists of 5 parts. The working area in the up-left independently or it relies on pre-installed pack- which shows the texts with different colors (blue: ages. Compared to these tools, YEDDA provides annotated entities, green: recommended entities a lighter but more systematic choice with more and orange: selected text span). The entry at the flexibility, efficiency and less dependence on sys- bottom is the command line which accepts anno- tem environment for text span annotation. Be- tation command. There are several control buttons sides, YEDDA offers administrator useful toolkits in the middle of the interface, which are used to for evaluating the annotation quality and analyze set annotation model. The status area is below the the detailed disagreements within annotators. control buttons, it shows the cursor position and the status of recommending model. The right side 3 YEDDA shows the shortcut map, where shortcut key “a” or “A” means annotating the text span with “Arti- YEDDA is developed based on standard Python GUI library Tkinter5, and hence needs only ficial” type and the same for other shortcut keys. 6 Python installation as a prerequisite and is com- The shortcut map can be configured easily . De- patible with all Operating System (OS) platforms tails are introduced as follows. with Python installation. It offers two user- friendly interfaces for annotators and administra- 3.1.1 Shortcut Key Annotation tor, respectively, which are introduced in detail in Section 3.1 and Section 3.2, respectively. YEDDA provides the function of annotating text span by selecting using mouse and press shortcut 3.1 Annotator Client key to map the selection into a specific label. It The client is designed to accelerate the annotation is a common annotation process in many annota- process as much as possible. It supports short- tion tools (Stenetorp et al., 2012; Bontcheva et al., cut annotation to reduce the user operation time. 2013). It binds each label with one custom short- Command line annotation is designed to annotate cut key, this is shown in the “Shortcuts map La- multi-span in batch. In addition, the client pro- bels” part of Figure2. The annotator needs only vides system recommendations to lessen the work- two steps to annotate one text span, i.e. “select and load of duplicated span annotation. press”. The annotated file updates simultaneously with each key pressing process. 4For web-based tools, we list their server-side dependency on operating systems. 5https://docs.python.org/2/library/ 6Type the corresponding labels into the entries following tkinter.html shortcut keys and press “ReMap” button. Figure 4: Multiple annotators F1-score matrix. Figure 3: Administrator Interface.