Référence Bibliographique
Total Page:16
File Type:pdf, Size:1020Kb
"Using a unified taxonomy to annotate discourse markers in speech and writing" Crible, Ludivine ; Zufferey, Sandrine Abstract We report an annotation experiment aiming at assessing the use of a single functional taxonomy of sense relations for discourse markers in spoken and written data. We start by presenting an operational definition of the category of DMs and its application to identify tokens of DMs in corpora. We then present an original annotation experiment making use of a unified taxonomy to annotate written and spoken data in English and French. In this experiment, we test the reliability of the annotations made separately by two annotators and the applicability of the tag set across two languages in the spoken and written modes. Our experiment leads us to conclude that: i) spoken data is not more difficult to annotate than written data in terms of inter-annotator agreement, ii) recurrent problems are found across the two languages and modes, iii) the reliability of the annotation scheme is improved by the use of more explicit instructions and training. Document type : Contribution à ouvrage collectif (Book Chapter) Référence bibliographique Crible, Ludivine ; Zufferey, Sandrine. Using a unified taxonomy to annotate discourse markers in speech and writing. In: Harry Bunt, Proceedings of the 11th Joint ACL - ISO Workshop on Interoperable Semantic Annotation (isa-11), 2015, p. 14-22 Available at: http://hdl.handle.net/2078.1/158968 [Downloaded 2019/04/19 at 05:41:24 ] Proceedings 11th Joint ACL - ISO Workshop on Interoperable Semantic Annotation (isa-11) April 14, 2015 Queen Mary University of London London, UK Harry Bunt, editor i Proceedings of the 11th Joint ACL - ISO Workshop on Interoperable Semantic Annotation (isa-11) Workshop at the 11th International Conference on Computational Semantics (IWCS 2015) Queen Mary College of London London, UK, April 14, 2015 TiCC, Tilburg center for Cognition and Communication Tilburg University, The Netherlands ISBN/EAN: 978-90-74029-00-1 i Workshop Programme 08.45 -- 09:00 Registration 09:00 -- 09:10 Opening by Workshop Chair 09:15 – 09:45 Harry Bunt: On the Principles of Interoperable Semantic Annotation 09:45 -- 10:15 Kiyong Lee and Harry Bunt: ISO 24617-6: Principles of semantic annotation; Discussion of comments from DIS ballot 10:15 -- 10:45 Kiyong Lee: The annotation of measure expressions in ISO standards 10:45 – 11:15 Coffee break 11:15 -- 11:45 Elisabetta Jezek and Rossella Varvara: Instrument subjects without Instrument role 11:45 -- 12:15 Jérémy Trione, Frédéric Béchet, Benoit Favre and Alexis Nasr: Rapid FrameNet annotation of spoken conversation transcripts 12:15 -- 12:30 Steven Neale, João Silva and António Branco: An Accessible Interface Tool for Manual Word Sense Annotation 12:30 -- 13:00 Julia Gil and James Pustejovsky: The Semantics of Image Annotation 13:00 – 14:00 Lunch break 14:00 -- 14:30 Ludivine Crible and Sandrine Zufferey: Using a unified taxonomy to annotate discourse markers in speech and writing 14:30 -- 15:00 Rashmi Prasad and Harry Bunt: Semantic Relations in Discourse: The Current State of ISO 24617-8 15:00 -- 15:20 Jet Hoek and Sandrine Zufferey: Factors influencing implicitation of discourse relations across languages 15:20 -- 15:50 Tea break 15:50 -- 16:20 Silvia Pareti: Annotating Attribution Relations Across Languages and Genres 16:20 -- 16:50 Hegler Tissot, Angus Roberts, Leon Derczynski, Genevieve Gorrell and Marcus Didonet Del Fabro: Analysing Temporal Expressions Annotated in Clinical Notes 16:50 -- 17:10 Volker Gast, Lennart Bierkandt and Christoph Rzymski: Creating and retrieving tense and aspect annotations with GraphAnno, a lightweight tool for multilevel annotation 17:10 -- 17:30 Kiyong Lee and Harry Bunt: Discussion of possible new ISO projects in areas of semantic annotation 17:30 Workshop Closing i ii Workshop Organizers/Organizing Committee Harry Bunt Tilburg University Nancy Ide Vassar College, Poughkeepsie, NY Kiyong Lee Korea University, Seoul James Pustejovsky Brandeis University, Waltham, MA Laurent Romary INRIA/Humboldt Universität Berlin Workshop Programme Committee Jan Alexandersson DFKI, Saarbrücken Harry Bunt TiCC, Tilburg University Nicoletta Calzolari ILC-CNR, Pisa Thierry Declerck DFKI, Saarbrücken Liesbeth Degand Université Catholique de Louvain Anna Esposito Seconda Università di Napoli, Caserta Alex Chengyu Fang City University Hong Kong Anette Frank Universität Heidelberg Robert Gaizauskas University of Sheffield Koiti Hasida Tokyo University Nancy Ide Vassar College, Poughkeepsie Daniel Hardt Copenhagen Business Scool Elisabetta Jezek Università degli Studi di Pavia Michael Kipp University of Applied Sciences, Augsburg Philippe Muller IRIT, Université Paul Sabatier, Toulouse Martha Palmer University of Colorado, Boulder Volha Petukhova Universität des Saarlandes, Saarbrücken Andrei Popescu-Belis Idiap, Martigny, Switzerland Rarhmi Prasad University of Wisconsin, Milwaukee Laurent Prévot Aix-Marseille University James Pustejovsky Brandeis University Laurent Romary INRIA/Humboldt Universität Berlin Ted Sanders Universiteit Utrecht Thorsten Trippel University of Bielefeld Piek Vossen Vrije Universiteit Amsterdam Bonnie Webber School of Informatics, University of Edinburgh Annie Zaenen Stanford University Proceedings Editor Harry Bunt Tilburg University ii iii Table of contents Harry Bunt On the Principles of Interoperable Semantic Annotation 1 Ludivine Crible and Sandrine Zufferey Using a unified taxonomy to annotate discourse markers in speech and writing 14 Volker Gast, Lennart Bierkandt and Christoph Rzymski Creating and retrieving tense and aspect annotations with GraphAnno, a lightweight tool for multilevel annotation 23 Julia Bosque-Gil and James Pustejovsky The Semantics of Image Annotation 29 Jet Hoek and Sandrine Zufferey Factors influencing implicitation of discourse relations across languages 39 Elisabetta Jezek and Rossella Vanvara Instrument Objects without Instrument Role 46 Kiyong Lee The Semantic Annotation of Measure Expressions in ISO Standards 55 Steven Neale, João Silva and António Branco An Accessible Interface Tool for Manual Word Sense Annotation 67 Silvia Pareti Annotating Attribution Relations Across Languages and Genres 72 Rashmi Prasad and Harry Bunt Semantic Relations in Discourse: The Current State of ISO 24617-8 80 Hegler Tissot, Angus Roberts, Leon Derczynski, Genevieve Gorrell and Marcus Didonet Del Fabro Annotating Clinical Temporal Expressions in a Community Corpus 93 Jérémy Trione, Frédéric Béchet, Benoit Favre and Alexis Nasr: Rapid FrameNet annotation of spoken conversation transcripts 103 iii iv Author Index Frédéric Béchet 103 Bierkandt, Lennart 23 Bosque-Gil, Julia 39 Branco, António 77 Bunt, Harry 1, 80 Crible, Ludivine 14 Derczynski, Leon 93 Didonet Del Fabro, Marcus 93 Benoit Favre 103 Gast, Volker 23 Gil, Julia 29 Gorrell, Genevieve 93 Hoek, Jet 39 Jezek, Elisabetta 46 Lee, Kiyong 55 Nasr, Alexis 103 Neale, Steven 77 Pareti, Silvia 72 Prasad, Rashmi 80 Pusstejovsky, James 29 Roberts, Angus 93 Rzymski Christoph 23 Silva, João 77 Tissot, Hegler 93 Jérémy Trione 103 Vanvara, Rossella 46 Zufferey, Sandrine 14, 39 iv On the principles of interoperable semantic annotation Harry Bunt TiCC, Tilburg Center for Cognition and Communication Tilburg University, The Netherlands [email protected] Abstract This paper summarizes the research that is leading to ISO standard 24617-6, which describes the approach to semantic annotation that characterizes the ISO semantic annotation framework (Se- mAF). It investigates the consequences and the risks of the SemAF strategy of developing separate annotation schemes for certain classes of semantic phenomena, with the long-term aim to combine these schemes into a single, wide- coverage scheme for semantic annotation. The principles are discussed for linguistic annotation in general and semantic annotation in particular that underly the SemAF effort. The notions of abstract syntax and concrete syntax are described with their relation to the specification of a metamodel and the semantics of annotations. Overlaps between the annotation schemes defined in SemAF parts are discussed, as well as semantic phenomena that cut across these schemes. 1 Introduction ISO standard 24617-6, “Principles of semantic annotation”, sets out the approach to semantic annotation that characterizes the ISO semantic annotation framework (SemAF). In addition, it provides guidelines for dealing with two issues regarding the annotation schemes defined in the different parts of SemAF: inconsistencies that may arise due to overlaps between annotation schemes, and semantic phenomena that cut across SemAF-parts, such as negation, modality, and quantification. The purpose of ISO 24617-6 is to provide support for the establishment of a consistent and coherent set of international standards for semantic annotation. It does so in three ways. First, by making explicit which basic principles underly the approach followed in the SemAF parts that have already produced ISO standards (Part 1, Time and events; Part 2, Dialogue acts); and in the parts that are under way (Part 4, Semantic roles; Part 7, Spatial information; Part 8, Discourse relations). This approach lends methodological coherence to SemAF and helps to ensure consistency between existing, developing, and future SemAF parts. Second, by identifying overlaps between SemAF parts, and indicating how these may be dealt with. Third, by identifying common issues that cut across SemAF parts and which are not or only partially covered,