Semeval 2017 Task 10: Scienceie - Extracting Keyphrases and Relations from Scientific Publications
Total Page:16
File Type:pdf, Size:1020Kb
SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications Isabelle Augenstein1, Mrinal Das2, Sebastian Riedel1, Lakshmi Vikraman2, and Andrew McCallum2 1Department of Computer Science, University College London (UCL), UK 2College of Information and Computer Sciences, University of Massachusetts Amherst, USA Abstract classification and relation extraction. However, keyphrases are much more challenging to identify We describe the SemEval task of extract- than e.g. person names, since they vary signifi- ing keyphrases and relations between them cantly between domains, lack clear signifiers and from scientific documents, which is cru- contexts and can consist of many tokens. For this cial for understanding which publications purpose, a double-annotated corpus of 500 pub- describe which processes, tasks and ma- lications with mention-level annotations was pro- terials. Although this was a new task, we duced, consisting of scientific articles of the Com- had a total of 26 submissions across 3 eval- puter Science, Material Sciences and Physics do- uation scenarios. We expect the task and mains. the findings reported in this paper to be Extracting keyphrases and relations between relevant for researchers working on under- them is of great interest to scientific publishers as standing scientific content, as well as the it helps to recommend articles to readers, high- broader knowledge base population and light missing citations to authors, identify poten- information extraction communities. tial reviewers for submissions, and analyse re- search trends over time. Note that organising 1 Introduction keyphrases in terms of synonym and hypernym re- Empirical research requires gaining and maintain- lations is particularly useful for search scenarios, ing an understanding of the body of work in spe- e.g. a reader may search for articles on informa- cific area. For example, typical questions re- tion extraction, and through hypernym prediction searchers face are which papers describe which would also receive articles on named entity recog- tasks and processes, use which materials and how nition or relation extraction. those relate to one another. While there are re- We expect the outcomes of the task to be rele- view papers for some areas, such information is vant to the wider information extraction, knowl- generally difficult to obtain without reading a large edge base population and knowledge base con- number of publications. struction communities, as it offers a novel appli- Current efforts to address this gap are search en- cation domain for methods researched in that area, gines such as Google Scholar,1 Scopus2 or Seman- while still offering domain-related challenges. tic Scholar,3 which mainly focus on navigating au- Since the dataset is annotated for three tasks thor and citations graphs. dependent on one another, it could also be used The task tackled here is mention-level iden- as a testbed for joint learning or structured pre- tification and classification of keyphrases, e.g. diction approaches to information extraction (Kate Keyphrase Extraction (TASK), as well as extract- and Mooney, 2010; Singh et al., 2013; Augenstein ing semantic relations between keywords, e.g. et al., 2015; Goyal and Dyer, 2016). Keyphrase Extraction HYPONYM-OF Informa- Furthermore, we expect the task to be interest- tion Extraction. These tasks are related to the ing for researchers studying tasks aiming at under- tasks of named entity recognition, named entity standing scientific content, such as keyphrase ex- 1https://scholar.google.co.uk/ traction (Kim et al., 2010b; Hasan and Ng, 2014; 2http://www.scopus.com/ Sterckx et al., 2016; Augenstein and Søgaard, 3https://www.semanticscholar.org/ 2017), semantic relation extraction (Tateisi et al., 546 Proceedings of the 11th International Workshop on Semantic Evaluations (SemEval-2017), pages 546–555, Vancouver, Canada, August 3 - 4, 2017. c 2017 Association for Computational Linguistics 2014; Gupta and Manning, 2011; Marsi and Example 1. Ozt¨ urk¨ , 2015), topic classification of scientific ar- Text: Information extraction is the process of ticles (OS´ eaghdha´ and Teufel, 2014), citation con- extracting structured data from unstructured text, text extraction (Teufel, 2006; Kaplan et al., 2009), which is relevant for several end-to-end tasks, in- extracting author and citation graphs (Peng and cluding question answering. This paper addresses McCallum, 2006; Chaimongkol et al., 2014; Sim the tasks of named entity recognition (NER), a sub- et al., 2015) or a combination of those (Radev and task of information extraction, using conditional Abu-Jbara, 2012; Gollapalli and Li, 2015; Guo random fields (CRF). Our method is evaluated on et al., 2015). the ConLL-2003 NER corpus. The expected impact of the task is an interest of the above mentioned research communities be- ID Type Start End yond the task due to the release of a new corpus, 0 TASK 0 22 leading to novel research methods for information 1 TASK 150 168 extraction from scientific documents. What will 2 TASK 204 228 3 TASK 230 233 be particularly useful about the proposed corpus 4 TASK 249 271 are annotations of hypernym and synonym rela- 5 PROCESS 279 304 6 PROCESS 306 309 tions on mention-level, as existing hypernym and 7 MATERIAL 343 364 synonym relation resources are on type-level, e.g. WordNet.4 Further, we expect that these methods ID1 ID2 Type 2 0 HYPONYM-OF will directly impact industrial solutions to making 2 3 SYNONYM-OF sense of publications, partly due to the task organ- 5 6 SYNONYM-OF isers’ collaboration with Elsevier.5 2 Task Description The task is divided into three subtasks: A) Mention-level keyphrase identification B) Mention-level keyphrase classification. Keyphrase types are PROCESS (including methods, equipment), TASK and MATE- RIAL (including corpora, physical materials) C) Mention-level semantic relation extraction between keyphrases with the same keyphrase types. Relation types used are HYPONYM- OF and SYNONYM-OF. 3 Resources for SemEval-2017 Task We will refer to the above subtasks as Subtask A, 3.1 Corpus Subtask B, and Subtask C respectively. A corpus for the task was built from ScienceDi- A shortened (artificial) example of a data in- rect6 open access publications and was available stance for the Computer Science area is displayed freely for participants, without the need to sign a in Example1, examples for Material Science and copyright agreement. Each data instance consists Physics are included in the appendix. The first part of one paragraph of text, drawn from a scientific is the plain text paragraph (with keyphrases in ital- paper. ics for better readability), followed by stand-off Publications were provided in plain text, in ad- keyphrase annotations based on character offsets, dition to xml format, which included the full text followed relation annotations. of the publication as well as additional metadata. 500 paragraphs from journal articles evenly dis- tributed among the domains Computer Science, Material Sciences and Physics were selected. 4https://wordnet.princeton.edu/ 5https://www.elsevier.com/ 6http://www.sciencedirect.com/ 547 The training data part of the corpus consists of dent newsletter, which reaches all of its students. 350 documents, 50 for development and 100 for Students were shown example annotations and the testing. This is similar to the pilot task described annotation guidelines, and if they were still inter- in Section5, for which 144 articles were used for ested in participating in the annotation exercise, training, 40 for development and for 100 testing. afterwards asked to select beforehand how many We present statistics about the dataset in Ta- documents they wanted to annotate. Approxi- ble1. Notably, the dataset contains many long mately 50% of students were still interested, hav- keyphrases. 22% of all keyphrases in the train- ing seen annotated documents and read annotation ing set consist of words of 5 or more tokens. guidelines. They were then given two weeks to an- This contributes to making the task of keyphrase notate documents with the BRAT tool (Stenetorp identification very challenging. However, 93% of et al., 2012), which was hosted on an Amazon EC2 those keyphrases are noun phrases7, which is valu- instance as a web service. Students were compen- able information for simple heuristics to identify sated for annotations per document. Annotation keyphrase candidates. Lastly, 31% of keyphrases time was estimated as approximately 12 minutes contained in the training dataset only appear in it per document and annotator, on which basis they once, systems will have do generalise to unseen were paid roughly 10 GBP per hour. They were keyphrases well. only compensated upon completion of all annota- tions, i.e. compensation was conditioned on an- 3.2 Annotation Process notating all documents. The annotation cost was Mention-level annotation is very time-consuming, covered by Elsevier. To develop annotation guide- and only a handful of semantic relations such as lines, a small pilot annotation exercise on 20 doc- hypernymy and synonymy can be found in each uments was performed with one annotator after 8 publication. We therefore only annotate para- which annotation guidelines were refined. graphs of publications likely to contain relations. We originally intended for student annotators We originally intended to identify suitable doc- to triple annotate documents and apply majority uments by automatically extracting a knowledge voting on the annotations, but due to difficulties graph of relations from a large scientific dataset with recruiting high-quality annotators we instead using Hearst-style patterns (Hearst, 1991; Snow opted to double-annotate documents, where the et al., 2005), then using those to find potential re- second annotator was an expert annotator. Where lations in a distinct set of documents, similar to annotations disagreed, we opted for the expert’s the distant supervision (Mintz et al., 2009; Snow annotation. Pairwise inter-annotator agreement et al., 2005) heuristic. Documents containing a between the student annotator and the expert anno- high number of such potential relations would tator measured with Cohen’s kappa is shown in Ta- then be selected.