Construction of the Literature Graph in Semantic Scholar

Home , Oren Etzioni, PubMed Central, ResearcherID, Semantic Scholar

Construction of the Literature Graph in Semantic Scholar Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-Han Ooi, Matthew Peters, Joanna Power, Sam Skjonsberg, Lucy Lu Wang, Chris Wilhelm, Zheng Yuan, Madeleine van Zuylen, and Oren Etzioni [email protected]

Allen Institute for Artiﬁcial Intelligence, Seattle WA 98103, USA Northwestern University, Evanston IL 60208, USA

Abstract

We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graph consists of more than 280M nodes, representing papers, authors, entities and various interac- tions between them (e.g., authorships, citations, entity mentions). We reduce literature graph construction into familiar NLP tasks (e.g., entity extraction and linking), point out research challenges due to differ- ences from standard formulations of these tasks, and report empirical results for each task. The methods described in this pa- Figure 1: Part of the literature graph. per are used to enable semantic features in www.semanticscholar.org. 2017). We describe methods used in a scalable de- 1 Introduction ployed production system for extracting structured The goal of this work is to facilitate algorithmic information from scientific documents into the lit- discovery in the scientific literature. Despite no- erature graph (see Fig. 1). The literature graph is table advances in scientific search engines, data a directed property graph which summarizes key mining and digital libraries (e.g., Wu et al., 2014), information in the literature and can be used to an- researchers remain unable to answer simple ques- swer the queries mentioned earlier as well as more tions such as: complex queries. For example, in order to compute the Erdos˝ number of an author X, the graph What is the percentage of female subjects in can be queried to find the number of nodes on the depression clinical trials? shortest undirected path between author X and Paul Which of my co-authors published one or more Erdos˝ such that all edges on the path are labeled papers on coreference resolution? “authored”. Which papers discuss the effects of Ranibizumab We reduce literature graph construction into fa- on the Retina? miliar NLP tasks such as sequence labeling, entity linking and relation extraction, and address some In this paper, we focus on the problem of ex- of the impractical assumptions commonly made in tracting structured data from scientific documents, the standard formulations of these tasks. For ex- which can later be used in natural language inter- ample, most research on named entity recognition faces (e.g., Iyer et al., 2017) or to improve ranking tasks report results on large labeled datasets such of results in academic search (e.g., Xiong et al., as CoNLL-2003 and ACE-2005 (e.g., Lample et al.,

84 Proceedings of NAACL-HLT 2018, pages 84–91 New Orleans, Louisiana, June 1 - 6, 2018. c 2017 Association for Computational Linguistics 2016), and assume that entity types in the test set Entities. Each node of this type represents a match those labeled in the training set (including unique scientific concept discussed in the literature, work on domain adaptation, e.g., Daumé, 2007). with attributes such as ‘canonical name’, ‘aliases’ These assumptions, while useful for developing and ‘description’. Our literature graph has 0.4M and benchmarking new methods, are unrealistic for nodes of this type. We describe how we populate many domains and applications. The paper also entity nodes in §4.3. serves as an overview of the approach we adopt Entity mentions. Each node of this type rep- at www.semanticscholar.org in a step towards resents a textual reference of an entity in one of more intelligent academic search engines (Etzioni, the papers, with attributes such as ‘mention text’, 2011). ‘context’, and ‘confidence’. We describe how we In the next section, we start by describing our populate the 237M mentions in the literature graph symbolic representation of the literature. Then, we in §4.1. discuss how we extract metadata associated with a paper such as authors and references, then how we 2.2 Edge Types extract the entities mentioned in paper text. Before Citations. We instantiate a directed citation we conclude, we briefly describe other research edge from paper nodes p1 p2 for each p2 ! challenges we are actively working on in order to referenced in p1. Citation edges have attributes improve the quality of the literature graph. such as ‘from paper id’, ‘to paper id’ and ‘contexts’ (the textual contexts where p2 is referenced in p1). 2 Structure of The Literature Graph While some of the paper sources provide these attributes as metadata, it is often necessary to extract The literature graph is a property graph with di- them from the paper PDF as detailed in §3. rected edges. Unlike Resource Description Frame- Authorship. We instantiate a directed author- work (RDF) graphs, nodes and edges in property ship edge between an author node and a paper node graphs have an internal structure which is more a p for each author of that paper. ! suitable for representing complex data types such Entity linking edges. We instantiate a directed as papers and entities. In this section, we describe edge from an extracted entity mention node to the the attributes associated with nodes and edges of entity it refers to. different types in the literature graph. Mention–mention relations. We instantiate a directed edge between a pair of mentions in the 2.1 Node Types same sentential context if the textual relation ex- Papers. We obtain metadata and PDF files traction model predicts one of a predefined list of of papers via partnerships with publishers (e.g., relation types between them in a sentential con- 1 Springer, Nature), catalogs (e.g., DBLP, MED- text. We encode a symmetric relation between m1 and m2 as two directed edges m1 m2 and LINE), pre-publishing services (e.g., arXiv, bioRx- ! m2 m1. ive), as well as web-crawling. Paper nodes are ! associated with a set of attributes such as ‘title’, ‘ab- Entity–entity relations. While mention– stract’, ‘full text’, ‘venues’ and ‘publication year’. mention edges represent relations between men- While some of the paper sources provide these at- tions in a particular context, entity–entity edges tributes as metadata, it is often necessary to extract represent relations between abstract entities. These them from the paper PDF (details in §3). We de- relations may be imported from an existing knowl- terministically remove duplicate papers based on edge base (KB) or inferred from other edges in the string similarity of their metadata, resulting in 37M graph. unique paper nodes. Papers in the literature graph cover a variety of scientific disciplines, including 3 Extracting Metadata computer science, molecular biology, microbiology In the previous section, we described the overall and neuroscience. structure of the literature graph. Next, we discuss Authors. Each node of this type represents a how we populate paper nodes, author nodes, au- unique author, with attributes such as ‘first name’ thorship edges, and citation edges. and ‘last name’. The literature graph has 12M 1Due to space constraints, we opted not to discuss our nodes of this type. relation extraction models in this draft.

85 Although some publishers provide sufficient Field Precision Recall F1 metadata about their papers, many papers are pro- title 85.5 85.5 85.5 vided with incomplete metadata. Also, papers ob- authors 92.1 92.1 92.1 tained via web-crawling are not associated with bibliography titles 89.3 89.4 89.3 any metadata. To fill in this gap, we built the Sci- bibliography authors 97.1 97.0 97.0 enceParse system to predict structured data from bibliography venues 91.7 89.7 90.7 the raw PDFs using recurrent neural networks bibliography years 98.0 98.0 98.0 (RNNs).2 For each paper, the system extracts the paper title, list of authors, and list of references; Table 1: Results of the ScienceParse system. each reference consists of a title, a list of authors, a venue, and a year. Preparing the input layer. We split each fed into a two-layer bidirectional LSTM (Long PDF into individual pages, and feed each page to Short-Term Memory, Hochreiter and Schmidhuber, Apache’s PDFBox library3 to convert it into a se- 1997), i.e., quence of tokens, where each token has features, e.g., ‘text’, ‘font size’, ‘space width’, ‘position on gk! LSTM.Wik; gk! 1/; gk Œgk! gk ; the page’. D D I hk! LSTM.gk; hk! 1/; hk Œhk! gk We normalize the token-level features before D D I feeding them as inputs to the model. For each of the W ‘font size’ and ‘space width’ features, we compute where is a weight matrix, gk and hk are de- three normalized values (with respect to current fined similarly to gk! and hk! but process token page, current document, and the whole training sequences in the opposite direction. corpus), each value ranging between -0.5 to +0.5. Following Collobert et al. (2011), we feed the The token’s ‘position on the page’ is given in XY output of the second layer hk into a dense layer to coordinate points. We scale the values linearly to predict unnormalized label weights for each token range from . 0:5; 0:5/ at the top-left corner of and learn label bigram feature weights (often de- the page to .0:5; 0:5/ at the bottom-right corner. scribed as a conditional random field layer when In order to capture case information, we add used in neural architectures) to account for depen- seven numeric features to the input representa- dencies between labels. tion of each token: whether the first/second let- Training. The ScienceParse system is trained ter is uppercase/lowercase, the fraction of upper- on a snapshot of the data at PubMed Central. It case/lowercase letters and the fraction of digits. consists of 1.4M PDFs and their associated meta- To help the model make correct predictions for data, which specify the correct titles, authors, and metadata which tend to appear at the beginning bibliographies. We use a heuristic labeling pro- (e.g., titles and authors) or at the end of papers (e.g., cess that finds the strings from the metadata in the references), we provide the current page number tokenized PDFs to produce labeled tokens. This la- as two discrete variables (relative to the beginning beling process succeeds for 76% of the documents. and end of the PDF file) with values 0, 1 and 2+. The remaining documents are not used in the train- These features are repeated for each token on the ing process. During training, we only use pages same page. which have at least one token with a label that is For the k-th token in the sequence, we compute not “none”. the input representation ik by concatenating the nu- Decoding. At test time, we use Viterbi decod- meric features, an embedding of the ‘font size’, and ing to find the most likely global sequence, with the word embedding of the lowercased token. Word no further constraints. To get the title, we use the embeddings are initialized with GloVe (Pennington longest continuous sequence of tokens with the et al., 2014). “title” label. Since there can be multiple authors, Model. The input token representations are we use all continuous sequences of tokens with the passed through one fully-connected layer and then “author” label as authors, but require that all authors of a paper are mentioned on the same page. If the 2The ScienceParse libraries can be found at http:// allenai.org/software/. author labels are predicted in multiple pages, we 3https://pdfbox.apache.org use the one with the largest number of authors.

86 Results. We run our final tests on a held-out Approach CS Bio set from PubMed Central, consisting of about 54K prec. yield prec. yield documents. The results are detailed in Table 1. We Statistical 98.4 712 94.4 928 use a conservative evaluation where an instance is Hybrid 91.5 1990 92.1 3126 correct if it exactly matches the gold annotation, Off-the-shelf 97.4 873 77.5 1206 with no credit for partial matching. To give an example for the type of errors Table 2: Document-level evaluation of three ap- our model makes, consider the paper (Wang proaches in two scientific areas: computer science et al., 2013) titled “Clinical review: Efficacy of (CS) and biomedical (Bio). antimicrobial-impregnated catheters in external ventricular drainage - a systematic review and meta- analysis.” The title we extract for this paper omits We evaluate the performance of each approach in the first part “Clinical review:”. This is likely to two broad scientific areas: computer science (CS) be a result of the pattern “Foo: Bar Baz” appear- and biomedical research (Bio). For each unique ing in many training examples with only “Bar Baz” (paper ID, entity ID) pair predicted by one of the labeled as the title. approaches, we ask human annotators to label each mention extracted for this entity in the paper. We 4 Entity Extraction and Linking use CrowdFlower to manage human annotations In the previous section, we described how we popu- and only include instances where three or more late the backbone of the literature graph, i.e., paper annotators agree on the label. If one or more of nodes, author nodes and citation edges. Next, we the entity mentions in that paper is judged to be discuss how we populate mentions and entities in correct, the pair (paper ID, entity ID) counts as the literature graph using entity extraction and link- one correct instance. Otherwise, it counts as an ing on the paper text. In order to focus on more incorrect instance. We report ‘yield’ in lieu of salient entities in a given paper, we only use the ‘recall’ due to the difficulty of doing a scalable title and abstract. comprehensive annotation. Table 2 shows the results based on 500 papers 4.1 Approaches using v1.1.2 of our entity extraction and linking We experiment with three approaches for entity components. In both domains, the statistical ap- extraction and linking: proach gives the highest precision and the lowest yield. The hybrid approach consistently gives the I. Statistical: uses one or more statistical models highest yield, but sacrifices precision. The TagMe for predicting mention spans, then uses another sta- off-the-shelf library used for the CS domain gives tistical model to link mentions to candidate entities surprisingly good results, with precision within 1 in a KB. point from the statistical models. However, the II. Hybrid: defines a small number of hand- MetaMap Lite off-the-shelf library we used for the engineered, deterministic rules for string-based biomedical domain suffered a huge loss in preci- matching of the input text to candidate entities sion. Our error analysis showed that each of the in the KB, then uses a statistical model to disam- approaches is able to predict entities not predicted biguate the mentions.4 by the other approaches so we decided to pool their outputs in our deployed system, which gives signif- III. Off-the-shelf: uses existing libraries, namely icantly higher yield than any individual approach (Ferragina and Scaiella, 2010, TagMe)5 and while maintaining reasonably high precision. (Demner-Fushman et al., 2017, MetaMap Lite)6, with minimal post-processing to extract and link 4.2 Entity Extraction Models entities to the KB. Given the token sequence t1; : : : ; tN in a sentence, 4We also experimented with a “pure” rules-based approach which disambiguates deterministically but the hybrid approach we need to identify spans which correspond to en- consistently gave better results. tity mentions. We use the BILOU scheme to en- 5 The TagMe APIs are described at https://sobigdata. code labels at the token level. Unlike most formula- d4science.org/web/tagme/tagme-help 6We use v3.4 (L0) of MetaMap Lite, available at https: tions of named entity recognition problems (NER), //metamap.nlm.nih.gov/MetaMapLite.shtml we do not identify the entity type (e.g., protein,

87 drug, chemical, disease) for each mention since the Description F1 output mentions are further grounded in a KB with Without LM 49.9 further information about the entity (including its With LM 54.1 type), using an entity linking module. Avg. of 15 models with LM 55.2 Model. First, we construct the token embedding xk Œck wk for each token tk in the input D I Table 3: Results of the entity extraction model on sequence, where ck is a character-based represen- the development set of SemEval-2017 task 10. tation computed using a convolutional neural network (CNN) with filter of size 3 characters, and wk are learned word embeddings initialized with the with a similar architecture, but trained on differ- GloVe embeddings (Pennington et al., 2014). ent datasets. Two instances are trained on the We also compute context-sensitive word embed- BC5CDR (Li et al., 2016) and the CHEMDNER dings, denoted as lmk Œlm lm , by con- datasets (Krallinger et al., 2015) to extract key en- D k!I k catenating the projected outputs of forward and tity mentions in the biomedical domain such as dis- backward recurrent neural network language mod- eases, drugs and chemical compounds. The third els (RNN-LM) at position k. The language model instance is trained on mention labels induced from (LM) for each direction is trained independently Wikipedia articles in the computer science domain. and consists of a single layer long short-term mem- The output of all model instances are pooled to- ory (LSTM) network followed by a linear project gether and combined with the rule-based entity layer. While training the LM parameters, lmk! is extraction module, then fed into the entity linking used to predict tk 1 and lmk is used to predict model (described below). C tk 1. We fix the LM parameters during training of 4.3 Knowledge Bases the entity extraction model. See Peters et al. (2017) and Ammar et al. (2017) for more details. In this section, we describe the construction of en- Given the xk and lmk embeddings for each token tity nodes and entity-entity edges. Unlike other k 1; : : : ; N , we use a two-layer bidirectional knowledge extraction systems such as the Never- 2 f g 7 LSTM to encode the sequence with xk and lmk Ending Language Learner (NELL) and OpenIE 8 feeding into the first and second layer, respectively. 4, we use existing knowledge bases (KBs) of en- That is, tities to reduce the burden of identifying coher- ent concepts. Grounding the entity mentions in gk! LSTM.xk; gk! 1/; gk Œgk! gk ; D D I a manually-curated KB also increases user confi- h! LSTM.Œgk lmk; h! /; hk Œh! h ; k D I k 1 D k I k dence in automated predictions. We use two KBs: where g and h are defined similarly to g and k k k! UMLS: The UMLS metathesaurus integrates in- but process token sequences in the opposite hk! formation about concepts in specialized ontologies direction. in several biomedical domains, and is funded by Similar to the model described in §3, we feed the the U.S. National Library of Medicine. output of the second LSTM into a dense layer to DBpedia: DBpedia provides access to structured predict unnormalized label weights for each token information in Wikipedia. Rather than including all and learn label bigram feature weights to account Wikipedia pages, we used a short list of Wikipedia for dependencies between labels. categories about CS and included all pages up to Results. We use the standard data splits of depth four in their trees in order to exclude irrele- the SemEval-2017 Task 10 on entity (and relation) vant entities, e.g., “Lord of the Rings” in DBpedia. extraction from scientific papers (Augenstein et al., 2017). Table 3 compares three variants of our en- 4.4 Entity Linking Models tity extraction model. The first line omits the LM Given a text span s identified by the entity extrac- embeddings lmk, while the second line is the full tion model in §4.2 (or with heuristics) and a ref- model (including LM embeddings) showing a large erence KB, the goal of the entity linking model improvement of 4.2 F1 points. The third line shows is to associate the span with the entity it refers to. that creating an ensemble of 15 models further im- A span and its surrounding words are collectively proves the results by 1.1 F1 points. 7http://rtw.ml.cmu.edu/rtw/ Model instances. In the deployed system, we 8https://github.com/allenai/ use three instances of the entity extraction model openie-standalone

88 referred to as a mention. We first identify a set of CS Bio candidate entities that a given mention may refer Baseline 84.2 54.2 to. Then, we rank the candidate entities based on Neural 84.6 85.8 a score computed using a neural model trained on labeled data. Table 4: The Bag of Concepts F1 score of the base- For example, given the string “. . . database line and neural model on the two curated datasets. of facts, an ILP system will . . . ”, the entity extraction model identifies the span “ILP” as a possible entity and the entity linking model as- earlier. We compute two scores based on the word sociates it with “Inductive_Logic_Programming” overlap of (i) mention’s context and candidate’s as the referent entity (from among other can- definition and (ii) mention’s surface span and the didates like “Integer_Linear_Programming” or candidate entity’s name. Finally, we feed the con- “Instruction-level_Parallelism”). catenation of the cosine similarity between f.m/ Datasets. We used two datasets: i) a biomed- and g.e/ and the intersection-based scores into an ical dataset formed by combining MSH (Jimeno- affine transformation followed by a sigmoid non- Yepes et al., 2011) and BC5CDR (Li et al., 2016) linearity to compute the final score for the pair (m, with UMLS as the reference KB, and ii) a CS e). dataset we curated using Wikipedia articles about Results. We use the Bag of Concepts F1 metric CS concepts with DBpedia as the reference KB. (Ling et al., 2015) for comparison. Table 4 com- Candidate selection. In a preprocessing step, pares the performance of the most-frequent-entity we build an index which maps any token used in baseline and our neural model described above. a labeled mention or an entity name in the KB to associated entity IDs, along with the frequency 5 Other Research Problems this token is associated with that entity. This is similar to the index used in previous entity linking In the previous sections, we discussed how we con- systems (e.g., Bhagavatula et al., 2015) to estimate struct the main components of the literature graph. the probability that a given mention refers to an In this section, we briefly describe several other entity. At train and test time, we use this index related challenges we are actively working on. to find candidate entities for a given mention by Author disambiguation. Despite initiatives to looking up the tokens in the mention. This method have global author IDs ORCID and ResearcherID, also serves as our baseline in Table 4 by selecting most publishers provide author information as the entity with the highest frequency for a given names (e.g., arXiv). However, author names cannot mention. be used as a unique identifier since several people Scoring candidates. Given a mention (m) and often share the same name. Moreover, different a candidate entity (e), the neural model constructs a venues and sources use different conventions in vector encoding of the mention and the entity. We reporting the author names, e.g., “first initial, last encode the mention and entity using the functions name” vs. “last name, first name”. Inspired by f and g, respectively, as follows: Culotta et al. (2007), we train a supervised binary f.m/ Œv avg.v ; v /; classifier for merging pairs of author instances and D m.nameI m.lc m.rc g.e/ Œv v ; use it to incrementally create author clusters. We D e.nameI e.def where m.surface, m.lc and m.rc are the mention’s only consider merging two author instances if they surface form, left and right contexts, and e.name have the same last name and share the first initial. and e.def are the candidate entity’s name and def- If the first name is spelled out (rather than abbrevi- ated) in both author instances, we also require that inition, respectively. vtext is a bag-of-words sum encoder for text. We use the same encoder for the the first name matches. mention surface form and the candidate name, and Ontology matching. Popular concepts are another encoder for the mention contexts and entity often represented in multiple KBs. For example, definition. the concept of “artificial neural networks” is repre- Additionally, we include numerical features to sented as entity ID D016571 in the MESH ontology, estimate the confidence of a candidate entity based and represented as page ID ‘21523’ in DBpedia. on the statistics collected in the index described Ontology matching is the problem of identifying

89 semantically-equivalent entities across KBs or on- is being cited and whether it is accelerating), and tologies.9 opens the door for further research to better under- Limited KB coverage. The convenience of stand and predict citations. For example, in order grounding entities in a hand-curated KB comes at to allow users to better understand what impact a the cost of limited coverage. Introduction of new paper had and effectively navigate its citations, we concepts and relations in the scientific literature experimented with methods for classifying a cita- occurs at a faster pace than KB curation, resulting tion as important or incidental, as well as more fine- in a large gap in KB coverage of scientific concepts. grained classes (Valenzuela et al., 2015). The cita- In order to close this gap, we need to develop mod- tion information also enables us to develop models els which can predict textual relations as well as for estimating the potential of a paper or an author. detailed concept descriptions in scientific papers. In Weihs and Etzioni (2017), we predict citation- For the same reasons, we also need to augment based metrics such as an author’s h-index and the the relations imported from the KB with relations citation rate of a paper in the future. Also related extracted from text. Our approach to address both is the problem of predicting which papers should entity and relation coverage is based on distant su- be cited in a given draft (Bhagavatula et al., 2018), pervision (Mintz et al., 2009). In short, we train which can help improve the quality of a paper draft two models for identifying entity definitions and before it is submitted for peer review, or used to relations expressed in natural language in scientific supplement the list of references after a paper is documents, and automatically generate labeled data published. for training these models using known definitions and relations in the KB. 6 Conclusion and Future Work We note that the literature graph currently lacks In this paper, we discuss the construction of a graph, coverage for important entity types (e.g., affilia- providing a symbolic representation of the scien- tions) and domains (e.g., physics). Covering af- tific literature. We describe deployed models for filiations requires small modifications to the meta- identifying authors, references and entities in the data extraction model followed by an algorithm for paper text, and provide experimental results to eval- matching author names with their affiliations. In uate the performance of each model. order to cover additional scientific domains, more Three research directions follow from this work agreements need to be signed with publishers. and other similar projects, e.g., Hahn-Powell et al. Figure and table extraction. Non-textual (2017); Wu et al. (2014): i) improving quality and components such as charts, diagrams and tables enriching content of the literature graph (e.g., on- provide key information in many scientific docu- tology matching and knowledge base population). ments, but the lack of large labeled datasets has im- ii) aggregating domain-specific extractions across peded the development of data-driven methods for many papers to enable a better understanding of the scientific figure extraction. In Siegel et al. (2018), literature as a whole (e.g., identifying demographic we induced high-quality training labels for the task biases in clinical trial participants and summarizing of figure extraction in a large number of scientific empirical results on important tasks). iii) exploring documents, with no human intervention. To accom- the literature via natural language interfaces. plish this we leveraged the auxiliary data provided In order to help future research efforts, we make in two large web collections of scientific documents the following resources publicly available: meta- (arXiv and PubMed) to locate figures and their as- data for over 20 million papers,10 meaningful cita- sociated captions in the rasterized PDF. We use tions dataset,11 models for figure and table extrac- the resulting dataset to train a deep neural network tion,12 models for predicting citations in a paper for end-to-end figure detection, yielding a model draft 13 and models for extracting paper metadata,14 that can be more easily extended to new domains among other resources.15 compared to previous work. 10http://labs.semanticscholar.org/corpus/ Understanding and predicting citations. 11http://allenai.org/data.html The citation edges in the literature graph provide 12https://github.com/allenai/ a wealth of information (e.g., at what rate a paper deepfigures-open 13https://github.com/allenai/citeomatic 9Variants of this problem are also known as deduplication 14https://github.com/allenai/science-parse or record linkage. 15http://allenai.org/software/

90 References Martin Krallinger, Florian Leitner, Obdulia Rabal, Miguel Vazquez, Julen Oyarzabal, and Alfonso Va- Waleed Ammar, Matthew E. Peters, Chandra Bhagavat- lencia. 2015. CHEMDNER: The drugs and chemi- ula, and Russell Power. 2017. The ai2 system at cal names extraction challenge. In J. Cheminformat- semeval-2017 task 10 (scienceie): semi-supervised ics. end-to-end entity and relation extraction. In ACL workshop (SemEval). Guillaume Lample, Miguel Ballesteros, Sandeep K Subramanian, Kazuya Kawakami, and Chris Dyer. Isabelle Augenstein, Mrinal Das, Sebastian Riedel, 2016. Neural architectures for named entity recog- Lakshmi Vikraman, and Andrew D. McCallum. nition. In HLT-NAACL. 2017. Semeval 2017 task 10 (scienceie): Extracting keyphrases and relations from scientific publications. Jiao Li, Yueping Sun, Robin J. Johnson, Daniela Sci- In ACL workshop (SemEval). aky, Chih-Hsuan Wei, Robert Leaman, Allan Peter Davis, Carolyn J. Mattingly, Thomas C. Wiegers, Chandra Bhagavatula, Sergey Feldman, Russell Power, and Zhiyong Lu. 2016. Biocreative v cdr task cor- and Waleed Ammar. 2018. Content-based citation pus: a resource for chemical disease relation extrac- recommendation. In NAACL. tion. Database : the journal of biological databases and curation 2016. Chandra Bhagavatula, Thanapon Noraset, and Doug Downey. 2015. TabEL: entity linking in web tables. Xiao Ling, Sameer Singh, and Daniel S. Weld. 2015. In ISWC. Design challenges for entity linking. Transactions of the Association for Computational Linguistics Ronan Collobert, Jason Weston, Léon Bottou, Michael 3:315–328. Karlen, Koray Kavukcuoglu, and Pavel P. Kuksa. Mike Mintz, Steven Bills, Rion Snow, and Daniel Ju- 2011. Natural language processing (almost) from rafsky. 2009. Distant supervision for relation extrac- scratch. In JMLR. tion without labeled data. In ACL. Aron Culotta, Pallika Kanani, Robert Hall, Michael Jeffrey Pennington, Richard Socher, and Christopher D. Wick, and Andrew D. McCallum. 2007. Author Manning. 2014. GloVe: Global vectors for word rep- disambiguation using error-driven machine learning resentation. In EMNLP. with a ranking loss function. In IIWeb Workshop. Matthew E. Peters, Waleed Ammar, Chandra Bhagavat- Hal Daumé. 2007. Frustratingly easy domain adapta- ula, and Russell Power. 2017. Semi-supervised se- tion. In ACL. quence tagging with bidirectional language models. In ACL. Dina Demner-Fushman, Willie J. Rogers, and Alan R. Aronson. 2017. MetaMap Lite: an evaluation of a Noah Siegel, Nicholas Lourie, Russell Power, and new Java implementation of MetaMap. In JAMIA. Waleed Ammar. 2018. Extracting scientific figures with distantly supervised neural networks. In JCDL. Oren Etzioni. 2011. Search needs a shake-up. Nature 476 7358:25–6. Marco Valenzuela, Vu Ha, and Oren Etzioni. 2015. Identifying meaningful citations. In AAAI Workshop Paolo Ferragina and Ugo Scaiella. 2010. TAGME: (Scholarly Big Data). on-the-fly annotation of short text fragments (by Xiang Wang, Yan Dong, Xiang qian Qi, Yi-Ming Li, wikipedia entities). In CIKM. Cheng-Guang Huang, and Lijun Hou. 2013. Clin- ical review: Efficacy of antimicrobial-impregnated Gus Hahn-Powell, Marco Antonio Valenzuela- catheters in external ventricular drainage - a system- Escarcega, and Mihai Surdeanu. 2017. Swanson atic review and meta-analysis. In Critical care. linking revisited: Accelerating literature-based discovery across domains using a conceptual influence Luca Weihs and Oren Etzioni. 2017. Learning to pre- graph. In ACL. dict citation-based impact measures. In JCDL. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Jian Wu, Kyle Williams, Hung-Hsuan Chen, Madian short-term memory. Neural computation . Khabsa, Cornelia Caragea, Alexander Ororbia, Dou- glas Jordan, and C. Lee Giles. 2014. CiteSeerX: AI Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant in a digital library search engine. In AAAI. Krishnamurthy, and Luke S. Zettlemoyer. 2017. Learning a neural semantic parser from user feed- Chenyan Xiong, Russell Power, and Jamie Callan. back. In ACL. 2017. Explicit semantic ranking for academic search via knowledge graph embedding. In WWW. Antonio J. Jimeno-Yepes, Bridget T. McInnes, and Alan R. Aronson. 2011. Exploiting mesh indexing in medline to generate a data set for word sense disambiguation. BMC bioinformatics 12(1):223.