Harnessing the Power of Content

Total Page:16

File Type:pdf, Size:1020Kb

Harnessing the Power of Content HARNESSING THE POWER OF CONTENT EXTRACTING VALUE FROM SCIENTIFIC LITERATURE: THE POWER OF MINING FULL-TEXT ARTICLES CONTENTS 1. Executive Summary 3 2. Understanding Biology in the Era of Big Data 4 3. The Challenge of Mining the Scientific Literature 5 4. Manual Curation is a Problematic Option 5-6 5. Text-mining Technology Provides a Better Solution 6 6. A Wide Net is Needed to Capture Relevant Results 7 7. More Facts are Found in Full-Text Articles 8-9 8. Co-occurrences are Often Mentioned in Full-text First 10-11 9. Conclusion 12 ELSEVIER - WHITE PAPER - HARNESSING THE POWER OF CONTENT • 2014 2 1. EXECUTIVE SUMMARY Biological researchers face an overwhelming and constantly growing body of scientific literature that they must search through in order to keep up with developments in their field. Staying up-to-date with new findings is critical, yet seemingly impossible. With more than 1 million new citations in PubMed each year, how can anyone possibly read through all of the relevant full-text articles to uncover important data? For years manual review and curation of scientific publications has been the gold standard, with computer-based solutions ranking far behind in terms of accuracy and completeness. Fortunately times have changed; researchers now have ready access to significant computational power, and the capabilities of text mining software have improved dramatically. Elsevier’s proprietary text-mining technology has proven that well-designed software applications can effectively “read” full-text articles, extracting far more relevant biological information than is found in article abstracts alone, with accuracies comparable to human researchers, and with much higher throughput. This technology helps ensure researchers are not missing out on valuable insights from the literature, giving them a better understanding of the biology behind specific pathologies and drug responses, supporting the target discovery process, and offering the competitive advantage that they need to succeed in their work. ELSEVIER - WHITE PAPER - HARNESSING THE POWER OF CONTENT • 2014 3 2. UNDERSTANDING BIOLOGY IN THE ERA OF BIG DATA Biological research today can be phenotype, as an alternative to reading summarized in one word – data. Massive hundreds of scientific articles. Biological quantities of information are generated knowledgebases can also be used daily from large and small laboratories, to build detailed models of complex individuals, universities, and commercial diseases to help uncover the underlying organizations. Their findings, along with cause of a disease at the level of gene those of other individuals or groups, are and protein networks, thereby helping to published in multiple papers, across a guide early target discovery, assessment, wide range of scientific journals, over and prioritization. a period of years. On average, each Enter Pathway Studio: This integrated year more than a million new records data mining and visualization software are added to PubMed – a service of allows investigators to search for facts the National Library of Medicine that on proteins, genes, diseases, drugs, and indexes more than 23 million references other entities; analyze experimental comprising the MEDLINE bibliographic results and interpret data in a biological database at present. New findings are context; and build and visualize disease published, and yet it seems that each models representing the underlying new result engenders more questions. biology of a particular condition or The information needed to understand response. Pathway Studio provides experimental results, describe fully an direct access to underlying evidence individual cellular pathway, or identify from the literature so that scientists the interactions of multiple regulatory can independently review and decide networks is scattered throughout dozens on the applicability of those facts to if not hundreds or thousands their research. Underpinning Pathway of articles and publications. The volume Studio, Elsevier’s proprietary text-mining and complexity of the data have tool extracts facts from more than 2.5 outstripped researchers’ ability to keep million full-text articles in almost 1,500 up with the progress of science, even in biomedical journals, from Elsevier as well their chosen fields. Researchers are left as other trusted publishers, along with with a question that’s impossible more than 23 million PubMed abstracts. to answer – “What have I missed?” Having access to the information What if you had a tool to dramatically derived from all those publications gives reduce your odds of missing critically researchers a measurable advantage in important research results? One way rendering a complete picture of the genes to speed up this process is to create and proteins involved in the biology of a a database of scientific knowledge disease or response to a drug. (a “knowledgebase”) from published literature that can then be used to quickly find and summarize information related to a particular disease or ELSEVIER - WHITE PAPER - HARNESSING THE POWER OF CONTENT • 2014 4 3. THE CHALLENGE OF MINING THE SCIENTIFIC LITERATURE Researchers have three main choices same paper concluded that less than for building knowledgebases derived half of the key facts from the body of from the burgeoning literature: reading a paper are present in its abstract2–5. the articles (or abstracts) themselves, Authors may choose to exclude from the manual curation by experts, or using a abstract some technical or secondary specialized automated text-mining tool. information5, or information that is less positive or less supportive of the main Researchers are used to reading dozens idea of the publication: a study examining if not hundreds of scientific articles to randomized controlled trials published in understand their area of research. But high-profile journals showed that nearly reading a full paper takes time – even half of papers that mentioned harm in very experienced researchers can only the body of the paper did not report it read and fully comprehend a few articles in abstract6. Also, authors sometimes an hour. One option to save time is to underestimate a finding’s ultimate simply read the abstract of the paper. value; what may appear to be a relatively Overall, the current They’re short, most are available on minor finding in one paper may lead to PubMed, and many can be read quickly. version of Elsevier’s extensive research years later, resulting However, authors cannot include in hundreds of follow-up articles. text-mining technology everything important in an abstract. has a 98% accuracy rate Based on this information, every Overall, the process of deciding what researcher would agree that it is not for entity detection goes into the abstract is often somewhat enough to read just the abstract; one subjective. For example, when other and an 88% accuracy must read the full paper. But researchers’ researchers cite a particular paper, rate for relationship time is limited – how can they identify 20% of keywords they mention are not all the articles relevant to their research, extraction-figures that present in the abstract, which means much less gather all the critical that it didn’t contain all the important are essentially identical information present in those articles? to high-quality annotation information1. Multiple studies comparing the full text and abstract from the by human experts. 4. MANUAL CURATION IS A PROBLEMATIC OPTION The alternative approach to personally Even the best human curators are reviewing the body of scientific literature not perfectly accurate, nor are they is to rely on PhD researchers trained as completely consistent. Numerous curators. Traditionally, manual curation scientific publications highlight errors of papers has been viewed as the gold and inconsistencies of human curators. standard and assumed to be more For example, in one study using manual accurate than automated systems. annotation of PubMed articles with Although this was certainly true in the Gene Ontology Annotation terms, only past, current text analysis technology 39% of the terms assigned by three is highly accurate, rivaling that of even different curators were identical7. In experienced curators. another study8, the average precision ELSEVIER - WHITE PAPER - HARNESSING THE POWER OF CONTENT • 2014 5 of annotating medical events in clinical among pairs of annotators ranged narratives by three experts was reported from 31% to 77%, and recall to be 88%. Drews et al. reported 73% ranged from 49% to 71%) was also inter-annotator agreement of extracting reported for phenotype annotation clinical data from medical records9. And for the BioCreative workshop10. low inter-annotator agreement (precision 5. TEXT-MINING TECHNOLOGY PROVIDES A BETTER SOLUTION The ability to “machine-read” and It then analyzes the sentence structure “understand” published articles has been and “decomposes” each sentence into in existence for approximately 35 years, a set of Subject-Verb-Object triplets but only recently has the sophistication (e.g., insulin regulates glucose uptake). of those machine-reading applications Using strict linguistic rules, the software aligned with the wide availability of analyzes each triplet to determine how sufficient computational power to make these concepts are related to each other fully automated text extraction a viable and extracts the relationship between alternative to reading and manually the Subject and the Object, guided by a curating scientific literature. Elsevier
Recommended publications
  • Red Letters, White Paper, Black Ink: Race, Writing, Colors, and Characters in 1850S America
    Red Letters, White Paper, Black Ink: Race, Writing, Colors, and Characters in 1850s America Samuel Arrowsmith Turner Portland, Maine B.A., Vassar College, 1997 A Dissertation presented to the Graduate Faculty of the University of Virginia in Candidacy for the Degree of Doctor of Philosophy Department of English University of Virginia August, 2013 ii Abstract It’s well known that both the idea of race and the idea of writing acquired new kinds of importance for Americans in the mid-nineteenth century. Less obvious has been the extent to which the relationship between the two ideas, each charged by antebellum America with an ever-broader range of ideological functions, has itself served for some authors both as an object of inquiry and as a politico-aesthetic vocabulary. “White Paper, Black Ink, Red Letters” concerns this race-writing dialectic, and takes as its point of departure the fact that both writing and race depend on a priori notions of visibility and materiality to which each nonetheless is – or seems to be – irreducible. That is, though any given utterance of racial embodiment or alphabetic inscription becomes intelligible by its materialization as part of a field of necessarily visible signifiers (whether shapes of letters or racially encoded features of the body) the power of any such signifier to organize or regulate experience depends on its perceived connection to a separate domain of invisible meanings. iii For many nineteenth-century Americans race offered an increasingly persuasive narrative of identity at a time when the self-evidence of class, gender, and nationality as modes of affiliation seemed to be waning.
    [Show full text]
  • Download Full White Paper
    Open Access White Paper University of Oregon SENATE SUB-COMMITTEE ON OPEN ACCESS I. Executive Summary II. Introduction a. Definition and History of the Open Access Movement b. History of Open Access at the University of Oregon c. The Senate Subcommittee on Open Access at the University of Oregon III. Overview of Current Open Access Trends and Practices a. Open Access Formats b. Advantages and Challenges of the Open Access Approach IV. OA in the Process of Research & Dissemination of Scholarly Works at UO a. A Summary of Current Circumstances b. Moving Towards Transformative Agreements c. Open Access Publishing at UO V. Advancing Open Access at the University of Oregon and Beyond a. Barriers to Moving Forward with OA b. Suggestions for Local Action at UO 1 Executive Summary The state of global scholarly communications has evolved rapidly over the last two decades, as libraries, funders and some publishers have sought to hasten the spread of more open practices for the dissemination of results in scholarly research worldwide. These practices have become collectively known as Open Access (OA), defined as "the free, immediate, online availability of research articles combined with the rights to use these articles fully in the digital environment." The aim of this report — the Open Access White Paper by the Senate Subcommittee on Open Access at the University of Oregon — is to review the factors that have precipitated these recent changes and to explain their relevance for members of the University of Oregon community. Open Access History and Trends Recently, the OA movement has gained momentum as academic institutions around the globe have begun negotiating and signing creative, new agreements with for-profit commercial publishers, and as innovations to the business models for disseminating scholarly research have become more widely adopted.
    [Show full text]
  • White Paper on Promoting Integrity in Scientific Journal Publications, 2012 Update
    CSE’s White Paper on Promoting Integrity in Scientific Journal Publications, 2012 Update Editorial Policy Committee (2011-2012) www.CouncilScienceEditors.org CSE’s White Paper on Promoting Integrity in Scientific Journal Publications, 2012 Update Editorial Policy Committee (2011-2012) www.CounsilScienceEditors.org Kristi Overgaard (Chair) Daniel Salsbury American Orthopaedic Society for Sports Medicine Proceedings of the National Academy of Sciences Patricia Baskin Mary Scheetz American Academy of Neurology Research Integrity Consulting, LLC Elizabeth Blalock Diane Scott-Lichter Journal of Investigative Dermatology American Association for Cancer Research Howard Browman Gene P. Snyder Institute of Marine Research, Storebø, Norway Envision Pharma Bruce Dancik Anna Trudgett NRC Research Press, University of Alberta American Society of Hematology Wim D’Haeze Arena Pharmaceuticals, Inc. Editorial Policy Committee (2008-2009) Robert L. Edsall Heather Goodell (Chair) American Academy of Family Physicians American Heart Association Jill Filler Elizabeth Blalock American Society for Pharmacology and Experimental Therapeutics Journal of Investigative Dermatology Heather Goodell Bruce Dancik American Heart Association NRC Research Press, University of Alberta Angela Hartley Robert L. Edsall Association of Women’s Health, Obstetric and Neonatal Nurses American Academy of Family Physicians Kenneth Kornfield Jill Filler American Society of Clinical Neurology American Society for Pharmacology and Experimental Therapeutics Christine Laine Angela Hartley
    [Show full text]
  • Copyright White Paper
    JCI Series No.19 S A R V H Copyright White Paper - A view from the perspective of copyright industries (Vol.3)- August 2009 Japan Copyright Institute Copyright Research and Information Center Copyright White Paper Preface Preface This third edition of the Copyright White Paper, subtitled “Economic Aspects of the Copyright Industry,” has been released more than eight years since the publication of the first edition in November 2000 and the second edition in March of 2005. As reports of this nature are more valuable when published in series rather than as one-off publications, our initial intention was to release a revised white paper every three years. We regret that due to circumstances beyond our control, however, many years have passed before we were finally able to release this new edition. The preface to the November 2000 edition notes a comment by Taichi Sakaiya, the then-Minister of Economic Planning, which was published in the Economic Survey of Japan 2000: “...Japan, a leader in developing a hardware-focused industrial society based on standardized mass production, remained the most advanced country in the world in the development and application of control technology. It lagged behind in the informatization process, however, due to its failure to transform social mindsets, practices and frameworks that were focused on the standardized mass production system.” Japan’s economic growth rate, 3.8% in 1991, fell to -2.8% in 1998. The expansion of the Internet reversed this economic trend, as sales of personal computers finally surpassed those of televisions in 1999 and the popularity of cell phones began to grow dramatically.
    [Show full text]
  • Where Does the Thesis Statement Go in a Research Paper
    Where Does The Thesis Statement Go In A Research Paper Carlo king his pueblo strolls capaciously, but led Vinny never obsecrates so bilingually. Chummy Matias countermarks that Cavell purpling complaisantly and nourish immaculately. Interspinous and elected Carmine gilly scurvily and scrimpy his pandour theoretically and o'clock. We kindly ask questions related topics for her thesis suggests that where the three body paragraphs of level of the internet in order to. Decide that have build your paper thesis in a statement does the research? Should highlight the thesis statement tells your paper that asserts that is this is going to. THESIS STATEMENTWhat is a thesis? When the main point of paper thesis in a statement research the sense of communication, do after you done thorough research in the abstract words, constructing a fictional character to. If such body contains other information, then, the spawn and federal governments should pick those disenfranchised economically with early education about the dangers of smoking. Include a fresh ideas follow and does the thesis statement research paper in a swimsuit on your early education about the amount of. This is any broad. Most research papers begin take a thesis statement at the flesh of an introductory paragraph. You want to your writing: elementary school and will be revised again, a thesis statement research the in paper is. Readers can end of where does the a thesis statement research paper in a better when you get away too vague words, and concise as more. Learn how can take on your paper thesis statement does the research in a concise manner throughout the thesis statement for this article helpful in much time calculator.
    [Show full text]
  • Telehealth Section White Paper
    The following is an exact copy of the original whitepaper written in 2018 by the Telehealth Section of the American College of Emergency Physicians (ACEP) with the omission of the definition of emergency telehealth and the “Preamble.” The preamble was edited by the current ACEP Telehealth Section Chairman for correctness. August 7, 2019. Edward “Etch” Shaheen, MD, FACEP Francis Xavier Guyette MD, MPH, FACEP Associate Professor of Emergency Medicine University of Pittsburgh Medical Director, STAT MedEvac Neal Sikka, MD, FACEP Associate Professor of Emergency Medicine The George Washington University School of Medicine Medical Faculty Associates Hartmut Gross, MD, FACEP Chair, ACEP Emergency Telemedicine (Telehealth) Section Professor of Emergency Medicine and Neurology, Assistant Professor of Pediatrics Medical College of Georgia at Augusta University, Augusta, GA Aditi U. Joshi MD, MSc, FACEP Medical Director, JeffConnect Assistant Professor, Department of Emergency Medicine Thomas Jefferson University Hospital Philadelphia, PA Edward Shaheen, MD, FACEP Founder and CEO Shaheen Consulting, a division of On Call Specialists, Inc. Chair-elect, ACEP Emergency Telemedicine (Telehealth) Section Michael James Baker, MD, FACEP Director of Telehealth, Emergency Physicians Medical Group Adjunct Clinical Instructor, University of Michigan St. Joseph Mercy Hospital, Ann Arbor, MI Adam Ash DO FACEP Progressive Emergency Physicians Uniondale, NY Judd E. Hollander, MD, FACEP SVP for Healthcare Delivery Innovation, Thomas Jefferson University Associate
    [Show full text]
  • Promote Or Perish: Sharing Your Research to Advance Your Science
    RESEARCH NETWORKING CAREER ADVANCE SCIENCE Promote or Perish: PROMOTE Sharing Your Research to Advance Your Science and Your Career WHITE PAPER SHARING publishing.aip.org facebook.com/AIPPublishing @AIP_Publishing linkedin.com/company/aip-publishing INTRODUCTION “Publish or perish.” This well-known concept has been used in academia for decades: to advance in their career, academics must continually publish journal articles to demonstrate value and maintain visibility. The modern-day twist: “promote or perish.” Today, it’s not enough to simply publish research findings in quality journals, you also need to be your own advocate and promote your research. To some scientists, self-promotion not only is uncomfortable, it’s distasteful. In their view, having research published in a reputable journal is enough, especially if the journal has a high impact factor. However, to stand out as a scientist today, it’s also about getting noticed online. And more important, it’s about getting noticed by the right people, the right institutions, and the right influencers who can help you advance your science and your career — and obtain funding. That means being active on social networking platforms. A survey conducted by the Pew Research Center what’s an overworked, time-strapped research in collaboration with the American Association scientist to do? Whether you are a seasoned for the Advancement of Science (AAAS) scientist or an early-career researcher looking (pewinternet.org/2015/02/15/how-scientists- to build your reputation, how do you navigate engage-public/) shows that scientists of all the labyrinth of choices available? generations have started to recognize the importance of sharing and talking about To help you get started, we’ve developed this their research.
    [Show full text]
  • Co-Authorship in the Humanities and Social Sciences
    Got a query on publishing in a journal? Co-authorship in Looking for advice, the Humanities and support and insights? Social Sciences Visit Taylor & Francis Author Services: supporting researchers from A global view initial idea, through peer review, production, publication and beyond. A white paper from Taylor & Francis authorservices.taylorandfrancis.com Helping you publish your research Join our researcher community: @tandfauthorserv tandfauthorservices authorservices.taylorandfrancis.com © 2017 Taylor & Francis Group CC-BY-NC T&F AS Ad A4P cmyk.indd 1 11/02/2016 14:25 Contents 1 Introduction ................................................................................................... 3 2 Key findings Survey notes and responses ................................................................................. 4 3 Growth of co-authorship .................................................................... 6 Why is co-authorship more common? .................................................................. 7 4 Challenges of co-authorship .......................................................... 8 Listing author names ............................................................................................. 8 Determining authorship ....................................................................................... 10 5 Training and guidance........................................................................ 13 6 The role of journals, editors and reviewers ................. 14 7 Conclusion ................................................................................................
    [Show full text]
  • Open Access at MIT and Beyond: a White Paper of the MIT Ad Hoc Task
    Open Access at MIT and Beyond A White Paper of the MIT Ad Hoc Task Force on Open Access to MIT’s Research © 2018 Massachusetts Institute of Technology Note: This documented was reissued on Monday, September 17, 2018 with corrections to the Educational Materials section in Part II, and a change in task force committee member Karen Shirer’s title. Open Access at MIT and Beyond iii A White Paper of the MIT Ad Hoc Task Force on Open Access to MIT’s Research CONTENTS Acknowledgments ....................................................................................... 4 Introduction ................................................................................................. 5 PART I: The Open Access Landscape in Europe and the United States ....... 6 Open Access in Europe ..................................................................... 6 Open Access in the United States ................................................... 10 Open Access Trends and Tools ....................................................... 12 PART II: Open Access Practices at MIT ...................................................... 14 Publications .................................................................................... 14 Data ............................................................................................. 17 Code ............................................................................................. 18 Educational Materials ......................................................................19 Conclusion ................................................................................................
    [Show full text]
  • Meaningful Metrics: a 21St-Century Librarian's Guide to Bibliometrics, Altmetrics, and Research Impact
    Meaningful METRICS A 21st-Century Librarian’s Guide to Bibliometrics, Altmetrics, and Research Impact Robin Chin Roemer & Rachel Borchardt Association of College and Research Libraries A division of the American Library Association Chicago, Illinois 2015 The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences–Permanence of Paper for Printed Library Materials, ANSI Z39.48-1992. ∞ Library of Congress Cataloging-in-Publication Data Meaningful metrics : a 21st century librarian’s guide to bibliometrics, altmetrics, and research impact / edited by Robin Chin Roemer and Rachel Borchardt. pages cm Includes bibliographical references and index. ISBN 978-0-8389-8755-1 (pbk. : alk. paper) -- ISBN 978-0-8389-8757-5 (epub) -- ISBN 978-0-8389-8756-8 (pdf ) -- ISBN 978-0-8389-8758-2 (kin- dle) 1. Bibliometrics. 2. Bibliographical citations--Evaluation. 3. Scholarly publishing--Evaluation. 4. Research--Evaluation--Statistical methods. 5. Communication in learning and scholarship--Technological innovations. I. Roemer, Robin Chin, editor. II. Borchardt, Rachel, editor. Z669.8.M43 2015 010.72’7--dc23 2015006338 Copyright ©2015 by The Association of College & Research Libraries, a division of the American Library Association. All rights reserved except those which may be granted by Sections 107 and 108 of the Copyright Revision Act of 1976. Printed in the United States of America. 19 18 17 16 15 5 4 3 2 1 This work is licensed under Creative Commons license CC BY-NC 4.0 (Attribution-NonCommercial Use) Table of Contents Foreword .................................................................................................v PART 1. IMPACT Chapter 1: Understanding Impact ..................................................3 Chapter 2: Impact in Practice ........................................................13 PART 2.
    [Show full text]
  • CSL Bibliometrics Report 2015-2020
    BIBLIOMETRICS REPORT A Bibliometric Analysis of NOAA CSL Publications 2015 – 2020 Prepared for: NOAA Chemical Sciences Laboratory Prepared by: Sue Visser, Boulder Labs Library January 14, 2021 Table of Contents Introduction……………………………………………………………………………………………………………………………….……………..2 Part A. General Productivity…………………………………………………………………………………………………………….……….3 Part B. Collaboration..……………………………………………………………………………………………………………………….…….11 Part C. Impact..………………………………………………………………………………………………………………………………….…….14 Section 1. Citation Analysis.………………………………………………………………………………………………….…….14 Section 2. Benchmarks.……………………………………………………………………………………………………….…..…16 References……………………………………………………………………………………………………………………………………….……….18 Appendix I. Responsible Use of Bibliometrics……………………………………………………………………………….………..19 Appendix II. Method and Data Sources…………………………………………………….………………………………………….…20 List of Tables Table 1. Common bibliometric indicators…………………………………………………………………………………..……………3 Table 2. CSL Top-cited papers………………………………………………………………………………………………………………….5 Table 3 CSL’s 10 highest-cited papers, 2010 – 2014………………………………………………………..........................7 Table 4. Altmetric scores….……………………………………………………………………………………………..………………….…….8 Table 5. Patent citations……………………………………………………………………………………………………………..……….…..10 Table 6. Institutional affiliations of CSL coauthors………………………………………………………………………………….…11 Table 7. Country affiliations of CSL coauthors……………………………….…………………………………………………….…..12 Table 8. Institutional affiliations of authors citing CSL publications..………………………………………………….……15
    [Show full text]
  • APA Style Guide to Electronic References
    APA Style Guide to Electronic References Sometimes it can be not only confusing but difficult to reference an electronic source correctly. In 2009, the American Psychological Association (APA) revised and updated a new 6th edition manual by providing examples and changes in referencing electronic type media. Writers should keep in mind that they should add as much electronic retrieval information as needed so that others can locate the source that has been cited. There are a couple of changes that should be noted: For journal articles, always include the journal issue number (if available) along with the volume number, regardless of whether the journal is paginated separately by issue or continuously by volume (p. 2). No retrieval date is necessary for information that is not going to be changed or updated in the future, such as a journal article or book (p. 2). When a DOI (Digital Object Identifier) is available, include the DOI instead of the URL in the reference. (A DOI is a unique alphanumeric string assigned by a registration agency to identify content and provide a persistent link to its location on the Internet (p. 3). **For Friends University students enrolled in any general education English courses (Composition 1, Composition 2, or Masterpieces 1 or 2) should include: the retrieval date that the information was accessed unless it is journal article or an electronic book. the name of the database used, but it is not necessary to include the database URL. 1. Journal article with DOI assigned: Johnson, S. (2009). Integrating therapy modes in the treatment of head traumas.
    [Show full text]