Information Retrieval and Text Mining Technologies for Chemistry

Total Page:16

File Type:pdf, Size:1020Kb

Information Retrieval and Text Mining Technologies for Chemistry Review pubs.acs.org/CR Information Retrieval and Text Mining Technologies for Chemistry † ○ ‡ ○ § ∥ ⊥ ‡ Martin Krallinger, , Obdulia Rabal, , Analiá Lourenco,̧ , , Julen Oyarzabal,*, # ∇ ■ and Alfonso Valencia*, , , † Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre, C/Melchor Fernandeź Almagro 3, Madrid E-28029, Spain ‡ Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra, Avenida Pio XII 55, Pamplona E-31008, Spain § ESEI - Department of Computer Science, University of Vigo, Edificio Politecnico,́ Campus Universitario As Lagoas s/n, Ourense E-32004, Spain ∥ Centro de Investigaciones Biomedicaś (Centro Singular de Investigacioń de Galicia), Campus Universitario Lagoas-Marcosende, Vigo E-36310, Spain ⊥ CEB-Centre of Biological Engineering, University of Minho, Campus de Gualtar, Braga 4710-057, Portugal # Life Science Department, Barcelona Supercomputing Centre (BSC-CNS), C/Jordi Girona, 29-31, Barcelona E-08034, Spain ∇ Joint BSC-IRB-CRG Program in Computational Biology, Parc Científic de Barcelona, C/ Baldiri Reixac 10, Barcelona E-08028, Spain ■ InstitucióCatalana de Recerca i Estudis Avancatş (ICREA), Passeig de Lluís Companys 23, Barcelona E-08010, Spain ABSTRACT: Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field. CONTENTS 2.2.2. Character Encoding J 2.2.3. Document Segmentation J 1. Introduction B 2.2.4. Sentence Splitting J 1.1. Chemical Information: What D 2.2.5. Tokenizers and Chemical Tokenizers K 1.2. Chemical Information: Where E 2.3. Document Indexing and Term Weighting L 1.2.1. Journal Literature and Conference Pa- 2.3.1. Term Weighting O pers, Reports, Dissertations, Books, and 2.4. Information Retrieval of Chemical Data P Others E 2.4.1. Boolean Search Queries Q 1.2.2. Patents F 2.4.2. Using Metadata for Searching and 1.2.3. Regulatory Reports and Competitive Indexed Entries Q Intelligence Tools G 2.4.3. Query Expansion Q 2. Chemical Information Retrieval H 2.4.4. Keyword Searching and Subject Search- 2.1. Unstructured Data Repositories and Charac- ing R teristics I 2.2. Preprocessing and Chemical Text Tokeniza- tion I Received: December 30, 2016 2.2.1. Document Transformation I © XXXX American Chemical Society A DOI: 10.1021/acs.chemrev.6b00851 Chem. Rev. XXXX, XXX, XXX−XXX Chemical Reviews Review 2.4.5. Vector Space Retrieval Model and 4.4. Chemical Representation AV Extended Boolean Model R 4.4.1. Line Notations AW 2.4.6. Proximity Searches R 4.4.2. Connection Tables AX 2.4.7. Wildcard Queries S 4.4.3. InChI and InChIKey AX 2.4.8. Autocompletion T 4.5. Chemical Normalization or Standardization AX 2.4.9. Spelling Corrections T 5. Chemical Knowledgebases AY 2.4.10. Phrase Searches, Exact Phrase Search- 5.1. Management of Chemical Data AY ing, or Quoted Search U 5.2. Chemical Cartridges AY 2.5. Supervised Document Categorization U 5.3. Structure-Based Chemical Searches AZ 2.5.1. Text Classification Overview U 6. Integration of Chemical and Biological Data BB 2.5.2. Text Classification Algorithms W 6.1. Biomedical Text Mining BD 2.5.3. Text Classification Challenges W 6.1.1. General and Background BD 2.5.4. Documents Clustering X 6.1.2. Evaluations BE 2.6. BioCreative Chemical Information Retrieval 6.2. Detection of Chemical and Biomedical Assessment Y Entity Relations BE 2.6.1. Evaluation Metrics Y 6.2.1. General and Background BE 2.6.2. Community Challenges for IR Z 6.2.2. Chemical−Protein Entity Relation Ex- 3. Chemical Entity Recognition and Extraction AB traction BG 3.1. Entity Definition AB 6.2.3. Chemical Entity−Disease Relation Ex- 3.2. Historical Work on Named Entities AC traction BI 3.3. Methods for Chemical Entity Recognition AC 7. Future Trends BL 3.3.1. General Factors Influencing NER and 7.1. Technological and Methodological Chal- CER AC lenges BM 3.3.2. Chemical and Drug Names AC Author Information BO 3.3.3. Challenges for CER AD Corresponding Authors BO 3.3.4. General Flowchart of CER AE ORCID BO 3.4. Dictionary Lookup of Chemical Names AE Author Contributions BO 3.4.1. Definition and Background AE Notes BO 3.4.2. Lexical Resources for Chemicals AF Biographies BO 3.4.3. Types of Matching Algorithms AG Acknowledgments BO 3.4.4. Problems with Dictionary-Based Meth- Abbreviations BO ods AG References BQ 3.5. Pattern and Rule-Based Chemical Entity Detection AG 3.6. Supervised Machine Learning Chemical 1. INTRODUCTION Recognition AI Because of the transition from printed hardcopy scientific fi 3.6.1. De nition, Types, and Background AI papers to digitized electronic publications, the increasing 3.6.2. Machine Learning Models AJ amount of published documents accessible through the Internet 3.6.3. Data Representation for Machine Learn- and their importance not only for commercial exploitation but ing AK also for academic research, data mining technologies, and 3.6.4. Feature Types, Representation, and search engines have impacted deeply most areas of scientific Selection AK research, communication, and discovery. The Internet has 3.6.5. Phases: Training and Test AL greatly influenced the publication environment,1 not only when 3.6.6. Shortcomings with ML-Based CER AL considering the access and distribution of published literature, fl 3.7. Hybrid Entity Recognition Work ows AL but also during the actual review and writing process. In this 3.8. Annotation Standards and Chemical Corpo- respect, chemistry is a pioneering domain of online information ra AL representation as machine-readable encoding systems for fi 3.8.1. De nition, Types, and Background AL chemical compounds date back to the line notation systems 3.8.2. Biology Corpora with Chemical Entities AM of the late 1940s. 3.8.3. Chemical Text Corpora AN Currently, efficient access to chemical information (section 3.8.4. CHEMDNER Corpus and CHEMDNER 1.1) contained in scientific articles, patents, legacy reports, or Patents Corpus AN the Web is a pressing need shared by researchers and patent 3.9. BioCreative Chemical Entity Recognition attorneys from different chemical disciplines. From a Evaluation AO researcher’s viewpoint, chemists aim to locate documents that 3.9.1. Background AO describe particular aspects of a compound of interest (e.g., 3.9.2. CHEMDNER AP synthesis, physicochemical properties, biological activity, 4. Linking Documents to Structures AR industrial application, crystalline status, safety, and toxicology) 4.1. Name-to-Structure Conversion AR or a particular chemical reaction among the growing collection 4.2. Chemical Entity Grounding AS of published papers and patents. In fact, chemists are among fi 4.2.1. De nition, Types, and Background AS the researchers that spend more time reading articles, and after 4.2.2. Grounding of Other Entities AS medical researchers are the ones that overall read more articles 4.2.3. Grounding of Chemical Entities AT per person.2 For example, there are approximately 10 000 4.3. Optical Compound Recognition AT journals publishing “chemistry” articles.3 Every year, over B DOI: 10.1021/acs.chemrev.6b00851 Chem. Rev. XXXX, XXX, XXX−XXX Chemical Reviews Review 20 000 new compounds are published in medicinal and to items of structured data repositories (i.e., defined database biological chemistry journals.4 Together with journals, patents fields such as already indexed chemicals) or document metadata constitute a valuable information source for chemical (data about the document itself). Metadata attributes of a compounds and reactions:5 the Chemical Abstracts Service document (or data in general) do usually provide some (CAS) states that 77% of new chemical compounds added to descriptive information associated to the document, such as the CAS Registry are disclosed first in patent applications,6 and author, publication dates, author names keywords, or journal the percentage of new compounds added to the CAS information, which can be also exploited for retrieval purposes. REGISTRY database from patents raised from 14% in 1976 Metadata attributes are usually much more structured than the to 46% in 2010, with 576 new compounds being added to the actual document
Recommended publications
  • Serum Malondialdehyde Is Associated with Non-Alcoholic Fatty Liver and Related Liver Damage Differentially in Men and Women
    antioxidants Article Serum Malondialdehyde is Associated with Non-Alcoholic Fatty Liver and Related Liver Damage Differentially in Men and Women Shira Zelber-Sagi 1,2,*, Dana Ivancovsky-Wajcman 1, Naomi Fliss-Isakov 2,3, Michal Hahn 4, 2,3 2,3 2,3, 4, Muriel Webb , Oren Shibolet , Revital Kariv y and Oren Tirosh y 1 School of Public Health, University of Haifa, Haifa 3498838, Israel; [email protected] 2 Department of Gastroenterology, Tel Aviv Medical Center, Tel Aviv 6423914, Israel; naomifl@tlvmc.gov.il (N.F.-I.); [email protected] (M.W.); [email protected] (O.S.); [email protected] (R.K.) 3 Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 6997801, Israel 4 Institute of Biochemistry, Food Science and Nutrition, The RH Smit Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rechovot 76100001, Israel; [email protected] (M.H.); [email protected] (O.T.) * Correspondence: [email protected]; Tel.: +972-3-6973984 The last two authors equally contributed to the paper. y Received: 25 May 2020; Accepted: 25 June 2020; Published: 2 July 2020 Abstract: Background: Non-alcoholic fatty liver disease (NAFLD) and steatohepatitis (NASH) are associated with increased oxidative stress and lipid peroxidation, but large studies are lacking. The aim was to test the association of malondialdehyde (MDA), as a marker of oxidative damage of lipids, with NAFLD and liver damage markers, and to test the association between dietary vitamins E and C intake and MDA levels. Methods: A cross-sectional study was carried out among subjects who underwent blood tests including FibroMax for non-invasive assessment of NASH and fibrosis.
    [Show full text]
  • By in Vivo's Biopharma, Medtech and Diagnostics Teams
    invivo.pharmaintelligence.informa.com JANUARY 2018 Invol. 36 ❚ no. 01 Vivopharma intelligence ❚ informa 2018 OUTLOOK By In Vivo’s Biopharma, Medtech and Diagnostics Teams PAGE LEFT BLANK INTENTIONALLY invivo.pharmaintelligence.informa.com STRATEGIC INSIGHTS FOR LIFE SCIENCES DECISION-MAKERS CONTENTS ❚ In Vivo Pharma intelligence | January 2018 BIOPHARMA MEDTECH 2018 DIAGNOSTICS OUTLOOK 12 22 28 Biopharma 2018: Medtech 2018: Diagnostics 2018: Is There Still A Place For Pharma The Place For Innovation Steady Progress And In The New Health Care As Value-based Health Care The Big Get Bigger Economy? Gains Momentum MARK RATNER WILLIAM LOONEY ASHLEY YEO If the beginning of 2017 was marked 2018 will be a time of transition in health 2017 was a watershed year in many by doubts around whether and how care, when biopharma’s counterparts respects, politically, economically the FDA would act with respect to in adjacent industry segments scale up and commercially for many players complex diagnostics, we enter 2018 in a radical redesign of their traditional in the medtech field. Where will the feeling that slow-moving vessel may business models. Biopharma is not opportunities lie in 2018? Will finally be turning. moving as quickly, and it confronts a breakthrough medtech innovation still strategic dilemma on how to address the have a place among providers often prospect of a much more powerful set of riding on fumes when it comes to 36 rivals in the ongoing battle to own the budgets, and is it all as bad as some patient experience in medicine. would make out? Thirty-five Years Covering Health Care: The More Things Change… 30 PETER CHARLISH A Virtuous Cycle: What The The health care industry has come a Immuno-Oncology Revolution long way in the past 35 years, although Means For Other Disease Areas in some areas very little has changed.
    [Show full text]
  • Oxisresearch™ a Division of OXIS Health Products, Inc
    OxisResearch™ A Division of OXIS Health Products, Inc. BIOXYTECHÒ MDA-586™ Spectrophotometric Assay for Malondialdehyde For Research Use Only. Not For Use In Diagnostic Procedures. Catalog Number 21044 INTRODUCTION The Analyte Lipid peroxidation is a well-established mechanism of cellular injury in both plants and animals, and is used as an indicator of oxidative stress in cells and tissues. Lipid peroxides, derived from polyunsaturated fatty acids, are unstable and decompose to form a complex series of compounds. These include reactive carbonyl compounds, of which the most abundant is malondialdehyde (MDA). Therefore, measurement of malondialdehyde is widely used as an indicator of lipid peroxidation (1). Increased levels of lipid peroxidation products have been associated with a variety of chronic diseases in both humans (2, 3) and model systems (4, 5). MDA reacts readily with amino groups on proteins and other biomolecules to form a variety of adducts (1), including cross-linked products (6). MDA also forms adducts with DNA bases that are mutagenic (7, 8) and possibly carcinogenic (9). DNA-protein cross-links are another result of the reaction between DNA and MDA (10). The TBARS method is commonly used to measure MDA in biological samples (11). However, this reaction is relatively nonspecific; both free and protein-bound MDA can react. The MDA-586 method is designed to assay free MDA or, after a hydrolysis step, total MDA (i.e., free and protein-bound Schiff base conjugates). The assay conditions serve to minimize interference from other lipid peroxidation products, such as 4- hydroxyalkenals. PRINCIPLES OF THE PROCEDURE The MDA-586 method1 (12) is based on the reaction of a chromogenic reagent, N-methyl-2- phenylindole (R1, NMPI), with MDA at 45°C.
    [Show full text]
  • Big-Data Science in Porous Materials: Materials Genomics and Machine Learning
    Lawrence Berkeley National Laboratory Recent Work Title Big-Data Science in Porous Materials: Materials Genomics and Machine Learning. Permalink https://escholarship.org/uc/item/3ss713pj Journal Chemical reviews, 120(16) ISSN 0009-2665 Authors Jablonka, Kevin Maik Ongari, Daniele Moosavi, Seyed Mohamad et al. Publication Date 2020-08-01 DOI 10.1021/acs.chemrev.0c00004 Peer reviewed eScholarship.org Powered by the California Digital Library University of California This is an open access article published under an ACS AuthorChoice License, which permits copying and redistribution of the article or any adaptations for non-commercial purposes. pubs.acs.org/CR Review Big-Data Science in Porous Materials: Materials Genomics and Machine Learning Kevin Maik Jablonka, Daniele Ongari, Seyed Mohamad Moosavi, and Berend Smit* Cite This: Chem. Rev. 2020, 120, 8066−8129 Read Online ACCESS Metrics & More Article Recommendations ABSTRACT: By combining metal nodes with organic linkers we can potentially synthesize millions of possible metal−organic frameworks (MOFs). The fact that we have so many materials opens many exciting avenues but also create new challenges. We simply have too many materials to be processed using conventional, brute force, methods. In this review, we show that having so many materials allows us to use big-data methods as a powerful technique to study these materials and to discover complex correlations. The first part of the review gives an introduction to the principles of big-data science. We show how to select appropriate training sets, survey approaches that are used to represent these materials in feature space, and review different learning architectures, as well as evaluation and interpretation strategies.
    [Show full text]
  • Neuroprotective Effects of Geniposide from Alzheimer's Disease Pathology
    Neuroprotective effects of geniposide from Alzheimer’s disease pathology WeiZhen Liu1, Guanglai Li2, Christian Hölscher2,3, Lin Li1 1. Key Laboratory of Cellular Physiology, Shanxi Medical University, Taiyuan, PR China 2. Second hospital, Shanxi medical University, Taiyuan, PR China 3. Neuroscience research group, Faculty of Health and Medicine, Lancaster University, Lancaster LA1 4YQ, UK running title: Neuroprotective effects of geniposide corresponding author: Prof. Lin Li Key Laboratory of Cellular Physiology, Shanxi Medical University, Taiyuan, PR China Email: [email protected] Neuroprotective effects of geniposide Abstract A growing body of evidence have linked two of the most common aged-related diseases, type 2 diabetes mellitus (T2DM) and Alzheimer disease (AD). It has led to the notion that drugs developed for the treatment of T2DM may be beneficial in modifying the pathophysiology of AD. As a receptor agonist of glucagon- like peptide (GLP-1R) which is a newer drug class to treat T2DM, Geniposide shows clear effects in inhibiting pathological processes underlying AD, such as and promoting neurite outgrowth. In the present article, we review possible molecular mechanisms of geniposide to protect the brain from pathologic damages underlying AD: reducing amyloid plaques, inhibiting tau phosphorylation, preventing memory impairment and loss of synapses, reducing oxidative stress and the chronic inflammatory response, and promoting neurite outgrowth via the GLP-1R signaling pathway. In summary, the Chinese herb geniposide shows great promise as a novel treatment for AD. Key words: Alzheimer’s disease, geniposide, amyloid-β, neurofibrillary tangles, oxidative stress, inflammatation, type 2 diabetes mellitus, glucagon like peptide receptor, neuroprotection, tau protein Neuroprotective effects of geniposide 1.
    [Show full text]
  • Downloaded in MS Word Format (.Docx)
    Wang and Tseng J Cheminform (2019) 11:78 https://doi.org/10.1186/s13321-019-0401-4 Journal of Cheminformatics SOFTWARE Open Access IntelliPatent: a web-based intelligent system for fast chemical patent claim drafting Pei‑Hua Wang1 and Yufeng Jane Tseng1,2* Abstract The frst step of automating composition patent drafting is to draft the claims around a Markush structure with sub‑ stituents. Currently, this process depends heavily on experienced attorneys or patent agents, and few tools are avail‑ able. IntelliPatent was created to accelerate this process. Users can simply upload a series of analogs of interest, and IntelliPatent will automatically extract the general structural scafold and generate the patent claim text. The program can also extend the patent claim by adding commonly seen R groups from historical lists of the top 30 selling drugs in the US for all R substituents. The program takes MDL SD fle formats as inputs, and the invariable core structure and variable substructures will be identifed as the initial scafold and R groups in the output Markush structure. The results can be downloaded in MS Word format (.docx). The suggested claims can be quickly generated with IntelliPatent. This web‑based tool is freely accessible at https ://intel lipat ent.cmdm.tw/. Keywords: Web server, Markush structure, Pharmaceutical patent Introduction of visualizing and analyzing Markush structures from a Claims are the most important sections in composition composition patent [7]. Its web based application, iMa- patents [1]. In the pharmaceutical industry, claims should rVis, revised the underlying R group numbering system include key compounds and all structural derivatives that to deal with nested R group presentation [8].
    [Show full text]
  • Unstructured Data Is a Risky Business
    R&D Solutions for OIL & GAS EXPLORATION AND PRODUCTION Unstructured Data is a Risky Business Summary Much of the information being stored by oil and gas companies—technical reports, scientific articles, well reports, etc.,—is unstructured. This results in critical information being lost and E&P teams putting themselves at risk because they “don’t know what they know”? Companies that manage their unstructured data are better positioned for making better decisions, reducing risk and boosting bottom lines. Most users either don’t know what data they have or simply cannot find it. And the oil and gas industry is no different. It is estimated that 80% of existing unstructured information. Unfortunately, data is unstructured, meaning that the it is this unstructured information, often majority of the data that we are gen- in the form of internal and external PDFs, erating and storing is unusable. And PowerPoint presentations, technical while this is understandable considering reports, and scientific articles and publica- 18 managing unstructured data takes time, tions, that contains clues and answers money, effort and expertise, it results in regarding why and how certain interpre- 3 x 10 companies wasting money by making tations and decisions are made. ill-informed investment decisions. If oil A case in point of the financial risks of bytes companies want to mitigate risk, improve not managing unstructured data comes success and recovery rates, they need to from a customer that drilled a ‘dry hole,’ be better at managing their unstructured costing the company a total of $20 million data. In this paper, Phoebe McMellon, dollars—only to realize several years later Elsevier’s Director of Oil and Gas Strategy that they had drilled a ‘dry hole’ ten years & Segment Marketing, discusses how big Every day we create approximately prior two miles away.
    [Show full text]
  • The Impact of Academic Patenting on the Rate, Quality, and Direction of (Public) Research Output
    Methodology Data & Measurement Results Summary The Impact of Academic Patenting on the Rate, Quality, and Direction of (Public) Research Output Pierre Azoulay1 Waverly Ding2 Toby Stuart3 1Sloan School of Management MIT & NBER [email protected] 2Haas School of Business University of California — Berkeley 3Graduate School of Business Harvard University January 23, 2007 — Northwestern University Azoulay, Ding, Stuart The Impact of Academic Patenting Methodology Data & Measurement Results Summary Outline 1 Motivation(s) 2 Methodology Problems with existing approaches Selection on observables with staggered treatment decisions Implementing IPTCW estimation 3 Data & Measurement Data sources Measuring “patentability” Descriptive statistics 4 Results The determinants of selection into patenting The impact of academic patenting on the rate of publications The impact of academic patenting on the quality of publications The impact of academic patenting on the content of publications 5 Caveats, Summary & Future Directions Azoulay, Ding, Stuart The Impact of Academic Patenting Methodology Data & Measurement Results Summary Outline 1 Motivation(s) 2 Methodology Problems with existing approaches Selection on observables with staggered treatment decisions Implementing IPTCW estimation 3 Data & Measurement Data sources Measuring “patentability” Descriptive statistics 4 Results The determinants of selection into patenting The impact of academic patenting on the rate of publications The impact of academic patenting on the quality of publications The impact
    [Show full text]
  • 1 Application of Text Mining to Biomedical Knowledge Extraction: Analyzing Clinical Narratives and Medical Literature
    Amy Neustein, S. Sagar Imambi, Mário Rodrigues, António Teixeira and Liliana Ferreira 1 Application of text mining to biomedical knowledge extraction: analyzing clinical narratives and medical literature Abstract: One of the tools that can aid researchers and clinicians in coping with the surfeit of biomedical information is text mining. In this chapter, we explore how text mining is used to perform biomedical knowledge extraction. By describing its main phases, we show how text mining can be used to obtain relevant information from vast online databases of health science literature and patients’ electronic health records. In so doing, we describe the workings of the four phases of biomedical knowledge extraction using text mining (text gathering, text preprocessing, text analysis, and presentation) entailed in retrieval of the sought information with a high accuracy rate. The chapter also includes an in depth analysis of the differences between clinical text found in electronic health records and biomedical text found in online journals, books, and conference papers, as well as a presentation of various text mining tools that have been developed in both university and commercial settings. 1.1 Introduction The corpus of biomedical information is growing very rapidly. New and useful results appear every day in research publications, from journal articles to book chapters to workshop and conference proceedings. Many of these publications are available online through journal citation databases such as Medline – a subset of the PubMed interface that enables access to Medline publications – which is among the largest and most well-known online databases for indexing profes- sional literature. Such databases and their associated search engines contain important research work in the biological and medical domain, including recent findings pertaining to diseases, symptoms, and medications.
    [Show full text]
  • Enriching Chemistry Database Content with Human-Like Indexing of Documents
    WHITE PAPER Enriching Chemistry Database Content with Human-Like Indexing Of Documents APPLYING THE POWER OF A TAXONOMY WITH 450 MILLION TERMS Taxonomies are used to index content in databases, making facts and literature discoverable. The size of some taxonomies can make this task daunting. A comprehensive taxonomy to describe chemistry uses roughly 450 million terms, making it ~450 times the size of the English language and ~150 times the size of the next nearest discipline-specific taxonomy. To ensure that Reaxys® fully facilitates discoverability in the far-reaching and complex discipline of chemistry, two processes occur alongside each other: the familiar manual indexing and excerption method; and a novel automatic but human-like indexing process. How can the content in a large and diverse database be made highly discoverable? The English language has over 1 million words. How many do you think the average person can recognize? The online survey TestYourVocab.com enables visitors to test the size of their English vocabulary. The results reveal interesting trends. Data on roughly 450,000 native English speakers shows that most adult survey participants have a vocabulary size of 20,000 to 35,000 words. These are words that they can recognize and define. Topping the charts were individuals who came near 40,000 words. While these figures do not represent the average English-speaking population—it takes a certain type of person to respond to such surveys—and may be biased by the data collection methodology, the survey results hint at an upper limit to the word and concept retention capacity of the human brain.
    [Show full text]
  • Botanica Marina 2017 · FEBRUARY · Volume 60 · Number 1 · PAGES 1–84
    Botanica MARINA 2017 · FEBRUARY · VOLUME 60 · NUMBER 1 · PAGES 1–84 EDITOR-IN-CHIEF Matthew J. Dring, Belfast/Portaferry, Northern Ireland (–2020) ASSOCIATE EDITORS Kirsten Heimann, Townsville, Australia (–2020) Ka-Lai Pang, Keelung, Taiwan (–2020) Georg Pohnert, Jena, Germany (–2017) Michel Poulin, Ottawa, Canada (–2020) EDITORIAL BOARD Charles D. Amsler, Birmingham, USA (–2020) Inka Bartsch, Bremerhaven, Germany (−2018) John Beardall, Clayton, Australia (–2017) John A. Berges, Milwaukee, USA (–2020) Sung Min Boo, Daejeon, Korea (–2018) Olivier De Clerck, Ghent, Belgium (–2018) Jeffrey L. Gaeckle, Olympia, USA (−2018) Mona Hoppenrath, Wilhelmshaven, Germany (–2018) Catriona Hurd, Hobart, Australia (−2018) Gavin W. Maneveldt, Bellville, South Africa (–2018) Michael J. Wynne, Ann Arbor, USA (–2020) Abstracted/INDEXED IN AGRICOLA (National Agricultural Library); Baidu Scholar; Cabell’s Directory; CABI (over 50 subsections); Celdes; Chemical Abstracts Service (CAS) - CAplus; Chemical Abstracts Service (CAS) - SciFinder; CNKI Scholar (China National Knowledge Infrastructure); CNPIEC; EBSCO (relevant databases); EBSCO Discovery Service; Elsevier - BIOBASE/CABS (Current Awareness in Biological Sciences); Elsevier - Fluidex; Elsevier - Geobase; Elsevier - Reaxys; Elsevier - SCOPUS; Gale/Cengage; Genamics JournalSeek; Google Scholar; J-Gate; JournalTOCs; KESLI-NDSL (Korean National Discovery for Science Leaders); Naviga (Softweco); NISC SA: Fish and Fisheries Worldwide; NISC SA: Water Resources Worldwide; Primo Central (ExLibris); ProQuest (relevant
    [Show full text]
  • Construing Patentability of Chemical Technology Inventions, with Focus
    Construing Patentability of Chemical Technology Inventions, with Focus on their Patent Eligibility and Industrial Applicability: A Comparison on the Patent Examination Approach in the Philippines and in Japan By Anthea Kristine Y. Paculan Intellectual Property Office of the Philippines Supervised by Yorimasa Suwa, PhD., MBA Senior Researcher, Asia-Pacific Industrial Property Center, Japan Institution of Promoting Invention and Innovation Advised by Dr. Kazukiyo Nagai Professor, Department of Applied Chemistry, School of Science and Technology, Meiji University Tokiko Mizuochi, Ph.D Project Assistant Professor, Keio University Office for Open Innovation Final Report In fulfillment of the Study Cum Research fellowship program Sponsored by Japan Patent Office 27 June – 25 October 2019 The views and findings in this report are those of the author and do not necessarily reflect the views and policy of the organization or sponsor of this study. i ○c JPO 2020 Abstract The Intellectual Property Office of the Philippines (IPOPHL) must cope up with the emerging challenges in the patent examination of various fields of technology. The Chemical Technology field at IPOPHL covers a wide array of chemical-related subject-matters which in turn has resulted in handling concerns to the examiners assigned to perform substantive examination on such diverse technologies. Japan Patent Office (JPO) has provided comprehensive guidelines addressing various patentability issues, especially that of patent eligibility and industrial applicability of subject matters in the chemical field. By introducing conceptual aspects of the Japanese patent system as a model, this study allowed the investigation of the similarities and differences in JPO’s and IPOPHL’s examination procedure and assessment of patentability requirements, with focus on patent eligibility and industrial applicability of chemical technology inventions.
    [Show full text]