Practical Chemoinformatics Muthukumarasamy Karthikeyan • Renu Vyas

Practical Chemoinformatics Muthukumarasamy Karthikeyan • Renu Vyas Practical Chemoinformatics 1 3 Muthukumarasamy Karthikeyan Renu Vyas Digital Information Resource Centre Scientist (DST) National Chemical Laboratory Division of Chemical Engineering and Pune Process Development India National Chemical Laboratory Pune India ISBN 978-81-322-1779-4 ISBN 978-81-322-1780-0 (eBook) DOI 10.1007/978-81-322-1780-0 Springer New Delhi Dordrecht Heidelberg London New York Library of Congress Control Number: 2014931501 © Springer India 2014 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recita- tion, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplica- tion of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Dedicated to our respected parents and loving children v Foreword The term “cheminformatics” was only coined in 1998; nevertheless, in the last 15+ years this field has experienced a burgeoning growth with respect to the numbers of publications, conferences, specialized journals, and the diversity of research. The editorial published in the inaugural issue of the journal Cheminformatics in January of 2009 outlined major challenging problems facing cheminfomatics such as “over- coming stalled drug discovery … advancing green chemistry … understanding life from chemical prospective, and … enabling the network of the world’s chemical and biological information to be accessible and interpretable”. This visionary editorial emphasized that despite their breadth and complexity cheminformatics embod- ies thenecessary concepts and tools to effectively tackle these vital problems. Addressing challenges facing cheminformatics is exciting but it requires deep understanding of the cheminformatics theory as well as practical knowledge of the many important cheminformatics tools created by specialists working in the field. Practical Chemoinformatics by Karthikeyan and Vyas serves a critical purpose of bringing cheminformatics education and tools to researchers at all levels, from undergraduate students to specialists. The book incorporates ten excellently written chapters that cover cheminformatics methods and applications from A to Z. Not only do the authors provide critical summary of major cheminformatics concepts but most importantly they incorporate many case studies illustrating how typical research problems can be addressed and solved using proprietary as well as open source databases and computational tools. I am confident that the book will be of interest to all scientists working in chemical biology and drug discovery but it will be particularly valuable for beginners and undergraduate, graduate or post-graduate students specializing in chemistry, biology and allied sciences. Alexander Tropsha, PhD UNC Eshelman School of Pharmacy University of North Carolina at ChapelHill, USA vii Preface Chemoinformatics is a key technology for today’s synthetic/medicinal chemist. People with extensive knowledge of chemistry and computer skills are immensely required by the industry. Database producers, chemical software developers, and chemical publishers offer attractive opportunities to the chemoinformaticians. The present book is intended to be a useful practical guide on chemoinformatics for the students at graduate, postgraduate, and Ph.D. levels. There are a couple of books on the theory of chemoinformatics and plenty of scattered information is available on the web but a well structured Do it yourself book is urgently required. The idea is that the reader of any background should be enthused to follow the book and start using the computer or a computer enthusiast can start learning the basics of computational chemistry. With this objective in mind, numerous step by step practice tutorials, source code snippets, and Do it yourself exercise have been given for quick grasp of the subject. The book intends to put the students in the driver’s seat to test drive the software, code snippets, and practice tutorials. Rules of thumb have been provided at the end of every chapter for specific practical guidance. The lan- guage has been intentionally kept simple, technical jargon wherever used has been thoroughly explained. Adequate bibliography has been provided for readers seeking advanced knowledge on any of the given topics. The chapters in the book are linked to each other and at the same time are independent of each other. The book begins with an elementary chapter on how to read and write molecules into a computer and basic file format conversions. The second chapter teaches how to compute properties of molecules and store them in a database. The third chapter delves into the use of computed property data to build models employing machine learning methods. The fourth and fifth chapters deal with protein active site pre- diction and docking studies, both of which are essential for any successful drug design experiment. The sixth and seventh chapter focus on use of reaction and NMR chemical shift based fingerprints respectively, and their use of virtual screening— an important component in chemoinformatics. The eighth chapter deals with text mining and its role in chemoinformatics methods to discover a lead molecule. The ninth and tenth are technology focused chapters that demonstrate ways to handle big data using today’s state of art workflows, portals deployed in distributed, cloud ix x Preface computing platforms, and Android-based app development. To sum up, the purpose behind bringing out this book is to demystify and master chemoinformatics through a practical approach and make students aware of the latest developments in this field. After comprehending the entire book the reader will be able to appreciate the power of chemoinformatics tools and apply them for practical use. Acknowledgments The authors express their deep sense of gratitude and heart-felt thanks to all the contributors of this book without whose help the book would not have seen the light of the day. First and foremost thanks are due to the young enthusiastic team— Deepak Pandit, Chinmai P., Monalisa M., Soumya, Surojit Sadhu, Yogesh Pandit, Apurva for their tireless efforts in compiling data, checking code and proof reading the chapters. We wish to thank senior scientists and mentors Dr. B.D. Kulkarni and Dr. S.S. Tambe for being an inspiration for writing the chapter on machine learning and special guidance regarding the section on genetic programming. The help from academicians, Dr. Sankar and Dr. Agila for the reaction ontology discussion in the chapter on reaction fingerprint and modelling, is greatly acknowledged. The sup- port from industry came from Mr. Sameer Choudhary and Ms. Sapna, CEO of Rasa Life Science Informatics for workflow related topics in chapters 5 and 9. We wish to thank Dr. S. Krishnan for nurturing and guiding the growth of chemoinformatics at NCL. Sincere thanks are due to former NCL directors Dr. R.A. Mashalkar, Dr Paul Ratanasamy, Dr. S. Shivram, and present director Dr. Sourav Pal for being the source of inspiration and constant encouragement. We also wish to express our gratitude towards all our chemoinformatics mentors, collaborators and colleagues whose valuable interactions have helped in career development- Dr J Gasteiger, Prof Alex Tropsha, Dr. Janest Ash, Dr. Wendy Warr, Dr. Peter Murray Rust, Dr. Peter Ertl, Dr Andreas Bender, Dr. Robert Glen, Dr Christopher Steinbeck, Prof Igor Tetko, Dr. Jonathan Goodman to name a few. Finally, we thank the publisher, Springer, for bringing out the book on time. xi Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics �� 1 1.1 Chemoinformatics .............................................................................. 2 1.1.1 Open-Source Tools ............................................................... 2 1.1.2 Introduction to Programming Languages ............................. 3 1.2 Chemical Structure Representation ..................................................

Practical Chemoinformatics Muthukumarasamy Karthikeyan • Renu Vyas

Designing Universal Chemical Markup (UCM) Through the Reusable Methodology Based on Analyzing Existing Related Formats

(CDK): an Open-Source Java Library for Chemo- and Bioinformatics

JUMBO Is an Opensource Toolkit Addressing the Semantic and Ontological Impedances That Are Major Barriers to Interoperability in Computational Chemistry and Physics

Optimizing the Use of Open-Source Software Applications in Drug

E:\Projekty\2001\Bulletin\2001\No07\Buletin 2001 7.Vp

The Chemistry Development Kit (CDK). 3

Open Source Molecular Modeling

Enzyme Reaction Information System Franz Fenninger

Processing CML Conventions in Java Egon L

Download the Source Code and Functionality by Adding Shell Commands

Modularitet Och Objektorientering

Materials Semanac