Introduction to Bioinformatics (Elective) – SBB1609

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to Bioinformatics (Elective) – SBB1609 SCHOOL OF BIO AND CHEMICAL ENGINEERING DEPARTMENT OF BIOTECHNOLOGY Unit 1 – Introduction to Bioinformatics (Elective) – SBB1609 1 I HISTORY OF BIOINFORMATICS Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biologicaldata. As an interdisciplinary field of science, bioinformatics combines computer science, statistics, mathematics, and engineering to analyze and interpret biological data. Bioinformatics has been used for in silico analyses of biological queries using mathematical and statistical techniques. Bioinformatics derives knowledge from computer analysis of biological data. These can consist of the information stored in the genetic code, but also experimental results from various sources, patient statistics, and scientific literature. Research in bioinformatics includes method development for storage, retrieval, and analysis of the data. Bioinformatics is a rapidly developing branch of biology and is highly interdisciplinary, using techniques and concepts from informatics, statistics, mathematics, chemistry, biochemistry, physics, and linguistics. It has many practical applications in different areas of biology and medicine. Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems. "Classical" bioinformatics: "The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information.” The National Center for Biotechnology Information (NCBI 2001) defines bioinformatics as: "Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three important sub- disciplines within bioinformatics: the development of new algorithms and statistics with which to assess relationships among members of large data sets; the analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; and the development and implementation of tools that enable efficient access and management of different types of information 2 Even though the three terms: bioinformatics, computational biology and bioinformation infrastructure are often times used interchangeably, broadly, the three may be defined as follows: 1. bioinformatics refers to database-like activities, involving persistent sets of data that are maintained in a consistent state over essentially indefinite periods of time; 2. computational biology encompasses the use of algorithmic tools to facilitate biological analyses; while 3. bioinformation infrastructure comprises the entire collective of information management systems, analysis tools and communication networks supporting biology. Thus, the latter may be viewed as a computational scaffold of the former two. There are three important sub-disciplines within bioinformatics: • the development of new algorithms and statistics with which to assess relationships among members of large data sets; • the analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; • and the development and implementation of tools that enable efficient access and management of different types of information Bioinformatics definition - other sources • Bioinformatics or computational biology is the use of mathematical and informational techniques, including statistics, to solve biological problems, usually by creating or using computer programs, mathematical models or both. One of the main areas of bioinformatics is the data mining and analysis of the data gathered by the various genome projects. Other areas are sequence alignment, protein structure prediction, systems biology, protein-protein interactions and virtual evolution. (source: www.answers.com) • Bioinformatics is the science of developing computer databases and algorithms for the purpose of speeding up and enhancing biological research. (source: www.whatis.com) • "Biologists using computers, or the other way around. Bioinformatics is more of a tool 3 than a discipline.(source: An Understandable Definition of Bioinformatics , The O'Reilly Bioinformatics Technology Conference, 2003) (4) • The application of computer technology to the management of biological information. Specifically, it is the science of developing computer databases and algorithms to facilitate and expedite biological research.(source: Webopedia) • Bioinformatics: a combination of Computer Science, Information Technology and Genetics to determine and analyze genetic information. (Definition from BitsJournal.com) • Bioinformatics is the application of computer technology to the management and analysis of biological data. The result is that computers are being used to gather, store, analyse and merge biological data.(EBI - 2can resource) • Bioinformatics is concerned with the creation and development of advanced information and computational technologies to solve problems in biology. • Bioinformatics uses techniques from informatics, statistics, molecular biology and high-performance computing to obtain information about genomic or protein sequence data. Bioinformaticist versus a Bioinformatician A bioinformaticist is an expert who not only knows how to use bioinformatics tools, but also knows how to write interfaces for effective use of the tools. A bioinformatician , on the other hand, is a trained individual who only knows to use bioinformatics tools without a deeper understanding. Aims of Bioinformatics In general, the aims of bioinformatics are three-fold. 1. The first aim of bioinformatics is to store the biological data organized in form of a database. This allows the researchers an easy access to existing information and submit new entries. These data must be annoted to give a suitable meaning or to assign its functional characteristics. The databases must also be able to correlate between different hierarchies of information. For example: GenBank for nucleotide and protein sequence information, Protein Data Bank for 3D macromolecular structures, etc. 4 2. The second aim is to develop tools and resources that aid in the analysis of data. For example: BLAST to find out similar nucleotide/amino-acid sequences, ClustalW to align two or more nucleotide/amino-acid sequences, Primer3 to design primers probes for PCR techniques, etc. 3. The third and the most important aim of bioinformatics is to exploit these computational tools to analyze the biological data interpret the results in a biologically meaningful manner. Goals The goals of bioinformatics thus is to provide scientists with a means to explain 1. Normal biological processes 2. Malfunctions in these processes which lead to diseases 3. Approaches to improving drug discovery To study how normal cellular activities are altered in different disease states, the biological data must be combined to form a comprehensive picture of these activities. Therefore, the field of bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data. This includes nucleotide and amino acid sequences, protein domains, and protein structures. The actual process of analyzing and interpreting data is referred to as computational biology. Important sub-disciplines within bioinformatics and computational biology include: • Development and implementation of computer programs that enable efficient access to, use and management of, various types of information • Development of new algorithms (mathematical formulas) and statistical measures that assess relationships among members of large data sets. For example, there are methods to locate a gene within a sequence, to predict protein structure and/or function, and to cluster protein sequences into families of related sequences. The primary goal of bioinformatics is to increase the understanding of biological processes. What sets it apart from other approaches, however, is its focus on developing and applying 5 computationally intensive techniques to achieve this goal. Examples include: pattern recognition, data mining, machine learning algorithms, and visualization. Major research efforts in the field include sequence alignment, gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein structure prediction, prediction of gene expression and protein–protein interactions, genome-wide association studies, the modeling of evolution and cell division/mitosis. Bioinformatics now entails the creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data. Tools: Used in three areas • Molecular Sequence Analysis • Molecular Structural Analysis • Molecular Functional Analysis Over the past few decades, rapid developments in genomic and other molecular research technologies and developments in information technologies have combined to produce a tremendous amount of information related to molecular biology. Bioinformatics is the name given to these mathematical
Recommended publications
  • Olaseni Sode
    Olaseni Sode Work Address: 5735 S. Ellis Ave. [email protected] Searle 231 +1 (314) 856-2373 The University of Chicago Chicago, IL 60615 Home Address: 1119 E Hyde Park Blvd [email protected] Apt. 3E +1 (314) 856-2373 Chicago, IL 60615 http://www.olaseniosode.com Education University of Illinois at Urbana-Champaign Urbana, IL Ph.D. in Chemistry September 2012 – Supervisor: Professor So Hirata – Thesis title: A Theoretical Study of Molecular Crystals – Transferred from the University of Florida in 2010 with Prof. Hirata Morehouse College Atlanta, GA B.S. & B.A. in Chemistry and French December 2006 – Studied at the Université de Nantes, Nantes, France (2004-2005) Research Experience The University of Chicago Chicago, IL Postdoctoral Research – research advisor: Dr. Gregory A. Voth 2012–present – Studied proposed proton transport minimum free-energy pathways in [FeFe]-hydrogenase via molecular dynamics simulations and the multistate empirical valence bond technique. – Collaborated to develop the multiscale string method to accelerate refinement of chemical reaction pathways of adenosine triphosphate (ATP) hydrolysis in globular and filamentous actin proteins. University of California, San Diego La Jolla, CA Visiting Scholar – research advisor: Dr. Francesco Paesani 2014, 2015 – Developed and implemented a reactive molecular dynamics scheme for condensed phase systems of the hydrated hydronium ions. University of Illinois at Urbana-Champaign Urbana, IL Doctoral Research – research advisor: Dr. So Hirata 2010-2012 – Characterized the vibrational structure of extended systems including infrared and Raman-active vibrations, as well as phonon dispersions and densities of states in the first Brillouin zone. – Developed zero-point vibrational energy analysis scheme for molecular clusters using a fragmented, local basis scheme.
    [Show full text]
  • Pysimm: a Python Package for Simulation of Molecular Systems
    SoftwareX 6 (2016) 7–12 Contents lists available at ScienceDirect SoftwareX journal homepage: www.elsevier.com/locate/softx pysimm: A python package for simulation of molecular systems Michael E. Fortunato a, Coray M. Colina a,b,∗ a Department of Chemistry, University of Florida, Gainesville, FL 32611, United States b Department of Materials Science and Engineering and Nuclear Engineering, University of Florida, Gainesville, FL 32611, United States article info a b s t r a c t Article history: In this work, we present pysimm, a python package designed to facilitate structure generation, simulation, Received 15 August 2016 and modification of molecular systems. pysimm provides a collection of simulation tools and smooth Received in revised form integration with highly optimized third party software. Abstraction layers enable a standardized 1 December 2016 methodology to assign various force field models to molecular systems and perform simple simulations. Accepted 5 December 2016 These features have allowed pysimm to aid the rapid development of new applications specifically in the area of amorphous polymer simulations. Keywords: ' 2016 The Authors. Published by Elsevier B.V. Amorphous polymers Molecular simulation This is an open access article under the CC BY license Python (http://creativecommons.org/licenses/by/4.0/). Code metadata Current code version v0.1 Permanent link to code/repository used for this code version https://github.com/ElsevierSoftwareX/SOFTX-D-16-00070 Legal Code License MIT Code versioning system used git Software code languages, tools, and services used python2.7 Compilation requirements, operating environments & Linux dependencies If available Link to developer documentation/manual http://pysimm.org/documentation/ Support email for questions [email protected] 1.
    [Show full text]
  • Quantitative Structure–Activity Relationships (Qsar) Study on Novel 4-Amidinoquinoline and 10- Amidinobenzonaphthyridine Derivatives As Potent Antimalaria Agent
    JCECThe Journal of Engineering and Exact Sciences – JCEC, Vol. 05 N. 03 (2019) journal homepage: https://periodicos.ufv.br/ojs/jcec doi: 10.18540/jcecvl5iss3pp0271-0282 OPEN ACCESS – ISSN: 2527-1075 QUANTITATIVE STRUCTURE–ACTIVITY RELATIONSHIPS (QSAR) STUDY ON NOVEL 4-AMIDINOQUINOLINE AND 10- AMIDINOBENZONAPHTHYRIDINE DERIVATIVES AS POTENT ANTIMALARIA AGENT A. W. MAHMUD1, G. A. SHALLANGWA1 and A. UZAIRU1 1Ahmadu Bello University, Department of Chemistry, Zaria, Kaduna, Nigeria. A R T I C L E I N F O A B S T R A C T Article history: Quantitative structure–activity relationships (QSAR) has been a reliable study in the Received 2019-02-16 development of models that predict biological activities of chemical substances based on Accepted 2019-06-27 Available online 2019-06-30 their structures for the development of novel chemical entities. This study was carried out on 44 compounds of 4-amidinoquinoline and 10-amidinobenzonaphthyridine derivatives to p a l a v r a s - c h a v e develop a model that relates their structures to their activities against Plasmodium QSAR falciparum. Density Functional Theory (DFT) with basis set B3LYP/6-31G was used to ∗ Antimalaria optimize the compounds. Genetic Function Algorithm (GFA) was employed in selecting Plasmodium falciparum descriptors and building the model. Four models were generated and the model with best 4-Amidinoquinolina internal and external validation has internal squared correlation coefficient ( 2) of 0.9288, k e y w o r d s adjusted squared correlation coefficient ( adj) of 0.9103, leave-one-out (LOO) cross- 2 2 QSAR validation coefficient ( cv) value of 0.8924 and external squared correlation coefficient ( ) Antimalaria value of 0.8188.
    [Show full text]
  • Open Data, Open Source, and Open Standards in Chemistry: the Blue Obelisk Five Years On" Journal of Cheminformatics Vol
    Oral Roberts University Digital Showcase College of Science and Engineering Faculty College of Science and Engineering Research and Scholarship 10-14-2011 Open Data, Open Source, and Open Standards in Chemistry: The lueB Obelisk five years on Andrew Lang Noel M. O'Boyle Rajarshi Guha National Institutes of Health Egon Willighagen Maastricht University Samuel Adams See next page for additional authors Follow this and additional works at: http://digitalshowcase.oru.edu/cose_pub Part of the Chemistry Commons Recommended Citation Andrew Lang, Noel M O'Boyle, Rajarshi Guha, Egon Willighagen, et al.. "Open Data, Open Source, and Open Standards in Chemistry: The Blue Obelisk five years on" Journal of Cheminformatics Vol. 3 Iss. 37 (2011) Available at: http://works.bepress.com/andrew-sid-lang/ 19/ This Article is brought to you for free and open access by the College of Science and Engineering at Digital Showcase. It has been accepted for inclusion in College of Science and Engineering Faculty Research and Scholarship by an authorized administrator of Digital Showcase. For more information, please contact [email protected]. Authors Andrew Lang, Noel M. O'Boyle, Rajarshi Guha, Egon Willighagen, Samuel Adams, Jonathan Alvarsson, Jean- Claude Bradley, Igor Filippov, Robert M. Hanson, Marcus D. Hanwell, Geoffrey R. Hutchison, Craig A. James, Nina Jeliazkova, Karol M. Langner, David C. Lonie, Daniel M. Lowe, Jerome Pansanel, Dmitry Pavlov, Ola Spjuth, Christoph Steinbeck, Adam L. Tenderholt, Kevin J. Theisen, and Peter Murray-Rust This article is available at Digital Showcase: http://digitalshowcase.oru.edu/cose_pub/34 Oral Roberts University From the SelectedWorks of Andrew Lang October 14, 2011 Open Data, Open Source, and Open Standards in Chemistry: The Blue Obelisk five years on Andrew Lang Noel M O'Boyle Rajarshi Guha, National Institutes of Health Egon Willighagen, Maastricht University Samuel Adams, et al.
    [Show full text]
  • Reversal of Multidrug Resistance in Mouse Lymphoma Cells by Extracts and Flavonoids from Pistacia Integerrima
    DOI:http://dx.doi.org/10.7314/APJCP.2016.17.1.51 Reversal of Multidrug Resistance in Mouse Lymphoma Cells by Extract and Flavonoids from Pistacia integerrima RESEARCH ARTICLE Reversal of Multidrug Resistance in Mouse Lymphoma Cells by Extracts and Flavonoids from Pistacia integerrima Abdur Rauf1*, Ghias Uddin2, Muslim Raza3, Bashir Ahmad4, Noor Jehan1, Bina S Siddiqui5, Joseph Molnar6, Akos Csonka6, Diana Szabo6 Abstract Phytochemical investigation of Pistacia integerrima has highlighted isolation of two known compounds naringenin (1) and dihydrokaempferol (2). A crude extract and these isolated compounds were here evaluated for their effects on reversion of multidrug resistance (MDR) mediated by P-glycoprotein (P-gp). The multidrug resistance P-glycoprotein is a target for chemotherapeutic drugs from cancer cells. In the present study rhodamine- 123 exclusion screening test on human mdr1 gene transfected mouse gene transfected L5178 and L5178Y mouse T-cell lymphoma cells showed excellent MDR reversing effects in a dose dependent manner. In-silico molecular docking investigations demonstrated a common binding site for Rhodamine123, and compounds naringenin and dihydrokaempferol. Our results showed that the relative docking energies estimated by docking softwares were in satisfactory correlation with the experimental activities. Preliminary interaction profile of P-gp docked complexes were also analysed in order to understand the nature of binding modes of these compounds. Our computational investigation suggested that the compounds interactions with the hydrophobic pocket of P-gp are mainly related to the inhibitory activity. Moreover this study s a platform for the discovery of novel natural compounds from herbal origin, as inhibitor molecules against the P-glycoprotein for the treatment of cancer.
    [Show full text]
  • A Study on Cheminformatics and Its Applications on Modern Drug Discovery
    Available online at www.sciencedirect.com Procedia Engineering 38 ( 2012 ) 1264 – 1275 Internatio na l Conference on Modeling Optimisatio n and Computing (ICMOC 2012) A Study on Cheminformatics and its Applications on Modern Drug Discovery B.Firdaus Begama and Dr. J.Satheesh Kumarb aResearch Scholar, Bharathiar University, Coimbatore, India, [email protected] bAssistant Professor, Bharathiar University, Coimbatore, India, [email protected] Abstract Discovering drugs to a disease is still a challenging task for medical researchers due to the complex structures of biomolecules which are responsible for disease such as AIDS, Cancer, Autism, Alzimear etc. Design and development of new efficient anti-drugs for the disease without any side effects are becoming mandatory in the recent history of human life cycle due to changes in various factors which includes food habit, environmental and migration in human life style. Cheminformaticds deals with discovering drugs based in modern drug discovery techniques which in turn rectifies complex issues in traditional drug discovery system. Cheminformatics tools, helps medical chemist for better understanding of complex structures of chemical compounds. Cheminformatics is a new emerging interdisciplinary field which primarily aims to discover Novel Chemical Entities [NCE] which ultimately results in design of new molecule [chemical data]. It also plays an important role for collecting, storing and analysing the chemical data. This paper focuses on cheminformatics and its applications on drug discovery and modern drug discovery techniques which helps chemist and medical researchers for finding solution to the complex disease. © 2012 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of Noorul Islam Centre for Higher Education.
    [Show full text]
  • Use of Bioinformatics Resources and Tools by Users of Bioinformatics Centers in India Meera Yadav University of Delhi, [email protected]
    University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Library Philosophy and Practice (e-journal) Libraries at University of Nebraska-Lincoln 2015 Use of Bioinformatics Resources and Tools by Users of Bioinformatics Centers in India meera yadav University of Delhi, [email protected] Manlunching Tawmbing Saha Institute of Nuclear Physisc, [email protected] Follow this and additional works at: http://digitalcommons.unl.edu/libphilprac Part of the Library and Information Science Commons yadav, meera and Tawmbing, Manlunching, "Use of Bioinformatics Resources and Tools by Users of Bioinformatics Centers in India" (2015). Library Philosophy and Practice (e-journal). 1254. http://digitalcommons.unl.edu/libphilprac/1254 Use of Bioinformatics Resources and Tools by Users of Bioinformatics Centers in India Dr Meera, Manlunching Department of Library and Information Science, University of Delhi, India [email protected], [email protected] Abstract Information plays a vital role in Bioinformatics to achieve the existing Bioinformatics information technologies. Librarians have to identify the information needs, uses and problems faced to meet the needs and requirements of the Bioinformatics users. The paper analyses the response of 315 Bioinformatics users of 15 Bioinformatics centers in India. The papers analyze the data with respect to use of different Bioinformatics databases and tools used by scholars and scientists, areas of major research in Bioinformatics, Major research project, thrust areas of research and use of different resources by the user. The study identifies the various Bioinformatics services and resources used by the Bioinformatics researchers. Keywords: Informaion services, Users, Inforamtion needs, Bioinformatics resources 1. Introduction ‘Needs’ refer to lack of self-sufficiency and also represent gaps in the present knowledge of the users.
    [Show full text]
  • Establishing the Basis for a CIS (Computer Information Systems) Undergraduate Program: on Seeking the Body of Knowledge
    Information Systems Education Journal (ISEDJ) 13 (5) ISSN: 1545-679X September 2015 Establishing the Basis for a CIS (Computer Information Systems) Undergraduate Program: On Seeking the Body of Knowledge Herbert E. Longenecker, Jr. [email protected] University of South Alabama Mobile, AL 36688, USA Jeffry Babb [email protected] West Texas A&M University Canyon, TX 79016, USA Leslie J. Waguespack [email protected] Bentley University Waltham, Massachusetts 02452, USA Thomas N. Janicki [email protected] University of North Carolina Wilmington Wilmington, NC 28403, USA David Feinstein [email protected] University of South Alabama Mobile, AL 36688, USA Abstract The evolution of computing education spans a spectrum from computer science (CS) grounded in the theory of computing, to information systems (IS), grounded in the organizational application of data processing. This paper reports on a project focusing on a particular slice of that spectrum commonly labeled as computer information systems (CIS) and reflected in undergraduate academic programs designed to prepare graduates for professions as software developers building systems in government, commercial and not-for-profit enterprises. These programs with varying titles number in the hundreds. This project is an effort to determine if a common knowledge footprint characterizes CIS. If so, an eventual goal would be to describe the proportions of those essential knowledge components and propose guidelines specifically for effective undergraduate CIS curricula. Professional computing societies (ACM, IEEE, AITP (formerly DPMA), etc.) over the past fifty years have sponsored curriculum guidelines for various slices of education that in aggregate offer a compendium of knowledge areas in ©2015 EDSIG (Education Special Interest Group of the AITP) Page 37 www.aitp-edsig.org /www.isedj.org Information Systems Education Journal (ISEDJ) 13 (5) ISSN: 1545-679X September 2015 computing.
    [Show full text]
  • Bioinformatics Study of Lectins: New Classification and Prediction In
    Bioinformatics study of lectins : new classification and prediction in genomes François Bonnardel To cite this version: François Bonnardel. Bioinformatics study of lectins : new classification and prediction in genomes. Structural Biology [q-bio.BM]. Université Grenoble Alpes [2020-..]; Université de Genève, 2021. En- glish. NNT : 2021GRALV010. tel-03331649 HAL Id: tel-03331649 https://tel.archives-ouvertes.fr/tel-03331649 Submitted on 2 Sep 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THÈSE Pour obtenir le grade de DOCTEUR DE L’UNIVERSITE GRENOBLE ALPES préparée dans le cadre d’une cotutelle entre la Communauté Université Grenoble Alpes et l’Université de Genève Spécialités: Chimie Biologie Arrêté ministériel : le 6 janvier 2005 – 25 mai 2016 Présentée par François Bonnardel Thèse dirigée par la Dr. Anne Imberty codirigée par la Dr/Prof. Frédérique Lisacek préparée au sein du laboratoire CERMAV, CNRS et du Computer Science Department, UNIGE et de l’équipe PIG, SIB Dans les Écoles Doctorales EDCSV et UNIGE Etude bioinformatique des lectines: nouvelle classification et prédiction dans les génomes Thèse soutenue publiquement le 8 Février 2021, devant le jury composé de : Dr. Alexandre de Brevern UMR S1134, Inserm, Université Paris Diderot, Paris, France, Rapporteur Dr.
    [Show full text]
  • The Security and Management of Computer Network Database In
    The Security and Management of Computer Network Database in Coal Quality Detection Jianhua SHI1, a, Jinhong SUN2 1Guizhou Agency of Quality Supervision and Inspection of Coal Product,Liupanshui city 553001,China [email protected] Keywords: Computer Network Database; Database Security; Coal Quality Detection Abstract. This paper research the results of quality management information system at home and abroad, through the analysis of the domestic coal enterprises coal quality management links and management information system development present situation and existing problems, combining with related theory and system development method of management information system, and according to the coal mining enterprises of computer network security and management, analysis and design of database, and the implementation steps and the implementation of the coal quality management information system of the problem are given their own countermeasure and the suggestion, try to solve demand for management information system of coal enterprise management level, thus improve the coal quality management level and economic benefit of coal enterprise. Introduction With the improvement of China's coal mining mechanization degree and the increase of mining depth, coal quality is on the decline as a whole. At the same time, the user of coal product utilization way more and more widely, use more and more diversified, more and higher to the requirement of coal quality. In coal quality issue, therefore, countless contradictions increasingly acute, the gap of
    [Show full text]
  • Toolboxes for a Standardised and Systematic Study of Glycans
    Campbell et al. BMC Bioinformatics 2014, 15(Suppl 1):S9 http://www.biomedcentral.com/1471-2105/15/S1/S9 RESEARCH Open Access Toolboxes for a standardised and systematic study of glycans Matthew P Campbell1, René Ranzinger2, Thomas Lütteke3, Julien Mariethoz4, Catherine A Hayes5, Jingyu Zhang1, Yukie Akune6, Kiyoko F Aoki-Kinoshita6, David Damerell7,11, Giorgio Carta8, Will S York2, Stuart M Haslam7, Hisashi Narimatsu9, Pauline M Rudd8, Niclas G Karlsson4, Nicolle H Packer1, Frédérique Lisacek4,10* From Integrated Bio-Search: 12th International Workshop on Network Tools and Applications in Biology (NETTAB 2012) Como, Italy. 14-16 November 2012 Abstract Background: Recent progress in method development for characterising the branched structures of complex carbohydrates has now enabled higher throughput technology. Automation of structure analysis then calls for software development since adding meaning to large data collections in reasonable time requires corresponding bioinformatics methods and tools. Current glycobioinformatics resources do cover information on the structure and function of glycans, their interaction with proteins or their enzymatic synthesis. However, this information is partial, scattered and often difficult to find to for non-glycobiologists. Methods: Following our diagnosis of the causes of the slow development of glycobioinformatics, we review the “objective” difficulties encountered in defining adequate formats for representing complex entities and developing efficient analysis software. Results: Various solutions already implemented and strategies defined to bridge glycobiology with different fields and integrate the heterogeneous glyco-related information are presented. Conclusions: Despite the initial stage of our integrative efforts, this paper highlights the rapid expansion of glycomics, the validity of existing resources and the bright future of glycobioinformatics.
    [Show full text]
  • File Formats Exercises
    Introduction to high-throughput sequencing file formats – advanced exercises Daniel Vodák (Bioinformatics Core Facility, The Norwegian Radium Hospital) ( [email protected] ) All example files are located in directory “/data/file_formats/data”. You can set your current directory to that path (“cd /data/file_formats/data”) Exercises: a) FASTA format File “Uniprot_Mammalia.fasta” contains a collection of mammalian protein sequences obtained from the UniProt database ( http://www.uniprot.org/uniprot ). 1) Count the number of lines in the file. 2) Count the number of header lines (i.e. the number of sequences) in the file. 3) Count the number of header lines with “Homo” in the organism/species name. 4) Count the number of header lines with “Homo sapiens” in the organism/species name. 5) Display the header lines which have “Homo” in the organism/species names, but only such that do not have “Homo sapiens” there. b) FASTQ format File NKTCL_31T_1M_R1.fq contains a reduced collection of tumor read sequences obtained from The Sequence Read Archive ( http://www.ncbi.nlm.nih.gov/sra ). 1) Count the number of lines in the file. 2) Count the number of lines beginning with the “at” symbol (“@”). 3) How many reads are there in the file? c) SAM format File NKTCL_31T_1M.sam contains alignments of pair-end reads stored in files NKTCL_31T_1M_R1.fq and NKTCL_31T_1M_R2.fq. 1) Which program was used for the alignment? 2) How many header lines are there in the file? 3) How many non-header lines are there in the file? 4) What do the non-header lines represent? 5) Reads from how many template sequences have been used in the alignment process? c) BED format File NKTCL_31T_1M.bed holds information about alignment locations stored in file NKTCL_31T_1M.sam.
    [Show full text]