Three-Dimensional Structures of Carbohydrates and Where to Find Them

International Journal of Molecular Sciences Review Three-Dimensional Structures of Carbohydrates and Where to Find Them Sofya I. Scherbinina 1,2,* and Philip V. Toukach 1,* 1 N.D. Zelinsky Institute of Organic Chemistry, Russian Academy of Science, Leninsky prospect 47, 119991 Moscow, Russia 2 Higher Chemical College, D. Mendeleev University of Chemical Technology of Russia, Miusskaya Square 9, 125047 Moscow, Russia * Correspondence: [email protected] (S.I.S.); [email protected] (P.V.T.) Received: 26 September 2020; Accepted: 16 October 2020; Published: 18 October 2020 Abstract: Analysis and systematization of accumulated data on carbohydrate structural diversity is a subject of great interest for structural glycobiology. Despite being a challenging task, development of computational methods for efficient treatment and management of spatial (3D) structural features of carbohydrates breaks new ground in modern glycoscience. This review is dedicated to approaches of chemo- and glyco-informatics towards 3D structural data generation, deposition and processing in regard to carbohydrates and their derivatives. Databases, molecular modeling and experimental data validation services, and structure visualization facilities developed for last five years are reviewed. Keywords: carbohydrate; spatial structure; model build; database; web-tool; glycoinformatics; structure validation; PDB glycans; structure visualization; molecular modeling 1. Introduction Knowledge of carbohydrate spatial (3D) structure is crucial for investigation of glycoconjugate biological activity [1,2], vaccine development [3,4], estimation of ligand-receptor interaction energy [5–7] studies of conformational mobility of macromolecules [8], drug design [9], studies of cell wall construction aspects [10], glycosylation processes [11], and many other aspects of carbohydrate chemistry and biology. Therefore, providing information support for carbohydrate 3D structure is vital for the development of modern glycomics and glycoproteomics. As result of growing interest to glycoprofiling, glycan microarrays, carbohydrate active enzymes (CAZy) and glycan-binding proteins (GBP) which are involved in biological processes, several major international projects (e.g., GlySpace [12], GlyCosmos [13], Glycomics@ExPASy [14], GlyGen [15], JCGGDB [16], Glytoucan [17], MIRAGE [18], CFG [19], RINGS [20], GLIC (https://glic.glycoinfo.org/), SysGlyco (https://sysglyco.org/)) were launched to integrate variety of data produced by glycobiological research. The main goal of existing glycoinformatics projects is to provide versatile resources with user-friendly access helpful for disease diagnostics [21,22], glycobioinformatics studies [23], glycosylation site prediction [24], CAZy activity prognosis [25,26] and other applications. Appending of structural repositories with 3D structural data opens the way for computational glycobiology and modeling of carbohydrate structures at atomic resolution. Design of novel workflows and techniques to connect carbohydrate spatial structure modes and experimental data with verification, processing, analysis and deposition of associated data has gained increased popularity in glycoscience community [27]. A Carbohydrate Structure Database (CSDB, [28]) module for carbohydrate 3D structure modeling is a demonstrative example of 3D structural data integration facilities (as a database) combined with dedicated interface (as a glycoinformatics project). Further details on CSDB 3D facilities are discussed below. Int. J. Mol. Sci. 2020, 21, 7702; doi:10.3390/ijms21207702 www.mdpi.com/journal/ijms Int. J. Mol. Sci. 2020, 21, x FOR PEER REVIEW 2 of 48 Int. J. Mol. Sci. 2020, 21, 7702 2 of 46 integration facilities (as a database) combined with dedicated interface (as a glycoinformatics project). Further details on CSDB 3D facilities are discussed below. The typical types of knowledge about a carbohydrate 3D structure include (Figure1): The typical types of knowledge about a carbohydrate 3D structure include (Figure 1): Primary structure (atom connectivity); • Primary structure (atom connectivity); Monosaccharide ring conformation; • Monosaccharide ring conformation; • Rotational states of inter-residue and exocyclic linkages and their energies; • Rotational states of inter-residue and exocyclic linkages and their energies; • Ring puckering and transitions of glycosidic linkage conformation on a time scale; • Ring puckering and transitions of glycosidic linkage conformation on a time scale; • Large-scaleLarge-scale spatialspatial arrangementarrangement (tertiary(tertiary structure).structure). • Figure 1.1. Typical componentscomponents of aa carbohydratecarbohydrate 3D3D structurestructure exemplifiedexemplified onon sucrose:sucrose: (a) primaryprimary structurestructure (in(in SymbolSymbol Nomenclature Nomenclature for for Glycans Glycans (SNFG)); (SNFG)); (b) ( superimposedb) superimposed conformational conformational states states and c Cremer–Popleand Cremer–Pople diagram; diagram; ( ) conformational (c) conformational space of a two-torsionspace of a glycosidic two-torsion linkage glycosidic (Ramachandran linkage plot); (d) transitions of glycosidic dihedrals. (Ramachandran plot); (d) transitions of glycosidic dihedrals. Herein we focus on the important aspects of carbohydrate 3D structure availability to researchers: Herein we focus on the important aspects of carbohydrate 3D structure availability to structural repositories; glycoinformatics tools and workflows to assist structure building, modeling and researchers: structural repositories; glycoinformatics tools and workflows to assist structure erroneous molecular geometry data detection and remediation; carbohydrate 3D structure presentation building, modeling and erroneous molecular geometry data detection and remediation; and visualization methods. carbohydrate 3D structure presentation and visualization methods. 2. Structural Databases 2. Structural Databases Structural databases make significant contribution to bringing information technologies to glycoscienceStructural [29 databases]. With no make focus significant on spatial structure,contribution glycan to bringing databases information and online toolstechnologies have been to recentlyglycoscience reviewed [29]. With [30–32 no]. focus Depositing on spatial huge struct numberure, ofglycan carbohydrates databases withand online detailed tools data have for eachbeen entry,recently databases reviewed are [30–32]. valuable Depositing sources of structuralhuge number information, of carbohydrates biological with assignments, detailedreferences data for each and externalentry, databases links. Structural are valuable data are sources often accompaniedof structural byinformation, original and biological sometimes assignments, assigned experimental references observables:and external NMRlinks. spectra,Structural HPLC data and are MS often profiles, accompanied etc. The servicesby original built and on topsometimes of the databases assigned canexperimental include 3D observables: structure simulation, NMR spec validation,tra, HPLC and MS storage. profiles, A viewpoint etc. The services of the authors built on at top the of ideal the integrationdatabases can of datainclude resources 3D structure and services simulation, in glycoinformatics validation, and is storage. summarized A viewpoint in Figure of2 .the A subjectauthors of at thisthe ideal review integration is databases of data providing resources theoretical and services or empirical in glycoinformatics 3D structures is of summarized carbohydrates in andFigure related 2. A data-miningsubject of this tools. review is databases providing theoretical or empirical 3D structures of carbohydrates and related data-mining tools. Int. J. Mol. Sci. 2020, 21, 7702 3 of 46 Int. J. Mol. Sci. 2020, 21, x FOR PEER REVIEW 3 of 48 Figure 2.2.Networking Networking between between glycoinformatics glycoinformatics projects projec and relatedts and servicesrelated that services promotes that achievement promotes achievementof data integration of data in glycomics.integration Reproduced in glycomics. with Reproduced permission with from [permission29], © 2020 from Wiley-VCH [29], © Verlag 2020 Wiley-VCHGmbH & Co. Verlag KGaA, GmbH Weinheim. & Co. KGaA, Weinheim. The majoritymajority ofof existingexisting repositories repositories for for carbohydrate carbohydrate 3D 3D structures structures offer offer open-access open-access data data via web via webinterface. interface. Deposited Deposited datasets datasets can be representedcan be repres byented glycoproteins, by glycoproteins, protein-carbohydrate protein-carbohydrate complexes, poly-complexes, and oligosaccharides poly- and oligosaccharides with 3D structure with experimentally3D structure experimentally resolved or specified resolved by or means specified of NMR, by X-raymeans crystallography,of NMR, X-ray cryoEM,crystallography, small angle cryoEM, X-ray sm scattering,all angle etc.X-ray [27 ].scattering, Several databasesetc. [27]. suchSeveral as databasesGLYCAM-Web, such EK3D,as GLYCAM-Web, 3DSDSCAR, GlycoMapsDBEK3D, 3DSDSCAR contain, GlycoMapsDB data from molecular contain dynamicsdata from simulations. molecular Wedynamics have also simulations. mentioned databasesWe have featuringalso mentioned information databases on protein featuring structures information involving carbohydrate on protein structuresmoiety in termsinvolving of glycosylation carbohydrate (as post-translationalmoiety in terms modification, of glycosylation dbPTM), (as carbohydrate post-translational

Three-Dimensional Structures of Carbohydrates and Where to Find Them

Zebrafish Disease Models to Study the Pathogenesis of Inherited Manganese Transporter Defects and Provide A

Uniprot at EMBL-EBI's Role in CTTV

Glycomics Goes Visual and Interactive

Immunological Approaches to Biomass Characterization and Utilization

The ELIXIR Core Data Resources: Fundamental Infrastructure for The

Bioinformatics Study of Lectins: New Classification and Prediction In

Toolboxes for a Standardised and Systematic Study of Glycans

Pathogenicity and Selective Constraint on Variation Near Splice Sites

Webnetcoffee

The Biogrid Interaction Database

Viroinformatics Investigation of B-Cell Epitope Conserved Region in SARS

Designing Tools for Studying the Dynamic Glycome John F