A Web-Accessible Platform for Generating Various Molecular
Total Page:16
File Type:pdf, Size:1020Kb
Dong et al. J Cheminform (2016) 8:34 DOI 10.1186/s13321-016-0146-2 SOFTWARE Open Access BioTriangle: a web‑accessible platform for generating various molecular representations for chemicals, proteins, DNAs/RNAs and their interactions Jie Dong1†, Zhi‑Jiang Yao2†, Ming Wen2†, Min‑Feng Zhu3, Ning‑Ning Wang1, Hong‑Yu Miao4, Ai‑Ping Lu5, Wen‑Bin Zeng1 and Dong‑Sheng Cao1,5* Abstract Background: More and more evidences from network biology indicate that most cellular components exert their functions through interactions with other cellular components, such as proteins, DNAs, RNAs and small molecules. The rapidly increasing amount of publicly available data in biology and chemistry enables researchers to revisit interaction problems by systematic integration and analysis of heterogeneous data. Currently, some tools have been developed to represent these components. However, they have some limitations and only focus on the analysis of either small molecules or proteins or DNAs/RNAs. To the best of our knowledge, there is still a lack of freely-available, easy-to-use and integrated platforms for generating molecular descriptors of DNAs/RNAs, proteins, small molecules and their interactions. Results: Herein, we developed a comprehensive molecular representation platform, called BioTriangle, to emphasize the integration of cheminformatics and bioinformatics into a molecular informatics platform for computational biol‑ ogy study. It contains a feature-rich toolkit used for the characterization of various biological molecules and complex interaction samples including chemicals, proteins, DNAs/RNAs and even their interactions. By using BioTriangle, users are able to start a full pipelining from getting molecular data, molecular representation to constructing machine learning models conveniently. Conclusion: BioTriangle provides a user-friendly interface to calculate various features of biological molecules and complex interaction samples conveniently. The computing tasks can be submitted and performed simply in a browser without any sophisticated installation and configuration process. BioTriangle is freely available at http://biotri‑ angle.scbdd.com. Keywords: Molecular descriptors, Molecular representation, Interaction features, Online descriptor calculation, QSAR/QSPR, Cheminformatics Background has been increasingly recognized that a single biologi- Despite the indisputable success of the reductionism cal process often involves complex interactions among approaches in advancing our knowledge and under- a variety of molecules, especially DNA, RNA, proteins standing of individual molecules and their functions, it and small molecules [1, 2]. Systematic investigation and understanding of human interactome (i.e., complex *Correspondence: oriental‑[email protected] networks resulted from numerous interactions among †Jie Dong, Zhi-Jiang Yao and Ming Wen contributed equally to this work nucleotides, proteins, metabolites etc.) is thus becoming 1 School of Pharmaceutical Sciences, Central South University, Changsha, a key research area, which could fundamentally renovate People’s Republic of China Full list of author information is available at the end of the article our thinking on how to develop novel and more efficient © 2016 The Author(s). This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Dong et al. J Cheminform (2016) 8:34 Page 2 of 13 therapeutic or preventive interventions (e.g., the network RNAs have been previously developed, including PRO- medicine concept) [1, 3]. FEAT [41], BioJava [42], PseAAC, propy [43], repDNA In previous studies, particular attention has been paid [44], repRNA [45], protr [46] etc. In the cheminformat- to a variety of molecular interaction networks and their ics field, several open sources or commercial software potential roles in disease mechanism and drug develop- for drug discovery (e.g., QSAR/SAR [47], virtual screen- ment [1, 4–7], including transcriptional and post-tran- ing [48], database search [49], drug ADME/T prediction scriptional regulatory networks [8–10], functional RNA [50, 51]) have been implemented for computing molec- networks [11–13], protein–protein interaction networks ular descriptors of small molecules, including Dragon, [14, 15], and metabolic networks [16, 17]. Consequently, CODESSA, Chemistry Development Kit (CDK) [52], public databases for human-specific molecular interac- chemopy [53], Molconn-Z, OpenBabel [54], Cinfony [55], tion data have been undergoing a rapid growth within the Rcpi [56], Indigo, JOELib, Avogadro and RDKit. How- past decade, such as BIND [18], DIP [19], STITCH [20], ever, all the tools mentioned above only support a limited HPRD [21], TTD [22], DrugBank [23], ChEMBL [24], number of molecular types or descriptors, and they may KEGG [25], BindingDB [26], SuperTarget and Matador not be freely available or easily accessible. To the best of [27], to name a few. However, the heterogeneity of data our knowledge, there is still a lack of freely-available and in such databases poses a significant challenge to their integrated platforms for generating molecular descrip- integration and analysis in practice. In particular, the bio- tors of DNA/RNA, proteins, small molecules and their informatics and the cheminformatics communities have interactions [57]. evolved more or less independently, e.g., with an empha- In this study, we develop a comprehensive molecular sis on macro biomolecules and chemical compounds, representation tool, called BioTriangle, for characteriz- respectively. However, to investigate complex molecular ing various complex biological molecules and pairwise interactions, both biological and chemical knowledge on interactions. More specifically, BioTriangle can calcu- structures and functions of all the involved molecules late a large number of molecular descriptors of chemi- are required, especially in the scenarios of identifying cals from their topology, structural and physicochemical new drug targets and their potential ligands or discover- features of proteins and peptides from their amino acid ing potential biomarkers for complex diseases [28–30]. sequences, and composition and physicochemical fea- Therefore, it is necessary and useful to build informat- tures of DNAs/RNAs from their primary sequences. ics platforms for unified data or knowledge represen- Furthermore, BioTriangle can calculate the interaction tation that can integrate the existing efforts from both descriptors between two individual molecules. To ease communities. the use of the BioTriangle utilities and functionalities, Molecular descriptors are one of the most powerful we also provide users a friendly and uniform web inter- approaches to characterize the biological, physical, and face. For illustration purpose, we use five datasets from chemical properties of molecules and have long been different applications as representative examples to show used in various studies for understanding molecular how BioTriangle can be used in an analytical pipeline. We interactions or drug development [31–34]. In the bio- thus recommend BioTriangle when molecular or interac- informatics and cheminformatics fields, sequence- and tion representation is need for exploring questions con- structure-based constitutional, physicochemical, and cerning structures, functions and interactions of various topological features have been widely used in the devel- molecular data in the context of biomedical studies. opment of computing algorithms for predicting protein structural and functional classes [29], protein–protein Implementation and user interface interactions [35], subcellular locations and peptides of BioTriangle is designed as a web application imple- specific properties [36], drug-target pairs and associa- mented in an open source Python framework (Django) tions [37, 38], meiotic recombination hot spots [39], and for the Graphical User Interface (GUI) and MySQL for nucleosome positioning in genomes [40], etc. Besides its data retrieval. The Nginx + uWSGI architecture is used capability of describing and distinguishing nucleotides, to enable an efficient data exchange between dynamic proteins and small molecules of different structural, data from the server-side and static contents form client- functional and interaction profiles, we further stress that side. By employing this architecture, the balance between molecular descriptor provides a convenient and consist- system resource occupation and computational efficiency ent way of molecular or interaction representation, and is is maintained; the good independence and safety of a thus a suitable choice to meet the needs mentioned in the long time data operation and file access from different previous paragraph. requests are also guaranteed. The JavaScript and jQuery Several bioinformatics packages for computing struc- were employed to help accomplish some complex inter- tural and physicochemical features of proteins or DNAs/ action processes, result visualization and download at Dong et al. J Cheminform (2016) 8:34 Page 3 of 13 front-side.