Is It Time for Cognitive Bioinformatics?
Total Page:16
File Type:pdf, Size:1020Kb
g in Geno nin m i ic M s ta & a P Lisitsa et al., J Data Mining Genomics Proteomics 2015, 6:2 D r f o Journal of o t e l DOI: 10.4172/2153-0602.1000173 o a m n r i c u s o J ISSN: 2153-0602 Data Mining in Genomics & Proteomics Review Article Open Access Is it Time for Cognitive Bioinformatics? Andrey Lisitsa1,2*, Elizabeth Stewart2,3, Eugene Kolker2-5 1The Russian Human Proteome Organization (RHUPO), Institute of Biomedical Chemistry, Moscow, Russian Federation 2Data Enabled Life Sciences Alliance (DELSA Global), Moscow, Russian Federation 3Bioinformatics and High-Throughput Data Analysis Laboratory, Seattle Children’s Research Institute, Seattle, WA, USA 4Predictive Analytics, Seattle Children’s Hospital, Seattle, WA, USA 5Departments of Biomedical Informatics & Medical Education and Pediatrics, University of Washington, Seattle, WA, USA Abstract The concept of cognitive bioinformatics has been proposed for structuring of knowledge in the field of molecular biology. While cognitive science is considered as “thinking about the process of thinking”, cognitive bioinformatics strives to capture the process of thought and analysis as applied to the challenging intersection of diverse fields such as biology, informatics, and computer science collectively known as bioinformatics. Ten years ago cognitive bioinformatics was introduced as a model of the analysis performed by scientists working with molecular biology and biomedical web resources. At present, the concept of cognitive bioinformatics can be examined in the context of the opportunities represented by the information “data deluge” of life sciences technologies. The unbalanced nature of accumulating information along with some challenges poses currently intractable problems for researchers. The solutions to these problems at the micro-and macro-levels are considered with regards to the role of cognitive approaches in the field of bioinformatics. Keywords: Digital medicine; Exposome; Omics; Biomarkers stream of the information produced by “omics“ is a challenge to the Deluge of Life Sciences Big Data technology of data storage and processing [6]. The unbalanced nature of accumulating information redoubles the challenges of the 5Vs of Big Modern molecular biology studies the function and interactions Data: veracity/reproducibility [7], variety, value and, to a lesser extent, of biological molecules, processes and systems. Molecular biology velocity and volume [8]. primarily focuses on DNA, RNA, proteins and metabolites utilizing four corresponding „omics“ technologies: genomics, transcriptomics, The challenge of interpreting the genome is currently being tackled proteomics, and metabolomics. Advances in sequencing technologies in many labs by using brute-force algorithms, making exhaustive enable researchers not only to decipher the human genome, but also to searches for genetic variants, the laborious matching of scattered uncover features of epigenetic regulation. The level of gene expression IDs and hand-curating datasets. However, the data assembled from is the object of transcriptomics; proteomics provides information all sources are left to languish in repositories “till better times“ until about the diversity of the protein molecules encoded by the genome; the development of bioinformatics grows to the capability of gaining metabolomics focuses on studying metabolites. knowledge from the data sets accumulated. The technical capabilities of modern „omics“ significantly exceed Challenges and Opportunities the capabilities of researchers to meaningfully process incoming experimental information. A typical microarray study can generate Is there a background in modern bioinformatics to beat the challenge the data on 500,000 single-nucleotide substitutions of DNA that of current data deluge? In 2004, Kuchar et al. [9] introduced cognitive differentiate the genome of one person from another [1]. Transcriptome bioinformatics as an instrument of scenario analysis performed analysis allows measuring the activity levels of each of 20,000 human by scientists working with molecular biology and biomedical protein-coding genes, with the signal of each gene detected by two or web resources. The term “Big Data” is mainly used to describe a three different sequences [2]. An in-depth study of the proteome can massive volume of both structured and unstructured data that is quantify up to 10,000 protein products of the genome [3] and elucidate too large or complex to process it using traditional database and structural modifications of at least one third of them [4]. software techniques. Bioinformatics accumulates the data primarily borrowed from the statistics, machine learning and pattern recognition Current international projects are designed not only to read the methods (except for the specific tasks of molecular modeling and DNA sequence, but also to decode the meaningful message from this molecule. The question of how to re-interpret the genome in the context of harmful mutations and diseases is of great interest to biomedical researchers [5]. Such projects that are the sequels of the *Corresponding author: Lisitsa Andrey, The Russian Human Proteome Human Genome Project include: HapMap, EnCode, 1000 Genomes, Organization (RHUPO), Institute of Biomedical Chemistry, Moscow, Cancer Genome, Human Proteome and a number of other large- Russian Federation, Tel: +7(499)246-69-80; Fax: +7(499)245-08-57; E-mail: [email protected] scale initiatives. The projects like the NIH Brain Initiative, the Human Microbiome Project are generating the data that will build a framework Received June 05, 2015; Accepted July 07, 2015; Published July 14, 2015 for understanding the human condition. These projects combine Citation: Lisitsa A, Stewart E, Kolker E (2015) Is it Time for Cognitive Bioinformatics? the technical and intellectual capacities of numerous countries into J Data Mining Genomics Proteomics 6: 173. doi:10.4172/2153-0602.1000173 international efforts. Copyright: © 2015 Andrey L, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted There is no lack of baseline data on the state of molecular systems use, distribution, and reproduction in any medium, provided the original author and within a human body. On the contrary, the exponentially growing source are credited. J Data Mining Genomics Proteomics Volume 6 • Issue 2 • 1000173 ISSN: 2153-0602 JDMGP, an open access journal Citation: Lisitsa A, Stewart E, Kolker E (2015) Is it Time for Cognitive Bioinformatics? J Data Mining Genomics Proteomics 6: 173. doi:10.4172/2153- 0602.1000173 Page 2 of 3 comparative genomics). Thus, the algorithms of digital data processing Although it is possible for a clinician to get an enormous cannot be transferred to biological and biomedical outcomes without amount of the data on internal molecular systems of the body, he/she significant expert adaptation. It is not surprising, for example, that the usually lacks the detailed information with respect to the exposome. modern practice of clinical diagnostics has hardly been updated with Generally, the exposome is captured as a personal health record new biomarkers despite the petabytes of the data generated. Potential based on often-faulty memories of patients. Recent developments in solutions can lie in an analysis of network interactions from the micro- modern telecommunication sphere have resulted in an opportunity of level (biological molecules, e.g., [10]) to the macro-level (exposome, collecting the exposome data. First of all, it can be periodically received [11]). from our communicators such as geolocations [18]. With respect to the health characteristics of an individual, the sequence of geolocation Small World of Genes and Proteins is no less informative than the DNA sequence in the genome [19]. Socially relevant diseases affecting almost all people are multigenic. The information kept in electronic financial systems (for example, the Thus, association studies are to be conducted to establish the link information on purchases) along with Google searches and Facebook between multiple variations and the risk of disease development (GWAS activity can be used to accurately build the behavioral profile of an - genome wide association studies). Sometimes, it is relatively easy to individual. Some wrist trackers can measure physical activity, food/ determine variation/disease relationships in controlled environments. drug intake and other health parameters and generate such profiles as However, to establish a relationship between a set of variations and a well. To avoid faulty memories these profiles can then be shared with disease is rather difficult [12] due to the genome variability interfering a physician for a more accurate accounting of the patient’s exposome. with environmental factors. In addition, these profiles can be shared with others or connected to a community of like-minded (www.myfitnesspal.com [20], www. It is possible to reduce the level of genomic variability at the level mynetdiary.com [21]). Such detailed profiles can also be valuable for of proteins. The proteins, as the final products of the genome, have evaluating drug side effects as well [19]. more direct influence on the condition of the body and can serve as reliable diagnostic markers. This issue has been thoroughly studied in Geo-data are becoming a source of the prediction of behavioral the field of proteomics for the last 15 years. However, clinically useful profiles leading to the development