The Rising Field of Bioinformatics
Total Page:16
File Type:pdf, Size:1020Kb
Serena Hughes Case Study Spring 2019 The Rising Field of Bioinformatics Overview. Bioinformatics is an up-and-coming field within the intersection of biology and computer science. Biology and computer science were initially thought of separately, but with technological advances it soon became clear that there was an overlap where computer science algorithms could be used to further biological discoveries and biological systems could be used as a basis for computational models. The growth of research combining the two has led to the recent growth of the field of bioinformatics. Currently, bioinformatics is still relatively small, often housed as a specialization of biology, computer science, or other departments at universities. In 2009, only approximately 17% of bioinformatics programs were part of a bioinformatics department, with most programs being housed under biology or biomedical departments.[7] While it has been ten years since this data was collected, even in 2018 U.S. News’ ranking of bioinformatics programs included only 6 schools total.[11] Bioinformatics is a small and growing field, so the importance of the organization of bioinformatics is also growing. What is Being Organized? Bioinformatics has become increasingly interdisciplinary as it has grown, building on other scientific and mathematical fields. These subfields and the types of research they enable are the resources being organized. In organizing these resources, we will shed light on the structure of the field of bioinformatics as a whole. Why is it Being Organized? The term “bioinformatics” has been in use since the 1970s. Around that time, bioinformatics was “defined as the study of informatic processes in biotic systems”. [4] However, after 1990 bioinformatics become known as referring to “computational methods for comparative analysis of genome data”.[4] Currently, there is a lack of consistency regarding what subfields bioinformatics is comprised of, perhaps partly due to the shift of bioinformatics’ definition. Today, Merriam Webster defines bioinformatics as “the collection, classification, storage, and analysis of biochemical and biological information using computers especially as applied to molecular genetics and genomics”. [3] While this definition seems to capture the essence of the field, some aspects of bioinformatics, such as evolutionary research, seem unaccounted for. Some sources suggest that chemistry, biology, and biochemistry are all subfields which comprise bioinformatics. Others understandably combine these into one subfield. [2], [6], [10] This contrasts with the field of cognitive science, which most sources seem to agree has the subfields of linguistics, psychology, philosophy, computer science, neuroscience, and anthropology. [5] How Much is it Being Organized? The inconsistency in current definitions of the subfields of bioinformatics seems to stem from varying granularity. In the earlier example, we saw how the fields of biology, chemistry and biochemistry were combined in a different description as one subfield. This demonstrates how the level of granularity plays a key role in how many subfields one considers. As such, while bioinformatics can certainly be partitioned into many specific fields, in this paper we seek to find broader subfields which encompass the research of many of these smaller fields. A diagram of subfields of bioinformatics by Boston University lists fourteen different such fields.[6] In the organization that I present here, we consider three. When is it Being Organized? When bioinformatics first emerged as a field, it was not defined based on subfields. Rather, it was intended to be a field that was parallel to biophysics and biochemistry, using information processing to further understand living systems.[4] With a lack of set categories from the origin of bioinformatics, universities tend to organize bioinformatics upon hiring professors in the field and journals organize bioinformatics in a way which governs the papers submitted. As such, the categories chosen by universities and journals to organize bioinformatics do not necessarily align. Professors at research universities may choose what projects they are interested in, and the university uses this to create categories of subfields within bioinformatics. This can create bias and irregularity in the organization. For example, UCLA lists a category under bioinformatics called “Foundations” which is not seen anywhere else. [8] This refers to research which is methodological in nature and cuts across multiple core areas. Such a categorization does not exist at other universities or in research journals. On the other hand, the journal Bioinformatics has a set of ten categories for researchers to choose from when submitting their papers. Some of these categories, such as “systems biology”, are a common way of organizing bioinformatics and appear in university organizing systems as well. Others, such as “data and text mining,” are not usually included in breakdowns of bioinformatics. [9] How or By Whom is it Organized? As we discussed previously, individual universities and research journals have their own methods of organizing bioinformatics currently. While their organizing systems are not identical, they do share many features. By combining these organizing systems, I have devised an artifact which explains the partitioning of bioinformatics by considering the broad subfields of data science, statistics, and chemistry. Combined with biology and computer science, these three subfields account for the way bioinformatics is organized within both the world of research universities and the world of scientific publications. The overlap of chemistry with computer science and biology contains the wet-lab research that is part of bioinformatics, particularly research involving the interactions of chemicals in organisms. For instance, the analysis of macromolecule structure would fall under this category. The overlap of statistics with computer science and biology includes the probabilistic analysis on evolutionary trees and prediction for future mutations. This includes phylogenetic research on taxonomies of living things. The overlap of data science with computer science and biology consists of the development of databases to store, access, and analyze the large amounts of data collected. Storing and analyzing genetic data relating to certain genes and using the database to compare them with similar genes in other organisms would fall under this. Within these subfields, there is also overlap. Bioinformatics focused on chemistry and statistics together includes the field of epigenetics, where scientists can study the probability of genes expressing a certain trait given the extracted genetic sequence. Similarly, within the overlap of statistics and data science, scientists perform research on population analysis, using probabilities and large amounts of data of the relationship between genotypes and phenotypes to determine how likely a feature is in a given population. Last, the overlap between chemistry and data science includes research on overall systems. This involves using data collected from individual regions or genes and analyzing how a change at one site might cause a change at a different location, indicating the two may be part of a biological system. There is also an overlap between all three of these subfields, where chemistry, statistics, and data science all play a role in such research. An example of a project in this overlap is genome sequencing, where chemistry is used to extract the DNA, a database is used to store and analyze the nucleotide sequence, and statistics is used to determine how likely it is that a certain gene is responsible for a hypothesized function based on previous research. Other Considerations? One consideration not previously mentioned is the similarity between the field of computational biology and bioinformatics. Many scientists seem to consider computational biology to be synonymous for bioinformatics, while others think of bioinformatics as being more engineering driven and computational biology as more scientifically driven[1]. Even with this interpretation, there is extensive overlap between the two fields and it can be difficult to determine which field a paper or project should belong to. ______________________________________________________________________________ [1] Altman, Russ. “Bioinformatics & Computational Biology = Same? No.” Building Confidence., 18 Feb. 2009, rbaltman.wordpress.com/2009/02/18/bioinformatics- computational-biology-same-no/. [2] Can, Tolga. “Introduction to Bioinformatics.” Methods in Molecular Biology (Clifton, N.J.), U.S. National Library of Medicine, 2014, www.ncbi.nlm.nih.gov/pubmed/24272431. [3] Gove, Philip Babcock. “Bioinformatics.” Webster's Third New International Dictionary of the English Language, Unabridged: a Merriam Webster, Merriam-Webster, 2002. [4] Hogeweg, Paulien. “The roots of bioinformatics in theoretical biology.” PLoS computational biology vol. 7,3 (2011): e1002021. doi:10.1371/journal.pcbi.1002021 [5] Thagard, Paul, Cognitive Science, The Stanford Encyclopedia of Philosophy (Fall 2008 Edition), Edward N. Zalta (ed.). [6] “Bioinformatics | Boston University.” Bioinformatics RSS, www.bu.edu/bioinformatics/. [7] “ Bioinformatics Programs in the United States.” University of North Carolina, Information Technology Services, June 2009, ils.unc.edu/informatics_programs/doc/Bioinformatics_2008.html. [8] “Faculty.” UCLA, bioinformatics.ucla.edu/faculty/. [9] “Instructions to Authors.” Bioinformatics, Oxford University Press, academic.oup.com/bioinformatics/pages/instructions_for_authors. [10] “MS Bioinformatics.” National University of Sciences and Technology, www.nust.edu.pk/INSTITUTIONS/Centers/RCMS/ap/pg/MSBioinformatics. [11] “The Best Genetics / Genomics / Bioinformatics Programs in America, Ranked.” Best Science Schools, U.S. News & World Report, 2018, www.usnews.com/best- graduate-schools/top-science-schools/genetics-rankings. .