Journal of Science Biological Science

A. Elumalai. / Journal of Science / Vol 2 / Issue 2 / 2012 / 71-80. e ISSN 2277 - 3290 Print ISSN 2277 - 3282 Journal of science Biological Science www.journalofscience.net AN OVERVIEW OF BIOINFORMATICS A. Elumalai* Anurag Pharmacy College, Ananthagiri (V), Kodad (M), Nalgonda (Dt), Andhra Pradesh, India, 508 206. ABSTRACT Bioinformatics is a branch of biological science which deals with the study of methods for storing, retrieving and analyzing biological data, such as nucleic acid (DNA/RNA) and protein sequence, structure, function, pathways and genetic interactions. It generates new knowledge that is useful in such fields as drug design and development of new software tools to create that knowledge. This review deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, structural biology, software engineering, data mining, image processing, modeling and simulation, discrete mathematics, control and system theory, circuit theory, and statistics. Keywords: Bioinformatics, Nucleic acid, Algorithms. INTRODUCTION Building on the recognition of the importance of includes nucleotide and amino acid sequences, protein information transmission, accumulation and processing in domains, and protein structures. The actual process of biological systems in 1978. Paulien Hogeweg, coined the analyzing and interpreting data is referred to as termed Bioinformatics to refer to the study of information computational biology. Important sub-disciplines within processes in biotic systems. This definition placed bioinformatics and computational biology include: bioinformatics as field parallel to biophysics and the development and implementation of tools that biochemistry. Examples of relevant biological enable efficient access to, and use and management information processes studied in the early days of of, various types of information. bioinformatics are the formation of complex social the development of new algorithms (mathematical interaction structures by simple behavioral rules, and the formulas) and statistics with which to assess information accumulation and maintenance in models of relationships among members of large data sets. For prebiotic evolution. example, methods to locate a gene within a sequence, At the beginning of the genomic revolution, the predict protein structure and/or function, and cluster term bioinformatics was re-discovered to refer to the protein sequences into families of related sequences. creation and maintenance of a database to store biological The primary goal of bioinformatics is to increase information such as nucleotide sequences and amino acid the understanding of biological processes. What sets it sequences. Development of this type of database involved apart from other approaches, however, is its focus on not only design issues but the development of complex developing and applying computationally intensive interfaces whereby researchers could access existing data techniques to achieve this goal. Examples include: pattern as well as submit new or revised data. recognition, data mining, machine learning algorithms, In order to study how normal cellular activities and visualization. Major research efforts in the field are altered in different disease states, the biological data include sequence alignment, gene finding, genome must be combined to form a comprehensive picture of assembly, drug design, drug discovery, protein structure these activities. Therefore, the field of bioinformatics has alignment, protein structure prediction, prediction of gene evolved such that the most pressing task now involves the expression and protein–protein interactions, genome-wide analysis and interpretation of various types of data. This Corresponding Author:- A. Elumalai Email:- [email protected] 71 A. Elumalai. / Journal of Science / Vol 2 / Issue 2 / 2012 / 71-80. association studies and the modeling of evolution. impractical to analyze DNA sequences manually. Today, Interestingly, the term bioinformatics was coined computer programs such as BLAST are used daily to before the genomic revolution. Paulien Hogeweg and Ben search sequences from more than 260 000 organisms, Hesper defined the term in 1978 to refer to the study of containing over 190 billion nucleotides. These programs information processes in biotic systems. This definition can compensate for mutations (exchanged, deleted or placed bioinformatics as a field parallel to biophysics or inserted bases) in the DNA sequence, to identify biochemistry (biochemistry is the study of chemical sequences that are related, but not identical. A variant of processes in biological systems). However, its primary this sequence alignment is used in the sequencing process use since at least the late 1980s has been to describe the itself. The so-called shotgun sequencing technique (which application of computer science and information sciences was used, for example, by The Institute for Genomic to the analysis of biological data, particularly in those Research to sequence the first bacterial genome, areas of genomics involving large-scale DNA sequencing. Haemophilus influenzae) does not produce entire Bioinformatics now entails the creation and advancement chromosomes. Instead it generates the sequences of many of databases, algorithms, computational and statistical thousands of small DNA fragments (ranging from 35 to techniques and theory to solve formal and practical 900 nucleotides long, depending on the sequencing problems arising from the management and analysis of technology). The ends of these fragments overlap and, biological data. when aligned properly by a genome assembly program, Over the past few decades rapid developments in can be used to reconstruct the complete genome. Shotgun genomic and other molecular research technologies and sequencing yields sequence data quickly, but the task of developments in information technologies have combined assembling the fragments can be quite complicated for to produce a tremendous amount of information related to larger genomes. For a genome as large as the human molecular biology. Bioinformatics is the name given to genome, it may take many days of CPU time on large- these mathematical and computing approaches used to memory, multiprocessor computers to assemble the glean understanding of biological processes. fragments, and the resulting assembly will usually contain Common activities in bioinformatics include numerous gaps that have to be filled in later. Shotgun mapping and analyzing DNA and protein sequences, sequencing is the method of choice for virtually all aligning different DNA and protein sequences to compare genomes sequenced today, and genome assembly them, and creating and viewing 3-D models of protein algorithms are a critical area of bioinformatics research. structures. Another aspect of bioinformatics in sequence There are two fundamental ways of modelling a analysis is annotation. This involves computational gene Biological system (e.g., living cell) both coming under finding to search for protein-coding genes, RNA genes, Bioinformatic approaches [1]. and other functional sequences within a genome. Not all of the nucleotides within a genome are part of genes. Static Within the genomes of higher organisms, large parts of Sequences – Proteins, Nucleic acids and Peptides the DNA do not serve any obvious purpose. This so- Structures – Proteins, Nucleic acids, Ligands called junk DNA may, however, contain unrecognized (including metabolites and drugs) and Peptides functional elements. Bioinformatics helps to bridge the Interaction data among the above entities including gap between genome and proteome projects — for microarray data and Networks of proteins, metabolites example, in the use of DNA sequences for protein identification [2]. Dynamic Systems Biology comes under this category Genome annotation including reaction fluxes and variable concentrations of In the context of genomics, annotation is the metabolites. Multi-Agent Based modelling approaches process of marking the genes and other biological features capturing cellular events such as signalling, transcription in a DNA sequence. The first genome annotation software and reaction dynamics Since the Phage Φ-X174 was system was designed in 1995 by Dr. Owen White, who sequenced in 1977, the DNA sequences of thousands of was part of the team at The Institute for Genomic organisms have been decoded and stored in databases. Research that sequenced and analyzed the first genome of This sequence information is analyzed to determine genes a free-living organism to be decoded, the bacterium that encode polypeptides (proteins), RNA genes, Haemophilus influenzae. Dr. White built a software regulatory sequences, structural motifs, and repetitive system to find the genes (fragments of genomic sequence sequences. A comparison of genes within a species or that encode proteins), the transfer RNAs, and to make between different species can show similarities between initial assignments of function to those genes. Most protein functions, or relations between species (the use of current genome annotation systems work similarly, but molecular systematics to construct phylogenetic trees). the programs available for analysis of genomic DNA, With the growing amount of data, it long ago became such as the GeneMark program trained and used to find 72 A. Elumalai. / Journal of Science / Vol 2 / Issue 2 / 2012 / 71-80. protein-coding genes in Haemophilus influenzae, are in high-throughput

Journal of Science Biological Science

Problem Statement 1

January, 1971 Vol

Information Theory, Evolution, and the Origin of Life

Open Source Bioinformatics:The Intersectionbetween Formal Intellectual Property Laws and User Generated Laws in the Scientific Research Commons

Cancer Letter May 16 1997

THE MANY STEPS and EVOLUTION in the DEVELOPMENT of COMPUTED TOMOGRAPHY TECHNOLOGY and IMAGING METHODS, the QUEST for ENHANCED VISIBILITY the First Fifty Years

Early Days of CT: Innovations (Both Good and Bad)

History and Epistemology of M Olecular Biology and Beyond

The American Mathematical Society

Computed Tomography: an Overview

Development of Computed Tomography

Medical Informatics Defined