Model Answer B. Sc. 6Th Semester

nd M. Sc. 2 Semester 2017 Bioinformatics Compiled by Mr. Nitin Swamy Asst. Prof. Department of Biotechnology 1 Page Department of Biotechnology, St. Aloysius College (Autonomous), Jabalpur nd M. Sc. 2 Semester 2017 Previous years questions asked. 1. What is bioinformatics? Describe its basic needs for bioinformatics application in biological prospects. 2. Enlist various applications of Bioinformatics in Biotechnology. 3. Explain the role of NCBI database in biotechnology? What are other bioinformatics softwares used for various biotechnological applications? 4. Define Bioinformatics. Discuss in brief major applications of bioinformatics in biotechnology. 5. Write the full form of BLAST. Bioinformatics Biological data are being produced at a phenomenal rate. For example as of August 2000, the GenBank repository of nucleic acid sequences contained 8,214,000 entries and the SWISS-PROT database of protein sequences contained 88,166. On average, these databases are doubling in size every 15 months. In addition, since the publication of the H. influenzae genome, complete sequences for over 40 organisms have been released, ranging from 450 genes to over 100,000. Add to this the data from the myriad of related projects that study gene expression, determine the protein structures encoded by the genes, and detail how these products interact with one another, and we can begin to imagine the enormous quantity and variety of information that is being produced. As a result of this surge in data, computers have become indispensable to biological research. Such an approach is ideal because of the ease with which computers can handle large quantities of data and probe the complex dynamics observed in nature. Bioinformatics, the subject is often defined as the application of computational techniques to understand and organize the information associated with biological macromolecules. This unexpected union between the two subjects is largely attributed to the fact that life itself is an information technology; an organism’s physiology is largely determined by its genes, which at its most basic can be viewed as digital information. 2 Page Department of Biotechnology, St. Aloysius College (Autonomous), Jabalpur nd M. Sc. 2 Semester 2017 Bioinformatics - a definition (Molecular) bio – informatics: bioinformatics is conceptualising biology in terms of molecules (in the sense of physical chemistry) and applying "informatics techniques" (derived from disciplines such as applied maths, computer science and statistics) to understand and organise the information associated with these molecules, on a large scale. In short, bioinformatics is a management information system for molecular biology and has many practical applications. -As submitted to the Oxford English Dictionary Aims of bioinformatics The aims of bioinformatics are threefold:- 1. First, at its simplest bioinformatics organizes data in a way that allows researchers to access existing information and to submit new entries as they are produced, eg., the Protein Data Bank for 3D macromolecular structures . While data managing is an essential task, the information stored in these databases is essentially useless until analyzed. Thus the purpose of bioinformatics extends much further. 2. The second aim is to develop tools and resources that aid in the analysis of data. For example, having sequenced a particular protein, it is of interest to compare it with previously characterized sequences. This needs more than just a simple text-based search and programs such as FASTA and PSI-BLAST must consider what comprises a biologically significant match. Development of such resources dictates expertise in computational theory as well as a thorough understanding of biology. 3. The third aim is to use these tools to analyze the data and interpret the results in a biologically meaningful manner. Traditionally, biological studies examined individual systems in detail, and frequently compared those with a few that are related. In bioinformatics, we can now conduct global analyses of all the available data with the aim of uncovering common principles that apply across many systems and highlight novel features. 3 Page Department of Biotechnology, St. Aloysius College (Autonomous), Jabalpur nd M. Sc. 2 Semester 2017 BIOLOGICAL DATABASES As biology has increasingly turned into a data-rich science, the need for storing and communicating large datasets has grown tremendously. The obvious examples are the nucleotide sequences, the protein sequences, and the 3D structural data. Bioinformatics is the application of Information technology to store, organize and analyze the vast amount of biological data which is available in the form of sequences and structures of proteins (the building blocks of organisms) and nucleic acids (the information carrier). The biological information of nucleic acids is available as sequences while the data of proteins is available as sequences and structures. Sequences are represented in single dimension where as the structure contains the three dimensional data of sequences. Sequences and structures are only among the several different types of data required in the practice of the modern molecular biology. Other important data types includes metabolic pathways and molecular interactions, mutations and polymorphism in molecular sequences and structures as well as organelle structures and tissue types, genetic maps, physiochemical data, gene expression profiles, two dimensional DNA chip images of mRNA expression, two dimensional gel electrophoresis images of protein expression, data A biological database is a collection of data that is organized so that its contents can easily be accessed, managed, and updated. There are two main functions of biological databases: •Make biological data available to scientists. o As much as possible of a particular type of information should be available in one single place (book, site, and database). Published data may be difficult to find or access and collecting it from the literature is very time- consuming. And not all data is actually published explicitly in an article (genome sequences!). •To make biological data available in computer-readable form. o Since analysis of biological data almost always involves computers, having the data in computer-readable form (rather than printed on paper) is a necessary first step. Data Domains •Types of data generated by molecular biology research: – Nucleotide sequences (DNA and mRNA) 4 – Protein sequences Page Department of Biotechnology, St. Aloysius College (Autonomous), Jabalpur nd M. Sc. 2 Semester 2017 – 3-D protein structures – Complete genomes and maps When Sanger first discovered the method to sequence proteins, there was a lot of excitement in the field of Molecular Biology. Initial interest in Bioinformatics was propelled by the necessity to create databases of biological sequences. Biological databases can be broadly classified into sequence and structure databases. Sequence databases are applicable to both nucleic acid sequences and protein sequences, whereas structure database is applicable to only Proteins. The first database was created within a short period after the Insulin protein sequence was made available in 1956. Incidentally, Insulin is the first protein to be sequenced. The sequence of Insulin consisted of just 51 residues (analogous to alphabets in a sentence) which characterize the sequence. While the initial databases of protein sequences were maintained at the individual laboratories, the development of a consolidated formal database known as SWISS-PROT protein sequence database was initiated in 1986 which now has about 70,000 protein sequences from more than 5000 model organisms, a small fraction of all known organisms. These huge varieties of divergent data resources are now available for study and research by both academic institutions and industries. These are made available as public domain information in the larger interest of research community through Internet (www.ncbi.nlm.nih.gov) Databases in general can be classified in to primary, secondary and composite databases. 1. A primary database contains information of the sequence or structure alone. Examples of these include Swiss-Prot & PIR for protein sequences, GenBank & DDBJ for Genome sequences and the Protein Databank for protein structures. 2. A secondary database contains derived information from the primary database. A secondary sequence database contains information like the conserved sequence, signature sequence and active site residues of the protein 5 families arrived by multiple sequence alignment of a set of related proteins. A Page Department of Biotechnology, St. Aloysius College (Autonomous), Jabalpur nd M. Sc. 2 Semester 2017 secondary structure database contains entries of the PDB in an organized way. These contain entries that are classified according to their structure like all alpha proteins, all beta proteins, etc. These also contain information on conserved secondary structure motifs of a particular protein. Some of the secondary database created and hosted by various researchers at their individual laboratories includes SCOP, developed at Cambridge University; CATH developed at University College of London, PROSITE of Swiss Institute of Bioinformatics, eMOTIF at Stanford. 3. Composite database amalgamates a variety of different primary database sources, which obviates the need to search multiple resources. Different composite database use different primary database and different criteria in their search

Model Answer B. Sc. 6Th Semester

(12) Patent Application Publication (10) Pub. No.: US 2003/0211987 A1 Labat Et Al

UNIVERSITY of CALIFORNIA, SAN DIEGO Use Solid K-Mers In

Developing Bioinformatics Computer Skills.Pdf

The Scientist :: Blast, Aug. 29, 2005 09/18/2005 04:42 PM

Msc THESIS Genetic Sequence Alignment on a Supercomputing Platform

Computation Resources for Molecular Biology: a Special Issue

Coding Sequences: a History of Sequence Comparison Algorithms As a Scientiªc Instrument

STUDY of the RELATIONSHIP BETWEEN Mus Musculus PROTEIN SEQUENCES and THEIR BIOLOGICAL FUNCTIONS a Thesis Presented to the Gradua

K-Mulus: a Database-Clustering Approach to Protein BLAST in the Cloud

In Its Most Basic Form a Sequence Alignment Is Simply Comparing Two Or More Sequences by Searching for Character Patterns and Other Similarities

Thesis by Submitted in Partial Fulfillment of the Requirements For

Scalable Parallel Algorithms for Genome Analysis by Evangelos Georganas a Dissertation Submitted in Partial Satisfaction Of