Standard Operating Procedures

BIOINFORMATICS Standard Operating Procedures for Biological Sequence Analysis dbt star college scheme Department of Computer Science PSGR KRISHNAMMAL COLLEGE FOR WOMEN College of Excellence An Autonomous Institution - Affiliated to Bharathiar University Reaccredited with ‘A’ Grade by NAAC An ISO 9001:2015 Certified Institution Peelamedu, Coimbatore – 641 004 Published by BLUE HILL PUBLISHERS Coimbatore - 641 113, Tamil Nadu, India. Web: www.bluehillpublishers.com Title : Bioinformatics: Standard Operating Procedures for Biological Sequence Analysis Language : English Year : 2018 Author : Department of Computer Science PSGR Krishnammal College for Women, Coimbatore – 641 004, Tamil Nadu, India. ISBN Number : 9788193993613 The Copyright shall be vested with PSGR Krishnammal College for Women All rights reserved. No part of this publication which is material protected by this copyright notice may be reproduced or transmitted or utilized or stored in any form or by any means now known or hereinafter invented, electronic, digital or mechanical, including photocopying, scanning, recording or by any information storage or retrieval system, without prior written permission from the PSGR Krishnammal College for Women, Coimbatore. Information obtained in this book has been published by Blue Hill Publishers and has been obtained by its Authors from sources believed to be reliable are are correct to the best of their knowledge. However, the PSGR Krishnammal College, Publisher and its Authors shall in no event be liable for any errors, omissions or damages arising out of use of this information and specifically disclaim any implied warranties or merchantability or fitness for any particular use. Published By : Blue Hill Publishers, Coimbatore. EDITORS Dr. S. Karpagavalli Dr. C. Arunpriya Mrs. R. Kavitha Mrs A. S. Kavitha Dr. J. VijiGripsy Mrs. N. Deepa Dr. R. Vishnupriya PREFACE Bioinformatics is the combined application of computer science to solve biology-based problems. Massive amounts of raw biological data can be translated into insightful information with the help of computational methods. There is a need of development and application of data-analytical tools, theoretical methods, mathe- matical modeling and software simulation techniques to explore biological systems. Hence, students of Computer Science stream must acquire knowledge on the basic principles of biology and biological data processing skills. In this regard, the Standard Operating Procedure (SOP) presents a collection of fourteen experiments that teach students how to access genome databank, perform sequence alignments, methods to predict primary, secondary and tertiary structure and to use protein visualization tools. The purpose of developing these laboratory procedures by us is to familiarize the Computer stream undergraduate students with practical skills necessary to process biological data. The Standard Operating Pro- cedure for each exercise has been organized as aim of the experiment, materials required, principle and theory, Procedure, Expected observations and outcome, Con- clusion and Interpretation. The unique feature of this SOP is that it explains the principles and theory related to lab experiments in a clear manner. In other words, the standard operating procedure designed by us focus on improving student’s theoretical knowledge and experimental skills by motivating them to learning theory concepts through lab exercises. The students going through this procedure will gain insights into the topics and methods of structural bioinformatics and genome analysis. This SOP provides a platform for the students to evaluate the different approaches, know their advanta- ges and disadvantages as well as where to obtain and how to use them. In a step further, the reader who has gained deep insights in the lab exercises described in the Standard Operating Procedure will be able to develop standard algorithms for their own purposes or even will create awareness on how to modify these algorithms for specific applications with prior knowledge on computing skills. We would like to thank DBT Star College scheme and our management for providing support in bringing out this Standard Operating Procedure. FOREWORD In the current era, huge amount of biological data has been generated using genome sequencing, microarrays, proteomics and functional and structuralgenomics methodology, which promoted a new multidisciplinary approach bioinformatics. Sophisticated computer system theories and computing algorithms have motiva- ted the researchers develop powerful tools for analyzing, predicting, understanding data from gene expression, drug design and otheremerging genomic and proteomic technologies. In a broader sense, this dramatic shift transformed biology from a purely laboratory-based science to an information science as well. The crucial step in this tran- sformation is to train computer science scientists with the knowledge of biological data.This major challenge requires both vision and hard work; vision to set an appro- priate agenda for the computational biologist of the future and hard work to develop a book which provides a base for understanding the biological data and the methods to process the same. Bioinformatics Standard Operating Procedures for Biological Sequence Analysis focuses on learning theoretical concepts and practical components throughwell defi- ned procedures. This manual is specifically written for students and researchers of computer science background. All key areas of bioinformatics such as biological databases, sequence alignment of genes, prediction of protein structure and structural bioinformatics are covered in this Standard Operating Procedure (SOP). This SOP can be praised as it provides a comprehensive and critical exami- nation of the computational methods needed for analyzing DNA, RNA and protein data. This Standard Operating Procedure is a sophisticated learning resource for the students of computer science asit explains the concepts behind the biological information and computational methods to solve the same. The authors of this Standard Operating Procedurehave taken great efforts to explain the principles of biological processing and presented the steps to carry out the exercises in an elaborate manner. The effective explanation on sequence analysis, in-depth and up-to-date coverage of all key topics in bioinformatics make this an ideal SOP for computer science students and for researchers to enrich their knowledge in bioinformatics. Dr.P.Ponmurugan Associate Professor, Department of Botany, Bharathiar University, Coimbatore-46 CONTENT SOP No. Title Page No. 01 Exploration of Resources Available in NCBI and PUBMED 1 02 Retrieval of a Genbank Entry 7 03 Retrieval and Analysis of a Gene Sequence “AF375082” in 13 FASTA Format 04 Finding the Official Symbol, Alias Name, Chromosome Num- 19 ber and ID for Gene using NCBI 05 Retrieval and Analysis of a Protein Sequence from Protein 25 Database 06 Primary Structure Analysis of a Protein 29 07 Secondary Structure Analysis of a Protein 33 08 Tertiary Protein Structure Analysis using RASMOL 39 09 Pair-Wise and Multiple Sequence Alignment Using ClustalW 43 10 Pair-Wise and Multiple Sequence Alignment Using BLAST 49 11 Alignment of Two Sequences and Determination of PAM 55 Scoring Matrix 12 Alignment of Two Sequences and Determination of BLO- 61 SUM Scoring Matrix 13 Similarity Search using BLAST and Interpretation of Results 67 14 Conversion of Gene Sequence into its Corresponding Amino 73 Acid Sequence STANDARD OPERATING PROCEDURES FOR BIOLOGICAL SEQUENCE ANALYSIS SOP-1 EXPLORATION OF RESOURCES AVAILABLE IN NCBI AND PUBMED 1. Aim of the experiment To explore the resources available on NCBI and PUBMED 2. Materials required Computer with internet connectivity to access World Wide Web 3. Principle and Theory The NCBI database providing information on the structure of assembled genomes, assembly names and other meta-data, statistical reports, and links to genomic sequence data. Of particular relevance to genome mapping is the Genomes Division of Entrez. Entrez provides integrated access to different types of data for over 600 orga- nisms, including nucleotide sequences, protein sequences with structures, PubMed, MEDLINE and genomic mapping information. The NCBI Human Genome Map Viewer is a new tool that presents a graphical view of the available human genome sequence data as well as cytogenetic, genetic, physical, and radiation hybrid maps. The Map Viewer provides human genome sequence for finished contigs, BAC tiling path of finished and draft sequence, location of genes, STSs, and SNPs on finished and draft sequences; it is a useful tool for integrating maps and sequence. There are many other tools and databases at NCBI that are useful for gene mapping projects including BLAST, GeneMap’99, LocusLink, OMIM , dbSTS, dbSNP, dbEST, and UniGene databases. The BLAST can be used to search DNA sequences for the presence of markers, to confirm and refine map localizations. LocusLink (Pruitt et al., 2000) presents information on official nomenclature, aliases, sequence accessions, phenotypes, EC numbers, MIM numbers, UniGene clusters, homology, map locations, and related Web sites. The dbSTS and dbEST databases themselves play a lesser role in human and mouse gene mapping endeavors as their relevant information has already been captured by other more detailed resources (LocusLink, GeneMap’99, UniGene, MGD, and eGenome) but are currently the primary source of genomic information for

Standard Operating Procedures

Structural Forms of the Human Amylase Locus and Their Relationships to Snps, Haplotypes, and Obesity

Differential Proteomic Analysis of the Pancreas of Diabetic Db/Db Mice Reveals the Proteins Involved in the Development of Complications of Diabetes Mellitus

Chuanxiong Rhizoma Compound on HIF-VEGF Pathway and Cerebral Ischemia-Reperfusion Injury’S Biological Network Based on Systematic Pharmacology

Role of Amylase in Ovarian Cancer Mai Mohamed University of South Florida, [email protected]

Marker Identification of the Grade of Dysplasia of Intraductal Papillary

Functional Analysis of Pdx1 Overexpression in Nave Endoderm

Cancer Sequencing Service Data File Formats File Format V2.4 Software V2.4 December 2012

Mapping Mrna Libraries

Epigenomic Plasticity Enables Human Pancreatic Α to Β Cell Reprogramming

Relationship Between Salivary/Pancreatic

Relationship Between Salivary/Pancreatic Amylase And

Human Salivary Amylase Gene Copy Number Impacts Oral and Gut Microbiomes