SDSU Template, Version 11.1

POINT ACCEPTED MUTATION (PAM) AND BLOCK SUBSTITUTION MATRIX (BLOSUM) INTERACTIVE PROGRAMS ON KELLEYBIOINFO.ORG _______________ A Project Presented to the Faculty of San Diego State University _______________ In Partial Fulfillment of the Requirements for the Degree Master of Science in Bioinformatics and Medical Informatics with a Concentration in Professional Science _______________ by Ari Widjaja Spring 2017 iii Copyright © 2017 by Ari Widjaja All Rights Reserved iv DEDICATION I dedicate this project to my father, Wahono Widjaja, and mother, Lanny Harrijanto, who have raised me to be the person I am today. Secondly, I dedicate this project to my lovely wife,Dewayani A. Windy, and our daughter, WidyaSuryadewi, for their unconditional love, support, and encouragement for me to finish this project and to become the best that I can be. I thank Lord Jesus Christ for all my achievement and the journey to be what I want to be. I am grateful for the bad and good days, the tears and laughter, and for everything that I went through. v "Strength doesn't come from what you can do. It comes from overcoming the things you once thought you couldn't" - Rikki Rogers vi ABSTRACT OF THE PROJECT Point Accepted Mutation (PAM) and Block Substitution Matrix (BLOSUM) Interactive Programs on Kelleybioinfo.org by Ari Widjaja Master of Science in Bioinformatics and Medical Informatics with a Concentration in Professional Science San Diego State University, 2017 Kelleybioinfo.org is an interactive bioinformatics software that was built by Dr. Scott Kelley's team in 2013 that teaches basic theories in bioinformatics in BIOMI 563 class at San Diego State University (SDSU). Equipped with five modules with two algorithms in each module, it offers 10 interactive bioinformatics learning platform. Kelleybioinfo software is also accessible through web browsers, handheld devices, or tablets. The National Science Foundation (NSF) has recently awarded a grant for a further development of this software and therefore we will introduce and develop the scoring matrix module. The scoring matrix module offers the basic concept of point accepted mutation (PAM) and block substitution matrix (BLOSUM). Scoring matrices such as PAM, BLOSUM, or Gonnett matrix have been widely used to score matches, mismatches, substitutions, and deletions in sequence alignments. This module will simulate global (for PAM) or local (for BLOSUM) protein sequence alignment and to determine protein’s mutability based off the log-odd matrix.There has not been a single interactive tutorial on how PAM’s or BLOSUM log-odd matrix are generated. Therefore having this module in kelleybioinfo.org will help students who learn bioinformatics to understand the PAM or BLOSUM matrix’s concept. vii TABLE OF CONTENTS PAGE ABSTRACT ............................................................................................................................. vi LIST OF FIGURES ............................................................................................................... viii ACKNOWLEDGEMENTS ..................................................................................................... ix CHAPTER 1 INTRODUCTION .........................................................................................................1 2 MATERIALS AND METHODS ...................................................................................3 Observed Frequency Calculation .............................................................................3 Expected Frequency Calculation .............................................................................4 Log-Odd Matrix .......................................................................................................5 Methods....................................................................................................................5 3 RESULTS ......................................................................................................................8 Module Main Page ...................................................................................................8 Concept Mode ..........................................................................................................9 Interactive Mode ......................................................................................................9 4 DISCUSSION ..............................................................................................................11 REFERENCES ........................................................................................................................12 viii LIST OF FIGURES PAGE Figure 1. BLOSUM_General.py script ......................................................................................6 Figure 2. BLOSUM_Interactive.html. The JavaScript object notations (JSON) parsed from both PAM and BLOSUM Python scripts are handled using JQuery event handler $.getJSON where it fetches JSON data from the server. ..................................7 Figure 3. Kelley Bioinformatics main page. ..............................................................................8 Figure 4. Probability module's icon. ..........................................................................................8 Figure 5. BLOSUM concept mode page. In concept mode user can follow through steps to calculate sequence probability and generate subsequent matrices. ..................9 Figure 6. PAM interactive (quiz) mode with blank tables. ......................................................10 Figure 7. BLOSUM interactive (quiz) mode with blank tables ...............................................10 ix ACKNOWLEDGEMENTS I would like to express special gratitude for my graduate advisor, Dr. Scott T. Kelley, for giving me the opportunity to work on this project and I would like to thank my project panel members: Dr. Robert Edwards and Dr. Barbara Bailey for their support and advice. Additionally, I would like to thank my colleague and friend Dennis Didulo for the guidance and technical consultation for the duration of this project. 1 CHAPTER 1 INTRODUCTION In bioinformatics, one of the most fundamental things to do is to decipher the hidden messages in our DNA and to make estimations of how one species is interrelated with another. One of the many topics in bioinformatics that attempts to answer the question of how an ape is related to human is through sequence alignment. Sequence alignment is a process of arranging the sequences of DNA, RNA, or protein in order to identify regions of similarity which can be a direct consequence of functional, structural, and evolutionary relationship between the sequences [1]. While any alignment algorithms such as Smith- Waterman or Needleman-Wunsch can be used to match sequences and to further transform a hypothetical ancestor sequence out of two aligned sequences, scoring matrix is constructed to quantify the alignment’s similarity. The score given in every possible substitution and identity is based off the frequencies of occurrences in alignments of related proteins, which also reflects the frequency that a particular amino acid occurs in nature [2]. Conventionally, the high and low scores in sequence alignments indicate the probability by chance that the two amino acids aligned. Higher scores indicate that two amino acids are evolutionary related and the probability of the two amino acids aligned by chance are low, and in contrast, lower scores indicated a high probability the two amino acids aligned by chance, and are evolutionarily unrelated [2]. While there are numerous scoring matrices nowadays such as: Gonnet [3], PET [4], Risler [5], Overington [6] there are two fundamental scoring matricesdeveloped: Point Accepted Mutation (PAM) and Block of Substitution Matrix (BLOSUM). PAM matrix was introduced and developed by Margareth Dayhoff in her thesis titled “A Model of Evolutionary Change in Proteins” [7] as a scoring matrix of the observed amino acid substitutions of 34 closely related protein superfamilies grouped into 71 evolutionary trees [7]. PAM matrices are based off Markov model of protein evolution [8] or mutations 2 observed throughout an un-gapped, global alignment. PAM1 matrix, which is used as the basis, which estimates what rate of substitution would be expected if 1% of the amino acids had changed. BLOSUM, on the other hand, was developed by Steven Henikoff and Jorja G. Henikoff in their thesis “Amino Acid Substitution Matrices from Protein Blocks” [9] published in 1992. BLOSUM scoring matrices, in contrast to PAMs, were derived from 2000 blocks of aligned sequence of more than 500 groups of related proteins [9] and they are based on implicit model of evolution or local alignments of closely related proteins [2]. Additionally, BLOSUM matrices focus on highly conserved regions of protein and as for extrapolation, larger numbers in the BLOSUM matrix naming scheme, such as BLOSUM 80, denote higher similarity and, conversely, smaller evolutionary distance. Both BLOSUM and PAM matrices are logarithm-of-odds matrices which is the ratio of the occurrence of each amino acid combination in the observed data (observed frequency) to the expected value of occurrence of the pair (expected frequency) [9]. Scoring matrix is one of bioinformatics subjects that has been presented and taught in every bioinformatics courses, however, currently there is not an interactive tutorial that guides student on how to construct scoring matrices. Kelleybioinfo.org (also known as "Kelley Bioinformatics") is a great venue

Load more