The Making of Margaret O. Dayhoff's Atlas of Protein Sequence And
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Lecture 5: Sequence Alignment – Global Alignment
Sequence Alignment COSC 348: Computing for Bioinformatics • Sequence alignment is a way of arranging two or more sequences of characters to identify regions of similarity – b/c similarities may be a consequence of functional or Lecture 5: evolutionary relationships between these sequences. Sequence Alignment – Global Alignment • Another definition: Procedure for comparing two or more sequences by searching for a series of individual characters that Lubica Benuskova, Ph.D. are in the same order in those sequences – Pair-wise alignment: compare two sequences – Multiple sequence alignment: compare > 2 sequences http://www.cs.otago.ac.nz/cosc348/ 1 2 Similarity versus identity Sequence alignment: example • In the process of evolution, from one generation to the next, and from one species to the next, the amino acid sequences of • Task: align abcdef with somehow similar abdgf an organism's proteins are gradually altered through the action of DNA mutations. For example, the sequence: • Write second sequence below the first one – ALEIRYLRD • could mutate into the sequence: ALEINYLRD abcdef abdgf • in one generation and possibly into AQEINYQRD • Move sequences to give maximum match between them. • over a longer period of evolutionary time. – Note: a hydrophobic amino acid is more likely to stay • Show characters that match using vertical bar. hydrophobic than not, since replacing it with a hydrophilic residue could affect the folding and/or activity of the protein. 3 4 Sequence alignment: example Quantitative global alignments abcdef • We are looking for an alignment, which || – maximizes the number of base-to-base matches; abdgf – if necessary to achieve this goal, inserts gaps in either sequence (a gap means a base-to-nothing match); • In order to maximise the alignment, we insert gap between – the order of bases in each sequence must remain and in lower sequence to allow and to align b d d f preserved and abcdef – gap-to-gap matches are not allowed. -
Novel Bioinformatics Applications for Protein Allergology
AND ! "#$% &'()* +% + ,-.,-/,0 + 121,..0-10- ! 3 4 33!!3 ,,,1/ !"# $% # $# &'()$ $*+,'-./ $ "Por la ciencia, como por el arte, se va al mismo sitio: a la verdad" Gregorio Marañón Madrid, 19-05-1887 - Madrid, 27-03-1960 List of Papers This thesis is based on the following papers, which are referred to in the text by their Roman numerals. I Martínez Barrio, Á., Soeria-Atmadja, D., Nister, A., Gustafsson, M.G., Hammerling, U., Bongcam-Rudloff, E. (2007) EVALLER: a web server for in silico assessment of potential protein allergenicity. Nucleic Acids Research, 35(Web Server issue):W694-700. II Martínez Barrio, Á.∗, Lagercrantz, E.∗, Sperber, G.O., Blomberg, J., Bongcam-Rudloff, E. (2009) Annotation and visualization of endogenous retroviral sequences using the Distributed Annotation System (DAS) and eBioX. BMC Bioinformatics, 10(Suppl 6):S18. III Martínez Barrio, Á., Xu, F., Lagercrantz, E., Bongcam-Rudloff, E. (2009) GeneFinder: In silico positional cloning of trait genes. Manuscript. IV Martínez Barrio, Á., Ekerljung, M., Jern, P., Benachenhou, F., Sperber, -
Lie Therory and Hermite Polynomials
International Journal For Research In Advanced Computer Science And Engineering ISSN: 2208-2107 A Computational Method for Sequence analyzing Rekardo Fosalli, Teriso Joloobi School of Computer Science, University of De Cyprus, Cyprus Abstract: To study how normal cellular activities are altered in different disease states, the biological data must be combined to form a comprehensive picture of these activities. Therefore, the field of bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data. This includes nucleotide and amino acid sequences, protein domains, and protein structures. Common uses of bioinformatics include the identification of candidate genes and nucleotides (SNPs). Often, such identification is made with the aim of better understanding the genetic basis of disease, unique adaptations, desirable properties (esp. in agricultural species), or differences between populations. In a less formal way, bioinformatics also tries to understand the organisational principles within nucleic acid and protein sequences. Keywords: Clustering, RNA sequences, RNA-Seq, Data Mining, Bioinformatics, Graph Mining. 1. INTRODUCTION Bioinformatics has become an important part of many areas of biology. In experimental molecular biology, bioinformatics techniques such as image and signal processing allow extraction of useful results from large amounts of raw data. In the field of genetics and genomics, it aids in sequencing and annotating genomes and their observed mutations. It plays a role in the text mining of biological literature and the development of biological and gene ontologies to organize and query biological data. It also plays a role in the analysis of gene and protein expression and regulation. -
Sophie Dumont
Sophie Dumont University of California, San Francisco 513 Parnassus Ave, HSW-613, San Francisco, CA 94143-0512, USA Office: (415) 502-1229 / Cell: (510) 229-9846 [email protected] ; http://dumontlab.ucsf.edu/ EDUCATION 9/2000-12/2005 Ph.D., Biophysics, University of California, Berkeley, CA 10/1999-8/2000 Candidate for D.Phil., Theoretical Physics, University of Oxford, UK 9/1995-6/1999 B.A., Physics, magna cum laude, Princeton University, Princeton, NJ RESEARCH POSITIONS & TRAINING 7/2012-present Assistant Professor, University of California, San Francisco Dept of Cell & Tissue Biology and Dept of Cellular & Molecular Pharmacology Member, QB3 California Institute for Quantitative Biosciences Member, NSF Center for Cellular Construction Affiliate Member, UCSF Cancer Center Graduate program member: Bioengineering, Biomedical Sciences, Biophysics, Tetrad 7/2006-5/2012 Postdoctoral Fellow, Harvard Medical School Mentor: Prof. Timothy Mitchison (Systems Biology) 7/2006-6/2009 Junior Fellow, Harvard Society of Fellows 9/2000-12/2005 Graduate Student, University of California, Berkeley Advisor: Prof. Carlos Bustamante (Physics and Molecular & Cell Biology) 10/1999-8/2000 Graduate Student, University of Oxford Advisor: Prof. Douglas Abraham (Theoretical Physics) 6/1997-5/1999 Undergraduate Student, Princeton University Advisor: Prof. Stanislas Leibler (Physics) AWARDS 2016-2021 NSF CAREER Award 2016 Margaret Oakley Dayhoff Award of the Biophysical Society 2015-2020 NIH New Innovator Award (DP2) 2013-2018 Rita Allen Foundation and Milton -
Information-Theoretic Bounds of Evolutionary Processes Modeled As a Protein Communication System
INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM Liuling Gong, Nidhal Bouaynaya∗ and Dan Schonfeld University of Illinois at Chicago, Dept. of Electrical and Computer Engineering, ABSTRACT can be investigated in the context of engineering communica- In this paper, we investigate the information theoretic bounds tion codes. In particular, it is legitimate to ask at what rate of the channel of evolution introduced in [1]. The channel of can the genomic information be transmitted. And what is the evolution is modeled as the iteration of protein communica- average distortion between the transmitted message and the tion channels over time, where the transmitted messages are received message at this rate? Shannon’s channel capacity protein sequences and the encoded message is the DNA. We theorem states that, by properly encoding the source, a com- compute the capacity and the rate-distortion functions of the munication system can transmit information at a rate that is protein communication system for the three domains of life: as close to the channel capacity as one desires with an arbi- Achaea, Prokaryotes and Eukaryotes. We analyze the trade- trarily small transmission error. Conversely, it is not possi- off between the transmission rate and the distortion in noisy ble to reliably transmit at a rate greater than the channel ca- protein communication channels. As expected, comparison pacity. The theorem, however, is not constructive and does of the optimal transmission rate with the channel capacity in- not provide any help in designing such codes. In the case dicates that the biological fidelity does not reach the Shan- of biological communication systems, however, evolution has non optimal distortion. -
Testing the Independence Hypothesis of Accepted Mutations for Pairs Of
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: Theses, Computer Science and Engineering, Department of Dissertations, and Student Research 12-2016 TESTING THE INDEPENDENCE HYPOTHESIS OF ACCEPTED MUTATIONS FOR PAIRS OF ADJACENT AMINO ACIDS IN PROTEIN SEQUENCES Jyotsna Ramanan University of Nebraska-Lincoln, [email protected] Follow this and additional works at: http://digitalcommons.unl.edu/computerscidiss Part of the Bioinformatics Commons, and the Computer Engineering Commons Ramanan, Jyotsna, "TESTING THE INDEPENDENCE HYPOTHESIS OF ACCEPTED MUTATIONS FOR PAIRS OF ADJACENT AMINO ACIDS IN PROTEIN SEQUENCES" (2016). Computer Science and Engineering: Theses, Dissertations, and Student Research. 118. http://digitalcommons.unl.edu/computerscidiss/118 This Article is brought to you for free and open access by the Computer Science and Engineering, Department of at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in Computer Science and Engineering: Theses, Dissertations, and Student Research by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. TESTING THE INDEPENDENCE HYPOTHESIS OF ACCEPTED MUTATIONS FOR PAIRS OF ADJACENT AMINO ACIDS IN PROTEIN SEQUENCES by Jyotsna Ramanan A THESIS Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfilment of Requirements For the Degree of Master of Science Major: Computer Science Under the Supervision of Peter Z. Revesz Lincoln, Nebraska December, 2016 TESTING THE INDEPENDENCE HYPOTHESIS OF ACCEPTED MUTATIONS FOR PAIRS OF ADJACENT AMINO ACIDS IN PROTEIN SEQUENCES Jyotsna Ramanan, MS University of Nebraska, 2016 Adviser: Peter Z. Revesz Evolutionary studies usually assume that the genetic mutations are independent of each other. However, that does not imply that the observed mutations are indepen- dent of each other because it is possible that when a nucleotide is mutated, then it may be biologically beneficial if an adjacent nucleotide mutates too. -
Sophie Dumont
Sophie Dumont University of California, San Francisco 513 Parnassus Ave, HSW-613, San Francisco, CA 94143-0512, USA Office: (415) 502-1229 / Cell: (510) 229-9846 [email protected] ; http://dumontlab.ucsf.edu/ EDUCATION 9/2000-12/2005 Ph.D., Biophysics, University of California, Berkeley, CA 10/1999-8/2000 Candidate for D.Phil., Theoretical Physics, University of Oxford, UK 9/1995-6/1999 B.A., Physics, magna cum laude, Princeton University, Princeton, NJ RESEARCH POSITIONS & TRAINING 7/2012-present Assistant Professor, University of California, San Francisco Dept of Cell & Tissue Biology Dept of Cellular & Molecular Pharmacology Member, NSF Center for Cellular Construction Member, QB3 California Institute for Quantitative Biosciences Associate Member, UCSF Cancer Center Graduate program member: Biomedical Sciences Biophysics Bioengineering Tetrad 7/2006-5/2012 Postdoctoral Fellow, Harvard Medical School 7/2006-6/2009 Junior Fellow, Harvard Society of Fellows Mentor: Prof. Timothy Mitchison (Systems Biology) Collaborator: Prof. Edward Salmon (UNC Chapel Hill) Topic: Mechanochemistry of cell division 9/2000-12/2005 Graduate Student, University of California, Berkeley Advisor: Prof. Carlos Bustamante (Physics and Molecular & Cell Biology) Collaborators: Prof. Ignacio Tinoco & Prof. Jan Liphardt (UC Berkeley) Prof. Anna Marie Pyle (Yale University) Thesis: Force and helicase-catalyzed mechanical unfolding of single RNA molecules 10/1999-8/2000 Graduate Student, University of Oxford Advisor: Prof. Douglas Abraham (Theoretical Physics) Topic: Statistical mechanics of neural networks 6/1997-5/1999 Undergraduate Student, Princeton University Advisor: Prof. Stanislas Leibler (Physics) Thesis: Thermal sensing network of E. coli Page 1 of 5 AWARDS 2016-2021 NSF CAREER Award 2016 Margaret Oakley Dayhoff Award of the Biophysical Society 2015-2020 NIH New Innovator Award (DP2) 2013-2018 Rita Allen Foundation and Milton E. -
A Thesis Entitled Homology-Based Structural Prediction of the Binding
A Thesis entitled Homology-based Structural Prediction of the Binding Interface Between the Tick-Borne Encephalitis Virus Restriction Factor TRIM79 and the Flavivirus Non-structural 5 Protein. by Heather Piehl Brown Submitted to the Graduate Faculty as partial fulfillment of the requirements for the Master of Science Degree in Biomedical Science _________________________________________ R. Travis Taylor, PhD, Committee Chair _________________________________________ Xiche Hu, PhD, Committee Member _________________________________________ Robert M. Blumenthal, PhD, Committee Member _________________________________________ Amanda Bryant-Friedrich, PhD, Dean College of Graduate Studies The University of Toledo December 2016 Copyright 2016, Heather Piehl Brown This document is copyrighted material. Under copyright law, no parts of this document may be reproduced without the expressed permission of the author. An Abstract of Homology-based Structural Prediction of the Binding Interface Between the Tick-Borne Encephalitis Virus Restriction Factor TRIM79 and the Flavivirus Non-structural 5 Protein. by Heather P. Brown Submitted to the Graduate Faculty as partial fulfillment of the requirements for the Master of Science Degree in Biomedical Sciences The University of Toledo December 2016 The innate immune system of the host is vital for determining the outcome of virulent virus infections. Successful immune responses depend on detecting the specific virus, through interactions of the proteins or genomic material of the virus and host factors. We previously identified a host antiviral protein of the tripartite motif (TRIM) family, TRIM79, which plays a critical role in the antiviral response to flaviviruses. The Flavivirus genus includes many arboviruses that are significant human pathogens, such as tick-borne encephalitis virus (TBEV) and West Nile virus (WNV). -
The Origins of Bioinformatics.Pdf
PERSPECTIVES d’éthique en désaccord avec la directive européenne. Le Medicine in the 21st Century, December 4–5, 1998, 2nd nomic environment that includes various Monde 15 June (2000). edn 47–53 (AACC, Washington DC, 1998). social values, research practices and business 7. Kolata, G. Special Report: Who owns your genes? New 26. Caulfield, T. & Gold, E. R. Genetic testing, ethical York Times 15 May (2000). concerns, and the role of patent law. Clin. Genet. 57, pressures. We are mindful that, in some situa- 8. Ramirez, A. School given patent to clone humans. 370–375 (2000). tions, modifying patent law may reduce one National Post 16 May (2000). 27. Bruzzone, L. The research exemption: a proposal. Am. 9. Sagar, A., Daemmrich, A. & Ashiya, M. The tragedy of Intell. Prop. Law Assoc. QL 21, 52 (1993). problem (such as permitting more competi- commoners: biotechnology and its publics. Nature 28. Parker, D. Patent infringement exemptions for life science tion), while magnifying others (such as Biotechnol. 18, 2–4 (2000). research. Houston J. Intl Law 16, 615 (1994). 10. Angell, M. Is academic medicine for sale? N. Engl. J. 29. Gold, E. R. in Commercialization of Genetic Research: reducing incentives to conduct research and Med. 20, 1516–1518 (2000). Ethical, Legal and Policy Issues (eds Caulfield, T. & development). Nevertheless, although care 11. Pottagem, A. The inscription of life in law: gene, patents, Williams–Jones, B.) 63–78 (Plenum, New York, 1999). and bio-politics. The Modern Law Review 61, 740–765 30. Schissel, A., Merz, J. F. & Cho, M. K. Survey confirms must be taken, this debate needs to progress (1998). -
History and Epistemology of M Olecular Biology and Beyond
M A X - P L A N C K - I N S T I T U T F Ü R W I S S E N S C H A F T S G E S C H I C H T E M ax Pl anc k Ins t i tut e for the His tory of Sc i enc e 2 0 0 6 P R E P R I N T 3 1 0 W o r k s h o p H i s t o r y and Ep i s t e m o logy of M o lecu la r B i o logy and Beyond : P r ob le m s and Pe r spec t i ves Collecting and Experimenting: The Moral Economies of Biological Research, 1960s-1980s. Bruno J. Strasser I. Introduction Experimentation is often singled out as the most distinctive feature of modern science. Our very idea of modern science, that we trace back to the Scientific Revolution, gives a central place to this particular way of producing knowledge. The current epistemic, social and cultural authority of science rests largely on the possibility of experimentation in the laboratory. The history of the sciences over the last four centuries reminds us that experimentation has only been one out of many ways in which scientific knowledge has been produced. However, by most accounts, experimentation has progressively come to dominate all the others, in most fields of science, from high energy physics to molecular biology. In the life sciences, the rise of experimentalism has been cast against the natural history tradition, leading to its progressive demise. -
Coding Sequences: a History of Sequence Comparison Algorithms As a Scientiªc Instrument
Coding Sequences: A History of Sequence Comparison Algorithms as a Scientiªc Instrument Hallam Stevens Harvard University Sequence comparison algorithms are sophisticated pieces of software that com- pare and match identical or similar regions of DNA, RNA, or protein se- quence. This paper examines the origins and development of these algorithms from the 1960s to the 1990s. By treating this software as a kind of scien- tiªc instrument used to examine sets of biological objects, the paper shows how algorithms have been used as different sorts of tools and appropriated for dif- ferent sorts of uses according to the disciplinary context in which they were deployed. These particular uses have made sequences themselves into different kinds of objects. Introduction Historians of molecular biology have paid signiªcant attention to the role of scientiªc instruments and their relationship to the production of bio- logical knowledge. For instance, Lily Kay has examined the history of electrophoresis, Boelie Elzen has analyzed the development of the ultra- centrifuge as an enabling technology for molecular biology, and Nicolas Rasmussen has examined how molecular biology was transformed by the introduction of the electron microscope (Kay 1998, 1993; Elzen 1986; Rasmussen 1997).1 Collectively, these historians have demonstrated how instruments and other elements of the material culture of the labora- tory have played a decisive role in determining the kind and quantity of knowledge that is produced by biologists. During the 1960s, a versatile new kind of instrument began to be deployed in biology: the electronic computer (Ceruzzi 2001; Lenoir 1999). Despite the signiªcant role that 1. One could also point to Robert Kohler’s (1994) work on the fruit ºy, Jean-Paul Gaudillière (2001) on laboratory mice, and Hannah Landecker (2007) on the technologies of tissue culture. -
Bioinformatics Scoring Matrices
Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Scoring Matrices • Learning Objectives – To explain the requirement for a scoring system reflecting possible biological relationships – To describe the development of PAM scoring matrices – To describe the development of BLOSUM scoring matrices (c) David Gilbert 2008 Scoring matrices 2 Scoring Matrices • Database search to identify homologous sequences based on similarity scores • Ignore position of symbols when scoring • Similarity scores are additive over positions on each sequence to enable DP • Scores for each possible pairing, e.g. proteins composed of 20 amino acids, 20 x 20 scoring matrix (c) David Gilbert 2008 Scoring matrices 3 Scoring Matrices • Scoring matrix should reflect – Degree of biological relationship between the amino-acids or nucleotides – The probability that two AA’s occur in homologous positions in sequences that share a common ancestor • Or that one sequence is the ancestor of the other • Scoring schemes based on physico-chemical properties also proposed (c) David Gilbert 2008 Scoring matrices 4 Scoring Matrices • Use of Identity – Unequal AA’s score zero, equal AA’s score 1. Overall score can then be normalised by length of sequences to provide percentage identity • Use of Genetic Code – How many mutations required in NA’s to transform one AA to another • Phe (Codes UUU & UUC) to Asn (AAU, AAC) • Use of AA Classification – Similarity based on properties such