The Bioinformaticist's Toolbox in The
Total Page:16
File Type:pdf, Size:1020Kb
THE BIOINFORMATICIST’S TOOLBOX IN THE POST-GENOMIC AGE: APPLICATIONS AND DEVELOPMENTS By DAVID R. SCHREIBER A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2007 1 Copyright 2007 By David R. Schreiber 2 ACKNOWLEDGMENTS Tremendous thanks go to James Deyrup—for the opportunity, the faith, understanding, generosity, goodwill, humanity and love. Thanks also go to Steve Benner—for the freedom, the enthusiasm, the privilege, and for an absurd portion of patience. And thanks go to my family— for being there, wherever “there” chose to be. 3 TABLE OF CONTENTS page ACKNOWLEDGMENTS ...............................................................................................................3 LIST OF TABLES...........................................................................................................................6 LIST OF FIGURES .........................................................................................................................7 LIST OF ABBREVIATIONS..........................................................................................................8 ABSTRACT...................................................................................................................................10 EXOBIOLOGY AND POST-GENOMIC SCIENCE ...................................................................12 Introduction.............................................................................................................................12 The Conventional Evolutionary Paradigm. ............................................................................13 Post-Genomic Science: Modeling Molecular Evolution ........................................................17 The Markov Model..........................................................................................................17 Non-Markovian Protein Evolution as a Post-Genomic Tool for Structure Prediction....18 Structure Prediction as a Tool for Identifying Long Distance Homologs.......................18 Recruitment of Function.........................................................................................................24 Correlating the Paleontological Record with Episodes of Sequence Evolution.....................30 Identification of in vitro Behaviors that Contribute to Physiological Function. ....................32 Structure Prediction and a Rapidly Searchable Database.......................................................33 Conclusions.............................................................................................................................36 EXTENDING THE PBL METHOD FOR ESTIMATING NUMBERS OF SYNONYMOUS AND NONSYNONYMOUS MUTATIONS TO ACCOUNT FOR AMBIGUITY IN INFERRED ANCESTRAL SEQUENCES ............................................................................41 Introduction.............................................................................................................................41 An Overview of the PBL Method...........................................................................................42 Extending the PBL Method to Phylogenetic Trees with Inferred Ancestral Sequences: Putting Ambiguity Back into the Equations. ......................................................................45 THE ADAPTIVE EVOLUTION DATABASE (TAED): APPLICATION OF THE EXTENDED PBL METHOD TO A LARGE DATASET.....................................................56 Background.............................................................................................................................56 Results.....................................................................................................................................60 Discussion...............................................................................................................................63 Conclusions.............................................................................................................................64 Materials and Methods ...........................................................................................................66 DETECTING COMPENSATORY COVARIATION SIGNALS IN PROTEIN EVOLUTION USING RECONSTRUCTED ANCESTRAL SEQUENCES ........................68 4 Introduction.............................................................................................................................68 Results.....................................................................................................................................73 Charge Compensation and the Surface of the Folded Protein.........................................75 Charge Compensation in Both Contiguous and Non-Contiguous Position Pairs............76 Enhancing the Charge Compensation Signal ..................................................................77 Charge Compensation in Specific Secondary Structural Elements.................................78 Charge Compensation in Buried Residues......................................................................79 Discussion...............................................................................................................................80 Accounting for the Stronger Signal From Node-Node Comparison...............................81 A Model-Independent Method to Evaluate an Evolutionary Tree ..................................82 Darwinian Requirements for Compensatory Covariation ...............................................84 Methods ..................................................................................................................................87 THE PLANETARY BIOLOGY OF CYTOCHROME P450 AROMATASES ...........................99 Background.............................................................................................................................99 Results...................................................................................................................................101 Discussion.............................................................................................................................106 Conclusions...........................................................................................................................112 Methods ................................................................................................................................114 NAMES AND ABBREVIATIONS OF NUCLEIC ACID BASES AND AMINO ACIDS.......125 AN IMPLEMENTATION OF THE EXTENDED PBL METHOD CODED IN JAVA 1.1 ......126 LIST OF REFERENCES.............................................................................................................141 BIOGRAPHICAL SKETCH .......................................................................................................151 5 LIST OF TABLES Table page 3-1 A sample listing from TAED.............................................................................................67 4-1 Frequencies of the average contiguous position pairs .......................................................96 4-2 List of 71 protein families used in this analysis.................................................................97 5-1 Frequency distributions of stem pig duplication substitutions ........................................123 5-2 Distributions of stem pig duplication substitutions .........................................................124 A-1 Names and abbreviations of nucleic acid bases...............................................................125 A-2 One and three letter symbols for the amino acids............................................................125 6 LIST OF FIGURES Figure page 1-1 Predicted surface, interior, and secondary structure assignments......................................38 1-2 Evolutionary tree showing the evolutionary history of the leptins....................................39 1-3 Evolutionary tree showing the evolutionary history of the extracellular...........................40 2-1 Two possible ways to get from TCA to CAA....................................................................49 2-2 A hypothetical phylogeny for sequences t1, t2 and t3. .....................................................50 2-3 Parsimony for each of the nucleotides in sequences t1, t2 and t3.....................................51 2-4 EPSs for ancestral node a2. ...............................................................................................52 2-5 Sequences t1, t2 and t3, numbers of degenerate sites, and average degeneracies ............53 2-6 Numbers of transitions and transversions for each pair of sequences. ..............................54 2-7 Tallies of average degeneracies and mutations for each pair of sequences.......................55 4-1 A leaf-leaf comparison (red) traverses more evolutionary distance ..................................89 4-2 Attempted detection of charge compensatory covariation signal using leaf-leaf..............90 4-3 Detecting charge compensatory covariation signal using explicitly reconstructed...........91 4-4 Predicting secondary structure using contiguous pairs of compensatory changes. ...........92 4-5 Distribution of distances between charge anti-compensatory pairs...................................93 4-6 Surface accessibility of charged residues ..........................................................................94 4-7 A schematic illustration of the use of compensatory