Engineering Proteins from Sequence Statistics: Identifying and Understanding the Roles

Engineering Proteins from Sequence Statistics: Identifying and Understanding the Roles of Conservation and Correlation in Triosephosphate Isomerase DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Brandon Joseph Sullivan B.S. Graduate Program in the Ohio State Biochemistry Program The Ohio State University 2011 Dissertation Committee: Thomas J. Magliery, Advisor Mark P. Foster William C. Ray Copyright by Brandon Joseph Sullivan 2011 Abstract The structure, function and dynamics of proteins are determined by the physical and chemical properties of their amino acids. Unfortunately, the information encapsulated within a position or between positions is poorly understood. Multiple sequence alignments of protein families allow us to interrogate these questions statistically. Here, we describe the characterization of bioinformatically-designed variants of triosephosphate isomerase (TIM). First, we review the state-of-the-art for engineering proteins with increased stability. We examine two methodologies that benefit from the availability of large numbers - high-throughput screening and sequence statistics of protein families. Second, we have deconvoluted what properties are encoded within a position (conservation) and between positions (correlations) by designing TIMs in which each position is the most common amino acid in the multiple sequence alignment. We found that a consensus TIM from a raw sequence database performs the complex isomerization reaction with weak activity as a dynamic molten globule. Furthermore, we have confirmed that the monomeric species is the catalytically active conformation despite being designed from 600+ dimeric proteins. A second consensus TIM from a curated dataset is well folded, has wild-type activity and is dimeric, but it only differs from the raw consensus TIM at 35 nonconserved positions. These two TIMs differ in the ii fraction of dataset sequences from eukaryotes and prokaryotes. These distribution differences have led to the breaking and altering of networks of statistical correlations at nonconserved positions which we demonstrate with mutual information and subset perturbation calculations. Additionally, we show that the curated consensus TIM is an extreme thermostable enzyme. The protein remains half folded at 95 °C and may be the only TIM to completely refold after thermal denaturation. Third, we wished to understand the determinants of protein stability -- one of biochemistry's most difficult questions. It has been shown that consensus mutations improve the stability of native proteins approximately half the time, but there is no a priori technique to predict which consensus mutations will be stabilizing. We have developed a double-sieve filter that selects stabilizing mutations based on extent of conservation and statistical independence from other positions within the multiple sequence alignment. These two mathematical tests reliably predict stabilizing mutations with greater than 90% accuracy. The statistical algorithm was used to select 15 consensus mutations that together, improved the melting temperature of wild-type TIM by nearly 10 °C. Finally, we designed and characterized a model system for testing the effects of statistically correlated residues. The TIM-knockout from the Keio Collection was engineered for T7 expression and tested for TIM activity complementation. The single gene knockout exhibits differential growth that correlates well to in vitro specific iii activities. The design and characterization of two libraries are proposed to test the relationship between correlations and protein fitness. iv Dedication Mòran taing To my family - Heidi, Keegan, Killian, Merlin and Addison. To my Parents - Brian and Kathy Sullivan v Acknowledgments Trí na chéile a thógtar na cáisléain As Venuka, Tom and I developed this project we observed several residues that interact with many other positions - we deemed these as very important residues that define the protein. Looking back, I have adopted a different perspective. I believe that these residues alone are meaningless and only achieve importance through the contribution of the other positions. In that same light, all of my accomplishments have been dependent on the support of my family, friends, mentors and labmates. For that, I owe them everything. I first want to thank my undergraduate advisor, David Wells and my many professors who developed my love for science. When I wished to return to academics, it was Sean Taylor who forwarded my curriculum vitae to newly hired faculty, Thomas Magliery. I thank Sean for the introduction and thank Tom for the great opportunity to join and start his lab. I am highly appreciative of all the administrative help and talent I have been blessed with throughout these years, particularly Che Maxwell, Nicole Wade, Peter Sanders, Judith Brown and Jennifer Hambach. I also thank Kevin Dill and Jerry Park for all their help. I vi have thoroughly loved my opportunities to teach for the chemistry department. I thank the entire chemistry staff for their training, support, advice and guidance - especially Yiying Wu, Steven Kroner, Christopher Callam, Matthew Stoltzfus, Mary Bailey, Robert Tatz, Eric Heine, Holly Wheaton, Tami Sizemore and Christopher Hadad. I am also grateful to my many students who have brought me great pride and inspiration. On the other side of the desk, I have enjoyed all of my coursework because of my amazing professors: Dehua Pei, Russ Hille, David Bisaro, Paul Herman, George Marzluf, Mark Foster, Ross Dalbey, Mark Pfeiffer, Jovica Badjic, Thomas Magliery, Charles Bell, Chenglong Li, Michael Chan and Will Ray. I am also highly appreciative of Ross Dalbey and Jill Rafael-Fortney for directing the Ohio State Biochemistry Program and providing incredible mentorship. I thank David Hart for the weekly Science magazines and for being awesome. I thank surface tension for saving numerous experiments in my graduate career. I am incredibly thankful for our Graduate School and the wonderful work of Kathleen Wallace, Karen Mayer and Dean Patrick Osmer. I am humbled by their nominations and kind words. Additional thanks are due to Kathleen Wallace and Karen Mayer for their efforts regarding the Preparing Future Faculty Program and my mentor Heather Rhodes of Denison University. I am also grateful for the many professors who have inspired me in my tenure at Ohio State. Their efforts continue to push and inspire me, especially: Pehr Harbury, Vern vii Schramm, Jay Keasling, Amy Keating, David Liu, Shelley Copley, David Baker, Peter Schultz, Julius Rebek, David Eisenberg, Dan Bolon, Patricia Babbitt and Daniel Nocera. I am additionally thankful to Pehr and his student Kierstin for MPAX training. Receiving a doctorate is truly an endurance event. I often feel that completing the journey has been made easier through my participation in endurance sports. Whether it has been a century ride, a triathlon, a marathon or a 15 mile swim the mental strength, stubbornness and stress relief have been invaluable. These traits have been second only to the friends and teammates who have inspired and pushed me to completion. In particular, I am thankful to labmates turned workout buddies: Matthew Heberling, Ely Porter, Sarah Johnston and Ted Schoenfeldt. I thank my swim team, the Columbus Sharks, for making 5:30 am workouts seem like a good idea. In particular I thank my coaches Tracy Hendershot and Bo Martin and teammate Evan Morrison for cultivating some truly insane swims. I am grateful for the many friends and classmates I have made in graduate school. In particular, I wish to thank Christopher Jones, Jeffrey Joyner, John Shimko, Ross Wilson, Ian Kleckner and Kevin Fiala. These students set high benchmarks and provided friendly competition. I am deeply thankful for my talented (and crazy) labmates. Jason Lavinder was a great friend, fellow football fan, and labmate. If I had known Jason before graduate school, I would have bought stock in Wendy's and retired after receiving my Ph.D. Some of my favorite Magliery memories took place on the golf course with Jason viii and the awkward brilliance of Sanjay Hari. Quiz team, snow pants, printer cartridges, the -20, lab lingo, wooden seat, B, the Inner Game. Man - I miss Sanjay. I am also incredibly fortunate to have adopted two lab sisters in Lihua Nie and Brinda R-a-m-a-s-u- b-r-a-m-a-n-i-a-n. Lihua has taught me that no detail is too small, to be persistent and never come up short, and that set backs are often little. She has also taught me that a trilingual-pint-sized-analytical-organic-biological-chemist wielding a hammer is a force to be reckoned with. In all seriousness, Lihua is the hardest working individual I have ever met. It has been an honor working with her. Brinda has been a fantastic friend. She has worked on one of the most difficult projects in the lab with grace, patience and unparalleled persistence. It was a great joy for our families to bring Sahana Umbah@@ and Keegan into the world only six weeks apart. Brinda also taught me that "isosbestic point" is a curse word. I also wish to thank the Genomic Design Group, especially Venuka Durani, Nicholas Callahan, Deepamali Perera and Sidharth Mohan. Of this group, Venuka deserves a special thanks. This project would not have been possible without her help. She continually provided invaluable feedback, suggestions, time and the techniques that brought this dissertation to fruition. I am most grateful and happy to have shared authorships with her. I thank David Mata for autoclaving the rotor in the middle of my electrocompetent cell preparation. One of my favorite aspects of graduate school has been the mentoring of undergraduates. I have been so lucky to have shared my project with incredibly talented people including Trixy Syu, Miriam Thomas, Deepti Mathur, Tran Nguyen and Samantha Rojas. These ix students continually surprised me, contributed significantly to the work written here, and brightened each day in the lab. I am so happy that I was able to work side by side with these amazing undergraduates.

Engineering Proteins from Sequence Statistics: Identifying and Understanding the Roles

Perspectives on Supercomputing and Artificial Intelligence Applications In

Ancient Metaproteomics: a Novel Approach for Understanding Disease And

Backbone Dynamics of Free Barnase and Its Complex with Barstar Determined by 15N NMR Relaxation Study

Characterization of in Vitro Oxidized Barstar

The TIM Barrel Fold Nagarajan D

Southwest Retort

Smurflite: Combining Simplified Markov Random Fields With

The Grand Challenges in the Chemical Sciences

Annual Report 2010 © Copyright 2011

The Medical & Scientific Library of W. Bruce

Decelerated Genome Evolution in Modern Vertebrates Revealed by Analysis of Multiple Lancelet Genomes

EYAL AKIVA, Phd