Customizing Scoring Functions in Molecular Docking

Customizing Scoring Functions rjy Tuan Anh Pham DISSERTATION Submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Biological and Medical Informatics in the GRADUATE DIVISION of the Copyright 2007 by Tuan Anh Pham ii Acknowledgements First the required content: The text of Chapter11 5 and Chapter11 6 of this dissertation is a revised reprint of the material as it appears in the Journal of Medicinal Chemistry and Journal of Computer- Aided Molecular Design, respectively. The coauthor Ajay Jain listed in these publications directed and supervised the research that forms the basis for the dissertation. Now the good stuff – recognizing those who lent me a hand in this epic journey called graduate school: First and foremost, I must thank my thesis advisor, Ajay Jain. I feel incredibly fortunate to have found his lab and mentorship. In a confusing time when I was wondering why I was in grad school, he gave me direction and focus. From the outset, our minds were aligned: we both wanted to devote our time to real-world problems. As an engineer with a Computer Science degree, I feel my interest peaks when I can perceive people making use of my work. In that sense, drug discovery and molecular docking were the perfect fit for me in this program. Having that fit between student and advisor has been a pleasure. Thank you, Ajay. My steady progression from orals to thesis defense was made possible by my thesis committee. Together with Ajay, Andrej Sali and Mark Segal offered excellent insights that helped me forge ahead with my graduate career. Thank you, Andrej and Mark for letting me graduate. iii BMI continues to grow and evolve through the hard work and effort of our directors, Patsy Babbitt & Tom Ferrin. Thanks, Tom. Patsy is so incredibly busy; she always seems on the go from one meeting to the next. Yet every time she sees me, she always stops for a second just to say hi. It’s the little things like that which make her a heartwarming figure to me and this program. Also, thank you Patsy for letting me squat in your lab in QB3 the last couple of years. I had more pictures up at my squattin’ desk in the Babbitt Lab than in my own office at the CRI. Granted, some of those were to fend off other potential squatters, but still. You always made me feel welcome. Thank you, Patsy. The program runs like a well-oiled machine because of our excellent program coordinators. When I first applied, Barbara Paschke was the face of BMI for me. She was the one who encouraged me to apply to the Ph.D. program. Thanks for the nudge, Barbara. Denise picked up the reins admirably after a year of chaos when Barbara left. Good work, Denise! Then Becca Brown took the job to new heights. So high, she was scooped up by the another program, Biophysics. I’m glad that I get to graduate under her watch. To the Appreciativest Valorwoman, you made my volleyball dreams come true. Our newest coordinator Julia will no doubt continue this legacy of excellence. Friends are what make the long days of graduate school enjoyable. There’s one group in particular that I have to give mad props to: Ben “Sleeveless Polo” Lauffer, Hesper “Sista Assista” Rego, Holly “The Recruiter” Atkinson, Kris “Thunder” Kuchenbecker, Laura “Kickeyball” Lavery, Liz “Smiley” Clarke, and Ranyee “Clutch” Chiang. Thank you, Thunderforce Volleyball for all the Championship Shorts. iv My lab mates were my brothers and sisters in science; they made it worthwhile to come into work. As postdocs, Jane Fridlyand and Taku Tokuyasu represented the finish line of where I wanted to be. My office mate, Chris Kingsley introduced me into the sordid world of professional cycling, among other interesting things on the interwebs. Lawrence Hon showed a novice on the tennis court what a sweet stroke looks like. His love of side gadget projects was also highly contagious. When Barbara Novak turned it down, I became Lab Manager by default. We hung out in our office and shared many a lunch. When we weren’t shooting the breeze, we encouraged each other to get a move on towards graduation. When James Langham joined the lab a few months before my graduation, I was finally no longer the baby in the lab – now he had the youngest tenure. James, you were a fantastic, fantastic resource for me while I was writing. Thanks to you, I now know more about the currency markets than I ever did. To Ann, Rebecca, and Hannah you made the Jain Lab feel like family. That’s the highest compliment I can give. Thank you, everyone in the Jain Lab. Shout outs to my fellow classmates! Mark Peterson, it’s a shame your body broke down in shambles because you’re the best tennis player alive (that I know). Mike Kim taught me everything I know about computers. Libusha Kelly, keep it down will ya? I can hear that happy laugh across our entire floor. Juanita Li, what a voice! John Chuang, we sure watched some terrible movies together. Simona Carini, you opened up your beautiful homes to us, but it’s the tiramisu I’ll always treasure. Christina Chaivorapol runs so much faster than I do it’s sorta not fair. Ranyee Chiang, you’re not that mean after all. All good people; you made grad school for me. Thanks guys. v I am who I am because of my family. Mom & Dad, thank you for letting me monopolize the dining room table all day and reminding me to eat every time I came over. Sometimes you just need to blow off steam and not think about work anymore – Lee was my outlet for tennis, videogames, and fun, period. Thanks, bro. Since I can remember, education has always been a priority. They can take away everything you have, but they can’t take away your education. That’s what my Bo taught me. Thank you, Bo. My mother taught me the importance of persevering through hardship to attain my goals. She also taught me to do things with compassion. Thank you, Me. You can’t ask for a better support network than this Phamily. Love you guys. And lastly to my Wifey – She is the bedrock upon which I build this life. This work would never have been completed without her encouragement. Through the highs and lows of the last five years and one quarter, she has stood by me. She’s celebrated with me, giving me the Thunder Punch. She’s even been to work with me. We have traveled the world together. We have traveled this road together. It is to her with love that I dedicate this dissertation. Thank you, An. vi Abstract Customizing Scoring Functions in Molecular Docking Tuan A. Pham In drug discovery, where a model of the protein structure is known, molecular docking is a well-established approach for predictive modeling. Docking algorithms utilize a search strategy for exploring ligand poses within an active site and a scoring function for evaluating the poses. This dissertation explores improvements to both aspects of docking, emphasizing the use of machine learning methods for improving scoring functions. The work is built upon an extensible software platform for modeling molecular interactions, called Surflex. Performance evaluation has been carried out on benchmarks that have been made publicly available, some of which were constructed in the course of this work. The novel tool pdbgrind, developed as part of the infrastructure for this work, was used to generate the large amount of data necessary to create adequate training and test sets. While the dissertation focuses most strongly on the scoring function problem in docking, some effort was also spent on the tightly coupled problem of search, and modest improvements were shown by enhancing Surflex’s representation of protein active sites. The bulk of the work describes improvements to empirical scoring functions for protein-ligand interactions. This dissertation demonstrates a robust method for tuning scoring function parameters to improve modeling of known binding phenomena. Penalties for inter-atomic overlap and same-charge repulsion were learned using vii synthetic negative data. The new function was shown to be equivalent or better than the original function in terms of screening utility on a large and diverse benchmark. This approach was generalized for the entire scoring function to support the use of multiple constraints in refining scoring function parameters. Using the constraint-based optimization procedure, users can exploit multiple types of data to customize functions to suit a particular task or a particular protein target or family of targets. Significant improvement to screening utility was shown using data typical of applications in docking. The main contributions of this dissertation are generalizable methods for generating and exploiting multiple types of data in refining scoring functions for docking. The approaches can be extended to other areas, including quantitative structure activity prediction or protein folding. viii Table of Contents Chapter 1 Introduction.................................................................................................. 2 1.1. Drug Development Cycle .............................................................................. 3 1.2. High-Throughput Screening.......................................................................... 5 1.3. Protein-Ligand Binding................................................................................. 7 1.4. Molecular Mechanics Force Fields.............................................................. 10 1.5. Machine Learning........................................................................................ 12 1.6. Conclusion ..................................................................................................

Load more