Combinatorial Approaches to Study Protein Stability: Design and Application of Cell-
Based Screens to Engineer Tumor Suppressor Proteins
DISSERTATION
Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of the Ohio State University
By
Brinda Ramasubramanian, M.Sc.
Graduate Program in Chemistry
The Ohio State University
2012
Dissertation Committee:
Thomas J Magliery, Advisor
Ross E Dalbey
Karin Musier Forsyth
Copyright by
Brinda Ramasubramanian
2012
Abstract
Tumor suppressor protein p53 is a transcription activation factor that is found mutated in
more that 50 percent of human cancers. Despite its pathological significance, there is no
robust, in vivo, bacterial screen to select for functional mutants of p53. We have developed a transcription interference screen for p53 core domain based on an artificial p53-responsive lac operon, controlling the expression of GFPuv in the host plasmid pGFPuv. The operator region of the lac promoter was replaced with a library of p53 binding elements based on the consensus sequence (RRRCWWGYYY)2. Wild type or
wt-like p53 binds to this site, blocking the polymerase leading to a non-fluorescent
phenotype. P53 Quad was expressed under the control of another promoter. The host
plasmid pACBAD-p53 can be co-maintained in the cell with pGFPuv. Known hotspot
mutants of p53, V143A, R175H, R249S and R273H were constructed by overlap PCR
and expressed from the pACBAD-p53 host plasmid. Our results show a marked decrease
in the fluorescence of pGFPuv when co-transformed with p53-Quad (wt like) and higher
fluorescence when co-transformed with the hotspot mutants of p53. We have successfully
designed a screen which discriminates between p53 which shows DNA binding activity
and variants of p53 that cannot bind or weakly bind to DNA. Our screen provides a
simple and quick method to screen for stable and functional variants of p53 and function
correlated to stability. The validity of the screen was further proved by biophysical
ii
characterizations of stable and functional variants resulting from the screen. Further, the
screen can potentially be used in combination with small molecule libraries, for
identifying lead compounds that may stabilize p53.
BRCA1 is another tumor suppressor protein for which the mutations are implicated in
familial cancers, and its function is governed by its ability to form complexes with its
various interacting partners. Its ability to interact with BARD1 through its RING finger
domain to form heterodmeric complex is particularly important for its function. In vivo analysis of a comprehensive list of cancer-associated mutations of BRCA1 using split
GFP complementation assay indicates that the interface between BRCA1 and BARD1 is
surprisingly robust to mutations. We further studied this interface in vitro, choosing a few
mutants of BRCA1, V11A, M18K, I21V, L52F and R71G in the absence of GFP fusion.
V11, M18 and I21 are at the interface while L52 and R71 are away from the interface.
Among these only M18K and V11A significantly affected the compex formation. Since
these proteins are soluble only as a complex we utilized this to analyze the interaction in
vitro. Co-expression of the various mutants of BRCA1 with wild type BARD1 using compatible plasmids with orthogonal tags showed that at constant expression levels of
BRCA1, the amount of the complex that can be purified from the soluble fraction
obtained is a function of the interaction between BRCA1 and BARD1. Using in vitro
analysis to complement the in vivo results, we were able to decisively prove that this
interface is indeed quite robust to mutations.
iii
Dedication
To my family, Shiva and Sahana
My parents, my brother and my sister
iv
Acknowlegements
I have come across many individuals who have encouraged and supported me and given me the resilience to work towards this degree. I will not have found myself here, at the verge of receiving my degree without their contributions. I would like to thank one and all for their efforts.
First and foremost I would like to thank my advisor Dr. Thomas Magliery for giving me an opportunity to work in his lab. He is an incredible teacher and I have been fortunate to learn from him. He achieves the perfect balance of helping when required but allowing independent thought at the same time. He has provided encouragement and support through my time here and has helped whenever I needed with my project. It would not have been possible for me to work on the challenging aspects of this project without his input and expertise.
I am highly appreciative if various Chemistry Personnel, especially Judy Brown and
Jennifer Hambach for the various timely reminders, absolute patience and incredible helpfulness. I am grateful to Dr. Sean Taylor for allowing me to work in his lab as a volunteer researcher and for encouraging me to pursue a graduate career. I also thank my various committee members Dr. Ross Dalbey, Dr. Karin Musier-Forsyth, Dr. Christopher
v
Jaroniec and Dr. Kimerly Powell for taking time from their busy schedules and offering
valuable suggestions.
As important as the work are the people one works with. I would like to thank my
colleagues at Hindustan Lever Research Center, Bangalore for supporting me on my
career ambitions. I would like to thank Matt Heberling, Ely Porter, Luke Smith, Grace
Cooper and David Bowles for the contributions that they have made to my project. Matt
joined the Magliery lab a few months after I did and we learnt together the various facets
of working with this project. Even before Ely joined the lab as a researcher, he impressed
me as a quiet and smart student. When he joined the lab I found that he is very outgoing,
funny and a very quick learner. He was able to make good progress in the project that he
worked on in addition to offering timely help for my late night experiments. It was equally fun to work with Luke. He was very sincere and started working on a somewhat
different and challenging aspect of the project. Grace Cooper who spent one summer
doing research with me was quiet and very productive and was a pleasure to work with. I am glad that David Bowles is continuing with what this project and I wish him all success.
I have been fortunate with a dynamic and intelligent group without whom my years here would have been less productive and less fun. I thank all Magliery lab members for their support and help. I thank Rachel Baldauff, Shila Sen, Chau Ngyuen, Danielle Williams,
Sarah Johnston, Nishanthi Paneerselvam, David Mata, Ted Schoenfeldt, Tran Ngyuen,
vi
Kimberly Stephany, George Matic and Nick Callahan for contributing to the friendly atmosphere of the lab, for the various insightful chats on science and life and for being entertaining, most of the times without even realizing it. I thank Christina Harsch and
Mohosin Sarkar for making me feel welcome during my initial days in the lab. Of the various people who helped me with my work, Dr. Jason Lavinder who was a graduate student at the time deserves special mention. Jason seemingly believed in “perfect” experimental set up and I was fortunate to have an opportunity to learn from him. He is also one of the happiest people I have met and his enthusiasm made the working in the lab and putting in long hours seem pleasant. I also have to thank Jason and Sanjay for my present, although meager, knowledge of American pop culture.
I am grateful for the many friends that I have made during my years here. I thank Dr.
Vivekanand Shete for helping me with various instruments and for words of encouragement at the times that I needed. Dr. Lihua Nie sets an example for hard work and I am fortunate to have worked beside her. I thank her for helping me out numerous times over weekends. I thank Srividya Murali for being her dignified self. I am grateful to
Brandon Sullivan for being an awesome co-worker and a great friend. During my first quarter in Magliery lab, Brandon asked me “How long is your rotation going to last?” My
‘rotation’ in the lab lasted for over six years and we built a friendship which I know will last for the years to come. Apart from his proactive helpfulness in the lab, his sunny personality often acts as a stress relief. It is hard for me to imagine Magliery lab without him and I am glad that we will receive our doctoral degrees around the same time. I am
vii
fortunate to have worked with Venuka Durani who has donned the roles of a great friend,
valuable coworker, and my personal counselor in a seamless fashion. We have shared the
jinxed ‘β-sheet bench space’ for over five years and our projects have had ups and downs
over the course of this time. I could always count on her to be there for the countless low
moments in my project and offer suggestions and reassurance. I thank her for the smiles, inside jokes, and invaluable memories which go beyond the confines of the lab.
“Home is the place you complain the most and are treated the best”. I am fortunate to have a loving family and I thank my parents for believing in me and supporting me on my various decisions. My advisor says PhD is a lifestyle where our lives are dictated by
growing cells and ongoing experiments and I have come to accept that. It can be challenging for someone who is not a graduate student to accept that lifestyle. I am lucky to have Shiva in my life, who has gracefully accepted this lifestyle and proved to be a tower of support. He has encouraged me to keep going and not falter in the face of difficulties. I would not have been able to complete this degree without his loving support. I appreciate Sahana, my daughter, for being pure joy and her ability to make me forget the world when I am with her. She is the best entertainment I could ever ask for and I am thankful that she is part of our life for now and ever.
viii
Vita
March 1994 ...... Nirmala High School, Aluva, India
1999...... B.S. Chemistry, Mahatma Gandhi
University
2001...... M.Sc. Applied Chemistry, NIT
Tiruchirapalli
2003...... Research Assistant, Hindustan Lever
Research Centre, Bangalore
2005 to present ...... Graduate Teaching Associate, Department
of Chemistry, The Ohio State University
Fields of Study
Major Field: Chemistry
Specialization: Biological Chemistry
ix
Table of Contents
Abstract ...... ii
Dedication ...... iv
Acknowlegements ...... v
Vita ...... ix
Table of Contents ...... x
List of Figures ...... xv
Chapter 1: Introduction ...... 1
1.1 Importance of protein stability ...... 1
1.2 Protein Engineering ...... 5
1.3 Screens and selections for folded proteins ...... 9
1.4.1 Survey of Chacracterization of the DNA Binding Domian of p53 ...... 19
1.4.2 Biophysical Characterization ...... 30
1.4.3 Characterization of the Response Elements of p53 Core Domain ...... 42
1.5 Rescue strategies for p53C and mutants ...... 46
1.6 Identification of functional mutants of p53 using genetic screens ...... 53
1.7 Thesis Synopsis ...... 58 x
Chapter 2: A cell based screen for the functional core domain variants of tumor suppressor protein p53 ...... 62
Contributions ...... 62
2.1 Summary ...... 62
2.2 Introduction ...... 63
2.3 A functional screen for p53 core domain ...... 68
2.3.1 Optimization of Growth Conditions ...... 74
2.4 Proof of principle using known hotspot mutant ...... 76
2.5 P53 responsive lac operon with robust transcription using a combinatorial library 79
2.6 Discussion ...... 84
2.7 Materials and Methods ...... 88
2.7.1 Construction of Reporter Plasmid – pGFPuv BDx ...... 88
2.7.2 Construction of Expression Plasmid – pACBADp53...... 91
2.7.3 Choosing the Reporter Plasmid form a Library of positives ...... 91
Chapter 3: Utilizing the cell baed screen to identify functional p53 mutations ...... 93
Contributions ...... 93
3.1 Summary ...... 93
3.2 Significance of studying libraries for core directed design ...... 95
3.3 Core randomized libraries of p53 DNA binding domain ...... 100
xi
3.3.1 Four-position Library ...... 100
3.3.2 AA and TI Sub-libraries ...... 104
3.4 In vitro characterization of library variants ...... 112
3.4.1 Stability Measurements using Urea Denaturation ...... 114
3.4.2 Thermal Melts Monitored Using Circular Dichroism ...... 119
3.4.3 DNA Binding Using Fluorescence Anisotropy ...... 120
3.5 Discussion ...... 125
3.6 Materials and Methods ...... 129
3.6.1 Construction of Libraries ...... 129
3.6.2 Protein Expression and Purification ...... 133
3.6.3 Chemical Denaturation Using Urea Monitored by Fluorescence ...... 134
3.6.4 Thermal Denaturation Monitored by Circular Dichroism ...... 134
3.6.5 DNA Binding Studies Using Fluorescence Anisotropy ...... 135
Chapter 4: Engineering the S7/S8 loop of the p53 core domain to improve stability .... 136
Contributions ...... 136
4.1 Summary ...... 136
4.2 Introduction ...... 137
4.3 Study of p53 in C. elegans as a potential method to stabilize human p53 ...... 142
4.4 Experimental studies to parallel the silico results ...... 145
xii
4.6 Future work ...... 150
4.7 Materials and Methods ...... 152
4.7.1 Cloning and Expression of Chimera Variants ...... 152
4.7.2 Screening for Positives ...... 152
4.7.3 Protein Expression and Purification ...... 153
4.7.4 Urea Mediated Chemical Denaturation Monitored by Fluorescence ...... 153
Chapter 5: In vivo and in vitro studies of cancer associated mutations in BRCA1 ...... 154
Contributions ...... 154
5.1 Summary ...... 154
5.2 Introduction ...... 155
5.3.1 Study of Cancer-associated Mutants of BRCA1 for their interaction with
BARD1 ...... 163
5.3.2 In vitro Analysis of Binding Interaction ...... 168
5.4 Future Directions ...... 171
5.5 Materials and Methods ...... 172
5.5.1 Construction of BRCA1 Mutants ...... 172
5.5.2 Screening for Positives ...... 174
5.5.3 Affinity purification of fusion protein and interaction partners ...... 175
5.5.4 Western Blots Using Anti-HA Antibody...... 175
xiii
5.5.5 Purification of BRCA1/BARD1 Complex ...... 176
Chapter 6: Extensions to the p53 bacterial screen ...... 178
6.1 Introduction ...... 178
6.2 Optimization of SICLOPPS Expression from pGDFuv-BDI ...... 183
6.3 Engineering the pEC vector to express SICLOPPS ...... 186
6.5 Materials and Methods ...... 195
6.5.1 Construction of SICLOPPS Library ...... 195
6.5.2 Re-engineering the pEC Vector ...... 196
6.5.3 Screening in BL21 (DE3) Cells ...... 196
Chapter 7: Materials and Methods ...... 197
7.1 Materials ...... 197
7.2 Methods ...... 198
7.2.1 Molecular Cloning ...... 198
7.2.2 Ligations and Transformations ...... 201
xiv
List of Figures
Figure 1 Domain organization and structure of p53 ...... 14
Figure 2 p53 pathway...... 16
Figure 3: Sturcture of the core domain of p53 ...... 18
Figure 4 Model for DNA binding of p53 as a tetramer ...... 20
Figure 5 Binding of full length p53 to its DNA...... 23
Figure 6 Conformational change between dimer to tetramer ...... 27
Figure 7 Properties of the species formed during denaturation ...... 31
Figure 8 Hotspot mutations on the core domain ...... 34
Figure 9: Regulatory network of p53 ...... 44
Figure 10: Rescue of p53 by binding to MDM2 ...... 47
Figure11: URA3 dependent screen for p53 in Yeast ...... 54
Figure 12 Red white screening in yeast ...... 56
Figure 13: A schematic representation of the screen ...... 69
Figure 14: Schematic representation of the screen ...... 71
Figure 15: Intial optimizations of the screen ...... 76
Figure 16: Mutant Properties ...... 78
Figure 17: Optimization of conditions for screening ...... 79
Figure 18: Selection of variants from the binding domain library ...... 80
xv
Figure 19: Screen with improved dynamic range...... 83
Figure 20: Sequence of Quad mutant...... 90
Figure 21: Results from AATI library ...... 103
Figure 22: Results from AA library: ...... 106
Figure 23: Positives from TI library ...... 109
Figure 24: Negatives from TI library ...... 111
Figure 25: Actives from T253 I255 library ...... 113
Figure 26:Wavelength scans for the different variants ...... 116
Figure 27: Characterization of Actives ...... 121
Figure 28 CD wavelength scans for TI library variants ...... 122
Figure 29 : Data from TI variants characterizations ...... 122
Figure 30: Binding studies of the TI library variants ...... 124
Figure 31: Proposed TIVA library for the p53 core ...... 128
Figure 32: Schematic representation of the library cloning scheme ...... 130
Figure 33: human p53C aligned with Cep1 ...... 141
Figure 34: A comparison of the core domain structures of human and worm p53 ...... 143
Figure 35: Screening results and designed variants ...... 144
Figure 36: Urea mediated denaturation of 4ΔL+EGG ...... 146
Figure 37:Analysis of the S7S8 loop ...... 147
Figure 38: Proposed mutations to study the S7S8 loop ...... 151
Figure 39: Structural network organization of BRCA1 and BARD1 ...... 162
Figure 40: Initial set of mutations studied by Sarkar and Magliery ......
xvi
Figure 41 Cancer associated mutations on BRCA1 ......
Figure 42: Cloning and screening of the cancer associated mutants of BRCA1 ...... 167
Figure 43: vectorlogy for in vitro expression of the BRCA1/BARD1 complex ...... 169
Figure 44: Screening and in vivo results for a selected set of variants ...... 170
Figure 45: Vector map with HA tag for western blots ...... 173
Figure 46 Principle of production of SICLOPPS ...... 183
Figure 47 Vector maps screening for active p53 in presence of SICLOPPS ...... 184
Figure 48 Optimized screening conditions in BL21 ...... 186
Figure 49 Comparison of DH10B and BL21 Cell lines ...... 187
Figure 50: pEC vector and copy number ...... 188
Figure 51 Proposed SICLOPPS in pEC vector ...... 190
Figure 52 A comparison of the Quad and the Hexa structures ...... 192
xvii
Chapter 1: Introduction
1.1 Importance of protein stability
Understanding protein folding and the basis of protein stability is of paramount
importance enabling us to design or redesign proteins for pharmacological and industrial
applications. Protein-based therapeutics including engineered antibodies are becoming an
effective way of treating many diseases including diabetes mellitus, rheumatoid arthritis
and thrombocytopenia.2 Successful design of such therapeutics relies on a detailed understanding of the structure of proteins and the forces behind it. The biochemical applications of proteins as analytical tools and enzymes are hugely improved with the stability of proteins, and hence a large effort is oriented towards improving the stability of proteins with pharmaceutical applications in order to tailor these proteins for the harsh conditions that industrial processing often requires. In addition, a higher stability also renders the protein more tolerant to point mutations as they do not fall below the
1
minimum threshold of stability. Therefore understanding the sequence structure
relationship of proteins and using the collective knowledge to engineer proteins of
improved stability and function has been an area of widespread interest.
Proteins fold into highly ordered structures forming α-helices or β-sheets to exclude
surrounding solvent. This hydrophobic collapse gives rise to modest net stabilization of
the protein. The large contributing factors (enthalpic gain due to van der Waals interactions, hydrogen bonds and other electrostatic interactions and salt bridges of the protein, entropic gain of the solvent, typically water, and the entropic loss due to the resultant ordered structure) lead to small differences between the large opposing forces leading to the marginal stabilization of most proteins. Most proteins are stabilized by only
5-15 kcal mol-1 at physiological temperatures.3 This marginal stabilization has profound
implications on their response to mutations. A large number of pathological mutations in
proteins render them non-functional simply due to reduced stability. The stability of a
protein determines its ability to resist proteolysis and carry out its biological function.
Protein stability is also known to play a significant role in evolution. Although evolution optimizes for function, a protein’s fitness and evolution does depend at least to some extent on its stability.4 It is therefore of immense value to be able to predict the effect of
various mutations on the structure and stability of proteins and this has been an area of
research for the past four decades.
2
Based on RNase refolding experiments, Anfinsen proposed that the native structure of a
protein, which is in equilibrium with its unfolded state, is the thermodynamic minimum
or the lowest energy state at a given environment, and that this structure is pre-
determined by the amino acid sequence of the protein.5 Since then scientists have devoted
themselves to decipher how the secondary and tertiary fold of proteins are determined by
its primary structure. Accurate modeling of the equilibrium between the native and the
unfolded states has been a challenge due the marginal stabilization of proteins. The
astronomically large conformational space available even for small proteins further
complicates their modeling. This is defined as the protein folding problem.6 If a protein
samples all the conformations available, the folding will take an an astronomically long
time. However, nature achieves the folding of proteins within micro-second timescales,7 and this was the observation that led to Levinthal’s paradox. To find a way around this, folding funnel models have been proposed.8 The protein folding funnel is a simplified
representation of the energy landscape of convergence of a protein from its disordered,
denatured state to its ordered native state. It literally represents that when the polypeptide
chain has higher conformational freedom, like in the denatured state, it is at a high energy
state, and as the energy decreases, the number of conformations available to the chain
also decrease. It also implies the high entropy of the denatured state and the compact low
entropy state of the folded protein.
Apart from the challenge in accurate energy prediction for a given structure, design
strategies to arrive at the most optimal sequence for a given structure are also
3
complicated by the observation that proteins with low sequence similarity do fold into similar folds and conversely high sequence similarity can lead to proteins of different folds. This makes it a difficult problem for modeling and computational analyses for the accurate prediction of stability.9 Therefore predicting the structure that a sequence adopts
or deriving the sequence from a particular structure are both challenging. An approach to
study this would be to test for sequences that can be tolerated in a particular backbone10 thus studying the inverse of the protein folding problem. In silico modeling achieves this by fitting a large number of sequences by energy minimization parameters to one structure for the design of a protein.11 The prediction of the tertiary structure from its
primary structure using computational modeling requires a potential energy function, the
global minimum of which coincides with the native state of the protein.12 Physics-based
methods, which consider a protein at its atomic level, include empirical components in
the potential function to enable reasonably effective conformational searching.13
Knowledge-based methods use experimentally derived data to build the empirical component and are often found to be more successful.14 15; 16 The accuracy of
computational structure predictions is vastly improved if they are trained on a larger
database of experimental sequences and structures. Mutations introduced to the protein of
interest followed by structural or functional analysis allow us to decipher the effect of
such mutations to the structure and function of the protein and also generate larger dataset
for “training” and testing. Combinatorial methods to generate large libraries of variants of
the protein and analysis using screens or selections improve the throughput of such
experiments. In this inverse approach the tolerance of a particular structure to changes in
4
sequence is analyzed to arrive at the rules of protein folding.17 On the road to
understanding proteins a variety of techniques including high resolution X-ray crystal
structures, energy refinement, recombinant DNA technology, molecular dynamics simulations, site directed mutagenesis, protein NMR, etc., have been developed and have
led to our collective knowledge of protein folding.18
1.2 Protein Engineering
Understanding the molecular basis of various functions of proteins will enable us to tailor
improved functionality to the protein or, in other words, evolve the protein to possess
specific function. Rational design and combinatorial approaches can be used to decipher
the fold, function and stability of proteins. Protein engineering approaches have been
successful in improving the thermal stability of biocatalysts to allow the use of these at
high temperatures for industrial applications.19 Also recombinant DNA technologies
combined with rational and combinatorial approaches have led to the generation of high affinity antibodies with industrial and pharmaceutical relevance. Protein engineering
approaches have also been a powerful tool in the discovery of small molecule drugs that
mediate or disrupt specific protein-protein interactions.20 Computational design of
proteins using potential energy minimizations is gaining success, especially for small
proteins. Dahiyat and Mayo redesigned a Zn-binding sequence using an algorithm that
uses a backbone template to contain no metal binding sites.21 The redesigned protein has
only 40% sequence similarity with the parent template and is well folded and soluble.
The design of a 93 residue αβ motif using an algorithm that allowed simultaneous
5
optimization of sequence and structure led to the generation of a novel motif. This study
demonstrated that such novel architectures can be explored using computational
approaches.22 De novo design strategies have evolved to design novel folds, function and
to improve the stability of proteins.23 The advances in the various fields of design of
proteins have allowed us to improve the stability of proteins and impart function but, we
are still unable to accurately predict the structure of a sequence or the energetic and
functional consequences of mutations to a particular protein.
One of the approaches that can be taken to answer this inverse protein folding problem is
to apply combinatorial methods to study large libraries of proteins and to use screens and
selections to interrogate the effect of various mutations on the function and stability of a
protein. The advances in DNA synthesis allow the synthesis of degenerate codons in
oligonucleotides which can be utilized to generate libraries of genes using PCR based
techiniques. Functional screens or selections that link the phenotype to the genotype can
be utilized to infer the stability of the protein that resulted from the library, and
sequencing allow the identification of the variants. This approach has been successfully
used in our lab to design core and loop libraries of the four helix bundle protein ROP to
elucidate the determinants of stability of this protein.24; 25 Using a cell-based screen for the function of ROP and high-throughput methods to determine the stability of the variants, the role of various amino acids in positions of interest has been studied and an overall picture of the stability determinants of this protein is emerging. The stability and fold of tumor suppressor proteins like p53 and BRCA1 is another area of immense
6
interest. Many cancer-associated mutations of these proteins are known to be
destabilizing, and they render the protein non-functional. Therefore improving the
stability of these proteins is an area of interest for applications to targeted cancer therapy.
It is beneficial to accumulate data that provide information about the role of various amino acids in the folding and stability of these proteins. One of the key requirements to assimilate such large datasets is a method to sort large number of variants in a high- throughput format and report the functionality of the protein. Inferential methods such as genetic screens and selections which assume that the function of a protein is a consequence of it being well-folded serve as a powerful tool for this purpose. For example, development of a cell-based screen for the function of ROP has been an important tool to segregate functional variants from non functional ones and the residues important for function are left intact in the various libraries, a functional protein implies foldedness. The functional variants obtained following the screening were analyzed using high throughput analyses to arrive at the rules of folding of this four helix bundle protein.
Our present understanding of protein structure and stability has been derived from site
directed mutagenesis experiments on proteins like T4 lysozyme, λ repressor, B1 domain
of protein G, staphylococcus nuclease, barnase and ROP.26 The majority of these are
alpha helical proteins and most of our knowledge in β-sheet proteins has originated from
the studies on the GB1 protein. Alpha helices are inherently more robust than β-sheets because of the effective exclusion of the solvent water by the protein backbone in these
7
structures. In comparison the entropic gain for the surrounding water for β-sheet
structures is limited due to the fact that the side chains are not as shielded in the β-sheet
structures as in alpha helices.27 This presumably makes the alpha helical system more
amenable to mutations and engineering studies, and this has lead to a more well-defined
understanding of the sequence-structure relationship of the helical structures. The
groundbreaking studies of Chou and Fasman lead to the prediction of β-sheet propensities
of various amino acids based on statistical analyses.28 Following this various studies were conducted, mainly of GB1, to study the role of amino acids at various positions.27; 29; 30; 31
The initial studies were limited by the lack of technology available at the time to synthesize and analyze large number of variants. But these initial studies lead to the
conclusion that the significance and propensity of various amino acids in β-sheets is
highly context dependent, which is to say that their role is determined to a large extent by
the nature of surrounding residues. Following this, significant progress has been made in
studying this protein, mainly using phage display techniques. Various studies indicate
that the structural context of each of the residues is a significant factor in determining the
role of the mutations at that position. In other words general rules of the ‘jigsaw puzzle’
or the ‘oil drop’ models of the protein core that can be used to explain the fold and
stability of helical proteins cannot be directly used to explain the properties of β-sheet
structures. The integrity of β-sheet structures is governed by both short range interactions
between residues and long range forces. Setting up a platform to study β-sheet proteins in
high throughput format to learn the sequence-structure-stability relationship is therefore a
valuable to further our knowledge of the determinants of packing and stability.
8
We wanted to implement the analysis strategy on a physiologically relevant β-sheet
protein with the goal to gain insight into the sequence structure stability relationship of a
β-sheet protein and at the same time address pharmaceutically relevant questions. We chose to work on tumor suppressor protein p53, the mutants of which are implicated in
more that 50% of human cancers.32 It is a sequence specific transcription activation factor
and therefore allows us to develop genetic screening systems in which the phenotype can
be conveniently liked to the genotype enabling the analyses of interesting variants.
1.3 Screens and selections for folded proteins
Genotypic screens serve as a key tool in the development of high-throughput platforms to
study various mutants of a protein. The basis of any screen or selection is to link the observed phenotype to its genotype in order to decipher the sequences of proteins with ease from their DNA sequences. The various screens and the principle behind them are reviewed by Magliery and Regan.26 Historically, selections were applied only to those
proteins that were known to be important for cell survival. For example, selections for
tryptophan synthase, lambda repressor and lac repressor. A folded and functional
tryptophan synthase is required for the growth of cells in media that lacks tryptophan.
Therefore this was used to select for variants of tryptophan synthase that were functional.
Lambda repressor blocks superinfection by lytic phage and again is essential for the survival of the bacteria. On the other hand the lac repressor dictates the survival of cells in lactose minimal media. The ‘blue white’ screen is one of the earliest and the well
9
known applications of genetic screening. This is based on lac repressor which can block
the hydrolysis of the chromogenic galactoside, bromo-chloro-indolyl-galactopyranoside,
(abbreviated as BCIP, popularly known as X-gal) resulting in white colonies whereas a
non functional lac repressor would be indicated by blue colonies. This represents one of
the systems that can be used both as screening system and a selection. Screening systems
which rely on functions not linked to the survival of the cells allow the analysis of both
functional and non functional variants of the protein of interest enabling us to interrogate
the rules of folding and function of that protein in greater depth. This is a distinct
advantage of screening systems over selections which rely on functions of the protein that
determine the ability of the cell to survive and do not provide direct data about the non-
fucntional variants. The key challenge for the design of screening systems is to link the
phenotype to the genotype so that analysis of interesting variants is facilitated. The
advances in sequencing of DNA have made it easy and economical to sequence a large
number of variants allowing the study of large libraries of proteins that pass the screen giving the right phenotype.
Following the studies on proteins that have a genetic selection, various methods that rely of the binding function of protein has been developed to screen for function. The basic premise of such studies is that a protein needs to be folded into the correct conformation and has to be stabilized beyond a minimal threshold for it to be functional. In other words, these inferential methods assume that a functional protein is required to be folded and native-like. Screens using mRNA display, ribosome display, phage display, yeast-2-
10
hybrid systems and bacteria have been developed based on this principle. mRNA display
relies on the fusion of a puromycin tagged DNA with the corresponding RNA fusion.
During translation, when the ribosome pauses at the RNA-DNA junction, puromycin gets covalently linked to the nascent peptide. Reverse transcription followed by DNA sequencing effectively results in gaining the information of sequence structure relationship.33 Although large libraries can be screened using this technique, it is
complicated by the requirement of in vitro compartmentalization method that involves
making oil droplets. The size of the droplets needs to be controlled with great precision to
ensure that only one variant is enclosed in one droplet. On the other hand, phage display
utilizes the display of the protein of interest in the phage coat protein and overcomes the
compartmentalization issues with the mRNA display. Screening is typically based on the
binding of the displayed protein to target of interest and the stringency of selection can be
increased by panning multiple rounds.
Phage display is also one of the few methods that have been adapted to screen purely for
stability or fitness of the protein of interest. The proteins can screened for their protease
resistance, thus moving away from being an inferential screening method to become a
direct measurement of protein fitness.34 Another method for direct measurement of
fitness of a protein was developed in yeast. Hagihara and Kim have utilized the quality
control mechanisms inherent to yeast to select for soluble peptides formed from random
sequences generated by combinatorial library methods. It has been shown that the
secretion efficiency correlates with the stability of protein and this is another property
11
that has been used as a screening method.35 One of the disadvantages of this method is that some misfolded proteins were selected and required a secondary characterization
method to validate the technique. Yeast surface display which relies on the fusion of the
protein of interest with a cell wall mating protein on the surface of yeast has been
employed to display folded and stable mammalian peptides. The displayed protein can
then be selected based on various binding assays similar to the techniques used in phage
display. The use of a eukaryotic system facilitates post translational modification and
allows the mammalian proteins to achieve a near native conformation.36; 37; 38; 39Yeast n-
hybrid systems are well established to decipher protein protein interactions.40 A related
method is the bacterial two hybrid system which offers the advantages of being faster and
not requiring nuclear localization.41; 42
Bacterial systems are especially amenable to combinatorial approaches since large
libraries (~109) can be generated in bacteria. These systems have been adapted to study
protein-DNA and protein-protein interactions. In a combinatorial context, libraries of
mutants can be studied and residue specific information on the determinants of the
interaction surface can be delineated. In addition, screens that simply monitor the
solubility of the protein of interest have also been developed.43 A reporter protein such as
GFP is fused to the C-terminus of the protein of interest. The folding of the protein of interest leads to the folding of this C-terminal fusion protein and its fluorescence in turn reports the folding of the protein of interest. A selection method based on the solubility of the protein is established when the C-terminal reporter fusion is required for the survival
12 of the cells. This idea has been implemented in selections based on fusion to antibiotic resistance genes. Maxwell et al. selected for soluble variants of HIV integrase based on the observation that resistance to chloramphenicol is a function of the solubility of the
HIV intergrase when expressed as an N-terminal fusion to the chloramphenicol acetyl transferase gene.44 Mansell et al. have recently designed a rapid folding assay for proteins expressed in the bacterial periplasm. They express the protein as a sandwich between an
N-terminal export system and a C-terminal selectable marker, TEM1 β-lactamase. They demonstrate that the folding efficiency of various target proteins correlates directly with in vivo β-lactamase activity and thus dictates survival.45
1.4 Tumor Supressor protein p53
Ever since its discovery as a 54 kDa (p53) cellular SV40 tumor antigen, extensive research has been conducted to provide us with the information we now have about p53.46 The protein p53 as we know today is a multidomain protein consisting of an N- terminal transactivation domain, a sequence specific DNA binding domain and a C- terminal domain which in turn is composed of a tetramerization domain and a terminal regulatory domain. A proline rich region is sandwiched between the N-terminal and DNA binding core domains (Figure 1). The function of p53 is to regulate cell differentiation under conditions of cellular stress mainly by transcriptional methods in addition to non transcriptional modes.47 It is a tumor suppressor protein and is found to be mutated in more than 50% of human cancers. Tremendous amount of work which has resulted in. more than 50,000 publications has changed the view from p53 being a tumor antigen to
13
Figure 1 Domain organization and structure of p53 a) Domain organization of p53 including N-terminal transactivation domain, DNA binding core domain, C-terminal tetramerization domain. The vertical lines represent the occurrence of hotspot mutations b) the structure of DNA binding domain of p53 (pdb id 1TSR).48
14 being called the “guardian of the genome.” Despite the large amount of research being done on p53, we are still far away from completely understanding this protein structurally or functionally. The folded and intrinsically disordered domains function in a concerted fashion allowing the widespread yet specific DNA binding properties of p53.
p53 was initially thought to be an oncogene linked to the viral transformation process since various studies showed that the functions of p53 were closely involved with the viral replication and tumorigenesis by the small DNA tumor viruses. The decade of the
1980s saw some research towards elucidating the cellular function of p53 which led to it being thought of as an oncogene.49; 50; 51; 52 Continued research showed that the variants of p53 in tumor cells are mutated and these led to its accumulation in tumor cells and these observations helped the classification of p53 as a tumor suppressor protein.53; 54 The research that followed in the decade of the 1990s further established that it plays a central role in preventing cancer in humans and animals. Malkin et al. showed that inheriting a mutant allele of this gene leads to cancer with 100% penetrance, multiple groups showed that knockout mice developed cancer at a very young age when they were complemented with loss-of-function mutants of p53,55; 56 and up to 50% of human cancers contain mutations which is defined as a predisposition towards different kinds of cancer, the onset of which may occur at a very young age.57
The decade of the 1990s also saw substantial research being done to elucidate the structural properties of this protein. Studies to decipher the cellular function of p53 show
15
Figure 2 p53 pathway p53 is activated as a result of cellular stress leading to changes in the chromatin structure. Activated p53 accumulates to initiate a cascade of downstream effects ultimately leading to cell cycle arrest, apoptosis, or senescence. Image from Sengupta and Harris (2005).58
16 that it is at the hub of a variety of signaling pathways that control the cell cycle and in both the alleles of p53 accompanied with aberrantly high levels of protein p53 in tumor cells.59 Germline mutations in the p53 gene are an indicator for Li-Fraumini syndrome maintain the integrity of the genome (Figure 2). In response to cellular stress that includes DNA damage and dis-regulated growth, the p53 pathway is activated which eventually leads to cell cycle arrest, apoptosis or senescence. This multifaceted role of p53 is mirrored in its complex and intricate structural biology. Understanding the individual components of the p53 structure, such as DNA binding domain and tetramerization domain, has laid the framework for understanding the effects of common cancer mutations. Prives et al. have shown that the manipulation of mutant p53 to bind
DNA can have implications for possible therapeutic applications.60 The DNA binding of wild type p53 is a complex function of its affinity to the DNA and other interacting proteins in addition to the DNA binding properties of competing proteins, and is regulated by phosphorylation, acetylation and other lysine modifications.61 The presence of the tetramerization domain which shows non-specific DNA binding and an unstructured C-terminal regulatory domain adds to the complexity of the picture. This structural intricacy has also proven to be a challenging target for elucidation of the structural basis of the function of p53. As a result many aspects of p53 function still remain elusive.
17
Figure 3: Sturcture of the core domain of p53 The structure of p53 core domain bound to DNA solved by Cho et al. PDB ID 1tsr is shown.62 Two of the monomers shown in cyan and blue make extensive contacts to the DNA while the third shown in warm pink does not interact with the DNA but the protein protein contacts stabilize the complex. The figure is rendered using Pymol.
18
1.4.1 Survey of Chacracterization of the DNA Binding Domian of p53
1.4.1.1 Structural Characterizations
The first step towards elucidating the structural biology of p53 came when the crystal
structure of the DNA bound core domain was reported by the Pavletich group.62 They reported that the core domain consists of residues 102 to 292 and forms a β-sandwich that serves as a scaffold for two large loops and a loop-sheet-helix motif (Figure 3). The two large loops are held together by a tetrahedrally coordinated Zn2+ ion. The sheets in the β-
sandwich are packed face-to-face forming a Greek key topology. Most of the mutations
that inactivate p53 are found to be in the four conserved regions within the core domain.
This first crystal structure showed three monomers of the core domain bound to the
consensus DNA binding sequence, one of which bound at its central region and made
extensive contacts to the bases and the phosphate backbone. The various cancer
associated mutations of p53 were mapped on to the DNA binding domain of this protein.
The solution NMR structure of the core domain, which was solved much later, provides a
picture with only subtle changes from the crystal structure.63 The overall structure although similar was found to be far more mobile than expected from the crystal structure. This mobility was mainly contributed by changes in the loop 1 conformations and the presence of unsatisfied hydrogen bond donors in the hydrophobic core of the protein.
19
Figure 4 Model for DNA binding of p53 as a tetramer Proposed model show the monomers making extensive PPIs as well as Protein DNA interactions to bind to DNA a tetramer. a and b represent the closed and open configurations of the tetramerization domain respectively. c and d shows the cartoon model based in the individual crystal structures of the core and the tetramerization domains. d is a 90° rotation view of c64
20
Following this landmark crystal structure which paved way to a variety of biophysical
studies on this protein, a number of crystal structures have been published. These
conspire to give us an overall picture of how this protein binds to DNA and the
implications of this binding to its function. P53 exists as a dimer in the unbound form
and its consensus DNA binding sequence provides a scaffold for the protein to
tetramerize as dimer of dimers. A variety of interactions occur in this complex. The
protein-DNA interface is the most conserved and various crystal structures solved
confirm this observation. The protein DNA binding is mediated by the sequences that
comprise the loop-sheet-helix motif and the large loop. The crystal structure of the core
domain from mouse in the absence of DNA reveals that the loop L1 and the C-terminal
end of helix H2 undergo significant structural changes upon binding to DNA.65 The helix
H2 and the loop L1 from the loop-sheet-helix motif packs against the major groove of the
DNA while an arginine residue from the large loop L3 packs against the minor groove of
the DNA. Cho et al.62 and later Kitayner et al64 proposed a model for the binding of p53
as a tetramer to DNA (Figure 4). Their proposed model was essentially confirmed by the
crystal structures that were solved subsequently. The present picture proposes that the
dimer-dimer interface contacts and protein-protein interactions within the dimer, in
addition to the dimer-DNA interaction surface contribute to the highly cooperative binding of p53 to DNA. Mutations that lead to the loss of some of the core domain-core domain interactions can still lead to p53-dependent transactivation if the tetramerization domain remains intact64 as the tetramerization domain contributes to the protein-DNA complex formation.
21
The dimer contacts involve the H1-S5 loop of one subunit and the S4-H1 (L2) and S6-S7
loops of the other dimer subunit. Zhao et al. have solved the structure of mouse p53 core domain, which is highly homologous to the human p53 with an overall sequence identity of 89%, in the absence of DNA at 2.7 Å resolution.65 Comparison of this free form of the
core domain with the DNA bound form62 indicates that the core domain binds to DNA
via extensive reconfiguration of Loop 1. Also, the physiologically relevant low affinity
dimer of the core domain is in a configuration that is incompatible with simultaneous
binding of both subunits to duplex DNA. The NMR structure of the 58 kDa dimer,
complexed with DNA, shows that the core domain undergoes substantial conformational
changes on binding to DNA.66 Furthermore, the various chemical shift analyses indicate
that the helix 1 and the neighboring G244 regions may form a possible dimerization
interface. In order to obtain the full picture of how the core domain binds as a tetaramer
to the DNA, various groups have employed chemical crosslinking between Cys277 and
Cyt18 modified to a cystamine to trap the protein DNA complex using a disulfide
linkage64; 67; 68 to study the molecular basis of the binding of p53 dimer and later the
tetramer to DNA. Recently, Chen et al. have crystallized the self assembled tetramer bound to the full consensus site and their findings explain the significance of the zero base pair separation observed between the half sites and provide further insight into the high cooperativity and kinetic stability of the p53-DNA complex.69 They propose that
p53 core domain forms a planar tetramer complex to bind to the minor grove face of the
DNA. The geometry and symmetry of this tetramer is exquisitely molded to match that of
the major grove of the DNA. The DNA largely remains unchanged upon binding to the
22
Figure 5 Binding of full length p53 to its DNA. The crystal structure of full-length p53 (PDB ID 3KMD).69 a and b shows the tetramer bound to consensus DNA. b shows the envelope form of the protein. c and d show a 90° rotation of a and b respectively. The figures are rendered using Pymol
23
protein but undergoes significant sliding at the central base pairs between the two half
sites. The tetramer forms a trough-like structure to bind to the DNA employing the L2
loop region and the S7S8 turn region for the individual dimers to come together (Figure
5). The loop 1 region is predominantly used in the binding to DNA. The significance of
the S7S8 turn is reflected in the unstable variants that resulted from the mutations of this
region as discussed in Chapter 4. Malecka et al. suggest that the variability observed
between the different structures for the dimer-dimer contact interface may be a function
of the specific sequence of the DNA used.67 As suggested by previous studies, although
a dimer can bind to the DNA, the complex is stabilized by tetramerization of the core
domain and this is reflected in the improved binding and half life of the complex.
Therefore the DNA-induced interactions that lead to the tetramerization of the core
domain play a significant role in the cooperativity of binding of the core domain. The
tetramerization is further assisted by the dedicated tetramerization domain of p53 which is connected to the core domain by a 30 residue linker region which is highly sensitive to
proteolytic digestion. Overall, when the core domain binds to the DNA as a tetramer, the
N-terminal regions face against each other with the DNA lacing through them and the C-
termini point toward one face of the complex, parallel to the axis of the DNA. This
facilitates the positive regulation of the C-terminal teramerization domain (CTD) and
provides additional stabilization to the sequence specific DNA-core complex by non
specific electrostatic interactions between the positively charged CTD and the DNA
backbone.
24
Tidow et al. have employed a multitechnique approach to solve and model the structure of full length p53. An array of techniques including small angle X-ray scattering (SAXS), electron microscopy (EM) and NMR spectroscopy was used in combination with the various crystal structures solved to determine the structure of the tetrameric human full- length p53.70 The structure solved in this manner agrees well with the model proposed by Kitayner et al. In this study the authors used the optimized spacial arrangement obtained from the SAXS and EM data to fit to the available high resolution crystal structure. They found that there is some difference between the structure of the unligated form and the DNA bound form. The authors propose that the loosely tethered dimers in the unligated form can facilely undergo conformational changes that are required for the DNA binding. The unligated p53 predominantly exists in an open conformation of two separate pairs of core domains. One of the dimers binds to the DNA first and the flexible linkers between the core and the tetramerization domains allow the second pair of the core domains to bind to the remaining sites on the DNA, thus burying it within the protein. The model for the existence of loosely tethered dimers in solution gains credibility from various previous studies. The Hill coefficient for the binding of p53 core domain to DNA was found to be 1.8. Since at physiological conditions p53 exists as a dimer, and binds to DNA as a dimer of dimers, this indicates the formation of a highly cooperative complex containing two molecules.71 This was further confirmed when
Veprintsev et al. solved the NMR structure for the full length tetramer.72 NMR serves as a complementary technique to explore the solution effects of the protein that is not constrained by the crystal packing effects. The size of the full length tetramer confirmed
25
to be ~170 kDa using analytical ultracentrifugation experiments is well beyond the
typically prescribed size limit for conventional NMR spectroscopy. In this seminal work
the authors collected high resolution NMR data using 15N -1H HSQC and 15N -1H
TROSY methods for the tetrameric complex and compared them with the spectral data available for the smaller domains. Their results indicated the presence of a self- complementary core domain interaction surface. They confirmed this using mutational analysis of the various charged residues on the putative surface. They propose that the p53 monomers exist as a head-to-head dimer and need to undergo a 70° rotation upon binding to the DNA (Figure 6) in order to facilitate the various interactions with the DNA which agrees with their previous observation that the DNA-dimer complex in solution is incompatible to bind the DNA as a tetramer. The biological implication of this dimer of dimers was further analyzed by Natan et al. and their studies show that the association of the dimers to form tetramers is ultraslow in the absence of DNA.73 In addtition, hetero- oligomerization in presence of mutants was found to be caused only from homodimers.
This indicates that p53 is targeted for degradation in a timescale that is much faster than
the rate of tetramerization. The tight control on the rates of oligomerization may serve as
an additional control mechanism to regulate the levels of p53 under normal cell
conditions. Under cellular stress, the upregulation of p53 allows for p53 to ‘find’ its DNA
in a more facile manner leading to the formation of highly stable protein DNA
complexes.
Apart from providing significant insight into the intricate molecular mechanisms
26
Figure 6 Conformational change between dimer to tetramer The dimer (top) in solution undergoes a conformational change, with one of the monomers rotating substantially to enable DNA-protein contact.72
27
employed by p53 in its function as the guardian of the genome, X-ray cryatallography has
also facilitated the design of small molecule drugs and our understanding of the role of
various cancer associated mutations which are largely clustered in the core domain.
Kussie et al. solved the crystal structure of MDM2 bound to a p53 peptide which is part
of the N-terminal transactivation domain.74 MDM2 is an oncoprotein identified as an
amplified gene product in a transformed mouse cell line. MDM2 leads to negative
regulation of p53 and therefore the MDM2-p53 interaction has served as a target for the
anti-cancer therapies.75 MDM2 negatively regulates p53 via monoubiquitination of p53
and the observation that disrupting this interaction is beneficial has led to generation of
small molecule drugs against cancer. It was found that an 11 amino acid region which is
part of the 12 kD conserved amino terminal domain of p53 containing the sequences responsible for transactivation by p53 is sufficient to bind to MDM2. This p53 peptide forms an amphipathic helix that binds to a cleft in MDM2 which is formed by two helices and a β-sheet and is lined with hydrophobic residues including aromatic amino acids.
The amphipathic helix is loosely folded in the absence of MDM2 and uses the hydrophobic residues to interact with the MDM2 binding pocket. Many transcription factors are composed of transactivation regions that are amphipathic helices. This structure shows that the buried volume is suitable for small molecule inhibitors to prevent the interaction of the oncogenic MDM2 with the tumor suppressor protein p53. The
Nutlins, for example, are an important class of p53 activators that function by disrupting the MDM2-p53 interaction, thus upregulating p53 and leading to the normal consequences of p53 accumulation in cell, namely apoptisis and cell cycle arrest. Some
28
of the other small molecules that rescue p53 function by binding to MDM2 are discussed
later in this chapter.
The structures of various cancer associated mutants of p53 have added to our knowledge
of the biological role of these mutations and have helped the design of specific second
site mutations that nullify the effect of the cancer associated mutation. Joerger et al. have
crystallized five different mutants of p53 providing a mechanism of binding and second
site suppression by additional mutations.76 They reported that R273H annihilates DNA binding exclusively by losing the contact by R273 to DNA backbone. This mutation leaves the global structure of the core domain undisturbed and is reflected in the minimal destabilization caused by this mutation. Therefore this mutation is dubbed as the ‘contact’ mutation. On the other hand, R249S which is another cancer associated mutation leads to substantial changes in the DNA binding surface specifically to loop L3. This structural disruption leads to a non-native conformation of the loop L3 by displacing the M243 residue that packs against the DNA. In addition, this mutation causes partial unfolding of the core domain leading to a more flexible structure. The structure with the known rescue mutant H168R mimics the R249 DNA contact and renders the protein functional with respect to DNA binding and transactivation. The above studies were further confirmed by Suad et al.77
29
1.4.2 Biophysical Characterization
Although p53 was discovered as a tumor suppressor protein about 30 years ago,
significant biophysical characterization was made much later, starting only in the late
1990s. The characterization provides an overall picture of the structural basis of
destabilization and the implications of stability on the function of the protein. P53 is a modular protein with five domains, most of the stability of which is determined by the
DNA binding core domain. The C-terminal domain imparts stability to the complex of p53 with the DNA and has been implicated in the regulation of function. The N-terminal domain of this protein is categorized among natively unfolded proteins and plays a role in
the transactivation function of the protein. This is followed by the proline rich domain.
P53 is found to be mutated in a variety of human cancers with most of the mutations being located in the core domain. Understanding the structural basis of instability of the protein and identifying the binding sites for its various partners has led to the targeted design of peptidic and non-peptidic drugs to rescue the function of this protein.
1.4.2.1 Understanding the structure and function relationship of the core domain
The first attempt for biophysically characterizing this protein was done by the Fersht lab.78 This lab has done substantial work on understanding the biophysical nature of this
protein, leading to advances in drug discovery and therapeutics. Bullock et al. have set up
a robust system to quantitatively measure the structure-activity relationship of the p53
core domain. Using Differential scanning calorimetry (DSC) and urea denaturation, they
have characterized this protein as being marginally stabilized at room temperature. DSC
30
Figure 7 Properties of the species formed during denaturation a) spectral properties of the various species formed during the urea mediated denaturation of p53 dore domain. b) equilibrium showing the various species formed during the denaturation of p53 core domain.78
experiments showed that p53 core domain does not unfold reversibly with temperature,
and qualitatively, the stability of the protein increases on binding to the consensus DNA.
Following this the authors have set up a system to measure the reversible unfolding of
p53 core domain by urea denaturation monitored by following the fluorescence of the
tryptophan residue in the core domain and estimating the free energy of unfolding. They
have defined the spectral properties of the native state, the denatured state and an
aggregated species that is formed under certain denaturing conditions (Figure 7). The aggregation was confirmed by gel filtration chromatography. Since dithiothreitol (DTT) was essential for reversible chemical denaturation, they have also characterized the fate of the Zn2+ in these experiments. Direct measurement using spectrophotometric assays
31
showed that Zn2+ is bound in 1:1 ratio to the core domain in the native state and remains
bound even in the presence of 5 M urea. Therefore the equilibrium that we measure using
urea mediated denaturation is between holo-native and Zn2+-bound denatured state
(Figure 7). All literature that has followed uses this method to purify and characterize the stability of this protein.
The free energy of unfolding of wt-p53 core domain is reported to be 8.6 kcal mol-1, confirming that p53 is only marginally stabilized at physiological temperatures. In addition, most of the tumor-derived mutations destablize the protein. Mutations also lead to a loss of the delicate balance between p53 and its negative regulators resulting in accumulation of non-functional p53. Gene therapy approaches that aim at delivering functional p53 to tumor cells to re-establish the p53 dependent pathways are gaining success. This triggered the design of a stable functional variant of p53. Nikolova et al. have employed a molecular evolution strategy to design a superstable variant of p53 core domain with wt-like DNA binding properties.79; 80 They mutated 20 different positions of
different solvent accessibilities to the consensus residues obtained from the comparison
of 22 different homologous p53 proteins. Four of the stabilizing mutations M133L
V203A N239Y N268D were also confirmed to provide an additive effect generating the
highly stablized Quadruple (Quad) mutant. The main contribution to stability increase
came from the N239Y and N268D substitutions which are also known to act as second-
site suppressors for various cancer-associated mutations.81 The crystal structure of this
mutant (PDB 1UOL) revealed that it folds into a native structure, and binding studies
32
using surface plasmon resonance (SPR) showed that the Quad mutant binds to DNA with
wt-like affinity. The N268D mutation leads to favorable hydrogen bonding between the
S7 and S8 strands and the N239Y leads to the rigidification of loop L3. These factors
explain the improved thermal stability of the the Quad mutant and provide insight into the
mechanism of action of the two stabilizing rescue mutations. Also this study clearly
reveals that stabilizing the protein leads to rescue of function and this approach has been
pursued for targeted drug design ventures. The Quad mutant itself serves as a potential
candidate for preliminary trials for gene therapy. In addition, this stabilized mutant of p53
has served as model system for various structural and biophysical studies and has
contributed immensely to our understanding of the structure-function relationship of this
protein.
The inherent plasticity of the hydrophobic core of this protein was analyzed using NMR
studies of the core domain. The authors found several buried polar groups which
explained the structural reasons for the instability. NMR spectroscopy, with its ability to
detect protons, located buried hydroxyl and sulfhydryl groups that form suboptimal
hydrogen-bond networks, one of which the authors pursued further. Tyr-236 and Thr-253
which are located in the hydrophobic core and away from the DNA binding motifs were
mutated to Phe-236 and Ile-253.63 These residues were chosen based on the structural
alignment with p63 and p73, the stable paralogs of p53. These mutations stabilized p53 by 1.6 kcal mol-1. NMR analyses of the mutant showed differences in the conformation of
a mobile loop that might reflect the existence of physiologically relevant alternative
33
Figure 8 Hotspot mutations on the core domain The six known hot spot mutations (R175H, G245S, R248Q, R249S, R273H and R282) are highlighted as red sticks. Crystal structure of p53 bound to DNA (PDB ID 1tsr) was rendered using Pymol. conformations. These mutations when combined with the previously characterized Quad mutant led to further stabilization.82 The core domain of this “Hexa’ mutant has an overall fold of the wt core domain, as the crystal structure of this ‘Hexa’ mutant (PDB
2WGX) was virtually identical but for the immediate hydrophobic environment of the mutated residues. The mutations lead to a change from the suboptimal hydrogen bonding to favorable van der Waals contact between F236 and I253. In addition to the specific interaction between these residues, the change from polar residues to hydrophobic residues allowed the neighbouring hydrophobic environment of these residues to repack
34 and form more favorable interactions. This mutant also showed native-like DNA binding to the fluorescein-labelled 30-mer double stranded DNA (dsDNA) containing the 3′ p21 response element, as indicated by the anisotropy experiments with the full length version of the mutant. One of the consequences of these mutations as indicated by urea mediated equilibrium denaturation is that they stabilize an intermediate in the unfolding pathway, moving away from the cooperative two state denaturation exhibited by the wt and Quad core domains. This behavior, different from that of the wt protein, may have implications in the in vivo function of the full length protein and makes it a less attractive template for in vitro characterizations.
P53 is arguably one of the most studied proteins and the common goal of these investigations is to battle cancer. According to the latest statistics, over 26000 somatic mutations and over 500 germline mutations have been reported to occur in the p53 gene.
More than 2300 of these mutations have been reported to have functional consequences83
(http://www-p53.iarc.fr version R15). The initial data derived for the various non- functional mutants of p53 were derived using yeast based transcriptional activation studies. Some of the initial biophysical characterizations to elucidate the mechanism of action of these mutations came from NMR studies conducted by Wong et al.84 They monitored the effect of five different ‘hotspot’ mutations of p53 (V143A, G245S,
R248Q, R249S, and R273H) by observing the changes in the chemical-shift pattern. The location of these mutations on the crystal structure of p53 is highlighted in Figure 8. The
35
extent of structural perturbation provided insight into the nature of these mutations.
R273H was categorized as a ‘contact’ mutant as this mutation mainly leads to
perturbation of the loop-sheet-helix motif and the L3 loop of the core domain. In
addition, this mutation leads to loss of a salt- bridge interaction. These changes minimally
affect the overall stability of the core domain but lead to considerably decreased
transcription of the p21 gene. Mutations R249S and R245S are characterized to be
‘structural’ mutations as they lead to global perturbation of the core domain structure and
cause the protein to be destabilized >2 kcal mol-1. The R248Q mutation leads to structural
destabilization and loss of DNA binding indicating that this is a structural and contact
mutant. The mutation V143A which is buried deep inside the hydrophobic core of the β-
sandwich was found to cause perturbations to all the residues in the core. The
understanding of these mutations allowed Nikolova et al. to analyze the mechanism of
rescue of some of the cancer associated mutants by second site suppressor mutations.85
These studies allowed them to categorize targeted drug discovery into two groups. The first group will consist of drugs that can globally stabilize the protein, and these will be effective against the vast majority of cancer-associated p53 mutations which are known to be destabilizing. These include structural mutants such as the previously studied
V143A and G245S. On the other hand contact mutants like R273H require the drug to restore the DNA binding activity in order to be effective in rescuing the mutant p53 to possess wt-like activity. Furthermore, Bullock et al. have conducted a comprehensive analysis of all the hotspot mutations and found that most of the mutations lead to destabilization of the core domain.86 They categorized the cancer associated mutants into
36
different classes: DNA contact, DNA region, Zn region and β sandwich, providing an
overall perspective on the various p53 mutations based on their location on the core
domain affording a direction for future drug design ventures. In subsequent studies, the
crystal structures of R273H and R249S were solved.87 These structures reinforce the
hypothesis on the reason behind the loss of function of these mutants. They also solved
the structure of Y220C, one of the mutations in p53 that occurs with high frequency.
Understanding the structural effects of this mutation has led to the discovery of a class of
compounds that can potentially serve as lead molecules for the rescue of a broad range of
p53 mutations.88; 89
A significant consequence of the development of the various characterization methods is
the ability to interrogate the sequence-structure-function relationship of these variants.
Khoo et al. analyzed various stabilizing and destabilizing mutants for their transcriptional and apoptotic activities in tumor derived cell lines.90 They found that the stability of the various mutants correlates with the serum half life of these proteins and by and large an increased stability leads to improved activity. The outlier for this hypothesis was the
V143A mutation in the Quad and the Hexa mutant context, which showed WT-like transcriptional activity despite being a highly destabilized variant. Mutational analysis of these variants indicated that the improved activity of these mutants is due to the N239Y which is universally present in all these variants. It was also shown that this mutation leads to a higher apoptotic activity in a transcription dependent manner. This may indicate an effect of this mutation on the binding efficiency of the p53-DNA complex.
37
Comparative binding studies of the various DNA targets indicated that the transcriptional
91 consequence of p53 binding is determined by the tightness (KD) of binding. Typically
p53 exhibited tighter binding to its apoptotic targets as compared to the cell cycle arrest complexes. Therefore this mutation may improve the binding to levels that are required for the apoptotic activity. This implies that specificity of the response elicited by targeted p53 therapy may be more challenging to engineer.
1.4.2.2 The role of unstructured C-terminal regulatory Domain of p53
The mode of DNA binding by p53 and the role of C-terminal domain in the regulation of p53 is also an active area of research. The binding of p53 to DNA is found to be affected
by post-translational modifications such as phosphorylation and acetylation, in addition
to the specific sequence of the DNA. The role of the unstructured C-terminal domain is
probably the least understood among the various domains of p53. Early studies
demonstrated that truncation of the CTD or blocking the CTD by an antibody lead to
improved DNA binding by the core domain.92 The role of specific post-translational
modifications on the C-terminal domain was studied by Friedler et al.93 Their results
show that acetylation of specific lysine residues reduce the binding efficiency of the C-
terminal domain (CTD) peptide to a long non-specific DNA sequence derived from
sheared herring sperm. Concentration-dependent measurements show that there is an
exponential effect in the binding when p53 is present as monomer versus dimers versus
tetramers. This further added to the proof that the CTD may negatively regulate the DNA
binding of p53 core domain. They also reported that the phosphorylation of S392 in the
38
CTD did not affect the binding to any degree, which contradicts previous results. The
authors suggest that S392 may not affect the binding and that previous work may have involved phosphorylation of other serine residues in addition to the S392. The observed dependence on phosphorylation of serine might be due to S376 and S378 which are in the
DNA binding stretch of the CTD. These experiments led to the hypothesis that the CTD negatively affects the sequence specific DNA binding by the core domain and an allosteric regulation of DNA binding by the CTD was suggested.
This allosteric regulation mechanism was challenged by experimental results that indicated that the DNA binding core domain is unaffected by changes in the CTD.
Specific constructs known to inhibit tetramerization and the contructs with the CTD were compared using NMR spectroscopy, and the structure of the core domain was found to be unvarying irrespective of the presence of the CTD.94 A comparison of the sequences of
the CTD of p53 and its paralogs p63 and p73 revealed significant differences among the
proteins supporting the tight binding of DNA by p73 and weak binding by p53.95 The influence of the CTD on DNA binding was also found to depend on the length of the target DNA studied. It was shown that the CTD facilitates binding to long stretches of
DNA while it negatively affected binding to short DNA. The presence of the CTD was found to improve both the kinetic and thermodynamic stability of the p53-DNA complex.
These results imply a positive regulatory role for the C-terminal domain in DNA binding by the core and tetramerization domains.
39
The tetramerization domain does seem to facilitate the cooperative binding of the protein to DNA whereas the mechanism of the C-terminal regulatory domain in still unclear.
Recent work reported by Tafvizi et al. provides a feasible mechanism by which p53 binds to sequence specific DNA.96 They used single molecule experiments to probe the mode of binding of each of the various domains in p53. It is known that non-specific DNA binding is independent of ionic strength while ionic strength does play a role on sequence specific DNA binding. Their experiments showed that the C-terminal domain binds DNA via non-specific electrostatic interactions and the sequence specific binding is mediated by the core domain. The fluorescence profiles of the labeled proteins also indicate that the C-terminal domain employs a sliding mechanism while the core domain engages in a
‘hopping’ mechanism to search through the DNA. DNA binding proteins are known to undergo conformational changes when they change from ‘search’ mode to the ‘binding’ mode. In the case of p53, the function is split between two domains, C-terminal domain sliding through the DNA aiding the ‘search’ process and the core domain binds that specific sequence of the DNA. The hopping mechanism followed by the core domain allows for the rapid traversing of long stretches of DNA by the protein. It is proposed that the protein spends a large amount of time in the ‘search’ configuration as compared to the
‘binding’ conformation. The authors propose that different roles of the domains in DNA binding explain the contradictory role of C-terminal domain on DNA binding. Truncation of the C terminus or binding by specific antibodies eliminates sequestration and leads to better binding to the cognate sites on short DNA fragments, while making binding to long
DNA molecules kinetically inefficient. Therefore the C-terminal domain might
40
kinetically favor binding to long DNA, which is the most likely scenario in vivo, and
thermodynamically hinder the binding of the core domain to short DNA by sequestering
the protein on to non-specific DNA.
In vitro experiments comparing the effects of various mutations on the core domain in isolation versus in the presence of tetramerization domains showed similar trends in the
DNA binding activity. Wieinberg et al. tested the effects on sequence-specific and non-
specific DNA using fluorescence anisotropy and analytical ultracentrifugation.71 The Hill plot indicated that the binding of the core domain is completely cooperative both in isolation and in the presence of the tetramerization domain. In addition, the amount of destabilization imparted by each of the mutants tested followed a similar trend in both the contexts. This establishes that the core domain, which dictates the stability of the protein, also determines the sequence specific DNA binding and can be used as a model to study the effect of mutations on the protein. This paints a picture which shows the DNA binding and the tetramerization domains act synchronously to bind DNA while the regulatory domain allows for the fast traversing through DNA to find the right binding targets. In the presence of non-native, non-specific or short fragments of DNA, this regulatory domain sequesters the protein by non specific electrostatic interactions with
DNA. This sequestering kinetically favors the degradation of the protein, which is the most likely scenario at normal cell conditions.
41
In the course of vertebrate evolution, p53 has probably evolved to be kinetically unstable
at the organismal temperature with a short half-life in the cell to allow a mechanism for
spontaneous degradation in addition to regulatory pathways such as MDM2. This rapid
turnover of p53 ensures that the protein will remain in the cell no longer than required,
unless it is in complex with DNA or other proteins. Some experimental data to support
this comes from the studies done by Khoo et al.97 They analyzed the changes imparted to
the sequence, structure and function of p53 in various lower organisms and its paralogs
p63 and p73. P53 is functionally evolved to be a tumor suppressor protein whereas the
main cellular function of p63 and p73 is not growth suppression. In the process of this
functional evolution, p53 has acquired destabilizing mutations that render the protein to
be highly unstable at the body temperature of the organism. This allows an additional
regulatory mechanism to be available for p53 to be targeted for degradation under normal
cell conditions. Studies on the tetramerization domains of these homologs indicate that the tetramerization domain of p53 has evolved to be less complex than in p53 from lower
invertebrates and the p63 and p73 in humans.98 This acquired low promiscuity may have
defined the functional drift of p53 from its paralogs. Its significance in human cancer has
made it a target for various drug design studies.
1.4.3 Characterization of the Response Elements of p53 Core Domain
Considerable efforts have been made towards understanding the sequence and
conformation of the DNA response elements (REs) of p53. The first report on the
42
sequence of the DNA that binds to p53 was done by Kern et al.99; 100 Their experiments,
which utilized radiolabeling of the DNA were able to identify that a human DNA
fragment as small as 33 bp can bind to p53. Using methylation interference assays, they
were also able to assign the residues that are significant for this binding. These
experiments first claimed that the function of p53 may be mediated by its ability to bind to specific DNA sequences in the human genome, and that this ability is altered by mutations that occur in p53 found in human tumors.
P53 was then defined as a sequence specific DNA binding protein by El Diery et al., and
they defined the currently known ‘consensus’ binding sequence for p53.101 They
identified and analyzed 18 different p53 binding sequences from the human genomic
DNA. Fragmented DNA was analyzed for binding to p53 using p53-specific antibody,
and the DNA bound to p53 was identified using sequencing. The significance of each of
the DNA bases was tested using methylation interference (MI) and immunoprecipitation
(IP) assays. Their results showed a striking pattern for the binding sequences which
consisted of two copies of the 10 bp motif 5’-PuPuPuC(A/T)|(T/A)GPyPyPy-3’
separated by 0-13 bp, and this was referred to as the ‘definition of the consensus.’ Their
experiments also showed that the ‘inversion’ of the half site as well as the C and G bases
at the positions 4 and 7 are significant for p53 binding.
In a parallel study, Funk et al. used an iterative selection procedure (CASTing: cyclic
amplification and selection of targets) to identify new specific binding sites for p53, using
43
Figure 9: Regulatory network of p53 The various stress that lead to change in Chromatic leads to activation of p53 pathway which ultimately leads to cell cycle arrest, apoptosis or senescence depending on the severity of cellular stress. MDM2 plays a major role in the inhibition of p53 and this interaction is the target of many small molecule drugs. Figure from Kim and Dass 2011102
44
nuclear extracts from normal human fibroblasts as the source of p53 protein.103 A
completely degenerate DNA sequence flanked by primer-specific amplification
sequences bound to p53 was isolated using magnetic beads coated with p53-specific
antibody. The DNA was released from this complex by denaturation and was amplified.
Multiple cycles improved the stringency of selection and yielded specific DNA binding
sequences for p53. The preferred consensus was the palindrome
GGACATGCCC|GGGCATGTCC. In vitro binding was assessed by Electrophoretic
Mobility Shift Assay (EMSA), and placing the identified sequences upstream of a
promoter in a yeast assay led to transcription by WT p53. Yeast one-hybrid methods
have been modified to identify REs from the human genome. Initially Tokino et al.
cloned putative p53 REs upstream of a basal promoter system controlling the expression
of HIS3.104 Only those cells encoding sequences that p53 expressed from a GAL4 promoter from a different vector can transactivate and will survive on media lacking histidine. They were able to identify 200 to 300 REs from the human DNA and they also reported that the spacing between the consensus half sites is significant. Later, Hearnes et al. combined Chromatin Immunoprecipitation (ChIP) with this yeast screen to identify novel binding sites from the whole genome.105 They were able to identify sequences that matched the consensus completely, and they observed that the spacing between the half sites was 2 bp or less.
45
1.5 Rescue strategies for p53C and mutants
The rescue strategies developed for p53 can be categorized into two groups.102; 106; 107 One
of the methods is direct rescue by gene therapy. In this method phage delivery systems
are targeted to deliver the cargo to the tumor cells. The ill-effects of the immunological
response to the virus and the accompanying side effects have been a major detriment for
this approach. The second method is the indirect rescue by small molecule activators, and
the effectiveness of a number of these molecules is currently being tested by pre-clinical
trials. P53 is maintained at low levels using tightly-regulated pathways and the knowledge about this regulatory network has paved way for this class of p53 rescue
(Figure 9). This class can be further subdivided into those molecules that reactivate
mutant p53 and those which confer activity to wt p53 by interacting with its negative
regulator MDM2. Despite a general understanding of the effect of these molecules, the
specific mechanism of action of a number of these compounds is still poorly understood.
Studies are underway to gain a better understanding of these in order for easier and more
effective design of drugs in the future. The few compounds for which the mechanism of
action is clear, it appears that stabilizing p53 is a major determinant of the effectiveness
of the drug.
The class of small molecule activators of p53 function by disrupting its interaction with
the protein MDM2108 is seemingly more established, with a few compounds in pre-
clinical studies. The high resolution crystal structure ofMDM2 bound to p53 shows that
46
Figure 10: Rescue of p53 by binding to MDM2 a) shows the various strategies to rescue the function of p53. Figure from Mandinova et al.109 b) shows the structures of p53 targeted drugs Nutlin 3a and MI219 bound to the p53 peptide binding pocket of MDM2110
47
this binding pocket is ideal in size for small molecule binding and has facilitated the
design and discovery of non-peptidic inhibitors of this interaction. Among the various
targets identified, Nutin3 and MI219 have all the desirable properties: namely, (a) high
binding affinity and specificity to MDM2, (b) potent cellular activity in cancer cells with
wild-type p53, and (c) a highly desirable pharmacokinetic (PK) profile and are among the
small molecules which are in clinical trials now.110; 111 The structure of Nutlin3 bound to
MDM2 has been solved and shows that it binds to the pocket where the p53 peptide
binds112 (Figure 10). RITA is another small molecule drug that has been reported to
activate WT p53 by disrupting its interactions with MDM2,113 but its in vivo
pharmacokinetic properties are yet to be established.
53BP2 is a positive regulator of p53, and based on this, Friedlar et al., have designed a
peptide CDB3 that can reactivate the DNA binding property of mutant p53.114; 115 Using
NMR studies they were able to localize the binding of the peptide to the edge of the DNA binding site and show that incubation with CDB3 improved the stability of the p53 core domain. Furthermore, when mutants of p53 were incubated with the peptide, it led to improved binding of the mutant. Using NMR chemical shifts, it was shown that this peptide can shift the conformation of structural mutant R249S to be WT-like.116 When mutants are in a dynamic equilibrium between the WT-like and non-native conformation, the chaperone effect of the CDB3 peptide assists in shifting the equilibrium towards the native like form. Thus CDB3 potentially rescues the function, simply by stabilizing the
48
mutant. The cellular uptake and the in vivo effect of the FL CDB3 were validated in three
different human cancer derived cell lines.117
Representative structural and contact mutants R175H and R273H were rescued by this
peptide. ELISA experiments using PAB1620, an antibody that recognizes properly folded
p53, confirmed the chaperone-like activity of the peptide. In the context of the contact
mutant, the authors propose the total upregulation of p53 in cells might be the reason for
the rescue of function by the peptide. In a separate study, the authors have shown that the
kinetic stability of the p53 mutants correlate with their thermodynamic stability. The
kinetic instability might explain the loss of activity by many of the structural mutants.
The authors tested the effect of stabilizing small molecules on the half life of various
mutants including WT p53. The presence of peptide drugs like the CDB3, which is
proposed to act as a chaperone in the folding of the protein, improved the kinetic stability
of the protein.118 This presents a unique strategy for stabilizing the mutant which might
lead to the rescue of structural mutants. A general mechanism for the binding of this
peptide was found by NMR combined with alanine mutation studies of the CDB3 peptide. Their studies revealed that the DNA binding region of p53 is highly positively charged and binding by the peptide is governed by non-specific electrostatic interactions.
An analysis of the binding of various p53 partners reveals that regions overlapping with the DNA binding area of the p53 form a promiscuous site for binding.119 This site
consists of the residues Leu114, His115, Gly117, Thr118, Val122, and Thr123 from loop
L1, Arg280, Arg282, Arg283, Thr284, and Glu286 from helix H2, and Tyr126, Thr140,
49
and Glu198 from the rest of the protein. Residues Val143, His179, Asn239, Gly244,
Val272, Cys277, and Gly279 also form part of the binding site. DNA-binding interface in
p53CD serves as a multipurpose promiscuous protein-binding site. This site mediates
binding of most of the p53CD-binding proteins, including Rad51, HIF-1α, Bcl-XL, CTF2,
53BP1, 53BP2 and heparin. The rigid hydrophobic core and the multitude of flexible loops in the core domain of p53 allow the promiscuous binding to its various partners.
Further experimental evaluation and possibly engineering is required to prove the specificity of CBD3 in vivo.
PRIMA1 and its methylated derivative PRIMA1MET are two non-peptidic small-molecule
drugs which rescue the activity of p53 mutants and have been shown to be active against
various cell lines and carcinomas.120; 121 A library of small molecules that suppress the
growth of human tumor cells in a mutant p53–dependent manner was screened using an
assay based on Saos-2-His-273 cells carrying tetracycline-regulated mutant p53. This
study identified PRIMA1 as a broad spectrum p53 rescue drug which reactivated 13 of 14
different mutants. In vivo studies show that PRIMA1 has tumor suppressor activity in
animal models and also induces other p53 dependent genes including MDM2. This
indicated that PRIMA1 reactivates p53 by inducing WT-like conformation of mutant p53.
In vitro studies in tumor cell lines and subsequent in vivo studies in mouse models
identified PRIMA1MET,a methylated derivative of PRIMA1, as a more potent p53
activator .122; 123 An insight into the mechanism of action of these compounds came from
in vivo and in vitro decomposition studies of PRIMA1 and PRIMA1MET.124 Their
50
decomposition leads to the formation of products that contain thiol reactive groups (for
example, methylene quinuclidinone, MQ). These decomposition products alkylate the
thiol groups in p53 leading to an unscrambling of the misfolded mutant. This mechanism
of action based on thiol reactivity is similar to other p53 reactivating drugs like MIRA,
STIMA, and CP-31398. The effect of PRIMA1MET on global gene expression was studied
by microarray analyses of tumor cell lines expressing p53 mutant and showed that various transcription dependent and independent p53 targets were regulated in the
presence of the small molecule in a p53 dependent manner.125 This further reinforced that
PRIMA1MET recues the wt-like functions of p53 and potentially has minimal possibilities
of the development of drug resistance.
Using the same screening system that was used for the discovery of PRIMA1, the Bykov
group identified a novel compound STIMA1 that improved the DNA binding and
transcriptional activation of mutant p53.126 The structural scaffold of 2-vinylquinazolin-
4(3H) is similar to the previously characterized mutant p53-reactivating compound CP-
31398. A set of 26 different derivatives of this scaffold were tested to identify STIMA1.
In vitro studies indicate that STIMA1 is specific toward cells expressing mutants of p53 and that WT p53 is less sensitive to the effects of STIMA1. Just as in the case of
PRIMA1, the authors propose a thiol reactive mechanism for the activity of STIMA1.
The compromised solubility of STIMA1 has hindered the in vivo studies of this compound and derivatizations to improve the solubility are being explored.
51
Phikan083, which belongs to a class of carbazole derivatives and can rescue the function
of Y220C, was discovered by in silico screening followed by experimental testing.87; 89
This mutation leads to a cavity in the p53 core domain distant from the surface regions that are known to be involved in DNA recognition or protein–protein interactions, making it a particularly attractive target site for stabilizing small-molecule drugs.
PhiKan083 was demonstrated to improve both the thermodynamic and kinetic stability of the mutant. Since Y220 is not in the region that binds DNA, PhiKan083 can serve as a lead compound for the design of generic p53 activators. The binding of this compound to the cavity generated by the Y220C was unambiguously proved when the complex was crystallized with PhiKan083 bound to the particular cavity generated Y220C mutation
(PDB 2YUK). The effectiveness of PhiKan083 demonstrates the utility of stabilization of p53 for rescue of function.
The significance of high throughput screening systems is accentuated by the fact that such discoveries were made via such screening systems albeit at a small scale. Despite the fact that screening systems provide a powerful tool in the identification of small molecules, only a few yeast based screening systems are prevalent for the identification of functional p53. The number of variants that can be studied in a high-throughput manner is limited by the transformation efficiency in yeast. This explains the small libraries of variants that have been used in the studies thus far. A review of the presently available screens for p53 is detailed below.
52
1.6 Identification of functional mutants of p53 using genetic screens
One of the earliest screening systems developed for p53 is a negative selection scheme
based on a yeast reverse one-hybrid system. Brachmann et al. used a URA3 based
reporter system and screened for both survival in uracil-free media and sensitivity to
FOA (5-fluoro-orotic acid). P53 expression was driven from a different plasmid under the
control of an ADH1 promoter127; 128 (Figure 11). In this assay, which depends on the
activation of URA3 placed downstream of a p53 dependent promoter, WT p53 can survive on media lacking uracil and will not survive on plates containing FOA (FOAs).
The binding sequence of p53 was based on the consensus reported by El Diery et al.
Mutations of p53 were identified based on their ability to rescue FOAS phenotype. They
isolated 49 different mutations, most of which were previously reported in cancer, and
they found that most of the mutations clustered around three of the six hotspot sites. The degree of dominant negative effect exerted by these mutants was further explored by screening in presence of one or two copies of WT p53. The same screen was later used in
order to identify second site repressor mutations.81 Using PCR and gap repair, a library of mutants was analyzed using the same screen, and identified second site repressor mutations for V143A, R249S and G245S. They also showed that the mutations that
53
Figure 11: URA3 dependent screen for p53 in Yeast Yeast based screen for p53 PCR mutagenesis and gap repair of different regions of the p53 ORF (PAC products A and B) was used to assess the transactivation capacity of various mutants of p53 monitored by the survival rate on uracil drop-out media. Survival on the histidine drop-out media denoted successful gap repair.127
54
resulted from the yeast screen lead to transcription of the reporter gene and natural p53
RE in mammalian cells. Ishioka et al. report a functional screen for p53 based in the p53
dependent expression of HIS3 gene.129 The expression plasmid that contained the DNA encoding the p53 fragments from the analyte was selected based on survival on plates lacking leucine. This tests the gap repair efficiency in yeast and selects for clones expressing full length p53. These were then transformed with the reporter plasmid that utilized a p53-dependent HIS3 system and plated using replica plating on histidine and leucine drop-out plates. This assay is cumbersome due to the requirement of replica plating. A modification of this system, which was developed by Flaman et al., depends on the expression of p53 dependent Ade2 gene from a plasmid that is integrated into the yeast strain130 (Figure 12). In presence of limiting amounts of adenine, Ade2- leads to the accumulation of a colored intermediate in the biosynthesis of adenine, turning the cells red. Therefore when Ade2 is expressed in a p53 dependent manner, WT p53 leads to white colonies while mutants will result in red colonies. This screen also identified marginally active variants forming pink colonies. A detailed analysis showed that these were temperature sensitive mutants which resulted in pink colonies at 25 °C and white colonies at 37 °C. This screen can be utilized to identify functional p53 from clinical samples including cell lines, peripheral blood lymphocytes and tumors. The advantage of this screen over the previous URA3 based screen is that it avoids the replica plating step.
The previous screen though has the distinct advantage that the URA3 is regulated by a tightly controlled promoter limiting basal expression in the absence of p53.
55
Figure 12 Red white screening in yeast p53 response elements enginerred upstream of ADE2 allows red white screening due to the coloured intermediate formed during the degradation of ADE2.130
A number of modifications of this basic yeast one hybrid screening system have been developed to analyze for active p53 or p53 REs. The p53 dependent transactivation has been monitored using autotrophic markers such as luciferase, tryptophan uracil and histidine. A significant drawback of yeast based screens is that efficient nuclear import of the protein is required. In addition, the transformation efficiency of yeast heavily limits the number of variants that can be studied at the same time. This is an important deterring factor for combinatorial experiments. mRNA display was used by the Ghadessey group to identify p53 variants that bound to the RE in the p21 gene.131; 132 In vivo results showed that in addition to the transactivation from p21, the endogenous levels of p21 were also increased by some of the variants tested. In vitro methods offer the advantage of avoiding the transformation step, which usually limits the library size, but they are also 56
complicated by the in vitro translational machinery required. A solubility screen was
developed by Mayer et al. in E. coli.132 The authors monitored the levels of a p53 core
domain with a C-terminal EGFP fusion as a function of cellular fluorescence. They
observed that the levels of p53 correlated with the thermodynamic stability of the variants tested. One of the problems with the solubility screen is that it does not require the folded conformation of the protein. If the chromophore is formed from misfolded proteins, false positives may occur.
The tremendous amount of work that has been done on p53 has led to a clearer picture of the mechanism of action of p53 as a tumor suppressor protein. Both the kinetic and
themodymamic stability of the full-length protein are under tight regulation at
physioloigical conditions. It appears that the CTD may facilitate the search process for
sequence specific DNA binding while the tetramerization domain promotes the
cooperative binding of p53 to DNA. The N-terminal domain is implicated in the transactivation function of p53 and also interacts with MDM2, which is the primary negative regulator of p53. Understanding the structural basis of instability has led to the identification of various drug targets. High throughput screening systems and combinatorial methods to generate large libraries of p53 can aid in interrogation of the sequence-structure-function relationship of this protein which is still unclear.
57
1.7 Thesis Synopsis
What determines the fold that a particular protein adapts? In other words, how much
perturbation can a structure tolerate with respect to its sequence before it changes it is unable to form ‘native’ structures? Also is there any difference between the stability determinants in β-sheets versus α-helices? Does the large contribution of long range
forces make the β-sheet systems have different tolerances to sequences? We have
attempted to answer some of these questions using combinatorial methods to study the
tumor suppressor protein p53. P53 is a physiologically important protein with a common
β-immunoglobulin fold. We have also analyzed the specificity of the interface of a
heterodimeric coiled coil formed by the interaction between BRCA1 and BARD1
We developed a transcription interference screen for p53 core domain based on an
artificial p53 responsive lac operon, controlling the expression of GFPuv in the host
plasmid pGFPuv. We report a novel p53 responsive lac operator which was derived by
the simultaneous optimization of the transcription fo GFP and binding to wt-like Quad
from a library of binding sequences. We have successfully designed a screen which
discriminates between p53 which shows DNA binding activity and variants of p53 that
cannot bind or weakly bind to DNA. As in the case of most proteins, the function of p53
is closely related to its stability. Therefore our screen provides a simple and quick method
to screen for stable and functional variants of p53. The design, optimization the results
from the screen are described in Chapter 2.
58
The various applications of the screen are the subject of discussion in Chapters 3 and 4.
Chapter 3 describes the application of the screen to find stable functional variants of p53
core domain from a library of core randomized variants. Initially, four core residues were
randomized to all twenty amino acids. The number of positives obtained as a result of
screening this library was very small, and therefore we generated two smaller sub-
libraries randomizing only two residues at a time to investigate the role of these four
residues. The results obtained from the sub-libraries showed that two of the four positions
randomized were more stringent to the size requirements in the β-sheet core than the
other two, thus explaining the low occurrence of positives from the 4-position library.
The validity of the screen was further proved by biophysical characterization of stable
and functional variants resulting from the screening. The characterization shows that the
variants that pass the screen are as stabilized as the Quad mutant. In silico studies indicate
that decreasing the global dynamics of the protein by reducing the loop length of the
S7S8 turn in the p53 core domain can lead to significant stabilization of the protein. But
this in silico designed variant failed to give a positive phenotype in our screen. Following
this, we rationally designed various deletion mutants and utilized our screen to explore
the stabilization of core domain through mutagenesis of this loop. We were able to
identify one such variant which is intermediate in stability with respect to the in silico designed mutant and Quad. Using the screen to identify rationally designed loop variants
of p53 is detailed in Chapter 4.
59
BRCA1 is another important tumor suppressor protein and it is found that some oncogenic mutations in BRCA1 lie in the putative binding regions to key interation partners. e.g., BRCA1 and BARD1 interact with each other through their RING finger domains to form a heterdimeric four helix bundle. The occurrence of oncogenic mutations is much more widespread in BRCA1 than in BARD1. Sarkar et al. have designed a screen based of the split complementation of GFP to screen for BRCA1 variants that can bind to BARD1 and thus provide information on the interaction surface between these proteins. The results obtained from the initial study of a few mutants indicated that the interface is fairly insensitive to mutations. Analysis of all the 36 known cancer-associated mutations of BRCA1 using this method also indicated the presence of a robust interface which is not highly sensitive to mutations. As a further validation of the results obtained from the in vivo screen, we have characterized these interactions in vitro.
A selected set of mutations both in and away from the interface were selected and variants with these mutations were co-expressed in the absence of the GFP tags. Both
BRCA1 and BARD1 partition into inclusion bodies when expressed separately in E. coli and are soluble only as a complex. We expressed the two proteins with orthogonal tags and the purification of the complex from the soluble fraction report on the interaction between WT BARD1 and BRCA1 variants. The observations from the in vitro studies confirm that the interface between these proteins is quite robust to mutations. The various details of this study are described in Chapter 5.
60
Steps to improve the dynamic range of the screen have been initiated. Two different approaches are being taken, to optimize the screen for a more stabilized core mutant
‘hexa’ p53 and to screen in the presence of the tetramerization domain to enable us to detect weaker DNA binding of the core domain. The screen can be potentially used for screening therapeutic targets. As a first step towards this, cyclic peptide libraries will be screened to identify molecules that can stabilize and rescue the function of known hotspot mutants of p53 core domain. These extensions to the screen are described in Chapter 6.
Given the pathological significance of p53 in human cancers and the high throughput nature of the functional screen for p53, our screen is a powerful tool in identification of destabilized mutants. In combination with other small molecule libraries, the screen can also be used as a tool to identify drugs to rescue various destabilized variants. Since the screen is bacterial and phenotypic, it provides a quick approach to identify lead compounds. A repertoire of such potential drugs which can be identified using a combinatorial library approach will be powerful against various oncogenic mutants of p53.
61
Chapter 2: A cell based screen for the functional core domain variants of tumor
suppressor protein p53
Contributions
The material presented in the chapters 2 and 3 will be published as a full paper co-
authored by Brinda Ramasubramanian and Thomas J Magliery. The work summarized in
this chapter was produced and written by the primary author. The experimental design
and data analyses were accomplished by the primary and the corresponding authors.
2.1 Summary
We have developed a high throughput phenotypic bacterial screen for the core domain of the tumor suppressor protein p53. The screen relies on the transcriptional interference of
an artificial p53 responsive lac operon, controlling the expression of GFPuv in the host plasmid pGFPuv in a p53-dependent manner. We modified the operator region of the lac operon to contain a p53 binding site. Wild type-like p53 binds to this site, blocking the polymerase and leading to a non-fluorescent phenotype. In the presence of mutant p53 whose ability to bind DNA is compromised, transcription of GFPuv is undeterred and the
62
resulting cells are fluorescent. In addition to utilizing some of the known binding
sequences of p53, we have also used combinatorial methods based on the consensus p53
binding sequence to generate a novel p53 construct (Binding Domain-1, BD-1) which, in
the context of the lac promoter optimizes the transcription from the operon in the absence
of functional p53. P53 Quad (an engineered stable variant of WT p53 core domain) and various mutants were expressed under the control of an arabinose promoter, a tunable system. The host plasmid pACBAD-p53 can be co-maintained in the cell with pGFPuv-
p53 binding domain (pGFPuv BD-x). Known hotspot mutants of p53, V143A, R175H,
R249S and R273H, which are known to span the spectrum of structural and contact
mutants of p53, were chosen as the negative controls for the screen. Our results show a
marked decrease in the fluorescence of pGFPuv-BD-1 when co-transformed with p53-
Quad (wt like) and higher fluorescence when co-transformed with the hotspot mutants of
p53. We have successfully designed a screen which discriminates between p53 which
shows DNA binding activity and variants of p53 that cannot bind or weakly bind to
DNA. As in the case of most proteins, the function of p53 is closely related to its
stability. Therefore our screen provides a simple and quick method to screen for stable
and functional variants of p53.
2.2 Introduction
Proteins play a central role in almost every biological process including signal
transduction, cell cycle regulation, DNA transcription, translation, cell cycle arrest,
apoptosis, etc. Amazingly, their large diversity is derived from varying just twenty amino
63
acids. The functional diversity of proteins is a result of the different chemical properties
of these twenty amino acids and their architecture in the native state. The wide
involvement of proteins in various cellular processes also means that mutations that
destabilize proteins lead to impaired biological functions and thus lead to various human
diseases. Cystic fibrosis, sickle-cell anemia, Alzheimer’s, heart diseases etc. are examples
of human diseases caused by functionally impaired proteins that are destabilized due to
mutations. One of the devastating consequences of impaired function of mutant proteins
is cancer. Census reports from the American Cancer Society show that about a million
new cases of cancer were diagnosed in patients over the last 10 years and more than half
a million fatalities were reported to be due to cancer in the past years in the United states
alone.133 Understanding the correlation between protein sequence and stability, structure and function is therefore of pivotal importance to human health.
Carcinogenesis is caused by the impaired balance between cell proliferation and
apoptosis. Human p53 was identified as a tumor suppressor protein as early as 1992.134
P53 is a sequence specific transcription activation factor which is at the hub of a network
of signaling pathways and its function includes the direct or indirect activation of
multiple genes involved in cell cycle arrest, apoptosis, cell adhesion etc.135 It can cause
temporary cell cycle arrest and apoptosis in response to carcinogenic cell stress. It is
found mutated in more that 50% of human cancers and leads to complex functional consequences including specificity of mutation to cancer prognosis and drug response.48
P53 is a multidomain protein containing an N-terminal transactivation domain, a DNA
64
binding core domain, a tetramerization domain and a C-terminal regulatory domain. A
proline rich domain is sandwiched between the transactivation and the DNA binding
domains.136 Proteolytic digestion experiments have shown that the sequence specific
DNA binding core domain coincides with the major hot spots for oncogenic mutations
and helps us understand why cancer derived mutants are defective in DNA binding.137
Wild type p53 is tightly regulated at low levels under normal cellular conditions and is a
marginally stabilized protein under physiological conditions. Mutant p53 falls outside of
this feedback loop, and accumulates in cancerous cells. Consequently, the activity of p53
relies on its intact conformation which is disrupted even by single amino acid mutations.
Many of these mutations are simply destabilizing and the reduced cellular levels lead to
the loss of activity.48 These results reinforce the fact that identifying stable variants or
stability determinants of p53 is valuable to information-based drug discovery. The sequence-structure-function relationship of proteins has been among the most elusive
concepts to understand. One of the most successful approaches to study this relationship
is the use of combinatorial methods to study the tolerance of sequence changes to
structure and stability in a given backbone structure, and this inverse protein folding
approach has been applied to a variety to proteins including ROP138, T4 lysozyme,139
GB1,31; 34 ubiquitin140 and others. High throughput analyses of large number of variants
using combinatorial experiments rely on linking the phenotype to its genotype. Therefore
an essential tool for such experiments is the development of a screen which reports on the
function of the protein of interest (POI).
65
P53, which has been named the ‘guardian of the genome’, is a physiologically important
protein which and has a β-immunoglobulin structure. Studies show that most of the
cancer-associated missense mutations render the p53 non-functional due to the reduced
stability of the protein. Therefore stabilizing the protein can lead to rescue of function of the protein. Small molecule rescue for a particular destabilized variant Y220C has recently been achieved using in silico screening and docking to the available crystal structure of the mutant.87 Phikan083 binds to the hydrophobic pocket generated by the
mutation and leads to improved thermal stability of the protein.89 This was found to be
sufficient to reinstate the DNA binding of the mutant. Attempts at rescue of the function of this protein target the DNA binding properties as well as stabilizing it. Molecules like
PRIMA1 and PRIMA1MET rescue the transcriptional activation by preventing the sequestering of the various folding intermediates in local minima in the folding funnel.141
The effect of mutations on the residues in its β-sheet core domain varies, depending on the hydrogen bonding pattern of that particular residue. Also, there is a higher prevalence of long range interactions for β-sheet proteins when compared to the alpha helices. Such differences have made any predictions of sequence changes on stability quite challenging.
This in turn has led to a lesser number of studies done on β-sheet proteins in comparison to the studies based in alpha helical proteins. Most of the drugs that have been developed are based on rational design, deriving information from the known scaffolds that can bind to p53. Understanding the tolerance of this protein to sequence changes and the
66
consequences of the various changes to stability and function of the protein will allow us
to gain insight into the structure-function relationship of this protein and to use the
information for targeted drug design. Combinatorial methods using phage or bacteria
allow the screening of ~109 variants at a time, and this can significantly speed up the
process of drug discovery. Therefore it is advantageous to develop a screen as a tool to
identify stable functional variants of p53 and identify drug targets to rescue known cancer
mutations of p53. The in vivo approach will aid in understanding the determinants of
protein stability under conditions somewhat native to the cell. Many factors like the effect
of molecular crowding, chaperones and proteolytic degradation are masked when the
proteins are studied in vitro.132 It has been shown that the stability of p53 is parallel to
the stability of its DNA binding core domain (p53C) and therefore a screen that can
identify stable variants of p53C can also be used as a tool to screen for therapeutic agents
for known p53 mutations.142
A few cell based assays mainly based on yeast one-hybrid systems have been developed
to study the effect of various mutations in this protein.143 These rely on the expression of
a gene of interest from a promoter modified to contain p53 response elements (REs). The transcription of these genes is effected only in the presence of a functional p53.
Selections based on this principle have auxotrophic marker genes downstream of this p53 responsive promoter whereas screens contain quantifiable genes such as ADE, luciferase,
Gal4, GFP and others downstream of the p53 dependent promoter. Selections monitor cell survival in selective media lacking the respective component such as uracil, histidine
67
and others. There has been an array of modifications to the original version of the system,
FASAY,144 to study the effect of mutations on p53-DNA interactions, in a region-specific
manner, expression-dependent manner and promoter-dependent manner. The system has
also been adapted to study the interaction of p53 with other proteins. The use of a
eukaryotic system facilitates various a post-translational mechanism by which p53 is
regulated. One of the significant requirements of yeast hybrid screening systems is the
need for nuclear import of the protein for the assay to work. Also, the outcome of these
assays is a complex function of other protein-protein interactions in addition to the
stability and DNA binding efficiency of p53. Therefore it is challenging to decipher the
specific effect of any particular mutation of the structure and function of the protein. The
library sizes that can be studied using yeast based methods is also limited by the low
transformation efficiency of yeast strains in general. Recently a bacterial screening
system, based on the solubility of the mutants reported by a C-terminal eGFP fusion has
also been reported. However, mutations may alter the conformation of the protein, and
since solubility does not demand the native conformation of the protein, additional
experiments are required to confirm the function of the protein. Here we report a robust,
in vivo bacterial screen that directly measures the DNA binding function of native and
mutant p53.
2.3 A functional screen for p53 core domain
The screen is based on the transcription interference of GFPuv (Green Fluorescent
Protein which contains the mutations F99S, M153T and V163A, also known as the ‘cycle
68
3’ variant of GFP145) gene in a p53 dependent manner. The overall scheme of the screen
is shown in Figure 13. The native function of p53 is to act as a transcription activation
factor that triggers downstream targets leading to cell cycle arrest, apoptosis, senescence
or cellular death. The response evoked by p53 is found to depend on its binding efficacy
to its target. It is reported to bind to DNA which has the following sequence of
Figure 13: A schematic representation of the screen The operator region of lac promoter upstream of GFPuv is modified to bind p53. If the p53 variant expressed from a different plasmid is well folded and functional it binds to its DNA Binding Domain (DBD) and inhibits the transcription of GFP, resulting in low or no cellular fluorescence. If the p53 variant is misfolded and non functional, it fails to bind to the DBD and can result in high cellular fluorescence.
an inverted repeat of PuPuPuCA/TA/TGPyPyPy,91; 101 where Pu is a purine and Py is a
pyrimidine. The repeat of the consensus sequence can be separated by 2-13 base pairs,
69
although a majority of the natural binding sequences are tandem repeats with no base
pairs separating them.91; 146 We decided to exploit the DNA binding function of p53 to
generate a simple phenotypic cellular screen. We modified the operator region of a lac
promoter which encodes GFPuv to contain the consensus DNA binding site for p53. The
lac operon system is one of the most studied genetic systems. In its native state, lac
operon encodes for a lacI, an inhibitor of the lac operon. In the absence of glucose or
lactose the lacI binds to the operon region and inhibits the transcription of downstream
genes, lac z,y and a. When lactose or glucose is present, it binds to the lacI, allowing the
RNA polymerase to transcribe the downstream genes. Bacterial one and two-hybrid
systems have utilized the properties of the lac operator to study protein-DNA and protein-protein interactions.147 We have modified the lac operon to bind p53 so that it
displaces the RNA polymerase to effectively cause transcriptional interference. The
details of the various modifications to the operon and the plasmids are described later in
this chapter. If the p53 variants are well folded and can bind to the consensus DNA
binding domain, the transcription of the downstream GFPuv will be reduced, and therefore will lead to cells in which fluorescence levels are low. On the other hand when the p53 variant expressed is not wt-like, it will not bind to the DNA and therefore leads to strongly fluorescent cells. Thus this screen that we have developed can be called a negative phenotypic screen.
We chose a plasmid pGFPuv, which encodes GFPuv under the control of a lac promoter as the reporter plasmid. pGFPuv has a ColE1 origin and an ampicillin resistance marker.
70
a)
b)
Figure 14: Schematic representation of the screen
a) Plasmid maps pACBADp53 and pGFP-p53bs (binding site) constructed for the screening of functional p53 variants. The pACBAD-p53 plasmid expresses p53 variants which can interact with the pGFPuv-p53bs plasmid which has a modified lacI promoter that facilitated the binding of p53 variants.
The presence of GFPuv allows us to decipher the effects of binding from cellular fluorescence and thus provides a quick phenotypic screen. In order to express GFPuv in a
71
p53-dependent manner, we modified the promoter encoding GFPuv to contain an
artificial p53-responsive operon. The original lac promoter was modified to contain known p53 binding elements at the +1 site of the operator, resulting in the plasmid that we call pGFPuv-BDx, where x indicates the identity of the binding sequence used. Three different p53 binding sequences derived from literature were used separately to modify
the operator in order for us to choose the most optimum operator sequence. The ideal
property of this binding sequence is that it will lead to uninterrupted and robust transcription of GFPuv in the absence of any p53 and will express GFPuv in a p53- dependent manner in the presence of p53. Two of the three variants of the binding sequence (BD1, BD2) were previously used by Sakaguchi et al. in a decoy experiment for p53 binding.148 We used this sequence as is and as a tandem repeat to yield two variants with which we modified the operator sequence. The third sequence (BD3) among the three variants of pGFPuv-BDx was reported by Kern et al. who used methylation interference and immunoprecipitation assays, to decipher this sequence. This sequence is also the first reported natural binding sequence for p53.100 The various sequences are depicted in Figure 14. Oligonucleotides encoding these sequences, flanked by sequences complementary to the parent vector, pGFPuv were used in a PCR reaction to replace the original lac operon sequences with the p53 REs. Following this, overlap extension PCR was used to generate sequences with appropriate restriction endonuclease sites and the final 750 bp fragments were ligated between AlwNI and HindIII sites in the pGFPuv vector to yield pGFPuv-BD1, pGFPuv-BD2 and pGFPuv-BD3.
72
For the expression system, we constructed a plasmid which has an orthogonal origin of
replication and resistance marker compared to the reporter system. We also engineered an
arabinose promoter in this vector so that the expression levels of the p53 variants can be
tightly regulated. This vector was generated from commercially available vector
pACYC177 and pUCBADGFPuv. The p15A origin and the kanamycin resistance gene
were amplified using oligonulceotides which encoded for BglII at the 5’ end and NotI at
the 3’ end. These were chosen so that co-maintained with pGFPuv which has a colE1
origin and encodes for ampicillin resistance.149 Using oligonucleotides which encoded for
the same restriction endonuclease sites, we also amplified the arabinose promoter region
and the GFPuv gene from the pUCBADGFPuv vector. These PCR products were ligated to generate the pACBADGFPuv vector. The GFPuv in this case serves as a “stuffer” gene in the multiple cloning site of the pACBAD vector which aided in the quick identification of successful ligation reactions when replaced with p53 variants. P53 “Quad” which is a stable, engineered variant of human p53 core domain was used as the “wild type” for the screen. 79 Since WT p53 is only marginally stabilized at room temperature (rt) in vitro
characterization of the WT has proved to be challenging. Using Quad as the “wild type”
will allow us to perform in vitro characterizations of the different variants that will result
from our screening studies. P53-Quad is expressed under the control of an arabinose
promoter which is tightly regulated. The details of the vectors are shown in Figure 14.
We generated a screening strain by transforming pGFPuv-BDx into DH10B E. coli and competent cells of this strain were transformed with pACBAD-p53. The level of GFPuv
fluorescence will depend on the efficacy with which the p53 variants expressed are able
73 to bind to its consensus DNA. In other words, well folded p53 variants will lead to cells that are lower in fluorescence, and mutants which are not well folded will lead to fluorescent cells. All the three of these modified lac operons did show some response to the presence of p53 and the resulting phenotype reflected lowered amounts of GFPuv.
The Kern sequence showed the maximum p53 dependent cellular fluorescence.
2.3.1 Optimization of Growth Conditions
The temperature of growth and the concentration of the arabinose required for optimal difference between p53 Quad and a linker used as a negative control were explored. The optimum temperature is the one that does not destabilize the p53 variants expressed and allows for the efficient formation the GFP chromophore. On the other hand, the arabinose concentration will allow us to choose the amount of p53 variant that needs to be expressed in order for maximum transcriptional interference of the GFP gene. Cells were grown at 37 °C and other lower temperatures like 30 °C and rt. Incubation times were varied and a range of conditions such as 12 h at 30 °C followed by incubation at 4
°C for 12 h were also tested. Optimal results were observed for plates incubated at 30 °C for 48 hours. Under these conditions the difference in the fluorescence levels between the cells that contain the Quad mutant and the cells not expressing any p53 was maximum.
The fluorescence of cells with pGFPuv in the presence of Quad were comparable to cells that did not express any p53.
74
The concentration of arabinose was varied from 0.0005% to 0.2%. Specifically the various concentrations of arabinose tested were 0.0005%, 0.002%, 0.005%, 0.02%,
0.05%, and 0.2%. When grown at 30 °C for 48 h, 0.005% concentration of arabinose showed maximum difference for the positive and the negative phenotypes (Figure 15).
Based on these observations, routine screenings were done on fresh made plates with
0.005% arabinose and the plates were incubated at 30°C for 48 hours.
In order to quantify the difference in fluorescence between the negatives and the positive, whole cell fluorescence from normalized amounts of cells were measured. In order for this, cells were grown at 30 °C overnight. The amount of cells were normalized based on
OD600 and equal amount of cells were pelleted and resuspended in sodium phosphate
buffer pH 7.2. The fluorescence from these were measured using the excitation at 395 nm
and emission at 509 nm which monitors GFPuv, the cells needed to be induced with
higher amounts of arabinose (0.1% versus 0.005%) to collect data that is above the noise
level. This might be due to the different expression levels resulting from the two
plasmids. pGFPuv which is expressed from a colEI vector may be expressed at higher levels in comparison to the low copy p15A origin vector that expresses p53. In spite of
the additional arabinose used, the data collected from these measurements were not robust between different trials. Extensive optimizations may yield robust data for
fluorescence in liquid cultures but since it was not a requirement for us to quantitate the
fluorescene levels, we did not pursue this further. A sample of the data collected from the
liquid culture sample is shown in Figure 15.
75
Figure 15: Initial optimizations of the screen a) shows the co-transformation of the Quad mutant with the pGFPuv-BD3 in comparison to pACYC177 a non fluorescent vector and pGFPuv-BD1 co-transformed with an empty vector (pACT7lacCAM) b) shows the fluorescence levels of normalized amount of cells grown in liquid media. Co-tranformation of Quad with pGFPuv-BD1 is compared with pGFPuv-BD1 in isolation and when transformed with a pAC-linker vector.
2.4 Proof of principle using known hotspot mutant
Now that we had a system that gave good screening results with the wt-like Quad mutant of p53 we wished to examine and establish the dynamic range of the screen. Four of the
76
known physiologically relevant mutants of p53 (V143A, R175H, R249S and R273H)
were constructed by overlap extension PCR150, using p53 Quad as the template. V143A is
a temperature-sensitive mutant and is found to be inactive at temperatures above 32 °C.84
R175H and R249S are structural mutants that destabilize the core domain to different extents and R273H is a contact mutation, which does not destabilize the core but since
R273 is close to the Zn2+ binding site, this mutation destroys the core domain-DNA
interaction.87 Their differences in stability 86 and function will allow us to establish the ability of our screen to report on functional folded core variants. The different mutants used as negative controls are summarized in Figure 16. The dynamic range of the screen was proven by the different phenotypes obtained for the different mutants as shown in
Figure 17. Low fluorescence level was obtained for the wt-like Quad mutant showing that it is the transcriptional activity that is compromised in the modified lac operon.
Intermediate fluorescence levels were obtained for the co-transformation with the temperature sensitive V143A and R249S. The most common mutant R175H and the contact mutant R273H gave the highest amount of fluorescence. The optimizations for temperature and arabinose concentration as detailed above were done for the mutants and
Quad and the comparison showed that cells grown under the previously optimized conditions, 48 h at 30 °C in media containing 0.005% arabinose showed maximum difference in the fluorescence levels between the Quad and the various mutants. But the fluorescence intensities in the presence of mutants were not as high as desirable, although in the presence of Quad, the cells were as dim as in the absence of any GFP.
77
R175H
R273H
R249S V143A
Figure 16: Mutant Properties a) The four mutants selected as negatives are shown as sticks on the structure. Picture made from structure of Quad (PDB 1UOL) using Pymol.b) Categorizations of the mutants and the effect of the mutation to stability is shown.
78
Figure 17: Optimization of conditions for screening The various plates were screened at different concentrations of arabinose (as indicated by the percentages in yellow in each plate). The four mutations and the Quad which was used as the WT were subjected to optimizations. Plates screened at 0.005% arabinose showed the maximum discrimination between the mutants and Quad.
2.5 P53 responsive lac operon with robust transcription using a combinatorial library
We wanted to further improve the difference in fluorescence levels that reports the binding of p53. Using the literature reported sequences to modify the lac operon, we succeeded in making the transcription of GFPuv p53-dependent, but with an overall
79
Quad over-expressed Quad not expressed
d
Figure 18: Selection of variants from the binding domain library a, b) Co-transformations of the binding domain library with the Quad mutant plated on media with and without arabinose respectively. Shown in the orange square is an example where the colony exhibited low cellular fluorescence in the presence of p53-Quad and had higher cellular fluorescence in the absence of the p53-Quad. (c) the sequences of twelve such colonies chosen from the DNA binding domain library. (d) The twelve colonies showing efficient expression of GFP after separation of the plasmid expressing p53 by digestion. Pictures of the plates were taken under UV-illuminator using a long wave UV lamp (365 nm)
80
reduction in the transcriptional efficiency of the GFPuv even in the absence p53. We
aimed to improve the transcription efficiency from this artificial p53 responsive lac operon while preserving the ability of p53 core domain to bind to it. In order for this we replaced the operator with a library of p53 binding DNA sequences based on the reported
101 consensus binding sequence,PuPuPuCWWGPyPyPy in tandem repeat. Degenerate oligos encoding (RRRCWWGYYY) 2 flanked by sequences complementary to the pGFPuv plasmid were PCR amplified using the pGFPuv as the template. This library can
have 65,536 possible variants (two possibilities each at 16 positions will lead to a library
of 216 variants) and a library of ~105 was generated. We inverted the scheme of
transformation in this case in order to achieve maximum efficiency of the transformation.
We generated a strain containing pACBADp53-Quad and competent cells of this strain
were transformed with the pGFPuv-BD-library. Again we wanted to choose variants that
were optimum, i.e., the variants that gave maximum cellular fluorescence in the absence
of Quad and minimum fluorescence in the presence of Quad. Since p53 is expressed
under the control of the highly tunable arabinose promoter we were able to simulate the above conditions simply by growing the cells in the presence and absence of arabinose.
We used replica plating to allow us the spacially identify the optimal library members.
Cells containing both pGFPuv-BDlib and pACBAD-p53 Quad were initially plated on solid media containing ampicillin and kanamycin. This represents a system where the
Quad variant is not expressed. Overnight cultures were carefully transferred to solid media containing ampicillin, kanamycin and 2 % arabinose, where the Quad variant is
81
overexpressed using nitrocellulose membranes and both the plates were further incubated
at 30 °C. The plates were analyzed for cells with diminished fluorescence in the presence
of arabinose and were considered optimal. Twelve such colonies were isolated and
analyzed. The colonies which maintained excellent expression of GFPuv upon clearing
the p53 expressing plasmid were considered to be good candidates for being the reporter
plasmid for the screen. Clearing the pACBADp53-Quad plasmid was achieved by
digesting the co-transformation with restriction endonucleases that cut only on the p53
expression plasmid followed by re-transformation. The efficiency of clearing the plasmid
was confirmed by the absence of grown on media containing kanamycin. The
fluorescence levels of these plasmids are shown in Figure 18. Four of these were then
tested for their efficacy in differentiating between the Quad variant and the hotspot
mutants under previously optimized conditions. One such colony was which exhibited
maximum discrimination between the Quad and mutants was used in further screening
experiments. The sequence for this particular DNA binding domain was found to be
“GAACTTGCCCGGGCTTGCCC”. We compared this sequence with the p21 sequence,
GAACATGTCCCAACATGTTG, which is a known tight binding sequence for p53.
Significant difference lie at the CATG stretch sandwiched between the purines and the pyrimidines. Most of the known binding sequences of p53 exhibit this CATG pattern whereas the sequence resulting from our screen has CTTG this position. This may have
been important for the transcriptional activity of the lac promoter. A comparison of the
colonies that resulted from the transformation of Quad and the various negatives with the
library-derived binding domain and the Kern sequences is shown in Figure 19.
82
Figure 19: Screen with improved dynamic range.
A comparison of the cellular fluorescence levels when the hotspot mutants were transformed with the library member (Binding Domain 1, BD1) on the left and the p53 DNA binding sequence reported by Kern et al. on the right. The different variants are indicated in yellow on the plate.
83
2.6 Discussion
We have successfully generated a high throughput system that screens for the function of p53 core domain. V143A, which is a thermosensitive variant, is only slightly fluorescent when screened at 30 °C whereas R175H, one of the very prominent mutations in cancer, gives the maximum fluorescence. The other structural mutants fall between these two mutations with respect to the scale of fluorescence. Our screen is based on the sequence specific DNA binding property of p53. We have engineered the lac operon which allows the transcription of the downstream GFP gene in a p53 dependent manner. Therefore the level of GFP expression is a direct correlation to the function of p53, which in turn indicates the structural integrity of p53. When we used the literature known p53 binding sequences to modify the lac promoter, the Quad mutant bound to the operator modified with the Kern sequence strongly and resulted in very low cellular fluorescence. But when screened with the negatives, or even in the absence of any p53, the cellular fluorescence was lower than unmodified pGFPuv. This showed that although the Kern sequence exhibited optimal binding to the p53 variants, it compromised the transcriptional activity of the lac promoter. We were able to consistently observe difference in cellular fluorescence in the presence and absence of p53 variants when screened on plates. The same differences were not reflected when the cells were grown in liquid media. In these experiments although the Quad gave the lowest fluorescence, the trend of the fluorescence levels for the different mutants analyzed were not captured. Expression of p53 variants needed to be induced using higher amounts of arabinose (0.1%) in the liquid
84
culture experiments to obtain reproducible data for the Quad. But at this higher
expression level of p53, the small differences in fluorescence between the different
hotspot mutations fell below the noise level of these experiments.
We replaced the operator sequence with a library of DNA binding sequences based on the
consensus binding sequence for p53 and simultaneously optimized for transcriptional
activity and the p53 binding. Using replica plating, we compared the same cells grown in the presence and absence of Quad. Three different phenotypes were observed for the resulting cells. In the first, the cells were fluorescent, irrespective of the presence of
Quad. These cells showed robust transcriptional efficiency from the modified lac promoter but it was no longer sensitive to p53 binding. The second phenotype observed was when the cells remained non-fluorescent whether Quad was expressed or not. This clearly indicated compromised transcriptional ability. The third category of cells showed different levels of fluorescence in response to Quad. This indicated that the promoter in these cells were able to transcribe for GFP in the absence of Quad and gets blocked in the presence of Quad and thus were optimized for both transcriptional activity and p53 binding. The conditions in which the cells were screened, namely temperature and the concentration of arabinose were also optimized. The temperature of growth has implications on the stability of the various p53 mutants studied in addition to the chromophore maturation of GFP. Varying the concentration of arabinose allowed us to tightly regulate the expression of p53 variants.
85
Screening methods are a valuable tool for high throughput analysis of proteins and help us in deciphering the changes that are a result of mutations to the proteins. A variety of screening systems have been developed for various proteins allowing us to choose the proteins based on various properties on the basis of cellular expression, resistance to proteolysis, ligand binding, catalytic activity, etc. Many methods, including phage display, mRNA display, in vitro proteolytic treatment, NMR and mass spectroscopy screening have been used to analyze the proteins. In vivo screening offers the advantage of providing native like conditions for the selections. The effective concentration of the
POI is maintained in addition to other cellular factors like molecular crowding, presence of chaperones, etc., are an added advantage of in vivo systems. Yeast, bacteria and phage are typically used as host systems for such in vivo screens. Each of these offer unique advantages and have different disadvantages. The eukaryotic machinery in yeast allows the POI to undergo post-translational modification. This is important for proteins where post-translational modification is important for the function. The occurrence of false positives is a major drawback for this system. The disadvantage of yeast n-hybrid systems is also that the POI needs to be imported to the nucleus. Nuclear import, especially of foreign proteins is often inefficient and in some cases non specific interactions lead to the false positives. In addition, various other protein-protein interactions in the yeast genome might lead to the positive phenotype. The library size of variants that can be screened using the yeast system is limited by the low transformation efficiency of yeast. Phage display, usually as a pVIII fusion is another vastly used host to
86 screen for various properties including ligand binding and resistance to proteolysis. The latter offers a unique opportunity to screen for proteins based only on fitness.
This screen that we have developed will allow us to study the effect of mutations on the structure of the protein and provides us with a high throughput method to study the determinants of stability of this protein. We have utilized the DNA binding property of p53 to develop a functional screen for p53, a β-sheet scaffold. A parallel screening system was developed to study ROP, a model protein which has coiled coil architecture.24
ROP regulates the copy number of ColE1-origin plasmids in bacteria. When GFP is expressed from a ColE1-origin plasmid, the level of GFP is a direct readout of the function of ROP. Structural analysis of the functional variants selected from this screen showed that they can be WT-like or destabilized to be molten globular25. This wide range of stabilities tolerated by the screen allowed the compilation of a large data set for this core packing propensity of this protein.
Detailed understanding of the sequence-structure relationship of p53 can aid in engineering this protein for pharmaceutical applications in addition to helping us predict the effect of various mutations to the core. As an initial step towards this, we have constructed and analyzed core randomized libraries of this protein. The results obtained from these are discussed in Chapter 3. The application of the screen to stabilization of loop regions by rational design is discussed in Chapter 4. The screen can also be used in
87
conjunction with small molecules that can rescue the function of the protein. Some
studies towards this goal are discussed in Chapter 6.
2.7 Materials and Methods
2.7.1 Construction of Reporter Plasmid – pGFPuv BDx
The lac operon region of pGFPuv was modified to contain three known binding domains of p53. These literature reported binding domains were constructed by reassembly of synthetic genes followed by amplification using primers containing the respective restriction sites and ligated between AlwNI and HindIII sites in pGFPuv. A part of the vector starting from the PvuII site, 5’ CAGCTGGC ACGACAGGTT TCCCGACTGG
AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC TCACTCATTA
GGCACCCCAG GCTTTACACTTTATGCTTCC GGCTCGTATG TTGTGTG 3’ was amplified using oligos 5’ AATAATCAGCTGGCACGACAGGTTC 3’ at the 5’ end and
5’ CCACACAACATACGAGCCG 3’ at the 3’ end . The region of the lac promoter containing the operator sequence was reassembled from oligos: 5’
CGGCTCGTATGTTGTGTGGTACAGAACATGTCTAAGCATGCTGGGG
TCACACAGGAAACAGCTATGACC 3’ and 5’ ATTATTAAGCTTGGCGTA
ATCATGGTCATAGCTGTTTCCTG 3’ to generate BD1 where the operator, shown
with the bases underlined replaced the original sequence of the operator :
GGCTCGTATG TTGTGTGGAA TTGTGAGCGG ATAACAATTT CACACAGGAA
ACAGCTATGA CCATGATTAC GCCAAGCTT 3’. This reassembly was mixed with
the previous PCR product generated and amplified using 5’ and 3’ primers, 5’
88
AATAATCAGCTGGCACGACAGGTT 3’ and 5’ ATTATTAAGCTT
GGCGTAATCATG 3’ respectively. The overlapping region for the original PCR product
and the reassembly reaction is shown with dashed underline. Following the same scheme,
BD2 was generated using 5’ CGGCTCGTATGTTGTGTGG TACAGAACATGTCTA
AACATGCTGGGG T ACA GAACATGTCTA AGCATGCTGGGG TCACACAGGA
AACAGCTATGACC 3’ and BD3 was generated using 5’ CGGCTCGTATGTT
GTGTGGCCTTGCCTGGACTTGCCTGG CCTTGCCTTTTCT TCACACAGGAA
ACAGCTATGACC 3’.
Overlap PCR was used to insert an NheI site into the original pGFPuv plasmid to enable direct cloning of the binding domain library. A library of p53 binding sites, based on the consensus binding sequence RRRCWWGYYYRRRCWWGYYY was constructed by thermally balanced inside out (TBIO) PCR of degenerate oligos 151 using the following
oligos: 5’AATAATAATGCTAGCATTA ATGTGAGTTA GCTCACTCAT TAGGCA
CCCCAGGCTTTACA 3’ , 5’ TAGGCACCCCAGGCTTTACACTTTATGCTTCCGG
CTCGTA TGTTGTGTGG 3’, 5’ CATAGCTGTTTCCTGTGTGARRRCWWGYYY
YYYCWWGRRR CCACACAACATACGAGCCGG 3’ , 5’ ATTATTATTAAG
CTTGGCGTAATCATGGTCATAGCTGTTTCCTGTGTGA 3’ . pGFPuv binding
domain library (pGFPuv-BD) was obtained by ligation of the PCR amplified product
between NheI and HindIII sites. About 200 fluorescent colonies were selected from this
library to obtain a library of positives. The cells those were fluorescent when grown in
89
Figure 20: Sequence of Quad mutant
The DNA sequence of the Quad mutant used is shown in black. The corresponding amino acid sequence in our construct is shown in blue, aligned to the codons. The various hotspot mutations studied are highlighted in red
plates containing ampicillin were selected using a hand-held UV lamp emitting 365 nm
UV radiation.
90
2.7.2 Construction of Expression Plasmid – pACBADp53
A kanamycin resistant plasmid containing a p15A origin which encodes the p53 core
domain gene under the control of an arabinose promoter was constructed. This was done
using PCR amplification of the region encoding p15a origin and kanamycin resistance
from the commercially available vector pACYC177. The region encoding arabinose
promoter and GFP was amplified from pUCBADGFP. The two PCR products were
ligated using NotI and BglII sites to obtain pACBADGFP which serves as a null to clone
in p53 variants. P53 core domain variants, p53-Quad, V143A, R175H, R249S and
R273H were cloned into this vector between NdeI and EcoRI sites. P53-core domain
variant “Quad” gene was constructed by Stemmer reassembly of synthetic
oligonucleotides encoding residues 94-298, 152 followed by an amplification reaction
using primers containing the required restrictions sites and cloned into pACADGFPuv
between NdeI and EcoRI sites. Four of the known hotspot mutants of p53, V143A,
R175H, R249S and R273H were constructed by overlap PCR, using p53 Quad as the
template. The DNA sequence for the Quad mutant is shown in Figure 20.
2.7.3 Choosing the Reporter Plasmid form a Library of positives
Replica plating was used to isolate the pGFP-BD library member which exhibited the
maximum difference in fluorescence in the presence of Quad and when Quad is absent.
pGFP-BD library was transformed into electrocompetent DH10B cells containing p53-
Quad. The transformation was initially plated on to LB kanamycin (kan) and ampicillin
(amp). Following 18 h of growth, the colonies were transferred on to plates containing
91
LB kan amp and 2% arabinose using nitrocellulose membranes. Both the plates were incubated at 37 °C for 18 h. Careful comparison of the plates allowed for selection of
colonies that gave low or no fluorescence in presence of Quad (2% arabinose) and high
fluorescence in the absence of Quad (no arabinose). Twelve such colonies were selected
for further analyses. The plasmid containing Quad was separated by digesting with NcoI
and SphI and transformed. The restriction digests linearize the DNA and therefore reduces the transformation efficiency. The colonies resulting from the re-transformation were grown to saturation and plated on to LB amp to analyze the fluorescence in the absence of Quad. The efficiency of plasmid separation when no growth was observed when the cultures were plated on to LB kan. Four of the twelve variants, (labeled 1, 4, 7 and 12) were co-transformed with the p53 hotspot mutants and based phenotypes of positive and negatives, BD1 was chosen to be carried forward in further studies.
92
Chapter 3: Utilizing the cell baed screen to identify functional p53 mutations
Contributions
The material presented in chapters 2 and 3 will be published as a full paper co-authored
by Brinda Ramasubramanian and Thomas J Magliery. The work summarized in this chapter was produced by the primary author. The experimental design and data analyses were accomplished by the primary and the corresponding authors.
3.1 Summary
We developed a functional screen for the core domain of p53 based on its DNA binding function (Chapter 2). The screen, as discussed previously, utilizes the DNA binding
function of the transcription factor p53 to displace the RNA polymerase from a promoter which expresses GFPuv from an engineered lac operon. A non-fluorescent or negative phenotype indicates a functional p53 and a fluorescent phenotype results when the p53 present is inactive. We have used the screen to identify mutations that retain the function of p53 from a library of p53 core variants.We generated three different core-randomized libraries to test the screen. The initial library in which four hydrophobic residues in the
93
core domain away from the DNA binding region were randomized had a low frequency
of occurrence of positives. These results indicated that either the mutability of these
residues is low and/or the stability tolerance of the screen is very narrow. In order to
deconvolute these effects, two smaller libraries, each a subset of the initial library, were
generated. Generating smaller libraries will result in a larger fraction of positives that can
be analyzed. This in turn will allow us to understand the low occurrence of positives in
the four-residue library. In order for this I255 and T253 were separately randomized from
the A161and A159. The results obtained from the screening of these smaller libraries
indicate that mutations to positions A161and I255 are destabilizing indicating that these
positions are considerably less tolerant to mutations than positions A159 and T253.
While position A161 is highly selective for small residues position I255 appears to be dominated by residues with high β-sheet propensity. In addition, postions A159 and T253 are biased towards smaller residues. A set of 8 positives obtained from the IT library were analyzed in vitro for their stability and DNA binding ability. The results obtained from the in vitro characterization show that the variants that had a positive phenotype in the screen were well folded and were very close to the parent Quad in terms of stability and function. We hypothesize that improving the tolerance of the screen will prove to be useful in analyzing a larger number of positives to gain insight into the folding and stability of p53 core domain. Adjacent and cross strand alanines have been shown to contribute to the stability of GB1. Therefore this alanine pairing might have a significant contribution to the packing and stability in the context of p53 core domain. Similar studies can be done by choosing other core domain residues in this protein and
94
randomizing them to identify the stability determinants of p53. Some of the prospective residues that can be chosen are discussed at the end of this chapter.
3.2 Significance of studying libraries for core directed design
The hydrophobic core of a protein is a major determinant of its stability.153; 154 It is
hypothesized that the degree solvent exposure of surface exposed and loop residues
remain similar for the native and unfolded states and therefore, changes to the surface residues are not as destabilizing to the protein as such mutations in the core.155; 156 Hence, a major avenue for the design of proteins has been core directed.157; 158 Computational
and combinatorial approaches have been utilized to design or re-design the cores of
proteins. Well-behaved proteins like T4Lysozyme, λ repressor, ubiquitin, GB1 have been used as model systems to implement various core directed design strategies.26
The significance of core packing and its contribution to the stability of the protein has
been extensively studied by the Matthews group using T4 lysozyme as a model system.159
The array of studies on this protein establishes the dominance of hydrophobic effect.
They found that this protein is surprisingly tolerant to various substitutions to the core and result in folded active proteins. Initial studies that replaced an Ile in the core to 13 other residues were found to destabilize the protein and the extent of destabilization correlated with the increase in solvent exposed surface area.160 Following this ‘cavity
filling’ and ‘cavity creating’ mutations were engineered into the core using site directed
mutagenesis and their results established that the effect of substitutions to the core
95
depends on the context of substitutions even when the packing of the residue under study
is optimum.161; 162 These studies established that although the hydrophobic effect plays a
dominant role in the stability of the protein, side chain packing interactions also
contribute significantly.
The significance of core packing was examined by the Fersht group using barnase as a
model system.163 All the positions in barnase, a protein that has a 13-residue helix
packing against a 5-strand anti-parallel β-sheet were randomized in 3 stages. Overall, the
protein was tolerant to these changes and a significant percentage of the mutants showed
some activity. The authors suggest that the specificity of the fold encoded in the
hydrophobic core is minimal and therefore the first step in evolution, the achievement of
activity, has a low barrier. Of note here is that the percentage of functional protein
obtained when the six residues on the sheet side were varied was much lower than that
observed when the six positions in the helix side were randomized. Since the sheet-side residues were randomized in stage 2, where the helix-side residues already incorporated changes, this might reflect on the large changes to the core, it might also imply the lower mutability of the sheet-side residues.
The ground-breaking studies of Woolfson and co-workers utilized phage display to generate large libraries of ubiquitin.164; 165 Their selection was based on the resistance to
degradation by proteases by well-folded proteins. The variants of ubiquitin were
expressed as fusions to 6ΧHis tag and were bound to Ni-NTA agarose columns. Phage
96
resistant to protease infected bacteria and thus could be amplified to decipher the
sequences. Multiple rounds of these were designed to yield stabilized variants of
ubiquitin. This was the first report of a combinatorial selection based purely on the
stability of the protein and not linked to the function of the protein. Their studies showed that ubiquitin is surprisingly restrictive to sequence changes in its hydrophobic core.
Phage display is a well established technique, and when combined with shotgun scanning methods, provides a powerful tool to analyze large libraries of variants and
screen them, based purely on stability in addition to the ability to screen based on ligand
binding function.166
The Sauer group examined the effect of core packing in λ repressor using cassette
mutagenesis. Initial studies that randomized eight positions in the dimerization interface
two of the positions were found to contain most of the information to result in effective
dimerization.167 Using the same methodology, the central core of the protein was
repacked and their results show that most of the information for the stability of the
protein is contained in the hydrophobic core of the protein.168 This implies that not all
hydrophobic cores pack to yield active protein, and that, additional steric considerations
that specify the packing do need to be considered. The role of steric compatibility is
illustrated by the observation that the number of functional substitutions at a particular
position is governed by the number of other residues that are allowed to co-vary. Their
studies indicate that the core residues are highly intradependent.
97
Computational algorithms essentially specify the structural co-ordinates of the target
protein. An energy minimization function, a rotamer library of permissible conformations
of each residue and a search method to find sequences of lowest energy represent the
algorithm.17 The specifications of the inputs are different for different algorithms. In their
pioneering work, Desjarlais and Handel re-packed the hydrophobic core of phage 434-cro
protein using a “custom made” rotamer library generated for the specific backbone of
interest and a genetic algorithm for energy minimization.169 They characterized three of
the designed variants which had 4, 7 and 8 sequence changes from the WT core. Their results indicated that this protein can accommodate such large changes to its core. Also, one of the designed variants resulted in slight stabilization over the WT. Dahiyat and
Mayo implemented a side-chain selection program which explicitly considers the specific packing interaction to choose for optimal side chains and their conformation as their
design criteria.170 By varying the strengths of packing constraints, they were able to
assess both the extent to which such packing constraints are important in computational
protein design and the tolerance of the hydrophobic core to sequence changes. Four
different designed variants were selected for in vitro characterization. Their results
indicate that the packing constraints correlate to the foldedness of the designed cores and
when the packing constraints are relaxed it can yield to highly mobile molten globular
structures and completely disordered chains. The Mayo group has also developed
computational algorithms that can sample a vast combinatorial library of sequences for a
target ββα motif. 171 This was one of the pioneering studies on a motif that contained
three secondary structure elements helix, sheet and turn.
98
These studies emphasize the role of the hydrophobic core in the stability of a protein and
also imply the role of other interactions including steric packing of the side chains to
specify the unique fold and function of each protein. The hydrophilic residues determine
the solubility of a protein and in addition, allow the exclusion of the hydrophobic core
residues from the solvent and may contribute to the stability in this manner. The significant coupling found among the core residues offer further challenges to design of
proteins. This emphasizes the need of large bodies of empirical data that are accessible
using combinatorial experiments in the context of different proteins to provide maximum
information for design efforts.
Bacterial systems are one of the most studied and understood systems and can be easily
manipulated. They have a short doubling time and therefore is ideal for the development
of screens. These screens will aid in interrogating large libraries of variants at a time and
find the ones with the desired properties. Linkage of the genotype to the phenotype
allows us to study the effect of mutations on a large number of variants, thus contributing
the repertoire of data that will, in an ideal scenario, allow us to predict a structure from a
sequence or predict the effect of point mutations to a particular protein. To that end we
have used a bacterial cell-based screen to analyze libraries of core randomized libraries of
the tumor suppressor protein p53. The data collected on this protein will provide a unique
opportunity to both understand the sequence structure stability relationship of this protein
and also utilize stabilizing mutations to design novel drugs against cancer.
99
3.3 Core randomized libraries of p53 DNA binding domain
3.3.1 Four-position Library
Our lab has extensively studied the core packing requirements of the model protein ROP, which forms an anti-parallel four-helix bundle, by generating combinatorial libraries of core-randomized variants of this protein followed by screening for functional variants using a cell-based screen. Initially, a four position library which randomized the residues
I15, T19, L41 and A45 in the two central layers of the core domain to all hydrophobic amino acids and the alcohols was studied. In vitro characterizations of active variants of
ROP resulting from this library showed that folded variants with a wide range of stabilities were obtained as positives from the screen. This allowed the study of large number variants from this library. We applied similar studies to p53 core domain to interrogate the effects that core mutations have on the packing and stability of a β-sheet protein using the screen to identify stable and functional variants of p53 from a library of p53 variants. We chose four residues, A159, A161, T253 and I255 in the core of p53, away from the DNA binding region to randomize, in order to analyze the effects of such mutations. The striking similarity between the four residues AATI in the p53 core domain and the ITLA residues in the ROP core domain which has previously led to successful study of a large number of variants provided added credibility to the choice of residues.
Initially a BsrGI site was engineered in the p53 gene using a silent mutation to the codon at T125. An NNK library which randomizes the positions A159, A161, T253 and I255 to all 20 amino acids was generated using PCR and ligated between BsrGI and BsaI sites in
100
the p53 gene. In order to screen for active variants, the library was transformed into
competent cells of a DH10B strain which already contained the pGFPuv screening vector.
This library yielded ~106 colonies, indicating a 10-fold coverage of the theoretical library
space (204 for four-position randomization into all 20 amino acids). Cells were grown at
30 °C for 48 h with 0.005% arabinose. The colonies that appeared as positives (low cellular fluorescence) were grown to saturation and re-plated and grown under screening conditions. The results from this NNK-4 library of the p53 core domain were convoluted in multiple ways.
First of all, the positives, i.e., cells exhibiting low cellular fluorescence for this AATI library were found to be about 0.025%. Colonies which exhibit low cellular fluorescence were isolated from solid media and were typically re-streaked to confirm the fluorescence levels. This procedure led to fluorescent cells. The initially observed non-fluorescent phenotype may be have been erroneous due to the large number of colonies screened and the differing apparent fluorescence emitted by colonies with different sizes. Each plate contained about 4000 colonies and only one or two of them were of the required phenotype. This becomes an issue because the overall goal of this library was to enable us to collect data on a statistically significant number of variants and thus allow us to contribute to the existing statistical model for the occurrence of stabilizing residues in the core of a predominantly β-sheet protein. The low occurrence of positives precludes the possibility for this.
101
About 36 positives obtained from approximately 40 plates were verified by re-streaking
and the sequences were checked by DNA sequencing. Only 50% of these positives
yielded complete full length proteins, and the rest of the sequences contained insertions
or deletions, thus leading to nonsense protein sequences. Even among the 19 or so sequences of positives that gave complete reads, a quarter of them had stop codons in the
sequences which lead to truncated proteins. This makes the actual occurrence of positives
in the screen much less than the observed 0.025%. One of the disadvantages of a screen
that is based on a negative phenotype is that any spontaneous mutations in the reporter
plasmid will lead to a ‘positive’ phenotype if GFP expression is impaired for any reason
that exclude blockage of transcription by well folded p53 variants. The extremely low
rate of occurrence of positives complicates the picture and makes it difficult to draw
conclusions on the structure function relationship of the protein. When 10 of the naïve
sequences were analyzed, only seven of them contained the full length protein that also
contained deletions and stop codons in addition to having charged residues packed into
the core. These observations raise the possibility that the low occurrence of positives
from the library may have been due to errors in the library itself. The other possibility is
that it might be due to the failure of GFP expression. If this were the case, both authentic
clones and insertion and deletion variants would appear as positives. Since this is not
reflected in the various sequencing results, it appears that the low rate of positives is due
at least in part to the errors in the library. It is also feasible that the low rate of positives
from this library is due to the high stringency of packing at these positions. This needed
to be rigorously explored.
102
Finally, interpreting the results of the screen came from the sequences encoded by the core variants that yielded the positive phenotype were challenging. These sequences had a high occurrence of proline which would not be expected to be stabilizing mutations in the core of any secondary structure. Also some of the positives contained a tryptophan which not an isosteric replacement for the original alanine, threonine and isoleucine residues. In addition, the positives which contained the tryptophan also had
Figure 21: Results from AATI library a) Crystal structure of Quad mutant (PBD 1UOL) with A159, A161, T253, and I255 which are the four residues that were randomized shown as spheres. The spheres are colored as green for carbon, red for the oxygen and blue for nitrogen. The pictured is rendered using Pymol (DeLano Scientific) b) Sequences of apparent positives obtained from the ITAA library.
phenylalanine, another large residue. Together, these variants appear to be highly overpacked. A summary of the apparent positives sequences obtained from this library is given in Figure 21. These results required us to do further experiments to characterize
103
the validity of the screen and these positions that were randomized in the core domain.
Considering the various sequencing results, we concluded that the positions that were randomized to generate this library have a low inherent mutability in addition to the
presence of errors in the library. Since p53 is only marginally stabilized under
physiological conditions, it is conceivable that any more mutations to the core further
destabilizes the protein, thus leading to the low observed occurrence of stabilized
functional variants in the screen. Alignment of the sequences of p53 core domain in the
Pfam data base provides further proof that the residues in these positions are much
conserved and most of the mutations found lead to loss of function of the protein, making
them disease causing mutations. To further investigate the mutability of these residues,
we generated two smaller sub-libraries. The results obtained from these and further
biophysical characterizations provide insight into the properties of the screen and the
significance of these residues.
3.3.2 AA and TI Sub-libraries
The low rate of occurrence of positives in the AATI library makes it difficult to analyze
the positives to understand the packing specificity at these positions. Since the rate of
occurrence of positives is low and is similar to the rate of occurrence of false positives, it
is difficult to deconvolute the various effects: packing stringency at these positions, errors
in the library, and errors in GFP transcription, all of which result in low cellular
fluorescence indicating a positive phenotype. To make the analysis of these possibilities
easier, we generated two smaller sub libraries, only randomizing two positions each time.
104
Each clone in the smaller libraries represents 1 in 400 possibilities which is a much larger
number than 1 in 4000, the observed rate of occurrence of apparent positives in the
AATI library. The higher frequency of positives will allow us to analyze a larger number of variants and therefore lead to a better understanding of the importance of these
positions.
The residues A159 and A161 were randomized to all 20 amino acids using oligos that
encoded for degenerate NNK codons at these positions. The library generated using PCR with Quad as the template was ligated into the pACBADp53-Quad vector between BsrGI and BsaI sites. The observed library size was well beyond the theoretical library size of
400 clones (202 for randomization of 2 positions to all 20 amino acids). The library was
screened for activity by transforming into DH10B cells, which already contained the
screening plasmid pGFPuv-BD1, by electroporation. The resulting colonies were
analyzed for actives based on the cellular fluorescence.
The results from the AA library are summarized in Figure 22. These positions show a
striking preference for alanine and cysteine. Position A161shows high selectivity for
small residues. In fact, the only residues that are tolerated at this site are Ala, Cys, Ser,
Asn and Val. Ser and Asn are small polar residues and Val is highly preferable in the core
of β-sheet proteins. But there is an overwhelming preference for Ala and Cys at this
position. Position A159 has a wider a distribution of residues in comparison to position
A161. A considerable distribution of larger residues such Leu, Ile, Val, Met and Thr are
105 found at this position although Ala is still highly preferred over the other residues.
Analysis of the pairwise distribution shows that when any of these large residues occur at this position, the A161 is restricted to the small residues. For example, when methionine
Figure 22: Results from AA library: a) Occurrence of positives in the AA library. Codon adjusted frequency of occurrence for each of the positions in the AA library
106
occurs at position 159, position 161 is always an alanine. This implies significant
coupling between these residues. Also, A161 packs against I195 in an opposing helix in
the crystal structure of Quad. This large residue may also restrict the various residues that
can be accommodated at A161. The absence of charged residues, large hydrophobic
residues, glycine and proline further establishes the packing stringency of these positions.
Alanine, which is the WT residue, is found to occur as one of the most frequent residues,
and the AA pairing is also observed at a high frequency. The high frequency of
occurrence of cysteine was intriguing and we decided to characterize the consensus ‘CL’
variant in vitro.
The residues T253 and I255 were also randomized to all 20 amino acids using oligos that
encoded for degenerate NNK codons at these positions. The library generated using PCR
with Quad as the template with a cloning scheme similar to that of the AA library. The
observed library size was well beyond the theoretical library size of 400 clones (202 for randomization of 2 positions to all 20 amino acids). The library transformed and screened for activity as previously described for the TI library.
The observed occurrence of positives in the TI library was close to 13%, which is about
50 of the 400 possibilities, if every clone is valid. We sequenced about 32 active and 64 inactive variants of p53 from this library. The results from the positives in the library are summarized in Figure 23. The occurrence of the various amino acids in these positions
107 indicate that all the amino acids except large hydrophobic residues like Phe, Trp and Tyr, charged residues, Pro and Gly are present in the sequences for the positives. The high flexibility of Gly and the restricted geometry of Pro make these residues non-ideal in the core of the protein. The absence of the large hydrophobic residues among the positives indicates that overpacking these positions lead to destabilization of the protein. The high occurrence of Val and Ile at position I255 indicates that this position is highly selective for residues with high β-sheet propensity. The conformational flexibility of Met may the determining factor which allows it to occur at position I255 at a relatively high frequency. The distribution of residues at position T253 is more widespread, allowing for small and polar residues. Thr may make hydrogen bonding interactions in the core with the back bone or with other residues. The higher occurrence of polar residues like
Gln and Asn in this position indicates that such hydrogen bonding interactions may be important in this position. But the higher preference for cysteine at this position indicates that size may be the major determining factor at the T253 site. An analysis of the pairwise distribution of various amino acids confirm that size and β-sheet propensities of the residues are the two major determinants of packing in these positions. The absence of large hydrophobic residues like tryptophan, phenylalanine and tyrosine, a large alcohol, suggests that these may lead to hydrophobic clash in the packing and therefore are poor substitutes for smaller amino acids, Ileand Thr. The overall preference for Val and Ile in both these positions compared to lower preference for Leu supports the preference for residues with high β-sheet propensities. We also see that charged residues such as Asp,
Glu, Lys and Arg are not represented in the positives selected from this library,
108 presumably because these will not be tolerated within the hydrophobic core of the protein
Figure 23: Positives from TI library a) Positives at each of the positions from the IT library. The frequency of occurrence is adjusted for the codon bias from the NNK library.
unless when the charge interactions are satisfied by a salt bridge. If size were the only
constraint, we would have expected pairs like TI, LT and TL. This may have been due to
undersampling since only 32 positives were sequenced. Considering that 13% of the 109
sequences are positives, 52 clones will be unique positive sequences and in order to cover
all the positives, we will need to sequence a 3-10-fold excess of that, approximately 150-
520 clones. On the other hand, the preference for polar residues at T255 and the bias
towards residues with β-branching at I253 indicates that the packing is determined by
more constraints than just size. In order to rigorously analyze it, a larger number of
colonies will need to be sequenced.
In the negatives, when we just look at the identity of residues at positions that were
randomized, we can see that all residues seem to be occurring. We need to analyze the
pairwise distribution of the negatives in order for us to draw conclusions on why the pair
was not folded well and therefore was not functional. The frequency of pairwise
distribution of amino acids in the TI library is summarized in Figure 24. These results
make it obvious that certain residues occur in the list of negatives due to the residue with
which it is paired. For example, isoleucine, which is the native sequence, when paired
with aspartic acid leads to highly fluorescent cells. Similarly threonine occurs in the
negatives when it is paired with glycine, proline or arginine. Just like the case of
positives, all possible negatives have not been explored. Out of the 66 negatives sent for
sequencing, 22 did not produce any sequence if p53 core domain and the rest of the 44 were analyzed.
These results demonstrated the clear advantage of screening systems in comparison to
selections.26 Screens allow for the negatives to be analyzed and we are able to make the
110 conclusion that the screen does allow only those variants which are functional to exhibit the right phenotype. The positions T253 and I255 allow for various mutations to small hydrophobic residues, residues with β-branching and small polar residues. The residues
I253 and A161 are more restrictive to mutations. At position A161 the sequences are mainly governed by the size of the residues and position I253 is selective to residues with
β branching. The results obtained from the ITAA library do mean that randomizing those four residues leads to destabilization and loss of function in a large population of the resulting mutants.
Figure 24: Negatives from TI library a) Pairwise distribution of negatives from the IT library b) p53 Quad mutant (PDB 1UOL) with the I255 and T253 hghlighted as sticks enveloped by transparent sphere. The image was rendered using Pymol
The sequences of the positives from TI and AA libraries show that two of the positions are more restrictive than the others and the results from the AA library also show that these positions contribute to the observed low number of positives in the AATI library. 111
Previous studies show that adjacent alanine residues in the core of a β-sheet structure
have energetic contributions to the structure although alanine is not considered to have high β-sheet propensity.172 Disrupting alanine-alanine pairing in the same chain has been
shown to have energetic consequences when occurring in the core of a β-sheet protein.
3.4 In vitro characterization of library variants
The ultimate proof of the validity of the screen can come only from the in vitro
characterizations of the different positives resulting from the screen. We have
characterized a few of the interesting variants using two standard methods for biophysical
characterization; urea mediated chemical denaturation and thermal denaturation. Nine
variants, eight from the TI library and one from the AA library, which were positives by
the screen were selected to characterize based on their sequences (Figure 25). The
variants had polar amino acids packed with hydrophobic amino acids like, N253 I255,
S253 V255 and others, all hydrophobes like V253 V255, L253 V255 and others were
chosen. These variants were amplified from the pACBADp53 vector using PCR and were
ligated into pET11a vector between NdeI and BamHI sites. The variants can be over
expressed under the control of a T7 promoter from the pET11a vector when transformed
into cells which has the T7 polymerase gene lysogenized into them. We have used C41
(DE3), a mutant strain of BL21 (DE3) for the expression of all the variants. This strain
has been reported to allow the expression of proteins which are toxic to the cells and have
also been previously used for the expression of p53 variants.50; 71
112
a. b.
Quad 8 R175H
7 1
6 2
5 3 4
Figure 25: Actives from T253 I255 library a) The sequences at positions T253 and I255 for the actives b) screening results for the various actives chosen from the TI library.
The L159 C161 variant from the AA library and the L253 V255 variant from the TI library failed to express and initial attempts to purify these proteins were not successful in isolating these proteins in concentrations that are required for the characterization. This may indicate that these are destabilized variants. The remaining seven variants expressed well and were able to be purified.
P53 was previously purified via cation exchange of the soluble fraction of the cell lysate.78 We have optimized the purification using Ni-NTA agarose column for the
6ΧHis-tagged version this protein, and the protein was eluted off the column by the cleavage of the 6ΧHis tag by TEV (Tobacco Etch Virus) protease while the proteins are 113
still bound to the Ni-NTA agarose column. On-column cleavage leads to efficient
separation between the protein molecules and minimizes exposure to imidazole during
the cleavage and elution steps. This also minimizes the chance of aggregation and
promotes efficient cleavage of the tag. The characterizations were optimized for Quad
mutant and four of the known hot spot mutants. The Urea melt data for these proteins are
shown and the data agree well with the literature reported D50 values (concentration of
urea at which 50% of the protein is unfolded) for these proteins.84
Since p53 is a DNA binding protein and the screen is based on the sequence specific
DNA binding ability of this protein we intended to characterize the binding efficiency of
a few positives obtained from the libraries to the DNA sequence obtained from the
binding domain library and compare with the binding efficiency to the literature reported
consensus sequence using fluorescence anisotropy. P53-DNA binding was studied using anisotropy experiments with 5’ flourescein (FL) labeled GADD45 DNA. 20 nM FL-
GADD45 DNA which has been reported to bind Quad p53 with a binding constant KD of
100 nM 71; 79 was used in all the experiments. The details of these characterizations are
discussed in the following sections.
3.4.1 Stability Measurements using Urea Denaturation
Urea denaturation was measured as a function of the change in intrinsic fluorescence of tryptophan at increasing concentrations of urea as detailed by Bullock et al.78 In the case of p53 core domain, the intrinsic fluorescence of the single tryptophan increases when the
114 protein is denatured. Therefore the denaturation of the protein can be monitored by the increase in fluorescence at 356 nm. The fluorescence maximum for tyrosine at 310 nm decreases as the protein melts. In addition, it has also been documented that an aggregated species of the protein give a maximum at 340 nm. The natively folded wt- protein is known to unfold via a two state mechanism.86 We have used a purification scheme different from that reported in the literature and we wanted to ensure that we are purifying the protein in the non-aggregated form.
We characterized 8 different variants from the TI library and one from the AA library choosing the variants as previously described. The LC variant from the AA library failed to express well and therefore may have been a destabilized variant. The various proteins buffered in sodium phosphate pH 7.2 were incubated with various concentrations of urea
(0 to 7 M) for at least 8 h. A concentration of 2 µM protein was maintained in all the samples. The samples were excited at 280 nm and the emission spectra between 300 nm to 400 nm were collected. Monitoring the change in fluorescence at 356 nm with the concentration of urea allowed us to follow the chemical denaturation of these variants.
The Quad mutant is known to unfold via a two state mechanism exhibiting an isofluorescence point at ~ 320 nm. The full scans for the denaturation curves show that
115
Figure 26:Wavelength scans for the different variants
2 µM of each of the proteins buffered in sodium phosphate pH 7.2, 5mM DTT were incubated in various concentrations of urea for at least 8 h. Data were collected by following the emission spectra between 300 and 400 nm after excitation at 280 nm. The various curves in each graph shows the spectra for proteins incubated with 0 to 7 M urea. Isofluorescence point was observed for all but variants: V253 M255, V253 V255 and M253 Q255, within the error for measurement.
(Continued)
116
Figure 26: Continued
(Continued) 117
Figure 26: Continued
118
most of the variants are well folded and donot have a third species present. But some of
the variants like VV and QL do have some amount of aggregated species present and
donot exhibit an isofluorescence point demonstrating a two-state transition. The presence
of the third species may be a function of the reaction conditions. We optimized the
purifications for Quad and sought to decipher the properties of the mutants under those
conditions. The results obtained from different variants suggest that some of them
donot have a two-state mechanism of unfolding. An uncharacterized third species as evident from the scans of the fluorescence melts seems to exist for some of the variants.
The various scans are shown in Figure 26.
Urea denaturation melts showed all the variants that were characterized to be native like and were similar to the Quad variant in stability. The chemical denaturation melts were fit to the Clarke and Fersht equation and the resultant D50 values are presented in Figure
27. These were very close to each other and to that of the Quad variant. This indicated that only highly stabilized variants can appear as a positive from the screen. This result suggested that the screen is highly stringent and that is one of the likely reasons for the observed low frequency of occurrence of positives from the AATI library.
3.4.2 Thermal Melts Monitored Using Circular Dichroism
To confirm the results from chemical denaturation experiments, we measured the thermal
denaturation trend of the various variants. It is known that p53 core domain unfolds
119
irreversibly with temperature. Therefore it is not possible to arrive at the thermodynamic
melting temperatures with these experiments, but we can observe the relative trends for
stability among the variants. Ang et al. have shown that the relative trends of stability
obtained from the thermal denaturation of p53 corresponded well with the trend obtained
from the chemical denaturation experiments.142 The wavelength scans of the variants
showed a positive peak at 234 nm and a minimum at 218 nm. Thermal melts were
monitored for change in signal at 218nm. Our thermal melt data showed that all the
variants had similar apparent melting temperatures and all the variants aggregate upon
denaturation. The most and the least stabilized variants were different only by 5 °C. The
results from the thermal denaturation curves are given in Figure 27 and a comparison of
the D50 and T50 are summarized in Figure 29. These additional experiments further
confirm that all the variants selected are close in stability.
3.4.3 DNA Binding Using Fluorescence Anisotropy
A final test of the validity of the screen is to measure and understand the DNA binding
function of the positives from the screen in vitro. Balagurumoorthy et al. have shown
that p53 core domain forms a 4:1 complex with its consensus DNA in a cooperative
manner.173 It has been shown that the core domains do not associate with each other to
form tetramers unless in the presence of DNA. Although the affinity of the p53 core
domain for the DNA is about three orders of magnitude weaker than in the presence of the tetramerization domain, it has been shown that the core domain interaction stabilizes
79 the p53 DNA complex and binds DNA with nanomolar KD values. We used
120
a.
b.
Figure 27: Characterization of Actives a) Urea mediated denaturation melts for the actives from the library. Change in fluorescence at 356 nm as a function of the concentration of urea is monitored. b) Thermal denaturation melts for actives from the library. Change in CD signal at 218 nm is plotted as a function of temperature. 121
Figure 28 CD wavelength scans for TI library variants CD scans of the various mutants in 50 mM sodium phosphate, 300 mM NaCl, pH 7.2 The concentrations of the proteins were kept constant at 15µM measured using UV absorbance.
Figure 29 : Data from TI variants characterizations
The T1/2 and D50 values were obtained by fitting the data from the CD thermal melts and urea denaturation melts respectively to the Clarke and Fersht equation. 122
fluorescence anisotropy to measure the affinity of the various mutants from our library to
the consensus DNA binding domain for p53 (GADD45 : GTACAGAACATG
TCTAAGCATGCTGGGGAC ) and the DNA binding domain used in our screen arrived
at via library methods.
Fluorescence anisotropy specifically measures the tumbling rate of fluorescent molecules
in solution. The linearly polarized excitation light is depolarized to different extents by the sample, depending on its tumbling rate which varies when it is in the free form versus when it is bound to bulky substrates such as proteins. The retention of polarization of a bound sample in comparison to its free form is what we measure in anisotropy experiments. The dissociation constant can be measured as a function of the anisotropy.
Fluorescence anisotropy measures the change in polarization of plane polarized light accomplished by the sample. When the rotational motion of a sample is faster than the time scale of the fluorophore it is attached to, the plane polarized light is scrambled by this sample. On the other hand for a molecule of restricted rotational freedom, like when bound to another large species, the rotational motion is slower than the timescale of the fluorophore and therefore the emitted light is still polarized. The extent of polarization in the emitted light is indicative of the reduced rotational motion of the probe bound to the sample. This is measured by addition of another polarizer before the detector which can toggle between vertical and horizontal orientations. Intensity of the parallel and
123 perpendicular polarized light measured allows the calculation of the anisotropy of the samples using the equation,